New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ImageLoadingEstimator for TensorFlow scoring should allow in-memory image streams as input in addition to images from files on drive #2121

Open
CESARDELATORRE opened this Issue Jan 11, 2019 · 3 comments

Comments

3 participants
@CESARDELATORRE
Copy link
Contributor

CESARDELATORRE commented Jan 11, 2019

Right now the only way for ML.NET to load images is via ImageLoadingEstimator, which can load them only from disk files (as confirmed by @yaeldekel and Pete a few weeks ago).

However, it is a very common scenario in applications, such as a web app, where users submit images through Http, then the DataView/pipeline would load in-memory image streams (either BitMap, byte[], Image) instead of loading images from files in a folder on a disk/drive.

That's the right way to do it for many scenarios in web apps and services (Web APIs).
And for instance, you can do that when using TensorFlowSharp in C#. But we cannot in ML.NET, as of today.

When implementing this feature improvement in ML.NET, there could be the following two approaches:

  • Modify schema comprehension to be able to map Bitmap fields/properties to Image columns of a data view.

  • Add another version of ImageLoading transformer that loads/decodes the image from a byte vector, rather than from a disk file identified by path.

In any case, this is an important scenario to implement because not being able to load images from in-memory streams and only from files can be a big handicap in performance for on-line scenarios like the ones mentioned.

With the current implementation in ML.NET, the only workaround is to save the upcoming image from http and in-memory into a temporary file on the disk and load it from there. But that is a very "coarse/poor" workaround, not performant at all for a real application in production.

The following is a sample app I created for this online scenario where the user uploads an image from the browser into a service (Web API) and ultimately you get it as an in-memory image stream.

SEE CODE HERE:

https://github.com/CESARDELATORRE/TensorFlowImageClassificationWebAPI

image

  • That web form uploads the image through Http into a service (Web API) in the server-side. At that point, the image is an in-memory image stream.

  • In this implementation the sample app works because I implemented a workaround so the submitted image is temporarily stored as a file, then loaded from the file into the DataView through the pipeline...)

Basically, when the C# method in the Web API gets the image as an in-memory stream it should be able to load it directly in the DataView. The following code is an example:

        // Controller's method from Web API 
        [HttpPost]
        [ProducesResponseType(200)]
        [ProducesResponseType(400)]
        [Route("classifyimage")]
        public async Task<IActionResult> ClassifyImage(IFormFile imageFile)
        {
                if (imageFile.Length == 0)
                    return BadRequest();

                // WORKAROUND: Save image into a temporal file
                //Save the temp image image into the temp-folder 
                string fileName = await _imageWriter.UploadImageAsync(imageFile, _imagesTmpFolder);
                string imageFilePath = Path.Combine(_imagesTmpFolder, fileName);

                // Use image filename as the workaround...
                // Rest of the implementation with ML.NET API for scoring TensorFlow model...
                // ...
           
        }

To sum up:

I believe it is "a must" for ML.NET to be able to load in-memory image streams into the DataView to use those images when scoring TensorFlow models (in addition "from files") because of the mentioned on-line and in-memory scenarios that are pretty common.

@CESARDELATORRE CESARDELATORRE changed the title TensorFlow scoring should allow in-memory image streams as input in addition to images from files on drive ImageLoadingEstimator for TensorFlow scoring should allow in-memory image streams as input in addition to images from files on drive Jan 11, 2019

@zeahmed

This comment has been minimized.

Copy link
Member

zeahmed commented Jan 11, 2019

I think #1609 is duplicate of this. Can we close #1609?

@zeahmed zeahmed added the enhancement label Jan 11, 2019

@CESARDELATORRE

This comment has been minimized.

Copy link
Contributor

CESARDELATORRE commented Jan 11, 2019

Let's close #1609 issue when this new issue is added to 0.10 backlog, ok?

But I agree, since this issue #2121 is related, but I'm providing further info about the impacted scenarios in this issue.

@mareklinka

This comment has been minimized.

Copy link

mareklinka commented Jan 14, 2019

The project I'm currently working on is looking to use ML.NET in a scenario very similar to the one outlined in this issue. And while in our case performance is not that much of a factor, having the capability to score in-memory images would be a great simplification.

+1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment