tc.image_classifier.create should have a way to use extracted features #3126

NeerajKomuravalli · 2020-04-14T10:04:12Z

The model building with turi create takes most of its time building features and if someone wants to retrain the model with the same data or add just some amount of data it goes through the process of extracting features again and this is a waste of time. Turi Create should have a method of extracting the features and saving the same in sFrame data format and using the same to create the model with either the extracted features or the image data itself.
I could not really find a method which addresses this anywhere in the code or documentation and I would really think this a great feature to have. Please let me know if this is already there and I am not able to find it and if it's not there I would not mind working on it and send a merge request.

TobyRoseman · 2020-04-14T17:42:36Z

This is a great idea. It's something we have talked about doing before, see #1916.

The main reason we haven't done it already is that we're not sure where to put this new function. The same deep features are used by both the image classifier and the image similarity toolkit.

So do we add the get_deep_features method under tc.image_classifier.? Or add it under tc.image_similarity.? We could add it in both places but I think that could be confusing.

My current thinking is that the best places to add it might be under tc.image_analysis..

If you have any opinions about where to add it, please let me know.

NeerajKomuravalli · 2020-04-14T17:49:30Z

I think using the same piece of code at two places will be both confusing and redundant so yes that might be a bad idea but why not create a new module where we can just extract features and use the same in both places. In addition we can make the tc.image_classifier.create accept data in both formats, sframe with image_path or sframe with extracted features and will have an option of saving features after model creation.

What are your thoughts?

TobyRoseman · 2020-04-15T00:16:49Z

I agree we'll need to modify the tc.image_classifier.create (and tc.image_similarity.create) to accept deep features rather than just images.

I think adding a new module (say we call it deep_features) would be a little strange since there would only be one thing inside that of that module. If we add:
tc.deep_features.get_deep_features(image_sarray, model_name).
There isn't going to be anything else under tc.deep_features.

Another option would be to add a new module and make each model have its own function, i.e. remove the model_name parameter. Then we would have three method:
tc.deep_features.Resnet(image_sarray)
tc.deep_features.Squeezenet(image_sarray)
tc.deep_features.VisionFeaturePrint_Scene(image_sarray)
This might be a cleaner solution.

Let me know which of these two options you had in mind. Or if I misunderstood and you had something else in mind.

NeerajKomuravalli · 2020-04-15T08:17:14Z

I like the later one better and yes It will be relatively cleaner and we can definitely do it.
I don't mind working on it and sending a PR but is there anything in specific I need to keep in mind before working on it?
Or let me know if you have someone else in your mind who will be working on it.

NeerajKomuravalli · 2020-04-17T06:39:00Z

any comments?

TobyRoseman · 2020-04-18T03:54:54Z

@NeerajKomuravalli - Sorry for the delay. I wanted to discuss this matter with my manager before responding. We think having one function under tc.image_analysis would be best.

I think it should look like tc.image_analysis.get_deep_features(images, model_name)
Where images is an SArray of images and model_name is a string.

Below is a list of things that will should to be done. There is likely more needed but this should be most of it.

Get Deep Features

Create tc.image_analysis.get_deep_features and write the docstring
Parameter checking for tc.image_analysis.get_deep_features. Make sure images is a non-empty SArray of images. Make sure model_name is a string with one of the three allowed values (note “VisionFeaturePrint_Scene” is only allowed on certain macOS version -- see existing code). If any of the parameter aren't right the user should get an informative error message.
Implement tc.image_analysis.get_deep_features. You should be able to just use the existing code from either the image classifier or image similarity.

Image Classifier

Modify code to check if the input are deep features. We already do something similar for the sound classifier. Creating a helper function here would probably be good so you can also use it for image similarity too.
If deep features are not passed in, the code should just use tc.image_analysis.get_deep_features to get the deep features. This is so we don't have duplicated code.
Update the docstring.
Write image classifier unit tests for this new functionality.

Image Similarity

Modify code to check if the input are deep features.
If deep features are not passed in, the code should use tc.image_analysis.get_deep_features`.
Update the docstring.
Write unit tests for this new functionality.

Let me know if you have any questions. Feel free to put up a work in progress pull request at any point, if you would like feedback.

NeerajKomuravalli · 2020-04-18T11:11:14Z

@TobyRoseman Thanks for the response, that was a very detailed doc. I agree with all your point and will try to send a PR at the earliest and will let you know if I have any doubts.
Thanks!

NeerajKomuravalli · 2020-04-27T11:55:32Z

@TobyRoseman I was working on all the things we discussed and I realized, to extract features we need to call the _image_feature_extractor._create_feature_extractor.extract_features and it accepts parameters dataset and feature which is basically SFrame and feature column name respectively. So if we want to call this from tc.image_analysis.get_deep_features we need to send SFrame dataset in get_deep_features or we need to change the extract_feature function in _image_feature_extractor.
I personally feel like we should change the input parameters of tc.image_analysis.get_deep_features from SArray to SFrame. This will make sure we don't have to change a lot of the code and yet the same can be achieved by just passing dataset, feature, model_name and batch_size. And even when people want to save features they can directly save the the output as it has a direct association of images and the features (This is a side benefit and it's advantages are subjective and might not be as important).
Let me know what are you thoughts?

TobyRoseman · 2020-04-27T23:33:48Z

I think dataset should be a SArray not an SFrame. There's no reason for it to be an SFrame since we only need the image column in order to generate the deep features. It's just cleaner/simpler to have it be an SArray.

I don't think it should be much work to for it be an SArray rather than an SFrame. The current code should not need to change too much.
If dataset is now a SArray, I think you can just change:
image_sf = tc.SFrame({"image": dataset[feature]})
to:
image_sf = tc.SFrame({"image": dataset})

You're right we will need a batch_size parameter in tc.image_analysis.get_deep_features. We should also have an optional verbose flag. So let's add verbose=True and batch_size=64 to the definition of tc.image_analysis.get_deep_features.

NeerajKomuravalli · 2020-04-28T03:38:50Z

yes, you are right. I can just create a SFrame and get this done and there is no need for accepting SFrame as input.

…issing value test (apple#3126)

…#3126)

…wo support functions (apple#3126)

…e column (apple#3126)

NeerajKomuravalli · 2020-05-21T14:44:16Z

Issue turicreate:

I am sorry for the delay I have caused in working on this, I could not really get a lot of time to work on it.
I have made the required changes and I have run into some issues and I would appreciate your input input in it.
Below are the changes I made with respect to our conversation a few weeks before:

Image Analysis

Added get_deep_features function as discussed
Added support function to find if extracted feature column is present in the SFrame and if present it returns the column name of that feature otherwise raises error saying it’s not present. This function is used in image_classifier and image_similarity to find out if the feature column is of type image or extracted feature and if the feature column is not passed it will try to find the feature column with either type image or extracted_features. (This is very similar to _find_only_image_column in _internal_utils but I was not sure if I should add it there or in image_analysys but I ended up adding it in image_analysis itself)

Image classifier

Changed the create function to accept feature column in either Image format or extracted feature format (of the model in use).
If the feature column is not mentioned by the user then the algorithm first checks If there is a column with data type of extracted feature (of the model in use) and if not present it chooses the Image type column
Changed _extract_features to extract features only when the datatype of the feature column is Image and do nothing and pass the feature column as it is when the data type of the feature of the type extracted features (of the model in use)
Accept the data type array (this will be the data type of the extracted feature, if present) in functions: predict_topk, _canonize_input, predict and classify.

Image similarity

Changed the create function to accept feature column in either Image format or extracted feature format (of the model in use).
If the feature column is not mentioned by the user then the algorithm first checks If there is a column with data type of extracted feature (of the model in use) and if not present it chooses the Image type column
Changed _extract_features to extract features only when the datatype of the feature column is Image and do nothing and pass the feature column as it is when the data type of the feature of the type extracted features (of the model in use)
Accept the data type array (this will be the data type of the extracted feature, if present) in functions: query

Test image classifier and test image similarity

Changed the get_test_data function to also include columns with extracted features for the corresponding images (from “awesome_image” column)
After making the changes in create functions of both image_classification and image_similarity, they accept the feature in either Image form or array form (extracted feature column type) and the test cases are changed to include models that are trained with both types of features. 
Below are the issues I am facing in detail:

After finishing all this, I tested the same with both c++ unit test cases and python unit test cases:

It passed all the c++ test cases
It passed all the python test cases except test_export_coreml (in test_image_similarity) and test_export_coreml_predict (in test_image_classifier). It throws the error : ValueError: Cannot call 'resize' on objects that are not either an Image or SArray of Images. And as far as I know this is happening because the module predict function in coremltools accepts only Image as data type and I am not sure where to make these changes

Please let me know if the approach I have taken is correct and what do you think of it and if you want I pull put out a WIP pull request for you to check what I have done.
Please share your thoughts!

TobyRoseman · 2020-05-21T18:58:48Z

Hi @NeerajKomuravalli - Wow, this all sounds great. Yes, please put up a WIP pull request and I'll be happy to take a detailed look. I'm not sure what's happening with those exporting to CoreML Python unit test but I'm sure we can figure it out.

NeerajKomuravalli · 2020-05-21T19:12:31Z

I have put a pull request, please let me know what you think.
Thanks!

NeerajKomuravalli · 2020-05-23T13:40:44Z

Hi,

I have made the required changes, please take a look whenever you can.

Thanks!

NeerajKomuravalli · 2020-06-16T13:53:21Z

Hi,

I have made all the necessary changes and have run the test cases as well to see if everything is working as it should. All the test cases are passed successfully except:

test_export_coreml_predict in test_image_classifier.py: Here when the code comes to coreml_model.predict it runs successfully when the feature is awesome_images but fails when we use deep_features.
Error thrown : "The model expects input feature resnet-50_deep_features to be an image, but the input is of type 5."
test_export_coreml in test_image_similarity.py : Same as above.
Error thrown : "Error Domain=com.apple.CoreML Code=0 "Input image feature 'VisionFeaturePrint_Scene_deep_features' not found" UserInfo={NSLocalizedDescription=Input image feature 'VisionFeaturePrint_Scene_deep_features' not found}"

My assumption is predict method on coremltools.models.MLModel can be called only with image as input and doesn't work with deep features.

Please let me know your thoughts and what are the next steps.

* Adding features column to the data and compensating for the same in missing value test (#3126) * Adding features column to the data to test with features (#3126) * Changing import statements to make the image_analysis callable (#3126) * Adding get_deep_features module to extract features and adding more two support functions (#3126) * Adding a way to create model by passing either feature column or image column (#3126) * Adding a way to create model by passing either feature column or image column (#3126) * Fixing syntax error * Adding more elaborate error messages to also include Missing value errors * Adding image_classifier test methods to include testing for data with and without extracted_features column * Making Image feature extraction feature more robust and indipendent * Changing the _extract_features functions to first recognize the feature type and then take the decision * Doing small edits * Changing the _extract_features functions to first recognize the feature type and then take the decision * passing model name specifically in test_select_correct_feature_column_to_train test case * Adding a way to account for extracted features instead of images as inoput features in classify, predict and _canonize_input functions * Adding feature based test cases for both image as a feature and extracted features as feature * Using actual features using get_deep_features instead of randomly generating features * Adding test cases for model using both image and extracted_feature as input feature * Removing unnecessary code * Adding a way to accept input as extracted feature in quary function * Created test cases for model that is being trained with deep features * Created test cases for model that is being trained with deep features * Hiding the functions is_image_deep_feature_sarray and find_only_image_extracted_features_column * Changing the function name find_only_image_extracted_features_column to _find_only_image_extracted_features_column as chnages were made in image_analysis * Changing the generic deep feature column name in data generation to the model specific name * Simplifing the code to detect type of sfrane column by using _find_only_column_of_type instead of writing a new logic * Removing unnecessary try except block and correcting spelling mistakes * Debugging to fix _find_only_column_of_type not found error * Fixing all the errors where image_classifier was failing in test cases * Fixing all the errors where image_similarity was failing in test cases * Changing test_export_coreml_predict function to also account for deep features * Changing test_export_coreml function to also account for deep features * Ignoring the coremltools predict test case when deep_features are used * Ignoring the coremltools predict test case when deep_features are used * Adding docstring for get_deep_features * Removing unnecessary code and makeing recommended changes * Ignoring getting deep features when mac version is less than 10.14 or if its a Linux system or any other OS * Removing coremltools import line to adhere to the turicreate testing practices * Shifting deep feature building in from get_test_data to setUpClass (no tested) * Shifting deep feature building in from get_test_data to setUpClass (fully tested) * Shifting deep feature building in from get_test_data to setUpClass * Removing unnecessary comments * Changed the logic that decides when to extract deepFeatures and changed the name of the same to WithDeepFeatures * Using the feature fed to unit test classes based on their names to maintain consistency

TobyRoseman · 2020-08-31T19:05:01Z

The relevant pull request here has been merged and was included in our latest release. Closing.

TobyRoseman added enhancement image classifier image similarity images labels Apr 14, 2020

TobyRoseman assigned NeerajKomuravalli Apr 18, 2020

NeerajKomuravalli added a commit to NeerajKomuravalli/turicreate that referenced this issue May 6, 2020

Adding features column to the data and compensating for the same in m…

daa5b4a

…issing value test (apple#3126)

NeerajKomuravalli added a commit to NeerajKomuravalli/turicreate that referenced this issue May 6, 2020

Adding features column to the data to test with features (apple#3126)

63736cf

NeerajKomuravalli added a commit to NeerajKomuravalli/turicreate that referenced this issue May 6, 2020

Changing import statements to make the image_analysis callable (apple…

37d5ccd

…#3126)

NeerajKomuravalli added a commit to NeerajKomuravalli/turicreate that referenced this issue May 6, 2020

Adding get_deep_features module to extract features and adding more t…

67216b5

…wo support functions (apple#3126)

NeerajKomuravalli added a commit to NeerajKomuravalli/turicreate that referenced this issue May 6, 2020

Adding a way to create model by passing either feature column or imag…

d891038

…e column (apple#3126)

NeerajKomuravalli added a commit to NeerajKomuravalli/turicreate that referenced this issue May 6, 2020

Adding a way to create model by passing either feature column or imag…

18d5b70

…e column (apple#3126)

TobyRoseman mentioned this issue May 21, 2020

Feature extraction #3210

Merged

TobyRoseman closed this as completed Aug 31, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tc.image_classifier.create should have a way to use extracted features #3126

tc.image_classifier.create should have a way to use extracted features #3126

NeerajKomuravalli commented Apr 14, 2020

TobyRoseman commented Apr 14, 2020

NeerajKomuravalli commented Apr 14, 2020

TobyRoseman commented Apr 15, 2020

NeerajKomuravalli commented Apr 15, 2020

NeerajKomuravalli commented Apr 17, 2020

TobyRoseman commented Apr 18, 2020

NeerajKomuravalli commented Apr 18, 2020

NeerajKomuravalli commented Apr 27, 2020

TobyRoseman commented Apr 27, 2020

NeerajKomuravalli commented Apr 28, 2020

NeerajKomuravalli commented May 21, 2020

TobyRoseman commented May 21, 2020

NeerajKomuravalli commented May 21, 2020

NeerajKomuravalli commented May 23, 2020

NeerajKomuravalli commented Jun 16, 2020

TobyRoseman commented Aug 31, 2020

tc.image_classifier.create should have a way to use extracted features #3126

tc.image_classifier.create should have a way to use extracted features #3126

Comments

NeerajKomuravalli commented Apr 14, 2020

TobyRoseman commented Apr 14, 2020

NeerajKomuravalli commented Apr 14, 2020

TobyRoseman commented Apr 15, 2020

NeerajKomuravalli commented Apr 15, 2020

NeerajKomuravalli commented Apr 17, 2020

TobyRoseman commented Apr 18, 2020

Get Deep Features

Image Classifier

Image Similarity

NeerajKomuravalli commented Apr 18, 2020

NeerajKomuravalli commented Apr 27, 2020

TobyRoseman commented Apr 27, 2020

NeerajKomuravalli commented Apr 28, 2020

NeerajKomuravalli commented May 21, 2020

TobyRoseman commented May 21, 2020

NeerajKomuravalli commented May 21, 2020

NeerajKomuravalli commented May 23, 2020

NeerajKomuravalli commented Jun 16, 2020

TobyRoseman commented Aug 31, 2020