Skip to content
This repository has been archived by the owner on Dec 21, 2023. It is now read-only.

tc.image_classifier.create should have a way to use extracted features #3126

Closed
NeerajKomuravalli opened this issue Apr 14, 2020 · 16 comments
Closed

Comments

@NeerajKomuravalli
Copy link
Contributor

The model building with turi create takes most of its time building features and if someone wants to retrain the model with the same data or add just some amount of data it goes through the process of extracting features again and this is a waste of time. Turi Create should have a method of extracting the features and saving the same in sFrame data format and using the same to create the model with either the extracted features or the image data itself.
I could not really find a method which addresses this anywhere in the code or documentation and I would really think this a great feature to have. Please let me know if this is already there and I am not able to find it and if it's not there I would not mind working on it and send a merge request.

@TobyRoseman
Copy link
Collaborator

This is a great idea. It's something we have talked about doing before, see #1916.

The main reason we haven't done it already is that we're not sure where to put this new function. The same deep features are used by both the image classifier and the image similarity toolkit.

So do we add the get_deep_features method under tc.image_classifier.? Or add it under tc.image_similarity.? We could add it in both places but I think that could be confusing.

My current thinking is that the best places to add it might be under tc.image_analysis..

If you have any opinions about where to add it, please let me know.

@NeerajKomuravalli
Copy link
Contributor Author

I think using the same piece of code at two places will be both confusing and redundant so yes that might be a bad idea but why not create a new module where we can just extract features and use the same in both places. In addition we can make the tc.image_classifier.create accept data in both formats, sframe with image_path or sframe with extracted features and will have an option of saving features after model creation.

What are your thoughts?

@TobyRoseman
Copy link
Collaborator

I agree we'll need to modify the tc.image_classifier.create (and tc.image_similarity.create) to accept deep features rather than just images.

I think adding a new module (say we call it deep_features) would be a little strange since there would only be one thing inside that of that module. If we add:
tc.deep_features.get_deep_features(image_sarray, model_name).
There isn't going to be anything else under tc.deep_features.

Another option would be to add a new module and make each model have its own function, i.e. remove the model_name parameter. Then we would have three method:
tc.deep_features.Resnet(image_sarray)
tc.deep_features.Squeezenet(image_sarray)
tc.deep_features.VisionFeaturePrint_Scene(image_sarray)
This might be a cleaner solution.

Let me know which of these two options you had in mind. Or if I misunderstood and you had something else in mind.

@NeerajKomuravalli
Copy link
Contributor Author

I like the later one better and yes It will be relatively cleaner and we can definitely do it.
I don't mind working on it and sending a PR but is there anything in specific I need to keep in mind before working on it?
Or let me know if you have someone else in your mind who will be working on it.

@NeerajKomuravalli
Copy link
Contributor Author

any comments?

@TobyRoseman
Copy link
Collaborator

@NeerajKomuravalli - Sorry for the delay. I wanted to discuss this matter with my manager before responding. We think having one function under tc.image_analysis would be best.

I think it should look like tc.image_analysis.get_deep_features(images, model_name)
Where images is an SArray of images and model_name is a string.

Below is a list of things that will should to be done. There is likely more needed but this should be most of it.

Get Deep Features

  • Create tc.image_analysis.get_deep_features and write the docstring
  • Parameter checking for tc.image_analysis.get_deep_features. Make sure images is a non-empty SArray of images. Make sure model_name is a string with one of the three allowed values (note “VisionFeaturePrint_Scene” is only allowed on certain macOS version -- see existing code). If any of the parameter aren't right the user should get an informative error message.
  • Implement tc.image_analysis.get_deep_features. You should be able to just use the existing code from either the image classifier or image similarity.

Image Classifier

  • Modify code to check if the input are deep features. We already do something similar for the sound classifier. Creating a helper function here would probably be good so you can also use it for image similarity too.
  • If deep features are not passed in, the code should just use tc.image_analysis.get_deep_features to get the deep features. This is so we don't have duplicated code.
  • Update the docstring.
  • Write image classifier unit tests for this new functionality.

Image Similarity

  • Modify code to check if the input are deep features.
  • If deep features are not passed in, the code should use tc.image_analysis.get_deep_features`.
  • Update the docstring.
  • Write unit tests for this new functionality.

Let me know if you have any questions. Feel free to put up a work in progress pull request at any point, if you would like feedback.

@NeerajKomuravalli
Copy link
Contributor Author

@TobyRoseman Thanks for the response, that was a very detailed doc. I agree with all your point and will try to send a PR at the earliest and will let you know if I have any doubts.
Thanks!

@NeerajKomuravalli
Copy link
Contributor Author

@TobyRoseman I was working on all the things we discussed and I realized, to extract features we need to call the _image_feature_extractor._create_feature_extractor.extract_features and it accepts parameters dataset and feature which is basically SFrame and feature column name respectively. So if we want to call this from tc.image_analysis.get_deep_features we need to send SFrame dataset in get_deep_features or we need to change the extract_feature function in _image_feature_extractor.
I personally feel like we should change the input parameters of tc.image_analysis.get_deep_features from SArray to SFrame. This will make sure we don't have to change a lot of the code and yet the same can be achieved by just passing dataset, feature, model_name and batch_size. And even when people want to save features they can directly save the the output as it has a direct association of images and the features (This is a side benefit and it's advantages are subjective and might not be as important).
Let me know what are you thoughts?

@TobyRoseman
Copy link
Collaborator

I think dataset should be a SArray not an SFrame. There's no reason for it to be an SFrame since we only need the image column in order to generate the deep features. It's just cleaner/simpler to have it be an SArray.

I don't think it should be much work to for it be an SArray rather than an SFrame. The current code should not need to change too much.
If dataset is now a SArray, I think you can just change:
image_sf = tc.SFrame({"image": dataset[feature]})
to:
image_sf = tc.SFrame({"image": dataset})

You're right we will need a batch_size parameter in tc.image_analysis.get_deep_features. We should also have an optional verbose flag. So let's add verbose=True and batch_size=64 to the definition of tc.image_analysis.get_deep_features.

@NeerajKomuravalli
Copy link
Contributor Author

yes, you are right. I can just create a SFrame and get this done and there is no need for accepting SFrame as input.

NeerajKomuravalli added a commit to NeerajKomuravalli/turicreate that referenced this issue May 6, 2020
NeerajKomuravalli added a commit to NeerajKomuravalli/turicreate that referenced this issue May 6, 2020
NeerajKomuravalli added a commit to NeerajKomuravalli/turicreate that referenced this issue May 6, 2020
NeerajKomuravalli added a commit to NeerajKomuravalli/turicreate that referenced this issue May 6, 2020
NeerajKomuravalli added a commit to NeerajKomuravalli/turicreate that referenced this issue May 6, 2020
NeerajKomuravalli added a commit to NeerajKomuravalli/turicreate that referenced this issue May 6, 2020
@NeerajKomuravalli
Copy link
Contributor Author

Issue turicreate:

I am sorry for the delay I have caused in working on this, I could not really get a lot of time to work on it.
I have made the required changes and I have run into some issues and I would appreciate your input input in it.
Below are the changes I made with respect to our conversation a few weeks before:

Image Analysis

  1. Added get_deep_features function as discussed
  2. Added support function to find if extracted feature column is present in the SFrame and if present it returns the column name of that feature otherwise raises error saying it’s not present. This function is used in image_classifier and image_similarity to find out if the feature column is of type image or extracted feature and if the feature column is not passed it will try to find the feature column with either type image or extracted_features. (This is very similar to _find_only_image_column in _internal_utils but I was not sure if I should add it there or in image_analysys but I ended up adding it in image_analysis itself)

Image classifier

  1. Changed the create function to accept feature column in either Image format or extracted feature format (of the model in use).
  2. If the feature column is not mentioned by the user then the algorithm first checks If there is a column with data type of extracted feature (of the model in use) and if not present it chooses the Image type column
  3. Changed _extract_features to extract features only when the datatype of the feature column is Image and do nothing and pass the feature column as it is when the data type of the feature of the type extracted features (of the model in use)
  4. Accept the data type array (this will be the data type of the extracted feature, if present) in functions: predict_topk, _canonize_input, predict and classify.

Image similarity

  1. Changed the create function to accept feature column in either Image format or extracted feature format (of the model in use).
  2. If the feature column is not mentioned by the user then the algorithm first checks If there is a column with data type of extracted feature (of the model in use) and if not present it chooses the Image type column
  3. Changed _extract_features to extract features only when the datatype of the feature column is Image and do nothing and pass the feature column as it is when the data type of the feature of the type extracted features (of the model in use)
  4. Accept the data type array (this will be the data type of the extracted feature, if present) in functions: query

Test image classifier and test image similarity

  1. Changed the get_test_data function to also include columns with extracted features for the corresponding images (from “awesome_image” column)
  2. After making the changes in create functions of both image_classification and image_similarity, they accept the feature in either Image form or array form (extracted feature column type) and the test cases are changed to include models that are trained with both types of features.

    Below are the issues I am facing in detail:

After finishing all this, I tested the same with both c++ unit test cases and python unit test cases:

  1. It passed all the c++ test cases
  2. It passed all the python test cases except test_export_coreml (in test_image_similarity) and test_export_coreml_predict (in test_image_classifier). It throws the error : ValueError: Cannot call 'resize' on objects that are not either an Image or SArray of Images. And as far as I know this is happening because the module predict function in coremltools accepts only Image as data type and I am not sure where to make these changes

Please let me know if the approach I have taken is correct and what do you think of it and if you want I pull put out a WIP pull request for you to check what I have done.
Please share your thoughts!

@TobyRoseman
Copy link
Collaborator

Hi @NeerajKomuravalli - Wow, this all sounds great. Yes, please put up a WIP pull request and I'll be happy to take a detailed look. I'm not sure what's happening with those exporting to CoreML Python unit test but I'm sure we can figure it out.

@NeerajKomuravalli
Copy link
Contributor Author

I have put a pull request, please let me know what you think.
Thanks!

@NeerajKomuravalli
Copy link
Contributor Author

Hi,

I have made the required changes, please take a look whenever you can.

Thanks!

@NeerajKomuravalli
Copy link
Contributor Author

Hi,

I have made all the necessary changes and have run the test cases as well to see if everything is working as it should. All the test cases are passed successfully except:

  1. test_export_coreml_predict in test_image_classifier.py: Here when the code comes to coreml_model.predict it runs successfully when the feature is awesome_images but fails when we use deep_features.
    Error thrown : "The model expects input feature resnet-50_deep_features to be an image, but the input is of type 5."
  2. test_export_coreml in test_image_similarity.py : Same as above.
    Error thrown : "Error Domain=com.apple.CoreML Code=0 "Input image feature 'VisionFeaturePrint_Scene_deep_features' not found" UserInfo={NSLocalizedDescription=Input image feature 'VisionFeaturePrint_Scene_deep_features' not found}"

My assumption is predict method on coremltools.models.MLModel can be called only with image as input and doesn't work with deep features.

Please let me know your thoughts and what are the next steps.

TobyRoseman pushed a commit that referenced this issue Jul 16, 2020
* Adding features column to the data and compensating for the same in missing value test (#3126)

* Adding features column to the data to test with features (#3126)

* Changing import statements to make the image_analysis callable (#3126)

* Adding get_deep_features module to extract features and adding more two support functions (#3126)

* Adding a way to create model by passing either feature column or image column (#3126)

* Adding a way to create model by passing either feature column or image column (#3126)

* Fixing syntax error

* Adding more elaborate error messages to also include Missing value errors

* Adding image_classifier test methods to include testing for data with and without extracted_features column

* Making Image feature extraction feature more robust and indipendent

* Changing the _extract_features functions to first recognize the feature type and then take the decision

* Doing small edits

* Changing the _extract_features functions to first recognize the feature type and then take the decision

* passing model name specifically in test_select_correct_feature_column_to_train test case

* Adding a way to account for extracted features instead of images as inoput features in classify, predict and _canonize_input functions

* Adding feature based test cases for both image as a feature and extracted features as feature

* Using actual features using get_deep_features instead of randomly generating features

* Adding test cases for model using both image and extracted_feature as input feature

* Removing unnecessary code

* Adding a way to accept input as extracted feature in quary function

* Created test cases for model that is being trained with deep features

* Created test cases for model that is being trained with deep features

* Hiding the functions is_image_deep_feature_sarray and  find_only_image_extracted_features_column

* Changing the function name find_only_image_extracted_features_column to _find_only_image_extracted_features_column as chnages were made in image_analysis

* Changing the generic deep feature column name in data generation to the model specific name

* Simplifing the code to detect type of sfrane column by using _find_only_column_of_type instead of writing a new logic

* Removing unnecessary try except block and correcting spelling mistakes

* Debugging to fix _find_only_column_of_type not found error

* Fixing all the errors where image_classifier was failing in test cases

* Fixing all the errors where image_similarity was failing in test cases

* Changing test_export_coreml_predict function to also account for deep features

* Changing test_export_coreml function to also account for deep features

* Ignoring the coremltools predict test case when deep_features are used

* Ignoring the coremltools predict test case when deep_features are used

* Adding docstring for get_deep_features

* Removing unnecessary code and makeing recommended changes

* Ignoring getting deep features when mac version is less than 10.14 or if its a Linux system or any other OS

* Removing coremltools import line to adhere to the turicreate testing practices

* Shifting deep feature building in from get_test_data to setUpClass (no tested)

* Shifting deep feature building in from get_test_data to setUpClass (fully tested)

* Shifting deep feature building in from get_test_data to setUpClass

* Removing unnecessary comments

* Changed the logic that decides when to extract deepFeatures and changed the name of the same to WithDeepFeatures

* Using the feature fed to unit test classes based on their names to maintain consistency
@TobyRoseman
Copy link
Collaborator

The relevant pull request here has been merged and was included in our latest release. Closing.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants