Skip to content
This repository has been archived by the owner on Dec 21, 2023. It is now read-only.

Feature extraction #3210

Merged
merged 45 commits into from
Jul 16, 2020
Merged

Conversation

NeerajKomuravalli
Copy link
Contributor

@TobyRoseman, please have a look and let me know what you think.
Thank you!

…noput features in classify, predict and _canonize_input functions
@TobyRoseman TobyRoseman self-requested a review May 21, 2020 22:25
@TobyRoseman
Copy link
Collaborator

Address #3126

Copy link
Collaborator

@TobyRoseman TobyRoseman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@NeerajKomuravalli - Thanks for working on it! I took a first look. Generally things look good; this is a great start.

I left a few comments about things to work on. My suggestion would be to work on the unit tests first. Let's make sure we're testing using both deep features and using images.

Copy link
Collaborator

@TobyRoseman TobyRoseman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking good. I pulled down your branch and tried it out. Things seem to be generally working.

If we can get the unit tests all fixed up, I'll run your branch on a large set of internal tests that we have.

from array import array

try:
feature_columns, _ = zip(*list(filter(lambda x: x[1] == array, list(zip(sframe.column_names(), sframe.column_types())))))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line is a bit complicated. I think you should just be able to use a helper function that we already have.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense, Will use _find_only_column_of_type

feature = None

if feature is None:
try:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not do all this in the except block above?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically that part of the code is responsible for figuring out which column is a feature column in case the feature column in not passed by the user.
And in that case we first see if there is a deep feature column in the sframe and if yes, we use it and if it's not found we go ahead and look for Image column.
The try block will fail for two reasons:

  1. There is no deep feature column
  2. More than one deep feature columns were present in the sframe. (This I am assuming is by design because we throw a similar error even when more than one Image columns were found in the dataset.)
    And when the try and except fail that means one of the above two cases would have happened and in that case we ignore the exception and we move ahead to see if we can find a Image column in the sframe dataset.
    So this choice was by design. Please let me know if I am missing something here.

list(self.model.predict(deep_features, output_type="probability_vector")),
self.tolerance,
)
# If the code came here that means the type of the feature used is deep_deatures and the predict fwature in coremltools doesn't work with deep_features yet so we will ignore this specific test case unitl the same is written.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should "fwature" be "function"? We try to maintain a 80 or 100 character length line limit. I would be nice if the text wrapped to the next line after 80 or 100 characters.

tc_distances = tc_ret.sort("reference_label")["distance"].to_numpy()
psnr_value = get_psnr(coreml_distances, tc_distances)
self.assertTrue(psnr_value > 50)
# If the code came here that means the type of the feature used is deep_deatures and the predict fwature in coremltools doesn't work with deep_features yet so we will ignore this specific test case unitl the same is written.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment

@TobyRoseman
Copy link
Collaborator

@NeerajKomuravalli - Your most recent changes look good, just one minor comment. I'm running your changes now on an internal testing system. If that all passes, I think we're ready to merge this change.

@TobyRoseman
Copy link
Collaborator

The unit tests are failing on Linux. The VisionFeaturePrint_Scene model is not available on Linux.

We should skip the tests for that model if _mac_ver() < (10, 14) evaluates to True. Also we should not being trying to extract deep features for that model.

@NeerajKomuravalli
Copy link
Contributor Author

@TobyRoseman
I went through both, test_image_classifier.py and test_image_similarity.py and in both the files we are only testing for VisionFeaturePrint_Scene model if _mac_ver() < (10, 14).
Can you point me to the code where this might be failing?

@TobyRoseman
Copy link
Collaborator

@TobyRoseman
I went through both, test_image_classifier.py and test_image_similarity.py and in both the files we are only testing for VisionFeaturePrint_Scene model if _mac_ver() < (10, 14).
Can you point me to the code where this might be failing?

Ok, if we're already skipping those classes on Linux, then we just need to skip extracting the deep features on Linux (since that is done independently of the test classes). I've just added comments in the code about that.

@NeerajKomuravalli
Copy link
Contributor Author

Hi @TobyRoseman ,

I have made the requested changes, let me know if there is anything else.

@TobyRoseman
Copy link
Collaborator

Thanks @NeerajKomuravalli. Those changes look good. I've just kicked off another internal test run. I'll let you know the results.

from turicreate.toolkits._main import ToolkitError as _ToolkitError
from turicreate.toolkits.image_analysis.image_analysis import MODEL_TO_FEATURE_SIZE_MAPPING, get_deep_features

import coremltools
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is no longer getting used.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately due to the (brittle) way that we test the minimal version of TuriCreate, we are going to need to remove this unnecessary line. coremltools is not a dependency for the minimal version of our TuriCreate. So having it imported at a top level break our unit tests even though the tests in this file are not ran for the minimal version.

I believe removing this line should be the final required change for this pull request. All of our other internal tests are passing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, will remove the import coremltools line and push changes

@NeerajKomuravalli
Copy link
Contributor Author

Made the required changes!

@TobyRoseman
Copy link
Collaborator

Thanks @NeerajKomuravalli - I'm rerunning the internal tests now.

@TobyRoseman
Copy link
Collaborator

Our unit tests for the minimal version of our package is still failing with this change. When our minimal package is installed we can't call get_deep_features. Since get_test_data is called just by importing the unit test files, that means get_deep_features gets called. We need to have to make it so that get_deep_features is only called from inside of a setUpClass method.

@NeerajKomuravalli
Copy link
Contributor Author

Makes sense, I shifted the get_deep_features call from get_test_data to setUpClass in both test_image_similarity.pyand test_image_classifier.py.
Let me know if there is anything else that is required!

@@ -84,19 +88,24 @@ class ImageClassifierTest(unittest.TestCase):
def setUpClass(
self,
model="resnet-50",
feature="resnet-50_deep_features",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this right. If the class name doesn't end in WithDeepFeature, then I think it should be feature="awesome_image". This code is only calling get_deep_features when self.feature != "awesome_image".

I think you need to make this change to all of your test classes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed the deep feature column name in test data to the one suggested by you and made all the requested changes.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you misunderstood my previous comment. I was not suggesting you need to change the name of the feature. I was basically just trying to say that it's important for the name of the class to represent what that class is actually testing.

For example, if the name of the class is ImageClassifierResnetTestWithDeepFeatures, then it should be feature="resnet-50_WithDeepFeature" or feature="resnet-50_deep_features", but not feature="awesome_image",. Similarly, if the name of the class is VisionFeaturePrintSceneTest then it should not be creating the model from deep features, so it should be feature="awesome_image", not feature="VisionFeaturePrint_Scene_WithDeepFeature".

Does that make sense? Let me know if you have any questions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah, you are right I did not see it there. It was correct before but during one of my recent pushes I changed it to test something and forgot to revert it back. My bad, will change it and push it.
And about changing the name of feature columns in the test data in test_image_classifier.py and test_image_similarity.py to have a suffix _WithDeepFeature instead of _deep_features makes more sense to me. So I am keeping that change as it is.

@NeerajKomuravalli
Copy link
Contributor Author

I made the required changes. Let me know if you need anything else.

@TobyRoseman
Copy link
Collaborator

@NeerajKomuravalli - Those changes look good. Thank you. I'm rerunning our internal tests now.

@TobyRoseman
Copy link
Collaborator

Internal tests now pass.

@NeerajKomuravalli - thanks so much for all your work! I think this is great feature. It will be included in our next release.

@TobyRoseman TobyRoseman changed the title Feature extraction [WIP] Feature extraction Jul 16, 2020
@TobyRoseman TobyRoseman merged commit 4f4f1a6 into apple:master Jul 16, 2020
@NeerajKomuravalli
Copy link
Contributor Author

That's a great news @TobyRoseman ! It was my first contribution to an open source project so I am really excited.

It was great working on this, let me know if I can help contribute more!

Thanks!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants