-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Feature extraction #3210
Feature extraction #3210
Conversation
…issing value test (apple#3126)
…wo support functions (apple#3126)
… and without extracted_features column
…re type and then take the decision
…re type and then take the decision
…_to_train test case
…noput features in classify, predict and _canonize_input functions
…cted features as feature
Address #3126 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@NeerajKomuravalli - Thanks for working on it! I took a first look. Generally things look good; this is a great start.
I left a few comments about things to work on. My suggestion would be to work on the unit tests first. Let's make sure we're testing using both deep features and using images.
src/python/turicreate/toolkits/image_analysis/image_analysis.py
Outdated
Show resolved
Hide resolved
src/python/turicreate/toolkits/image_analysis/image_analysis.py
Outdated
Show resolved
Hide resolved
…_extracted_features_column
…to _find_only_image_extracted_features_column as chnages were made in image_analysis
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking good. I pulled down your branch and tried it out. Things seem to be generally working.
If we can get the unit tests all fixed up, I'll run your branch on a large set of internal tests that we have.
from array import array | ||
|
||
try: | ||
feature_columns, _ = zip(*list(filter(lambda x: x[1] == array, list(zip(sframe.column_names(), sframe.column_types()))))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line is a bit complicated. I think you should just be able to use a helper function that we already have.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense, Will use _find_only_column_of_type
feature = None | ||
|
||
if feature is None: | ||
try: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not do all this in the except
block above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically that part of the code is responsible for figuring out which column is a feature column in case the feature column in not passed by the user.
And in that case we first see if there is a deep feature column in the sframe and if yes, we use it and if it's not found we go ahead and look for Image column.
The try block will fail for two reasons:
- There is no deep feature column
- More than one deep feature columns were present in the sframe. (This I am assuming is by design because we throw a similar error even when more than one Image columns were found in the dataset.)
And when thetry
andexcept
fail that means one of the above two cases would have happened and in that case we ignore the exception and we move ahead to see if we can find a Image column in the sframe dataset.
So this choice was by design. Please let me know if I am missing something here.
…he model specific name
…ly_column_of_type instead of writing a new logic
list(self.model.predict(deep_features, output_type="probability_vector")), | ||
self.tolerance, | ||
) | ||
# If the code came here that means the type of the feature used is deep_deatures and the predict fwature in coremltools doesn't work with deep_features yet so we will ignore this specific test case unitl the same is written. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should "fwature" be "function"? We try to maintain a 80 or 100 character length line limit. I would be nice if the text wrapped to the next line after 80 or 100 characters.
tc_distances = tc_ret.sort("reference_label")["distance"].to_numpy() | ||
psnr_value = get_psnr(coreml_distances, tc_distances) | ||
self.assertTrue(psnr_value > 50) | ||
# If the code came here that means the type of the feature used is deep_deatures and the predict fwature in coremltools doesn't work with deep_features yet so we will ignore this specific test case unitl the same is written. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment
@NeerajKomuravalli - Your most recent changes look good, just one minor comment. I'm running your changes now on an internal testing system. If that all passes, I think we're ready to merge this change. |
The unit tests are failing on Linux. The We should skip the tests for that model if |
@TobyRoseman |
Ok, if we're already skipping those classes on Linux, then we just need to skip extracting the deep features on Linux (since that is done independently of the test classes). I've just added comments in the code about that. |
… if its a Linux system or any other OS
Hi @TobyRoseman , I have made the requested changes, let me know if there is anything else. |
Thanks @NeerajKomuravalli. Those changes look good. I've just kicked off another internal test run. I'll let you know the results. |
from turicreate.toolkits._main import ToolkitError as _ToolkitError | ||
from turicreate.toolkits.image_analysis.image_analysis import MODEL_TO_FEATURE_SIZE_MAPPING, get_deep_features | ||
|
||
import coremltools |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is no longer getting used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately due to the (brittle) way that we test the minimal version of TuriCreate, we are going to need to remove this unnecessary line. coremltools
is not a dependency for the minimal version of our TuriCreate. So having it imported at a top level break our unit tests even though the tests in this file are not ran for the minimal version.
I believe removing this line should be the final required change for this pull request. All of our other internal tests are passing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, will remove the import coremltools
line and push changes
Made the required changes! |
Thanks @NeerajKomuravalli - I'm rerunning the internal tests now. |
Our unit tests for the minimal version of our package is still failing with this change. When our minimal package is installed we can't call |
Makes sense, I shifted the |
@@ -84,19 +88,24 @@ class ImageClassifierTest(unittest.TestCase): | |||
def setUpClass( | |||
self, | |||
model="resnet-50", | |||
feature="resnet-50_deep_features", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this right. If the class name doesn't end in WithDeepFeature
, then I think it should be feature="awesome_image"
. This code is only calling get_deep_features
when self.feature != "awesome_image"
.
I think you need to make this change to all of your test classes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed the deep feature column name in test data to the one suggested by you and made all the requested changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you misunderstood my previous comment. I was not suggesting you need to change the name of the feature. I was basically just trying to say that it's important for the name of the class to represent what that class is actually testing.
For example, if the name of the class is ImageClassifierResnetTestWithDeepFeatures
, then it should be feature="resnet-50_WithDeepFeature"
or feature="resnet-50_deep_features",
but not feature="awesome_image",
. Similarly, if the name of the class is VisionFeaturePrintSceneTest
then it should not be creating the model from deep features, so it should be feature="awesome_image",
not feature="VisionFeaturePrint_Scene_WithDeepFeature"
.
Does that make sense? Let me know if you have any questions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yeah, you are right I did not see it there. It was correct before but during one of my recent pushes I changed it to test something and forgot to revert it back. My bad, will change it and push it.
And about changing the name of feature columns in the test data in test_image_classifier.py
and test_image_similarity.py
to have a suffix _WithDeepFeature
instead of _deep_features
makes more sense to me. So I am keeping that change as it is.
…ed the name of the same to WithDeepFeatures
…intain consistency
I made the required changes. Let me know if you need anything else. |
@NeerajKomuravalli - Those changes look good. Thank you. I'm rerunning our internal tests now. |
Internal tests now pass. @NeerajKomuravalli - thanks so much for all your work! I think this is great feature. It will be included in our next release. |
That's a great news @TobyRoseman ! It was my first contribution to an open source project so I am really excited. It was great working on this, let me know if I can help contribute more! Thanks! |
@TobyRoseman, please have a look and let me know what you think.
Thank you!