feat: Support deep vision #1518

serena-ruan · 2022-06-02T09:33:13Z

Summary

Support common DNN models for deep vision classification.

Tests

Added unit test.

Dependency changes

Added requirements.txt file to explain dependencies for python package.

mhamilton723 · 2022-06-02T15:56:33Z

Giddy with excitement about this :)

;

…tabricks

core/src/test/scala/com/microsoft/azure/synapse/ml/nbtest/DatabricksUtilities.scala

mhamilton723

Looks awesome!! Can we have a call to walk through this so i can be better at reviewing?

core/src/test/scala/com/microsoft/azure/synapse/ml/nbtest/DeepLearningTests.scala

core/src/test/scala/com/microsoft/azure/synapse/ml/nbtest/GPUTests.scala

mhamilton723 · 2022-06-24T19:48:12Z

python/setup.py

+            version
+        ),
+        "Source Code": "https://github.com/Microsoft/SynapseML",
+    },


We should have a deeper discussion about packaging next week because im not quite sure how this file was needed

Yes, we should discuss about how we should release this. I create this setup.py file for this python folder as it doesn't have dependency on jars so we can release it directly, and it also has different environment requirements than our other pieces as it relies on horovod, pytorch, etc. So if this is an independent package, I didn't utilize our codegen system or the scala-side version generation, and we can just pack it and install with pip.

mhamilton723 · 2022-06-24T19:48:59Z

python/src/synapse/ml/dl/DeepVisionClassifier.py

+        "True/False, whether to use pretrained weights for the backbone model or not",
+    )
+
+    feature_extracting = Param(


might want to name this something active like extract_features, also im not sure what this does

In LitDeepVisionModel:

if feature_extracting: for p in self.model.parameters(): p.requires_grad = False

Should I also remove this param and always freeze all weights at the begining. And only free those layers users want to train explicitly?

mhamilton723 · 2022-06-24T19:49:38Z

python/src/synapse/ml/dl/DeepVisionClassifier.py

+        "string representation of optimizer function for the underlying pytorch_lightning model",
+    )
+
+    pretrained = Param(


i think we'll always have to used pretrained weights if we arent training full net

So you mean we just remove this parameter and fix it to True internally and do not let users train the full net?

mhamilton723 · 2022-06-25T23:16:15Z

notebooks/deepLearning/DeepLearning - Deep Vision Classifier.ipynb

@@ -0,0 +1,350 @@
+{


Some comments on this file (hard to leave at the right place so ill stick em in here)

You have some complex path and mounting stuff, consider using spark.readBinaryFiles, filtering out the content to just get the paths, and then modifying the name to point to mounted storage instead.

Some functions you define are camelcase, consider making these snake case because of python land

Is the _transform row function used?

in the definition of the udf you dont need to use a lambda, can justt pass func in directly

The comment "This is an imoportant parameter doesent tell enough to users, consider a brief explanation"

Is the dummy callback required?

Might want to make the deep vision classifier have a few more defaults so that we can keep init simple and clean for this demo

you do this thing where you have to set the estimator and the model to have different names for "features" any way tto make it so that the code here is re-used and works without setting on the model?

It looks like we need a argmax udf before we send it into the evaluator. Would other spark multiclass stuff need this too? If not we should move our API to align with other spark multiclass stuff if possible

your top of the cell looks alot like horovod install script, any way to just use this script directly?

1.2.4.6.7.10. Done.
3. Yes, it's used in DeepVisionClassifier's transformation_fn parameter, so that it transforms the image internally.
5. I'll remove this parameter and I haven't figured out how this param should be set tbh.. If I left it as default value in databricks it triggers errors.
8. That's because only estimator accepts transformation_fn so we can use path as input, and model reads the features directly so I use another udf to transform the path into features before inference.
9. I think it's needed as our model output contains an array of label probability instead of the real label prediction.

mhamilton723 · 2022-06-25T23:18:29Z

python/src/synapse/ml/dl/DeepVisionClassifier.py

+        "number of last layers to fine tune for the model, should be smaller or equal to 3",
+    )
+
+    num_classes = Param(Params._dummy(), "num_classes", "number of target classes")


Do these align with sparkML names? Not saying they dont right now, but just a good thing to double check

You're right, they use camelCase

But there're conflicts of this format as horovod uses snake_case... And we're extending their parameters, so should we just keep snake_case?

mhamilton723 · 2022-06-25T23:23:04Z

python/src/tests/dl/test_deep_vision_classifier.py

+            epochs=epochs,
+            validation=0.1,
+            verbose=1,
+            profiler=None,


You might want to apply the same similplifications from the notebook to this test, perhaps minus the removal of callbacks if you find those useful for debugging

Still applicable

mhamilton723 · 2022-06-25T23:25:55Z

python/version.py

+# Licensed under the MIT License. See LICENSE in project root for information.
+
+import re
+


I think we'll want to just add this code as extra files in the deep learning python code so that the codegen system naturally fits in with this. If the setup.py that gets generated is wrong in some way for this project, perhaps we should use this as a first example in codegen of a custom or overriding setup.py for a project so that future developers can benefit from this exploration.

serena-ruan · 2022-07-28T05:45:57Z

/azp run

azure-pipelines · 2022-07-28T05:46:12Z