Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Remove CNTK functionality and replace with ONNX #1593

Merged
merged 65 commits into from Nov 11, 2022

Conversation

svotaw
Copy link
Collaborator

@svotaw svotaw commented Aug 2, 2022

What changes are proposed in this pull request?

Currently the ImageFeaturizer uses CNTK models. This PR replaces this underlying dependency with ONNX models, effectively removing usage of CNTK from the library.

Associated changes:
Create ONNXHub for referring to predefined SynapseML models in the cloud
Directly use ONNX-ML protobuf classes for splitting models at internal nodes

How is this patch tested?

TODO

  • I have written tests (not required for typo or doc fix) and confirmed the proposed feature/bug-fix/change works.

Does this PR change any dependencies?

TODO

  • No. You can skip this section.
  • Yes. Make sure the dependencies are resolved correctly, and list changes here.

Does this PR add a new feature? If so, have you added samples on website?

TODO

  • No. You can skip this section.
  • Yes. Make sure you have added samples following below steps.

TODO

  1. Find the corresponding markdown file for your new feature in website/docs/documentation folder.
    Make sure you choose the correct class estimators/transformers and namespace.
  2. Follow the pattern in markdown file and add another section for your new API, including pyspark, scala (and .NET potentially) samples.
  3. Make sure the DocTable points to correct API link.
  4. Navigate to website folder, and run yarn run start to make sure the website renders correctly.
  5. Don't forget to add <!--pytest-codeblocks:cont--> before each python code blocks to enable auto-tests for python samples.
  6. Make sure the WebsiteSamplesTests job pass in the pipeline.

AB#1910261

@github-actions
Copy link

github-actions bot commented Aug 2, 2022

Hey @svotaw 👋!
Thank you so much for contributing to our repository 🙌.
Someone from SynapseML Team will be reviewing this pull request soon.
We appreciate your patience and contributions 💯!

Copy link
Collaborator

@mhamilton723 mhamilton723 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow so awesome!

  1. We might want to also deprecate old CNTK stack, and the old Model Downloader
  2. We should replace the instances of these in E2E Tests and notebooks
  3. We should also now see if theres a way to contribute java bindings back to onnx team so we dont have to carry them around and update them. I'll kick off a thread with ONNX team

@svotaw
Copy link
Collaborator Author

svotaw commented Aug 2, 2022

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@codecov-commenter
Copy link

codecov-commenter commented Aug 2, 2022

Codecov Report

Merging #1593 (f82069e) into master (abdfe19) will increase coverage by 1.03%.
The diff coverage is 77.55%.

@@            Coverage Diff             @@
##           master    #1593      +/-   ##
==========================================
+ Coverage   83.99%   85.02%   +1.03%     
==========================================
  Files         272      277       +5     
  Lines       14224    14495     +271     
  Branches      732      777      +45     
==========================================
+ Hits        11948    12325     +377     
+ Misses       2276     2170     -106     
Impacted Files Coverage Δ
...om/microsoft/azure/synapse/ml/onnx/ONNXUtils.scala 54.22% <54.22%> (ø)
.../microsoft/azure/synapse/ml/onnx/ONNXRuntime.scala 73.52% <73.52%> (ø)
.../com/microsoft/azure/synapse/ml/onnx/ONNXHub.scala 83.01% <83.01%> (ø)
...om/microsoft/azure/synapse/ml/onnx/ONNXModel.scala 72.48% <83.33%> (-6.53%) ⬇️
...oft/azure/synapse/ml/opencv/ImageTransformer.scala 96.45% <93.93%> (-0.05%) ⬇️
...rosoft/azure/synapse/ml/cntk/ImageFeaturizer.scala 95.37% <94.94%> (+6.90%) ⬆️
...oft/azure/synapse/ml/codegen/GenerationUtils.scala 79.16% <100.00%> (ø)
.../microsoft/azure/synapse/ml/codegen/RCodegen.scala 90.32% <100.00%> (ø)
...soft/azure/synapse/ml/core/env/FileUtilities.scala 79.41% <100.00%> (+0.62%) ⬆️
...osoft/azure/synapse/ml/core/env/PackageUtils.scala 100.00% <100.00%> (ø)
... and 19 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

Copy link
Contributor

@memoryz memoryz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job Scott! Thank you for the revamp!

@svotaw
Copy link
Collaborator Author

svotaw commented Aug 4, 2022

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@svotaw
Copy link
Collaborator Author

svotaw commented Aug 4, 2022

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Collaborator

@mhamilton723 mhamilton723 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Scott thanks so much for iterating on this, I think the big things still left are

  1. Deprecation of CNTK Stuff
  2. Replacing CNTK in Demos where possible (its okay if we need to drop the RNN notebook)
  3. Moving the spruced up notebooks to the deep-learning category instead of other!

@svotaw
Copy link
Collaborator Author

svotaw commented Aug 5, 2022

@mhamilton723

Scott thanks so much for iterating on this, I think the big things still left are

  1. Deprecation of CNTK Stuff
  2. Replacing CNTK in Demos where possible (its okay if we need to drop the RNN notebook)
  3. Moving the spruced up notebooks to the deep-learning category instead of other!

Can we do this in a separate PR? I think it will be easier to separate doc/deprecation/namespace moves into a PR right after this.

@svotaw
Copy link
Collaborator Author

svotaw commented Aug 5, 2022

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

// cannot be used in python. Use simple ImageFeaturizer.
override def pyTestObjects(): Seq[TestObject[ImageFeaturizer]] = {
val testObject: ImageFeaturizer = new ImageFeaturizer()
.setInputCol(inputCol)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this default to using the model string API? or is the model not set here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we set the model we get an error due to the ComplexParam. Is there a known solution for this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note that setting the model as a string (the onnxhub name) cause the model bytes to be set under the covers

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can implement the APIs for pythron wrapping that complex param in the tests. Shoulkd be something like extending PyParamWrappable or something like that. Dont worry if you want to leave this for later

Copy link
Collaborator

@mhamilton723 mhamilton723 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lovely work!

@svotaw
Copy link
Collaborator Author

svotaw commented Nov 5, 2022

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@svotaw
Copy link
Collaborator Author

svotaw commented Nov 8, 2022

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Contributor

@memoryz memoryz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good. The only main comment is on the recursion function - it should be possible to convert that into tailrec.

@svotaw
Copy link
Collaborator Author

svotaw commented Nov 9, 2022

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Contributor

@memoryz memoryz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thank you for fixing the recursion code!

@mhamilton723 mhamilton723 enabled auto-merge (squash) November 10, 2022 22:01
@mhamilton723
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mhamilton723 mhamilton723 merged commit f2e88fd into microsoft:master Nov 11, 2022
mhamilton723 added a commit that referenced this pull request Dec 1, 2022
* docs: fix command to launch jupyter notebook (#1649)

* docs: add aisample uplift modelling (#1640)

* add uplift model sample

* update uplift model notebook

* Update notebooks/community/aisamples/AIsample - Uplift Modelling.ipynb

Co-authored-by: weitian <weitian@microsoft.com>
Co-authored-by: Mark Hamilton <mhamilton723@gmail.com>

* chore: publish test jars

* docs: improve example notebooks

* docs: remove unused docs and fix links

* docs: move magic command forward since it restarts interpreter

* chore: added `synapse-internal` to platform detector function (#1651)

* chore: added trident platform

* chore: update naming

* Update core/src/main/python/synapse/ml/core/platform/Platform.py

* chore: update detection rules

* chore: remove redundant check

* docs: apply black formatter

Co-authored-by: Li Jiang <lijiang1@microsoft.com>
Co-authored-by: Mark Hamilton <mhamilton723@gmail.com>

* fix: KernelSHAP throws error when the key type in the ZipMap output is LongType (#1656)

Signed-off-by: Jason Wang <jasowang@microsoft.com>

Signed-off-by: Jason Wang <jasowang@microsoft.com>

* chore: turn off failing synapse tests temporarily (#1658)

* docs: simplify data downloading and add mlflow to uplift modelling (#1659)

* chore: simplify data downloading section

update save dataframe with delta format

* docs: add mlflow logging and loading

* Update notebooks/community/aisamples/AIsample - Uplift Modelling.ipynb

Co-authored-by: Li Jiang <lijiang1@microsoft.com>

* fix: fix Uplift Modelling style

* docs: improve error msg to make it clearer for users and fix typos (#1662)

* docs: improve error msg to make it clearer for users.

* fix: fix typos

Co-authored-by: Li Jiang <lijiang1@microsoft.com>

* docs: fix multiple typos and update error hintings in ai-samples-timeseries notebook (#1663)

Co-authored-by: Serena Ruan <82044803+serena-ruan@users.noreply.github.com>

* chore: clean up github workflows and add issue label remover (#1674)

* chore: re open github issues after a comment (#1676)

* chore: fix typo in issue reopen yaml

* chore: fix reopen on comment workflow

* chore: fix reopen comment action

* Update reopen-issue-on-comment.yml

* build: bump amannn/action-semantic-pull-request from 4 to 5.0.1 (#1680)

Bumps [amannn/action-semantic-pull-request](https://github.com/amannn/action-semantic-pull-request) from 4 to 5.0.1.
- [Release notes](https://github.com/amannn/action-semantic-pull-request/releases)
- [Changelog](https://github.com/amannn/action-semantic-pull-request/blob/main/CHANGELOG.md)
- [Commits](amannn/action-semantic-pull-request@v4...v5.0.1)

---
updated-dependencies:
- dependency-name: amannn/action-semantic-pull-request
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* fix: fix linked service on cog service base (#1685)

* fix: fix pyarrow failure in deeplearning test (#1689)

* build: bump amannn/action-semantic-pull-request from 5.0.1 to 5.0.2 (#1688)

Bumps [amannn/action-semantic-pull-request](https://github.com/amannn/action-semantic-pull-request) from 5.0.1 to 5.0.2.
- [Release notes](https://github.com/amannn/action-semantic-pull-request/releases)
- [Changelog](https://github.com/amannn/action-semantic-pull-request/blob/main/CHANGELOG.md)
- [Commits](amannn/action-semantic-pull-request@v5.0.1...v5.0.2)

---
updated-dependencies:
- dependency-name: amannn/action-semantic-pull-request
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Mark Hamilton <mhamilton723@gmail.com>

* docs: update developer readme instruction on python env creation (#1693)

* update developer readme instruction on python env creation

* Update developer-readme.md

* Update developer-readme.md

* fix: don't throw on invalid columns in DropColumns (#1695)

* remove unused imports

* batch prompts

* don't throw for invalid columns in DropColumn

* remove now unused verify method from DropColumns

* Remove trident.mlflow APIs. (#1687)

Co-authored-by: Li Jiang <lijiang1@microsoft.com>

* using predictionCol for isolation forest (#1686)

fixing issue #1060

Co-authored-by: Mark Hamilton <mhamilton723@gmail.com>

* fix: update isolation forest notebook (#1696)

* fix: remove synapse E2E testing exclusion - cyber ml  (#1699)

* remove CyberML exclusion from Synapse test pipeline

* black style

Co-authored-by: Jessica Wang <jessiwang@microsoft.com>

* chore: remove notebooks (#1703)

* chore: fix ado integration (#1704)

* chore: pin az and python versions (#1705)

* chore: remove synapse test exclusions (#1698)

* chore: update scalatest and scalactic (#1706)

* remove unused imports

* batch prompts

* don't throw for invalid columns in DropColumn

* remove now unused verify method from DropColumns

* update scalatest and scalactic

* strip unicode out of transliteration result

* clean transliterate test

* override assertDFEq in TransliterateSuite to strip out zero-width chars

* fix: remove Vowpal Wabbit exclusion, add Interpretability exclusion (#1708)

* remove Vowpal Wabbit exclusion, add Interpretability exclusion

* exclude all Interpretability Notebooks from Synapse E2E

Co-authored-by: Jessica Wang <jessiwang@microsoft.com>
Co-authored-by: Mark Hamilton <mhamilton723@gmail.com>

* feat: Remove CNTK functionality and replace with ONNX (#1593)

* Switch usage of CNTK to use ONNX

* style fix

* small test fixes

* resolved comments

* some test fixes

* test fixes and comment responses

* more test fixes

* fix dbx tests

* fix notebooks

* misc small fixes

* Switch usage of CNTK to use ONNX

* style fix

* small test fixes

* resolved comments

* some test fixes

* test fixes and comment responses

* more test fixes

* fix dbx tests

* fix notebooks

* misc small fixes

* test fix

* dotnet fix

* add protobuf to sbt

* notebook style fixes

* add protobuf to databricks

* revert dependency

* dependency override

* add shading

* more shading

* add maven shade plugin

* add shade to subproject

* switch to using separate onnx dependency

* increase onnx version

* add ignoreDecodingErrors

* fix decoding errors param

* fix style error

* changed to sonatype package

* add snapshots repo to tests - revert later

* downgrade package to Java 8

* more snapshot repo refs

* databricks onnx version fix

* add error comment

* centralize package constants

* add debugging for image name

* debugging image name

* add conversion of black and white

* style fix

* fix to b&w usage

* cherry pick python version fix

* throw if only 1 channel

* cli fix

* cleanup of review 1

* fix typo

* responded to comments

* remove explicit dependency

* responded to comments

Co-authored-by: Mark Hamilton <mhamilton723@gmail.com>

* build: bump loader-utils from 2.0.2 to 2.0.3 in /website (#1709)

Bumps [loader-utils](https://github.com/webpack/loader-utils) from 2.0.2 to 2.0.3.
- [Release notes](https://github.com/webpack/loader-utils/releases)
- [Changelog](https://github.com/webpack/loader-utils/blob/v2.0.3/CHANGELOG.md)
- [Commits](webpack/loader-utils@v2.0.2...v2.0.3)

---
updated-dependencies:
- dependency-name: loader-utils
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Mark Hamilton <mhamilton723@gmail.com>

* chore: ScalaStyle fixes (#1716)

* Style fixes

* Style fixes

* Style fixes in tests

* test style fixes

* make some warnings errors

* make some warnings errors

* make some warnings errors

* make some warnings errors

* fix vw

* fixes from review pass

* test: Improve ONNXtests reliability (#1713)

* Improve ONNXtests reliability

* fix constructor build

* make new defaults

* chore: Move new ImageFeaturizer to onnx namespace (#1711)

* feat: Deprecate CNTK objects (#1712)

* Deprecate CNTK objects

* remove CNTK-based notebooks

* ignore 1 failing test

* deprecate more util classes

* chore: add secret scanning infrastructure (#1724)

* feat: Add SpeakerEmotionInference transformer for generating SSML t… (#1691)

* feat: Add TextToSpeechSSMLGenerator transformer for generating SSML to augment TTS requestsfeat: Add TextToSpeechSSMLGenerator transformer for generating SSML to augment TTS requests.  Adding support for SSML to the TTS endpoint.

* Fixing style issues

* More style issues

* ok it finally passes scalastyle

Co-authored-by: Brendan Walsh <brwals@outlook.com>

* test: Additional E2E testing infrastructure (#1727)

* Addition E2E testing infrastructure

* Ignore cleanup test

* Style fix

* Update platform.py

Co-authored-by: Mark Hamilton <mhamilton723@gmail.com>

* chore: Making secrets optional and cached (#1726)

* Making secrets optional and cached

* do not cache normal secrets

* set publishing for dotnet

* more dotnet pipeline edits

* testing dotnet no env vars

* enable publish only

Co-authored-by: Mark Hamilton <mhamilton723@gmail.com>

* chore: autodelete old models (#1729)

* autodelete old models

* move cleanup calls to afterAll

* fix style

Co-authored-by: Mark Hamilton <mhamilton723@gmail.com>

* clarify date comparisons when deleting old models/groups (#1733)

* chore: automate clean-acr with github action workflow (#1735)

* hello workflow

* make pre-commit executable

* clean acr

* gh-set-secret

* pat->github

* echo

* env

* Create gh-set-secret.yml

* Update gh-set-secret.yml

* Update gh-set-secret.yml

* Create manual.yml

* Update manual.yml

* Update manual.yml

* Update manual.yml

* Update manual.yml

* use azurecli for keyvault access

* remove pip cache

* remove columns

* fix indent

* fix env var name

* split off script file

* azurecli@v1

* shorten path

* lengthen path

* add query option to az cmd

* re-indent

* re-indent again

* echo

* print

* test maniehtestkv

* back to azure kv task

* back to mmlspark-keys

* quot arg

* typo

* use popen for pipeline-run

* run through deletions in whatif mode

* print result code

* delete result code

* format changes and check result of transfer

* remove tqdn

* remove tqdn

* restore actual deletion

* formatize prints

* delete cruft

* switch from manual to cron

* sundays at 1am

* chmod pre-commit

Co-authored-by: Mark Hamilton <mhamilton723@gmail.com>

* feat: add simple deep learning text classifier (#1591)

* refactor deep vision model params

* add Text Classifier and tests

* update text classifier

* add default values for transformation edit fields and removed fields

* add deep text classification notebook

* update hovorod installation script

* update environment

* update env

* add installing packages on dbx

* fix python environment

* add more tests

* skip deep text model test

* address comments

* add learning_rate param

* fix notebook style

* fix _find_num_proc

* update newtonsoft.json version in dotnet to resolve security issue

* fix missing learning rate param

* fix dataframe partition error and strange read output

* fix failing test

* fix param updates

* update notebook to make it run faster

* make train data size smaller

* update models

* remove minor version of python to avoid warnings in pipeline

* ignore dl notebooks in Synapse tests

* update mardown name

* chore: fix style (#1736)

* build: bump actions/checkout from 2 to 3 (#1737)

Bumps [actions/checkout](https://github.com/actions/checkout) from 2 to 3.
- [Release notes](https://github.com/actions/checkout/releases)
- [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md)
- [Commits](actions/checkout@v2...v3)

---
updated-dependencies:
- dependency-name: actions/checkout
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* chore: bump version to 0.10.2 (#1738)

* docs: removing beta tag from R

* build: bump loader-utils from 2.0.3 to 2.0.4 in /website (#1719)

Bumps [loader-utils](https://github.com/webpack/loader-utils) from 2.0.3 to 2.0.4.
- [Release notes](https://github.com/webpack/loader-utils/releases)
- [Changelog](https://github.com/webpack/loader-utils/blob/v2.0.4/CHANGELOG.md)
- [Commits](webpack/loader-utils@v2.0.3...v2.0.4)

---
updated-dependencies:
- dependency-name: loader-utils
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Mark Hamilton <mhamilton723@gmail.com>

* chore: bump docusaurus (#1740)

* chore: bump spark to 3.2.3 (#1744)

* downgrade spark 3.1

* remove exlusion of org.json4s

* fix json4s-ast version

* resolve dependency conflicts

* remove notebook

* fix environment

* fix onnxhub load method and add error message in platform.py

* update platform.py

Signed-off-by: Jason Wang <jasowang@microsoft.com>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: elswork <elswork@gmail.com>
Co-authored-by: Tian Wei <12080578+whiskyboy@users.noreply.github.com>
Co-authored-by: weitian <weitian@microsoft.com>
Co-authored-by: Mark Hamilton <mhamilton723@gmail.com>
Co-authored-by: lhrotk <lhrotk@gmail.com>
Co-authored-by: Li Jiang <bnujli@gmail.com>
Co-authored-by: Li Jiang <lijiang1@microsoft.com>
Co-authored-by: Jason Wang <jasowang@microsoft.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: niehaus59 <41398747+niehaus59@users.noreply.github.com>
Co-authored-by: Markus Cozowicz <marcozo@microsoft.com>
Co-authored-by: JessicaXYWang <108437381+JessicaXYWang@users.noreply.github.com>
Co-authored-by: Jessica Wang <jessiwang@microsoft.com>
Co-authored-by: Scott Votaw <svotaw@gmail.com>
Co-authored-by: Brendan Walsh <37676373+BrendanWalsh@users.noreply.github.com>
Co-authored-by: Brendan Walsh <brwals@outlook.com>
Co-authored-by: Kyle Rush <kyrush@microsoft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants