Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Make SynapseE2E Tests work now with Spark 3.2 #1362

Merged
merged 25 commits into from
Feb 1, 2022
Merged

fix: Make SynapseE2E Tests work now with Spark 3.2 #1362

merged 25 commits into from
Feb 1, 2022

Conversation

riserrad
Copy link
Contributor

@riserrad riserrad commented Jan 25, 2022

Problem

SynapseE2E tests likely stopped passing after two events:

  • it started building & testing SynapseML 0.9.5, given it only supports Spark 3.2.
    • The reason is that the Synapse Analytics Workspace it was using to run the Jobs only had Apache Spark Pools using Spark 3.1.
  • the notebooks it was using to submit the jobs were moved from the /notebooks folder to subfolders in /notebooks/features, and the current code did not support fetching files from folders recursively.

Changes

Given these issues, the following changes are presented in this PR:

  • FileUtilities.scala now presents a function to read files in a folder recursively. This function is later consumed by SynapseUtilities.scala;
  • In SynapseTests.scala:
    • Made the tests now submit jobs to the workspace that supports Spark 3.2 (private preview);
    • To make sure the tests succeed and run in a timely fashion, it is now using 5 pools instead of 3.
      • Currently, since it has 13 Jobs, it requires 3 batches to complete. If we want this to run even faster, we could add 2 more pools and make it run in 2 baches only.
  • In SynapseUtilities.scala:
    • Fixed the livyPayload to install synapseml_2.12 instead of synapseml;
    • Fixed the livyPayload to also exclude org.slf4j:slf4j-api when configuring the session;
    • Increased the test timeout from 20 to 30 minutes - made this change by observing the average time jobs were taking and 20 minutes seemed not to be enough. Empirically, 30 minutes worked well.

Additionally, as I was going through this, I noticed some opportunities for improvement in developer-readme:

  • Added some links to additional software it needs, and for a newcomer it might not be obvious to have installed (e.g., JDK 11, Miniconda);
  • Added additional links to guidance that might be useful for folks getting started, such as "Forking a repo" and "Working with remotes";
  • Added an additional step to prepare the Python/Conda environment
    • This one might be needed to run tests manually, run nbconvert and all that.

Results

After these changes were pushed to the fork, we were able to see consistent success in the Pipeline execution.

Opportunities

There are still opportunties for improvements here, but I'd rather have them in a separate PR/effort, such as:

  • Understand the dependency on Azure CLI when setting up the local development environment.
    • Some folks went through issues when setting it up, when not having the cli installed/configured.
  • Have a separate test result for each notebook run in SynapseE2E
    • Today, if any job fails in SynapseE2E, it's hard to identify which job(s) failed for troubleshooting.
    • We can leverage Scala's runtime test generation for that.
  • Manage the Desired State Configuration of the SynapseE2E Workspace in code
    • It is risky not to store the Desired State Configuration of this workspace our SynapseE2E tests use to run the tests, since if anyone makes a change to it (delete it, deletes a pool, etc), tests are going to break.

@riserrad
Copy link
Contributor Author

/azp run

@azure-pipelines
Copy link

Commenter does not have sufficient privileges for PR 1362 in repo microsoft/SynapseML

@serena-ruan
Copy link
Contributor

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@codecov-commenter
Copy link

codecov-commenter commented Jan 25, 2022

Codecov Report

Merging #1362 (ba2cc60) into master (865d189) will decrease coverage by 0.10%.
The diff coverage is 0.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1362      +/-   ##
==========================================
- Coverage   84.85%   84.75%   -0.11%     
==========================================
  Files         287      287              
  Lines       14234    14239       +5     
  Branches      728      728              
==========================================
- Hits        12078    12068      -10     
- Misses       2156     2171      +15     
Impacted Files Coverage Δ
...soft/azure/synapse/ml/core/env/FileUtilities.scala 63.63% <0.00%> (-11.37%) ⬇️
...ala/org/apache/spark/ml/param/DataFrameParam.scala 70.83% <0.00%> (-16.67%) ⬇️
...crosoft/azure/synapse/ml/io/http/HTTPClients.scala 76.66% <0.00%> (-13.34%) ⬇️
.../execution/streaming/continuous/HTTPSourceV2.scala 92.80% <0.00%> (+0.71%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 865d189...ba2cc60. Read the comment docs.

@serena-ruan
Copy link
Contributor

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@riserrad
Copy link
Contributor Author

/azp run

@azure-pipelines
Copy link

Commenter does not have sufficient privileges for PR 1362 in repo microsoft/SynapseML

@svotaw
Copy link
Collaborator

svotaw commented Jan 27, 2022

@svotaw is added to the review. #Closed

@mhamilton723
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@riserrad
Copy link
Contributor Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@riserrad
Copy link
Contributor Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@riserrad
Copy link
Contributor Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@serena-ruan serena-ruan changed the title Make SynapseE2E Tests work now with Spark 3.2 Fix: Make SynapseE2E Tests work now with Spark 3.2 Jan 28, 2022
@serena-ruan
Copy link
Contributor

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@serena-ruan serena-ruan changed the title Fix: Make SynapseE2E Tests work now with Spark 3.2 fix: Make SynapseE2E Tests work now with Spark 3.2 Jan 28, 2022
@riserrad
Copy link
Contributor Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@riserrad
Copy link
Contributor Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Collaborator

@mhamilton723 mhamilton723 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fantastique! Just minor comments

Comment on lines 88 to 105

test("listPythonFiles") {
val allPythonFiles = SynapseUtilities.listPythonFiles()

allPythonFiles.foreach(file => println(file))
}

test("listNoteBookFiles") {
val allPythonNotebooks = SynapseUtilities.listNoteBookFiles()

allPythonNotebooks.foreach(file => println(file))
}

test("listPythonJobFiles") {
val allPythonJobFiles = SynapseUtilities.listPythonJobFiles()

allPythonJobFiles.foreach(file => println(file))
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these tests that you would like to keep or something that you use for debugging? If the former we might want to add some sort of assert in here so that we can ensure they don't break. If the latter we might want to set them to "ignore" with a comment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is now fine to remove them. Thanks for catching this one!

"spark.jars.packages": "com.microsoft.azure:synapseml_2.12:0.9.5",
"spark.jars.packages": "com.microsoft.azure:synapseml_2.12:0.9.4",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this clarification was already added to website

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very true!

@riserrad
Copy link
Contributor Author

riserrad commented Feb 1, 2022

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mhamilton723 mhamilton723 merged commit 0840e31 into microsoft:master Feb 1, 2022
serena-ruan added a commit that referenced this pull request Feb 7, 2022
* Trying to use only pool with Spark 3.2

* Updating install instructions for synapse to use 0.9.4

* Changing syntax to grab ipynb files

* Line breaking to comply with styling

* Changing ipynb filter for windows

* Fixing string new line syntax

* Improvements to SynapseTests

* Adding more spark pools 3.2

* Adjusting list tests not to assert

* Improving dev doc, livyPayLoad

* Changing SynapseWS to mmlsparkppe

* Changing synapse URL to dogfood

* Removing dogfood from token acquisition

* Fixing exludes syntax

* Adding 2 more Apache Spark Pools

* Improving the developer docs

* Adjusting identation on developer-readme

* Bumping Synapse test timeout to 40 min

* Applying PR feedback

Co-authored-by: Serena Ruan <82044803+serena-ruan@users.noreply.github.com>
serena-ruan added a commit that referenced this pull request Feb 7, 2022
serena-ruan added a commit to serena-ruan/SynapseML that referenced this pull request Feb 7, 2022
* Trying to use only pool with Spark 3.2

* Updating install instructions for synapse to use 0.9.4

* Changing syntax to grab ipynb files

* Line breaking to comply with styling

* Changing ipynb filter for windows

* Fixing string new line syntax

* Improvements to SynapseTests

* Adding more spark pools 3.2

* Adjusting list tests not to assert

* Improving dev doc, livyPayLoad

* Changing SynapseWS to mmlsparkppe

* Changing synapse URL to dogfood

* Removing dogfood from token acquisition

* Fixing exludes syntax

* Adding 2 more Apache Spark Pools

* Improving the developer docs

* Adjusting identation on developer-readme

* Bumping Synapse test timeout to 40 min

* Applying PR feedback

Co-authored-by: Serena Ruan <82044803+serena-ruan@users.noreply.github.com>
serena-ruan added a commit that referenced this pull request Feb 12, 2022
* revert: revert changes of spark 3.2

* fix: change azure-ai-textanalytics dependency to shaded jar and rename namespace to make it compatible with spark 3.1

* allow branch spark3.1 to trigger pipeline

* fix shaded jar

* fix fasterxml by adding it ahead of coreDependencies

* fix io.netty issue

* fix io.netty issue

* fix databricks conflicts

* fix libraries syntax

* exclude io.netty:netty-tcnative-boringssl-static

* update adbRuntime

* exclude org.antlr while installing libraries on dbx clusters

* fix adbruntime

* fix adbruntime

* fix adb runtime

* fix adb submit job error

* ignore geospatialServices notebooks for adb because adb 9.1 runtime doesn"t support sending http requests to them

* fix: Make SynapseE2E Tests work now with Spark 3.2 (#1362)

* Trying to use only pool with Spark 3.2

* Updating install instructions for synapse to use 0.9.4

* Changing syntax to grab ipynb files

* Line breaking to comply with styling

* Changing ipynb filter for windows

* Fixing string new line syntax

* Improvements to SynapseTests

* Adding more spark pools 3.2

* Adjusting list tests not to assert

* Improving dev doc, livyPayLoad

* Changing SynapseWS to mmlsparkppe

* Changing synapse URL to dogfood

* Removing dogfood from token acquisition

* Fixing exludes syntax

* Adding 2 more Apache Spark Pools

* Improving the developer docs

* Adjusting identation on developer-readme

* Bumping Synapse test timeout to 40 min

* Applying PR feedback

Co-authored-by: Serena Ruan <82044803+serena-ruan@users.noreply.github.com>

* change to spark3.1 pools

* add more spark pools

* Show detailed response of livy

* Update url cuz spark3.1 is in prod already

* Update SynapseTests.scala

* Update SynapseUtilities.scala

* fix: remove concurrency parameter for MVAD (#1383)

* remove concurrency parameter for MVAD

* fix: fix node-fetch version security & error in MVAD sample

Co-authored-by: Mark Hamilton <mhamilton723@gmail.com>

* fix: expose response error out for better debugging if the error is returned by http directly (#1391)

* merge `turn synapse tests into multiple test`

Co-authored-by: Ric Serradas <riserrad@microsoft.com>
Co-authored-by: Mark Hamilton <mhamilton723@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants