Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added notebook for running Simple Model with large data #1368

Merged
merged 3 commits into from Jun 2, 2023

Conversation

hsubbaraj-spiral
Copy link
Contributor

@hsubbaraj-spiral hsubbaraj-spiral commented May 24, 2023

Describe your changes and why you are making these changes

This notebook takes a set of data (currently "LARGE_HOTEL_REVIEWS") and generates rows of data based on the parameter provided.

Ran this with a ~400GiB sized dataset and it ran in about 40 minutes including cluster spin up time. Most of this time is reading/writing the data to the views of the SparkSession and spin up/teardown of executors. This is expected behavior, but will try and understand more deeply what is happening under the hood.

Related issue number (if any)

Loom demo (if any)

Checklist before requesting a review

  • I have created a descriptive PR title. The PR title should complete the sentence "This PR...".
  • I have performed a self-review of my code.
  • I have included a small demo of the changes. For the UI, this would be a screenshot or a Loom video.
  • If this is a new feature, I have added unit tests and integration tests.
  • I have run the integration tests locally and they are passing.
  • I have run the linter script locally (See python3 scripts/run_linters.py -h for usage).
  • All features on the UI continue to work correctly.
  • Added one of the following CI labels:
    • run_integration_test: Runs integration tests
    • skip_integration_test: Skips integration tests (Should be used when changes are ONLY documentation/UI)

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@hsubbaraj-spiral hsubbaraj-spiral requested review from saurav-c and removed request for cw75 June 1, 2023 17:21
Copy link
Contributor

@saurav-c saurav-c left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good to add comment in the notebook itself about the size of the data we are fetching

@hsubbaraj-spiral hsubbaraj-spiral added the skip_integration_test Skips required integration test (only documentation/UI changes) label Jun 2, 2023
@hsubbaraj-spiral hsubbaraj-spiral merged commit 39437ad into main Jun 2, 2023
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
skip_integration_test Skips required integration test (only documentation/UI changes)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants