Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add spark example to Sample Notebook #1003

Closed
wants to merge 4 commits into from

Conversation

kristin-kim
Copy link
Contributor

Add a simple spark example to existing sample Notebook of folder

@pull-request-size pull-request-size bot added the size/L Denotes a PR that changes 100-499 lines. label Mar 14, 2023
Copy link
Member

@NiloFreitas NiloFreitas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Kristin. Please take a look at my comments. Cheers

" .appName(\"Spark Session for Bike Sharing Data\") \\\n",
" .getOrCreate()\n",
"\n",
"path=\"gs://kristin_serverless_pyspark/bikesharingdemand-train.csv\"\n",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this csv too big? If it is small is better to have it under a resources folder and not hardcode a GCS path that may not exist in the future

"This notebook let users \n",
"* run a sample Spark Session with a sample CSV file\n",
"* verify if GCS buckets are properly mounted as a file system and\n",
"* execute Python files that are stored in mounted GCS buckets by !python and %run commands and\n"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to keep this part of the implementation? I mean the gcsfuse and running a python file from the notebook. I thought that you were going to create a separate example to demonstrate that, and keep this project running only PySpark

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kristin-kim If you can address the review questions, we can try to get this merged.

@agold-rh
Copy link
Contributor

Closed as stale. Please re-open if I'm wrong.

@agold-rh agold-rh closed this May 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size/L Denotes a PR that changes 100-499 lines.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants