Easy to use environment with `jupyter notebook` and apache spark.

Instead of opening jupyter notebook, just copy the preapareSparkEnvironment.sh script into your prefered directory and inside it run:

    $ source preapareSparkEnvironment.sh

Run directly from github:

    $ source <(curl -s https://raw.githubusercontent.com/boyander/runpyspark/master/prepareSparkEnvironment.sh)

It prepare your shell with pyspark configured to use jupyter notebook. After env is ready run pyspark it will open a jupyter notebook.

IMPORTANT: Before running the preapareSparkEnvironment.sh script, ensure you have followed the checklist for your OS.

Checklist for Ubuntu

Install spark following this medium post
Create an alias of spark in your home directory or rename the installation to just "spark"

    $ ln -s ~/spark-2.4.0-bin-hadoop2.7 ~/spark

Ensure you have spark-shell in your $PATH variable (Note: this suposes you are running zsh or oh-my-zsh terminal, if that's not the case or you are not sure, just change .zshrc to .bashrc in the following command).

    $ echo "export PATH=\"\$PATH:$HOME/spark/bin\"" >> ~/.zshrc
    $ source ~/.zshrc

To check it works, you must be able to run spark-shell from your terminal.

Checklist for MacOSX

You need brew installed, install the following packages:

brew install jq apache-spark

Important Notes

This script uses python3, ensure python3 is installed and running in your terminal.
When creating a jupyter notebook, ensure you've choosed python 3 kernel, otherwise it will not work.
There's also a notebook PysparkDemo.ipynb to test apache spark worked.
In case you've created multiple spark contexts, run $ killall java to stop all apache spark instances.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
.gitignore		.gitignore
PySparkDemo.ipynb		PySparkDemo.ipynb
README.md		README.md
prepareSparkEnvironment.sh		prepareSparkEnvironment.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.gitignore

.gitignore

PySparkDemo.ipynb

PySparkDemo.ipynb

README.md

README.md

prepareSparkEnvironment.sh

prepareSparkEnvironment.sh

Repository files navigation

Easy to use environment with `jupyter notebook` and apache spark.

Checklist for Ubuntu

Checklist for MacOSX

Important Notes

About

Releases

Packages

Languages

boyander/runpyspark

Folders and files

Latest commit

History

Repository files navigation

Easy to use environment with jupyter notebook and apache spark.

Checklist for Ubuntu

Checklist for MacOSX

Important Notes

About

Resources

Stars

Watchers

Forks

Languages

Easy to use environment with `jupyter notebook` and apache spark.