Skip to content
No description, website, or topics provided.
Jupyter Notebook Shell
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
.gitignore
PySparkDemo.ipynb
README.md
prepareSparkEnvironment.sh

README.md

Easy to use environment with jupyter notebook and apache spark.

Instead of opening jupyter notebook, just copy the preapareSparkEnvironment.sh script into your prefered directory and inside it run:

    $ source preapareSparkEnvironment.sh

Run directly from github:

    $ source <(curl -s https://raw.githubusercontent.com/boyander/runpyspark/master/prepareSparkEnvironment.sh)

It prepare your shell with pyspark configured to use jupyter notebook. After env is ready run pyspark it will open a jupyter notebook.

IMPORTANT: Before running the preapareSparkEnvironment.sh script, ensure you have followed the checklist for your OS.

Checklist for Ubuntu

  • Install spark following this medium post
  • Create an alias of spark in your home directory or rename the installation to just "spark"
    $ ln -s ~/spark-2.4.0-bin-hadoop2.7 ~/spark
  • Ensure you have spark-shell in your $PATH variable (Note: this suposes you are running zsh or oh-my-zsh terminal, if that's not the case or you are not sure, just change .zshrc to .bashrc in the following command).
    $ echo "export PATH=\"\$PATH:$HOME/spark/bin\"" >> ~/.zshrc
    $ source ~/.zshrc

To check it works, you must be able to run spark-shell from your terminal.

Checklist for MacOSX

  • You need brew installed
  • brew install jq
  • brew install spark-shell

Important Notes

  • This script uses python3, ensure python3 is installed and running in your terminal.
  • When creating a jupyter notebook, ensure you've choosed python 3 kernel, otherwise it will not work.
  • There's also a notebook PysparkDemo.ipynb to test apache spark worked.
  • In case you've created multiple spark contexts, run $ killall java to stop all apache spark instances.
You can’t perform that action at this time.