Easy to use environment with
jupyter notebook and apache spark.
Instead of opening
jupyter notebook, just copy the
preapareSparkEnvironment.sh script into your prefered directory and inside it run:
$ source preapareSparkEnvironment.sh
Run directly from github:
$ source <(curl -s https://raw.githubusercontent.com/boyander/runpyspark/master/prepareSparkEnvironment.sh)
It prepare your shell with
pyspark configured to use
jupyter notebook. After env is ready run
pyspark it will open a jupyter notebook.
IMPORTANT: Before running the
preapareSparkEnvironment.sh script, ensure you have followed the checklist for your OS.
Checklist for Ubuntu
- Install spark following this medium post
- Create an alias of spark in your home directory or rename the installation to just "spark"
$ ln -s ~/spark-2.4.0-bin-hadoop2.7 ~/spark
- Ensure you have spark-shell in your $PATH variable (Note: this suposes you are running zsh or oh-my-zsh terminal, if that's not the case or you are not sure, just change
.bashrcin the following command).
$ echo "export PATH=\"\$PATH:$HOME/spark/bin\"" >> ~/.zshrc $ source ~/.zshrc
To check it works, you must be able to run
spark-shell from your terminal.
Checklist for MacOSX
- You need brew installed
- brew install jq
- brew install spark-shell
- This script uses
python3, ensure python3 is installed and running in your terminal.
- When creating a jupyter notebook, ensure you've choosed python 3 kernel, otherwise it will not work.
- There's also a notebook
PysparkDemo.ipynbto test apache spark worked.
- In case you've created multiple spark contexts, run
$ killall javato stop all apache spark instances.