Distributed Big Data Processing pySpark workshop
NOTE: If you are using jupyter available in the cluster you can skip this setup. It is useful for people wanting to run workshop exercises locally.
Install anaconda https://conda.io/docs/user-guide/install/index.html
Create conda environment with packages from requirements file
> conda create --name pyspark_env --file environment/requirements.txt python=3.5
When prompted to install lots of pacakges click Enter to accept.
- Activate newly created conda environment
> source activate pyspark_env
- Run jupyter notebook and open workshop exercises
> jupyter notebook