Skip to content
No description, website, or topics provided.
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
basics
data initial local version - WIP Jun 6, 2018
environment Fix CVE Apr 2, 2019
notebooks added workaround for windows bug Jun 19, 2018
.gitignore repo cleanup and typos Jun 13, 2018
README.md typo in README Jun 18, 2018

README.md

Distributed Big Data Processing pySpark workshop

Local setup

NOTE: If you are using jupyter available in the cluster you can skip this setup. It is useful for people wanting to run workshop exercises locally.

  1. Install anaconda https://conda.io/docs/user-guide/install/index.html

  2. Create conda environment with packages from requirements file

> conda create --name pyspark_env --file environment/requirements.txt python=3.5

When prompted to install lots of pacakges click Enter to accept.

  1. Activate newly created conda environment
> source activate pyspark_env
  1. Run jupyter notebook and open workshop exercises
> jupyter notebook
You can’t perform that action at this time.