## Method 1 (Docker)
We will be using:
- Docker for Windows 10 with WSL2 Backend (https://www.docker.com/)
- AWS Glue container (https://hub.docker.com/r/amazon/aws-glue-libs)

Steps:  
0. (Pre-Req) Install WSL2.
1. Download Docker and install.
2. Set WSL2 as backend and restart.
3. Launch WSL2 and run:
```bash
docker pull amazon/aws-glue-libs:glue_libs_1.0.0_image_01
``` 
    - tag=`glue_libs_1.0.0_image_01` is the latest as of 2021
4. Run and install the container:
```bash
docker run -itd -p 8888:8888 -p 4040:4040 -v %UserProfile%\.aws:/root/.aws:rw -v C:\Users\YOUR_USERNAME\Documents\GitHub:/home/jupyter/jupyter_default_dir --name glue_jupyter amazon/aws-glue-libs:glue_libs_1.0.0_image_01 /home/jupyter/jupyter_start.sh
```
    - `p` specifies the port (i.e local development will be at `http://localhost:8888` or `http://localhost:4040`
    - `-v` specifies the directory for your files
    - `--name` specifies the container name (though the container ID will be different)
5. Check to see the container is running with `docker ps`
6. Launch Jupyter Notebook with your browser and open a `PySpark` kernel.

## Method 2 (Preferred)
We will be using:
- Ubuntu 20.04 (WSL2) or MacOS.

Steps:  
0. (Pre-Req) Install WSL2 for Windows 10 users. MacOS users, please ensure your terminal is set to `bash`.
1. Setup your Python environment (i.e `pip3 install notebook pandas numpy ...`)
2. Install `Java` and `PySpark`:  
- Linux
```bash
# install java
sudo apt install openjdk-8-jdk -y
# add to path
echo 'JAVA_HOME="/usr/lib/jvm/java-8-openjdk-amd64"' | sudo tee -a /etc/environment
# apply to environment
source /etc/environment
# install spark
pip3 install pyspark
```
- MacOS
```bash
# install java 8 and link to system java wrapper
brew install openjdk@8
sudo ln -sfn /usr/local/opt/openjdk@8/libexec/openjdk.jdk /Library/Java/JavaVirtualMachines/openjdk-8.jdk
# add to path (earlier OSX defaults to bash while newer ones defaults to zsh)
echo 'export JAVA_HOME="$(/usr/libexec/java_home -v1.8)"' | tee -a $HOME/.bashrc $HOME/.zshrc
# reload java path
source $HOME/.bashrc ; source $HOME/.zshrc
# install spark. Note: if you are using anaconda/conda environments, you need to make sure the pip3 is the correct pip3! Or you should install with conda directly!
pip3 install pyspark
```

- MacOS with M1 Chips may need to follow this guide for Java JDK:
https://code2care.org/q/install-native-java-jdk-jre-on-apple-silicon-m1-mac
    
3. Launch Jupyter Notebook.
4. Start a Spark session:  

```python
from pyspark.sql import SparkSession

spark = SparkSession.builder.getOrCreate()
```


If you can run the following code, then it works!

```python
from pyspark.sql import SparkSession

# Create a spark session (which will run spark jobs)
spark = SparkSession.builder.getOrCreate()
```

In [None]:
from pyspark.sql import SparkSession

# Create a spark session (which will run spark jobs)
spark = SparkSession.builder.getOrCreate()

# set some configs - you'll learn about them later on
spark.conf.set('spark.sql.repl.eagerEval.enabled', True)

sdf = spark.read.csv('../data/sample.csv', header=True)

sdf