## Creating AWS EC2 instance with Jupyter installed and access to Earthdata S3 buckets

The original tutorial is https://github.com/podaac/the-coding-club/blob/main/notebooks/EC2-Jupyter-Setup.md but I found that some modifications were needed for me to get it running for a general AMI and on my mac.

### Creating EC2 instance and SSH'ing

Instance is created with AMI (Amazon Machine Instance). An AMI is a template that contains the software configuration (operating system, application server, and applications) required to launch this instance.

**Create EC2**

1. Log into AWS, go to EC2 and got to Images -> AMI Catalog
2. Go to Quickstart AMIs and select one (e.g. an Amazon Linux AMI), then click `Launch Instance with AMI`.
3. Follow prompts for EC2 specifications.
4. Create a key pair (private key is a `.pem` file), or choose an existing public key where the private key is on your machine already.
>Example: I am using the public/private key pair dean_compute_tests/dean_compute_tests.pem for my downscaling tests.

**SSH into the instance**

1. Run the instance in the AWS console.
2. Place the private key `.pem` file in the `.ssh/` folder in your home directory, e.g. `~/.ssh/`. Modify the read/write permissions of the key, e.g. if the name of the key is `private_key.pem` then run
```
chmod 0400 private_key.pem
```
3. Add the following to the `config` file in the `.ssh/` folder:
```
Host <EC2_alias_of_choice>
    HostName                <instance_Private_IPv4_address>
    IdentityFile            ~/.ssh/<private_key_name.pem>
    User                    ec2-user
    LocalForward            xxxx localhost:xxxx
```
> where
> * EC2_alias_of_choice can be anything you want to refer to the instance by.
> * Private IPv4 address is like e.g. 100.104.xx.xx
> * private_key.pem is the private key file.
> * I had success with ec2-user rather than jpluser.
> * xxxx is the port number to tunnel through for Jupyter, e.g. 9881 (may want to make this number different than any other entry in the config file.

> e.g.:
```
Host dean_computetests_t2xlarge
    HostName                100.104.58.84
    IdentityFile            ~/.ssh/dean_compute_tests.pem
    User                    ec2-user
    LocalForward            9881 localhost:9881
```


4. SSH into the EC2 instance with `ssh <EC2_alias_of_choice>`

### Get Jupyter and access to Earthdata S3 buckets in the EC2 instance
The following should be done from within the EC2 instance (e.g. after SSH'ing in)

**Get Jupyter**

1. Update packages. Install wget, git, screen etc. Run 
```
sudo yum update -y && sudo yum install wget git screen -y
```
2. Download miniconda install script and execute it with bash. Go through setup prompts and enter yes when asked if conda init should be run.
```
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh

bash Miniconda3-latest-Linux-x86_64.sh
```
3. Re run the .bashrc file to add conda to `PATH` (doesn't need to be done for subsequent uses of the instance)
```
source ~/.bashrc
```
4. Create a new environment called jupyter running Python 3.7; activate it; install JupyterLab and other required packages.
```
conda create -n jupyter python=3.7 -y && \
  conda activate jupyter
  
conda install -c conda-forge requests tqdm -y

conda install -c conda-forge xarray h5netcdf h5py netCDF4 -y

conda install -c conda-forge jupyter -y

conda install -c conda-forge dask -y

conda install -c conda-forge s3fs -y
```
5. Use Python to generate and store a hashed password as a shell variable:
```
PW="$(python3 -c 'from notebook.auth import passwd; import getpass; print(passwd(getpass.getpass(), algorithm="sha256"))')"
```
6. Write the following into a bash script (e.g. `start_jupyter.sh`):
```
source ~/conda/bin/activate # Make conda accessible from any dir.

conda activate jupyter

fuser -k 9881/tcp # Make sure port 9881 is open

jupyter lab \
    --port=9881 \
    --ip='127.0.0.1' \
    --NotebookApp.token='' \
    --NotebookApp.password="$PW" \
    --notebook-dir="$HOME" \
    --no-browser \
    &
```
> where the port number, e.g `9881` in the above example, should be modified as needed.

7. Make the bash script executable, e.g.
```
chmod u+x start_jupyter.sh
```

8. Optionally, append the following line to your `.bash_profile` in order to print the running jupyter servers upon ssh login:
```
printf '\n~/conda/envs/jupyter/bin/jupyter server list && echo\n\n' >> .bash_profile
```



**Getting access to Earthdata S3 buckets in the EC2 instance**

1. Add the following to the `.netrc` file in the home directory of the EC2 instance (`~/.netrc`):
```
machine urs.earthdata.nasa.gov
    login <Earthdata username>
    password <Earthdata password>
```
where the `<Earthdata username>` and `<Earthdata password>` are your login credentials.

2. The python code utilizing the `sf3s` module (e.g. in the PO.DAAC cookbook and coding club example notebooks) can now be used to access S3 buckets.