![encodelogo](images/encodelogo.gif)

# Exploring ENCODE data from EC2 with Jupyter notebook

This notebook demonstrates how to mount *s3://encode-public* on an EC2 instance using Goofys, which makes an S3 bucket appear as a normal file system, and is useful for tools that expect a local file path. Once the bucket is mounted we can launch a Jupyter notebook on the instance and connect to it remotely. The benefit of using EC2 is that the compute is scalable to the analysis you would like to perform, and you don't have to download anything locally.

# Spin up instance

We will log into our AWS console and start an EC2 instance from a base Ubuntu image (it is also possible to find images that include most of the dependencies that we will install manually below).

![launch1](images/ec2_goofys_jupyter/launch1.png)

![launch2](images/ec2_goofys_jupyter/launch2.png)

![launch3](images/ec2_goofys_jupyter/launch3.png)

![launch4](images/ec2_goofys_jupyter/launch4.png)

![launch5](images/ec2_goofys_jupyter/launch5.png)

For this example we will use `t2.xlarge` instance size. Make sure to provide or create a key pair for your instance so we can SSH on later.

# SSH into the instance

Search for the instance you just created and find its public DNS.

![launch6](images/ec2_goofys_jupyter/launch6.png)

Open a terminal and connect to the instance using SSH, filling in your secret key and instance address:
```
$ ssh -i ~/.ssh/keenan.pem ubuntu@ec2-54-191-241-6.us-west-2.compute.amazonaws.com
```

# Install dependencies

We will install:

[Anaconda](https://www.anaconda.com/distribution/)
```
$ curl -O https://repo.anaconda.com/archive/Anaconda3-2019.03-Linux-x86_64.sh
$ bash Anaconda3-2019.03-Linux-x86_64.sh
$ source ~/.bashrc
$ conda create -n encode-public python=3.7
$ conda activate encode-public
```

[awscli](https://github.com/aws/aws-cli)
```
$ pip install awscli
```

[Jupyter notebook](https://jupyter.org/)
```
$ conda install jupyter
```

[Go](https://golang.org/)
```
$ sudo apt-get update
$ sudo apt-get install golang-go
```


[Goofys](https://github.com/kahing/goofys)
```
$ export GOPATH=$HOME/work
$ go get github.com/kahing/goofys
$ go install github.com/kahing/goofys
```

[Tree](http://manpages.ubuntu.com/manpages/trusty/man1/tree.1.html)
```
$ sudo apt-get install tree
```

# Mount S3 bucket

Goofys expects valid AWS credentials (though they don't need to have permission to do anything since we are mounting a public bucket). Run `aws configure` and enter your *aws_access_key_id*, *aws_secret_access_key*, and default region (e.g. `us-west-2`).

Mount *s3://encode-public* to local folder called *encode-public*:

```
$ mkdir encode-public
$ $GOPATH/bin/goofys encode-public/ encode-public/
```

# Start Jupyter notebook

Now we can run a Jupyter notebook on the EC2 instance and connect to it remotely.
```
$ jupyter notebook --no-browser --port=8888
```
Note the token in the returned URL (e.g. http://localhost:8888/?token=213b9a2799fe83807ab9e2e1254677ed3eb82cea9d05f452).

# Link local port to remote port

Open another terminal window and type (again filling in your details):

```
$ ssh -i ~/.ssh/keenan.pem -L 8000:localhost:8888 ubuntu@ec2-54-191-241-6.us-west-2.compute.amazonaws.com
```

This links your local 8000 port to the Jupyter notebook running on port 8888 of your EC2 instance. Launch a browser and type in `localhost:8000`. You should see a Jupyter window asking you for the token from above.

![launch5](images/ec2_goofys_jupyter/launch7.png)

# Create notebook

Create a new Jupyter notebook using Python 3.

![launch8](images/ec2_goofys_jupyter/launch8.png)

# Open ENCODE file using local path

In the notebook we can `ls` the *encode-project* folder to list the contents of the S3 bucket.

In [1]:
!ls encode-public/

2008  2010  2012  2014	2016  2018  encode_file_manifest.tsv
2009  2011  2013  2015	2017  2019  robots.txt
