# Running Jupyter from a Docker in EC2 instance






[Docker Reference: ](http://docs.aws.amazon.com/AmazonECS/latest/developerguide/docker-basics.html)



From the Docker website, Docker is the world’s leading software container platform. Developers use Docker to eliminate "works on my machine" problems when collaborating on code with co-workers. Operators use Docker to run and manage apps side-by-side in isolated containers to get better compute density. Enterprises use Docker to build agile software delivery pipelines to ship new features faster, more securely and with confidence for both Linux, Windows Server, and Linux-on-mainframe apps.

#### What is a Container?

Containers are a way to package software in a format that can run isolated on a shared operating system. Unlike VMs, containers do not bundle a full operating system - only libraries and settings required to make the software work are needed. This makes for efficient, lightweight, self-contained systems and guarantees that software will always run the same, regardless of where it’s deployed.


#### Docker For Developers

Docker automates the repetitive tasks of setting up and configuring development environments so that developers can focus on what matters: building great software.

Here in this notebook, we will set up a docker container inside an EC2 instance. It demonstrates launching a docker container which runs Jupyter. 

### Overview of the steps

* Launch an EC2 instance with the Amazon Linux AMI.
* SSH into the instance.
* Update the packages and package cache on the instance.
* Install additional required packages.
* Install Docker image on the instance which has Jupyter already installed in it.
* Open Jupyter in the docker.


In [None]:
################################### SET THE FOLLOWING PARAMETERS ###################################################
#Set the AWS Region
region = 'us-east-1'


ami_image = 'ami-8c1be5f6'

#Set the AWS Access ID (Given to you buy the DSA staff: "Access key ID")
access_id = 'Put your access key id here'

#Set the AWS Access Key (Given to you buy the DSA staff: "Secret access key")
access_key = 'Put your secret access key here'

#Security group name: Add your pawprint here
Sec_group_name= "Docker_your_pawprint_Sec_group"


We will be using Boto3 python package to use AWS services. 
Boto3 is the Amazon Web Services (AWS) Software Development Kit (SDK) for Python, 
which allows Python developers to write software that makes use of services like Amazon S3 and Amazon EC2. 
Boto3 has two distinct levels of APIs. Client (or "low-level") APIs provide one-to-one mappings to the underlying HTTP API operations. 
Resource APIs hide explicit network calls but instead provide resource objects and collections to access attributes and perform actions.


There is always a confusion between client and resource as to when to use what. 
You don't see the subtle difference when using a client or resource object. 
The resource API is still under development. So there would be more to offer in future through resource API.
Below readings might help you understand the difference between client and resource.



[Client](http://boto3.readthedocs.io/en/latest/guide/clients.html)

[Resource](http://boto3.readthedocs.io/en/latest/guide/resources.html)

### Create a EC2 client object




In [None]:
#Import AWS' Python Based DEVOPS tools
import boto3
from botocore.exceptions import ClientError

#Import System Tools
import collections
import json
import os
import datetime
import pandas
import time
import getpass
from subprocess import call

#Set the username from system
system_user_name=getpass.getuser()

# client interface.
# Estabilish Credentials/Session
ec2 = boto3.client(
    'ec2', 
    region_name=region,
    aws_access_key_id=access_id,
    aws_secret_access_key=access_key
)

We have created a security group in module 1 from the web console. We will create a similar security group. 

In [None]:
sg = ec2.create_security_group(
    Description='security grp for docker',
    GroupName=Sec_group_name   # We have set this variable above
)
Sec_group=sg["GroupId"]     # Sec_group should have the new security group ID.

Just like module 1 we have to SSH into the EC2 instance. So customize the security group to allow MU's TCP traffic and SSH requests. Configure the inbound rules to allow traffic as needed. 

In [None]:
#Modify Security Configuration to allow MU's IP addresses

#Describe Cluster

try:
    sec_rule="ALL TCP"
    data = ec2.authorize_security_group_ingress(
        GroupId=Sec_group,
        IpPermissions=[
            {'IpProtocol': 'tcp',
             'FromPort': 0,
             'ToPort': 65535,
             'IpRanges': [{'CidrIp': '0.0.0.0/0'}]},
        ],)
    print("Ingress "+sec_rule+" added")
except:
    print(sec_rule+" already added")
#     print(data)

try:
    sec_rule="ALL TCP"
    data = ec2.authorize_security_group_ingress(
        GroupId=Sec_group,
        IpPermissions=[
            {'IpProtocol': 'tcp',
             'FromPort': 0,
             'ToPort': 65535,
             'UserIdGroupPairs': [{ 'GroupId': Sec_group }]
#              'IpRanges': [{'CidrIp': Sec_group}]},
            }],
#         SourceSecurityGroup=Sec_group_name
    )
    print("Ingress "+sec_rule+" added")
except:
    print(sec_rule+" already added")

try:
    sec_rule="Custom ICMP Rule - IPv4"
    data = ec2.authorize_security_group_ingress(
        GroupId=Sec_group,
        IpPermissions=[
            {'IpProtocol': 'icmp',
             'FromPort': 0,
             'ToPort': -1,
             'IpRanges': [{'CidrIp': '173.31.192.195/32'}]},
        ])
    print("Ingress "+sec_rule+" added")
except:
    print(sec_rule+" already added")
#     print(data)

try:
    sec_rule="ALL UDP"
    data = ec2.authorize_security_group_ingress(
        GroupId=Sec_group,
        IpPermissions=[
            {'IpProtocol': 'udp',
             'FromPort': 0,
             'ToPort': 65535,
             'UserIdGroupPairs': [{ 'GroupId': Sec_group }]
            }],
    )
    print("Ingress "+sec_rule+" added")
except:
    print(sec_rule+" already added")
#     print(data)

    
try:
    sec_rule="ALL ICMP"
    data = ec2.authorize_security_group_ingress(
        GroupId=Sec_group,
        IpPermissions=[
            {'IpProtocol': 'icmp',
             'FromPort': -1,
             'ToPort': -1,
             'UserIdGroupPairs': [{ 'GroupId': Sec_group }]
            }],
    )
    print("Ingress "+sec_rule+" added")
except:
    print(sec_rule+" already added")

    
try:
    sec_rule="ALL ICMP"
    data = ec2.authorize_security_group_ingress(
        GroupId=Sec_group,
        IpPermissions=[
            {'IpProtocol': 'icmp',
             'FromPort': -1,
             'ToPort': -1,
             'IpRanges': [{'CidrIp': '0.0.0.0/16'}]
            }],
    )
    print("Ingress "+sec_rule+" added")
except:
    print(sec_rule+" already added")


Create a keypair for the EC2 instance. We first generate a name to create a key with that name and also store the key in a file. ec2.create_key_pair() will create a keypair. System command echo is used to write the contents of keypair generated to a file created with same name as keypair. 

You have to modify the file permissions to provide readonly access. If the file is open, system will throw an error. Do chmod(file, 0o400) 

In [None]:
import time 
import os

ec2_pem_file=time.strftime("EC2-%d%m%Y%H%M%S-"+system_user_name)
ec2_key=ec2.create_key_pair(KeyName=ec2_pem_file)

#Don't do this unless you have a good reason
#print(emr_key['KeyMaterial'])

os.system("echo \""+ec2_key['KeyMaterial']+"\" > "+ec2_pem_file+".pem")
os.chmod(ec2_pem_file+".pem",0o400)

print("KeyName         : "+ec2_key['KeyName']+"\nKey Fingerprint : "+ec2_key['KeyFingerprint'])

Launch an instance using the keypair and the security group created above. We only need one instance to run

**MaxCount:** The maximum number of instances to launch. If you specify more instances than Amazon EC2 can launch in the target Availability Zone, Amazon EC2 launches the largest possible number of instances above MinCount .


**MinCount:** The minimum number of instances to launch. If you specify a minimum that is more instances than Amazon EC2 can launch in the target Availability Zone, Amazon EC2 launches no instances.

Constraints: Between 1 and the maximum number you're allowed for the specified instance type.

In Tags, we are giving a name tag to the isntance to identify it by the name `Docker_Jupyter`. 

In [None]:
# Create Instance
instances = ec2.run_instances(
    ImageId=ami_image,
    MinCount=1,
    MaxCount=1,
    KeyName=ec2_pem_file,
    TagSpecifications=[
        {
            'ResourceType': 'instance',
            'Tags': [
                        {   'Key': 'Name',
                            'Value': 'Docker_Jupyter'
                        }
                    ]
        }
    ],
    InstanceType="t2.micro",
    SecurityGroupIds=[
        Sec_group
    ],
)

Get the instance id of new instance. The output in the variable "instances" has details of instances created in above cell. Its a dictionary. 

- In the below cell, `"Instances"` in `instances["Instances"]` is the key. It will give corresponding value associated with the key. So we have the instance details now.
- We know we created only 1 instance. We mentioned that with Mincount and MaxCount set to 1. So access the details of that instance using the index 0. 
- Finally for the one instance created, get the InstanceId into new_instance_id variable

In [None]:
new_instance_id = instances["Instances"][0]["InstanceId"]

Using the instanceId captured above, use `describe_instances()` method to get instance details. 
`describe_instances()` has public DNS address of the instance. 
We are filtering the results to the latest instance we created in this notebook by using a filter as shown below. 
If there are multiple instances present in the specified region, we dont want details of all the instances. 

```
InstanceIds=[
        new_instance_id,
    ]

```

In [None]:
inst_det = ec2.describe_instances(
    InstanceIds=[
        new_instance_id,
    ]
)

Get the public DNS of new instance. The output in the variable "inst_det" has details of the instance, like public DNS, public IP address, private IP address etc. Its a dictionary again. 

- In the below cell, `inst_det["Reservations"]` gives corresponding value associated with the key `Reservations`. 
- Again access the only keypair in the list with an index 0. 
- `"Instances"` in `inst_det["Reservations"][0]["Instances"][0]` is the key and gived corresponding details associated with the key. So we have the instance details now.
- Finally, capture the PublicDnsName name of instance in instance_pub_dns

In [None]:
instance_pub_dns=inst_det["Reservations"][0]["Instances"][0]["PublicDnsName"]
instance_pub_dns

Below function accepts ec2 client and the instance id. 
It uses the same code as above cell except here it is trying to get the state of the instance id passed as input. 
If the instance is in running or terminated state it will break out of the while loop and prints that the instance is running or terminated.

If the instance is in any other state like waiting to be set up or terminating, it keeps polling in regular intervals as per the delay. 
`time.sleep()` will sleepm for specified time and checks the status of instance in the while loop. 

```

delay *= random.uniform(1.1, 2.0)
        time.sleep(delay) 

```

In [None]:
def poll_until_completed(client, ins_id):
    delay = 2
    while True:
        instance = client.describe_instances(InstanceIds=[ins_id,])
        status = instance["Reservations"][0]["Instances"][0]["State"]["Name"]
#         message = cluster.get('Message', '')
        now = str(datetime.datetime.now().time())
    
        print("instance %s is %s at %s" % (ins_id, status, now))
        if status in ['running','terminated']:
            break

        # exponential backoff with jitter
        delay *= random.uniform(1.1, 2.0)
        time.sleep(delay)

Call the poll_until_completed() with ec2 client and instance id as parameters. 

In [None]:
import random
import time

poll_until_completed(ec2, new_instance_id)  # Can't use it until it's COMPLETED

### Upload a file to S3. You will access this file in the Jupyter notebook you will run inside docker.


"bgg_db_2017_03.csv" is the file available in your local directory. Upload this file to S3, so you can access the same file in Jupyter running in docker container on EC2 instance. 

In [None]:
# Creating the Connection

import boto3
s3 = boto3.client('s3', 
                   aws_access_key_id = access_id, 
                   aws_secret_access_key = access_key)

In [None]:
bucket_name= system_user_name+time.strftime(".%d%m%Y%H%M%S")+'.dsabucket'
s3.create_bucket(Bucket=bucket_name, CreateBucketConfiguration={
    'LocationConstraint': 'us-west-2'})

In [None]:
bucket_name

In [None]:
# Uploading files S3 Bucket. 
s3.upload_file('bgg_db_2017_03.csv',bucket_name,'board_games.csv')

### SSH into EC2 instance

SSH into the EC2 instance you just created through terminal. 

* Open up a terminal

Print the keypair file name

In [None]:
print("keypair file name:",ec2_pem_file)

* Open up a terminal.


* Step 1: change into the course folder


* Step 2: Update the permissions on keypair file

    `Run below cell and copy the output. paste it in the terminal. This will make the keypair file readonly.`

In [None]:
print("chmod 400 "+os.getcwd()+"/"+ec2_pem_file+".pem")

* Run the cell below and copy the output. 

* Paste the output in terminal and hit enter.

In [None]:
print("ssh -i "+os.getcwd()+"/"+ec2_pem_file+".pem ec2-user@"+instance_pub_dns)

<img src="../images/SSH_command.PNG">

## Run the below list of commands 

Follow the next 5 screenshots for updating the packages and installing new software. 

* sudo su

* yum update –y


<img src="../images/update_packages.png">

----



* yum install -y docker

* service docker start


<img src="../images/install_docker.PNG">

* usermod -a -G docker ec2-user

* yum install python38

* yum -y update


<img src="../images/update_packages_again.png">

* wget https://bootstrap.pypa.io/get-pip.py

* python3 get-pip.py

* /usr/local/bin/pip install boto3


<img src="../images/install_pip_boto3.PNG">

### Start docker service and download docker image that has Jupyter installed

* sudo service docker start

* docker run -it --rm -p 8888:8888 jupyter/scipy-notebook


<img src="../images/run_docker.PNG">

<br>
Docker will give a URL with a token. This URL will allow to open up Jupyter on EC2 instance. Copy(**select the URL with mouse, right click and then copy**) the URL docker gave as output and paste it in browser window. While pasting the URL, delete localhost from URL and replace it with the public DNS address of instance. 

For example (not a useable link),

http://ec2-54-201-248-103.us-west-2.compute.amazonaws.com:8888/?token=d8e2791504cb6623b3ab0d97f69aa74c583457f59f379c9a

-----
<br>
Paste the URL in your local browser


<img src="../images/jupyter_running.PNG">

### Download Access_S3.ipynb notebook


Now that you have Jupyter running in the docker container, lets run a python notebook there. There is a python notebook "Access_S3.ipynb" in the current directory "CloudComputingDataAnalytics/module2/labs/". Download the file to your local machine and upload the same into Jupyter running in Docker. Select Access_S3.ipynb file and click on download.


Use the upload button in Jupyter present in your current working directory to the docker. 

Below code cell will upload the file using scp command.



### Upload the file Access_S3.ipynb


Go to Jupyter running on Docker, use the upload button to upload Access_S3.ipynb. 
Access_S3 notebook in the local machine is copied to Jupyter on Docker.



* Once Jupyter is up and running, open up a terminal and run below command to install boto3. 
    
    pip install boto3
    
Close the terminal

 Run all the cells in the notebook. The boardgames  dataset should be downloaded in docker jupyter.

### Delete SSH Keypair


In [None]:
# Delete SSH Keypair

try:
    os.remove(ec2_pem_file+'.pem')
    print('Local Key Deleted')
except:
    print('Local Key Not Found')
    
response = ec2.delete_key_pair(KeyName=ec2_pem_file)
print('\nAWS Metadata: ')
print('http Status Code : '+str(response['ResponseMetadata']['HTTPStatusCode']))
print('Request ID       : '+response['ResponseMetadata']['RequestId'])
print('Retries          : '+str(response['ResponseMetadata']['RetryAttempts']))

## Terminate the EC2 instance

In [None]:
ec22 = boto3.resource('ec2',region_name=region,
                   aws_access_key_id = access_id, 
                   aws_secret_access_key = access_key)

ec22.Instance(new_instance_id).terminate()

## Delete the security group

Note: Make sure the instance is terminated before deleting the security group

In [None]:
import random
import time

poll_until_completed(ec2, new_instance_id)  # Can't use it until it's COMPLETED

In [None]:
SG_delete_response = ec2.delete_security_group(
    GroupId=Sec_group,
)
SG_delete_response

<h1><span style="background:yellow">Below cells are just for reference</span></h1>


### Paramiko Python Package

There is a python library called paramiko which will let you ssh into a remote machine and execute commands in terminal. For now, lets keep it simple. Commands in below cell actually use paramiko library commands to establish the connection by doing SSH into the EC2 instance. The commnads install docker software, start the container and add ec2-user to the docker group so that it can be accessed. At the end, python 3.4 is installed and boto3 package is installed. 


**Note:**

When you SSH into an EC2 instance, remember the machine always does SSH checking and asks for confirmation if we want to trust the machine we are getting into. We need to be careful to get through SSH checking and actually get into the instance. 

For that matter if you often launch new EC2 instances, start and stop EC2 instances, without using Elastic IPs (permanently attached to servers) then we would be dealing with new/changing IPs/hostnames of instances all the time. In that case if you want to permanently stop SSH checking and storing server fingerprints for EC2 public hostnames, add below lines to the ~/.ssh/config file. 

----

$#$ AWS EC2 public hostnames (changing IPs)

    Host *.compute.amazonaws.com

    StrictHostKeyChecking no

    UserKnownHostsFile /dev/null

In [None]:
# # Don't run this cell. 

# import boto3
# import botocore
# import paramiko

# client = paramiko.SSHClient()
# client.set_missing_host_key_policy(paramiko.AutoAddPolicy())

# # Connect/ssh to an instance
# try:
#     # hostname is public DNS address of EC2 instance, key_filename is the private key to connect to instance.
#     print("Trying to connect")
    
#     client.connect(hostname=instance_pub_dns, username='ec2-user', key_filename="EC2KeyPair1.pem")
#     print("connected to instance")
    
#     print("""Update all the existing packages, 
#           Install the most recent Docker Community Edition package, 
#           Start the Docker service.
#           Add the ec2-user to the docker group so you can execute Docker commands without using sudo. 
#           Log out with exit() command. """ )
    
#     stdin, stdout, stderr = client.exec_command("sudo su; yum update –y; yum install -y docker; service docker start;\
#     usermod -a -G docker ec2-user; yum install python34; yum -y update; yum install boto3; exit")
    
#     print("stdout: ",stderr)
    
#     print(" log back in again to pick up the new docker group permissions.")
#     client.connect(hostname=instance_pub_dns, username='ec2-user', key_filename="EC2KeyPair1.pem")
#     print("connected to instance back")
    
# except Exception as e:
#     print(e)

The code in the below cell is intimidating. It actually invokes a terminal session and starts the docker service. The command "docker run -it --rm -p 8888:8888 jupyter/scipy-notebook" is to tell docker to run the docker image jupyter/scipy-notebook. The image has Jupyter installed on it. 

Once the image is loaded, docker will spit a url like below with a token to access Jupyter. 

https://localhost:8888/?token=69df31aeebc2f1cd7bbdc8e78465bef30d11f02d462307a6


Copy the url and append the EC2 instance public dns and paste the url (as shown below) in browser window to access Jupyter on EC2 instance. For example, 


https://ec2-54-202-203-168.us-west-2.compute.amazonaws.com:8888/?token=69df31aeebc2f1cd7bbdc8e78465bef30d11f02d462307a6

In [None]:
# # ping from ec2-54-202-240-233.us-west-2.compute.amazonaws.com to 34.214.123.79

# # two servers are in Oregon center

# import paramiko
# import sys
# import time

# class sampleParamiko:
#     ssh = ""
#     def __init__(self, host_DNS, uname, keyfile):
#         try:
#             self.ssh = paramiko.SSHClient()
#             self.ssh.set_missing_host_key_policy(paramiko.AutoAddPolicy())
#             self.ssh.connect(host_DNS, username=uname, key_filename=keyfile)
#             #print "In init function"
#         except (paramiko.BadHostKeyException, paramiko.AuthenticationException, paramiko.SSHException) as e:
#             print(str(e))
#             sys.exit(-1)

#     def executeCmd(self,cmd):
#         try:
#             channel = self.ssh.invoke_shell()
#             timeout = 60 # timeout is in seconds
#             channel.settimeout(timeout)
#             newline        = '\r'
#             line_buffer    = ''
#             channel_buffer = ''
#             channel.send(cmd + ' ; exit ' + newline)
                
#             while True:
#                 channel_buffer = channel.recv(1).decode('UTF-8')
#                 if len(channel_buffer) == 0:
#                     break
#                 channel_buffer  = channel_buffer.replace('\r', '')
#                 if channel_buffer != '\n':
#                     line_buffer += channel_buffer
#                 else:
#                     print(line_buffer)
#                     line_buffer   = ''

#         except paramiko.SSHException as e:
#             print(str(e))
#             sys.exit(-1)
            
# host_DNS = instance_pub_dns
# username='ec2-user'
# filename="EC2KeyPair1.pem"

# cmd = "docker info"
# conn_obj = sampleParamiko(host_DNS, username, filename)
# print("Verify that the ec2-user can run Docker commands without sudo.")

# try:
#     print("Start the Docker service.")
#     stdin, stdout, stderr = client.exec_command("sudo service docker start")
    
# except Exception as e:
#     print(e)
    
# conn_obj.executeCmd(cmd)

# cmd = "docker run -it --rm -p 8888:8888 jupyter/scipy-notebook"
# conn_obj = sampleParamiko(host_DNS, username, filename)
# conn_obj.executeCmd(cmd)





# cmd_list = ["sudo su","yum update –y","yum install -y docker","service docker start","usermod -a -G docker ec2-user","yum install python34","yum -y update","yum install boto3","exit"]

# try:
#     for cmd in cmd_list:
#         print(cmd)
#         conn_obj = sampleParamiko(host_DNS, username, filename)
#         conn_obj.executeCmd(cmd)
    
# except Exception as e:
#     print(e)


In [None]:
client.close()

# Save your notebook and then `File > Close and Halt`