# Run Jupyter on a Docker Container


This notebook will test the concepts you have learned in the lab [Jupyter from Docker](../labs/Jupyter_from_Docker.ipynb) mainly. The code is partially complete in most of the activities. The activities will build upon each previous activity to acheive the final result at the end. The activities are similar to what you have seen in labs. You are expected to search online for syntaxes in some cases and complete the partially complete code cells. 

The exercises will ask you to upload a dataset to S3 bucket. You need to install Docker on an EC2 instance and fetch the dataset from S3 that is uploaded earlier. At the end perform simple linear regression to predict the salary of employees. 

### Launch a new EC2 instance and install docker on it

To install Docker on an Amazon Linux instance, launch an instance with the Amazon Linux AMI or use one of the existing instances. Connect to your instance using SSH. Update the installed packages and package cache on the instance.

## Note: 

Update the below credentials in the code cell below

1. aws_access_key_id
2. aws_secret_access_key

In [None]:
#Import AWS' Python Based DEVOPS tools
import boto3
from botocore.exceptions import ClientError

#Import System Tools
import collections
import json
import os
import datetime
import pandas
import time
import getpass
from subprocess import call

#Set important Variables
system_user_name=getpass.getuser()

def datetime_handler(x):
    if isinstance(x, datetime.datetime):
        return x.isoformat()
    raise TypeError("Unknown type")
aws_access_id='<enter your aws access key id>'
aws_secret_key='<enter your aws secret access key>'
# client interface.
# Estabilish Credentials/Session
ec2 = boto3.client(
    'ec2', 
    region_name='us-west-2',
    aws_access_key_id=aws_access_id,
    aws_secret_access_key=aws_secret_key
)

**Activity 1:** Create a new security group named Docker_SG and save the GroupId of newly created security group in Sec_group variable

In [None]:
# Store the security group name in a variable
New_Sec_Group_Name= "Docker_your_pawprint_SG"

Create_SG_response = ec2.create_security_group(
    Description='security grp for docker',
    GroupName=<security group name>
)
Sec_group=<get the group id of new security group>

In [None]:
# Modify Security Configuration to allow MU's IP addresses

try:
    sec_rule="ALL TCP"
    data = ec2.authorize_security_group_ingress(
        GroupId=Sec_group,
        IpPermissions=[
            {'IpProtocol': 'tcp',
             'FromPort': 0,
             'ToPort': 65535,
             'IpRanges': [{'CidrIp': '0.0.0.0/0'}]},
        ],)
    print("Ingress "+sec_rule+" added")
except:
    print(sec_rule+" already added")
#     print(data)

try:
    sec_rule="ALL TCP"
    data = ec2.authorize_security_group_ingress(
        GroupId=Sec_group,
        IpPermissions=[
            {'IpProtocol': 'tcp',
             'FromPort': 0,
             'ToPort': 65535,
             'UserIdGroupPairs': [{ 'GroupId': Sec_group }]
#              'IpRanges': [{'CidrIp': Sec_group}]},
            }],
#         SourceSecurityGroup=Sec_group_name
    )
    print("Ingress "+sec_rule+" added")
except:
    print(sec_rule+" already added")

try:
    sec_rule="Custom ICMP Rule - IPv4"
    data = ec2.authorize_security_group_ingress(
        GroupId=Sec_group,
        IpPermissions=[
            {'IpProtocol': 'icmp',
             'FromPort': 0,
             'ToPort': -1,
             'IpRanges': [{'CidrIp': '173.31.192.195/32'}]},
        ])
    print("Ingress "+sec_rule+" added")
except:
    print(sec_rule+" already added")
#     print(data)

try:
    sec_rule="ALL UDP"
    data = ec2.authorize_security_group_ingress(
        GroupId=Sec_group,
        IpPermissions=[
            {'IpProtocol': 'udp',
             'FromPort': 0,
             'ToPort': 65535,
             'UserIdGroupPairs': [{ 'GroupId': Sec_group }]
            }],
    )
    print("Ingress "+sec_rule+" added")
except:
    print(sec_rule+" already added")
#     print(data)

    
try:
    sec_rule="ALL ICMP"
    data = ec2.authorize_security_group_ingress(
        GroupId=Sec_group,
        IpPermissions=[
            {'IpProtocol': 'icmp',
             'FromPort': -1,
             'ToPort': -1,
             'UserIdGroupPairs': [{ 'GroupId': Sec_group }]
            }],
    )
    print("Ingress "+sec_rule+" added")
except:
    print(sec_rule+" already added")

    
try:
    sec_rule="ALL ICMP"
    data = ec2.authorize_security_group_ingress(
        GroupId=Sec_group,
        IpPermissions=[
            {'IpProtocol': 'icmp',
             'FromPort': -1,
             'ToPort': -1,
             'IpRanges': [{'CidrIp': '0.0.0.0/16'}]
            }],
    )
    print("Ingress "+sec_rule+" added")
except:
    print(sec_rule+" already added")


**Activity 2:** Create a new KeyPair. The name of the Keypair is specified in "emr_pem_file" variable below. Also, write the Keypair private key to current directory to access it for authentication. 

In [None]:
import time 
import os

# Generate a unique name for keypair 
emr_pem_file=time.strftime("EMR-%d%m%Y%H%M%S-"+system_user_name)

# Create a new key pair
emr_key=ec2.<what goes in here>(KeyName=<what goes in here>)

os.system("echo \""+emr_key['KeyMaterial']+"\" > "+emr_pem_file+".pem")
os.chmod(emr_pem_file+".pem",0o400)

print("KeyName         : "+emr_key['KeyName'])

**Activity 3:** Launch an EC2 instance using "emr_pem_file" as KeyName and "Sec_group" as security group

In [None]:
# Create Instance
instances = ec2.run_instances(
    ImageId='ami-aa5ebdd2',
    MinCount=1, 
    MaxCount=1,
    KeyName=<what goes in here>,
    TagSpecifications=[
        {
            'ResourceType': 'instance',
            'Tags': [
                        {   'Key': 'Name',
                            'Value': 'Docker_Exercise'
                        }
                    ]
        }
    ],
    InstanceType="t2.micro",
    SecurityGroupIds=[
        <what goes in here>
    ],
)

Run the cells below to get the public DNS address of the instance created.

In [None]:
new_instance_id = instances["Instances"][0]["InstanceId"]

In [None]:
inst_det = ec2.describe_instances(
    InstanceIds=[
        new_instance_id,
    ]
)

In [None]:
instance_pub_dns=inst_det["Reservations"][0]["Instances"][0]["PublicDnsName"]
instance_pub_dns

In [None]:
def poll_until_completed(client, ins_id):
    delay = 2
    while True:
        instance = client.describe_instances(InstanceIds=[ins_id,])
        status = instance["Reservations"][0]["Instances"][0]["State"]["Name"]
#         message = cluster.get('Message', '')
        now = str(datetime.datetime.now().time())
    
        print("instance %s is %s at %s" % (ins_id, status, now))
        if status in ['running','terminated']:
            break

        # exponential backoff with jitter
        delay *= random.uniform(1.1, 2.0)
        time.sleep(delay)

Run the poll function and wait until the instance is up and running. 

In [None]:
import random
import time

poll_until_completed(ec2, new_instance_id)  # Can't use it until it's COMPLETED

### SSH through terminal

SSH into the EC2 instance you just created through terminal. Run the cell below, copy the output. Open up a terminal and paste it and hit enter.

**Activity 4:** print the SSH command to use to SSH into your EC2 instance. 

Example: 

`ssh -i CloudComputingDataAnalytics/module2/labs/EMR-09102017111210-skaf48.pem ec2-user@ec2-54-201-248-103.us-west-2.compute.amazonaws.com`

In [None]:
print("ssh -i" +os.getcwd()+"/"+emr_pem_file+".pem ec2-user@"+instance_pub_dns)

## Note:

This is the last activity for this module.

**Activity 5:** Open up a terminal and SSH into the EC2 instance. Run below list of commands. 

* sudo su

* yum -y update

* yum install -y docker

* service docker start

* usermod -a -G docker ec2-user

* yum install python38

* wget https://bootstrap.pypa.io/get-pip.py

* python3 get-pip.py

* /usr/local/bin/pip install boto3

* docker run -it --rm -p 8888:8888 jupyter/scipy-notebook


-----

Copy and paste the url given by docker in browser window.

### Download Linear_Regression.ipynb notebook


There is a python notebook "Linear_Regression.ipynb" in the current directory "CloudComputingDataAnalytics/module2/exercises/". Download the file to your local machine and upload the same into Jupyter running in Docker. Select Linear_Regression.ipynb file and click on download. 


Go to the Jupyter running in Docker, use the upload button to upload the file "Linear_Regression.ipynb" that you downloaded to your local machine.

The notebook accesses Boston housing prices dataset. It fits a simple linear regression model to predict house prices(MEDV is the dependent variable). The Jupyter in docker container doesn't come with ggplot package pre installed. **Install 'ggplot' from terminal to generate plots in the notebook. **


### Upload the Linear_Regression file to docker jupyter notebook.


Run all the cells in the notebook. Download the **completed notebook "Linear_Regression.ipynb"** and upload it into DSA JupyterHub in your exercises folder for grading. 


The point that you were able to launch an EC2 instance, load up a Docker container with Jupyter running it is evaluated. You have root access on EC2 instance so you can install packages to complete running all cells in the notebook.

# <span style='background:yellow'>Save your notebook</span>

# Note
<h1><span style="background:red">Dont run below cells until exercises are graded. We will notify you when to run the cells and terminate the instance. </span></h1>


### Delete SSH Keypair


In [None]:
# Delete SSH Keypair

try:
    os.remove(emr_pem_file+'.pem')
    print('Local Key Deleted')
except:
    print('Local Key Not Found')
    
response = ec2.delete_key_pair(KeyName=emr_pem_file)
print('\nAWS Metadata: ')
print('http Status Code : '+str(response['ResponseMetadata']['HTTPStatusCode']))
print('Request ID       : '+response['ResponseMetadata']['RequestId'])
print('Retries          : '+str(response['ResponseMetadata']['RetryAttempts']))

## Terminate the EC2 instance

In [None]:
ec22 = boto3.resource('ec2',region_name='us-west-2',
                   aws_access_key_id=aws_access_id,
                   aws_secret_access_key=aws_secret_key)
print(new_instance_id)
ec22.Instance(new_instance_id).terminate()

## Delete the security group

Note: Make sure the instance is terminated before deleting the security group

In [None]:
import random
import time

# Just redefined here for good measure due to likelyhood of container being idle during grading
def poll_until_completed(client, ins_id):
    delay = 2
    while True:
        instance = client.describe_instances(InstanceIds=[ins_id,])
        status = instance["Reservations"][0]["Instances"][0]["State"]["Name"]
#         message = cluster.get('Message', '')
        now = str(datetime.datetime.now().time())
    
        print("instance %s is %s at %s" % (ins_id, status, now))
        if status in ['running','terminated']:
            break

        # exponential backoff with jitter
        delay *= random.uniform(1.1, 2.0)
        time.sleep(delay)

poll_until_completed(ec2, new_instance_id)  # Can't use it until it's COMPLETED

In [None]:
SG_delete_response = ec2.delete_security_group(
    GroupId=Sec_group,
)
SG_delete_response

# Save your notebook, then `File > Close and Halt`