> **Jupyter slideshow:** This notebook can be displayed as slides. To view it as a slideshow in your browser, type the following in the console:


> `> ipython nbconvert [this_notebook.ipynb] --to slides --post serve`


> To toggle off the slideshow cell formatting, click the `CellToolbar` button, then `View --> Cell Toolbar --> None`.

<img src="http://imgur.com/1ZcRyrc.png" style="float: left; margin: 20px; height: 55px">

# Amazon Web Services (AWS) 

_Authors: David Yerrington (SF)_

---

![](https://snag.gy/dFoKAy.jpg)




### Learning Objectives
*By the end of this lesson, you will be able to:*
- Explain the services AWS offers and which ones are relevant to data science.
- Start and terminate an elastic compute cloud (EC2) instance.
- Understand how to use the AWS composite leading indicators.
- Use EC2 from the command line.

### Student Pre-Work
*Before this lesson, you should already be able to:*
- Set up an account on AWS using two-factor authentication for security.
- Connect to a remote computer via SSH.
- Students should have a credit card to sign up for AWS or have an AWS account.

### Lesson Guide
- [Introduction](#intro)
- [What is AWS?](#what-is)
- [Elastic Compute Cloud (EC2) Overview](#ec2)
    - [Signing In](#signing-in)
- [EC2 Tutorial](#ec2-tutorial)
    - [Step 1: Launch an Amazon EC2 Instance](#step1)
    - [Step 2: Configure Your Instance](#step2)
    - [Step 3: Connect to Your Instance](#step3)
    - [Step 4: Terminate Your Instance](#step4)
    - [Additional Remarks](#addl)
- [Simple Storage Service (S3) Tutorial](#s3-tutorial)
- [AWS Command Line (AWS CLI)](#awscli)
- [AWS CLI Tutorial](#awscli-tutorial)
    - [Step 1: Create and AWS IAM User](#cli-step1)
    - [Step 2: Install and Configure the AWS CLI](#cli-step2)
    - [Step 2B: Setting Up Your Environmental Variables](#cli-step2b)
    - [Step 3: Using the AWS CLI With Amazon S3](#cli-step3)
- [EC2 From the Command Line](#ec2-cli)
    - [Get the Security Group ID](#security)
    - [Get the AMI ID](#ami-id)
    - [Launch Spot Instance](#launch)
    - [Connect to the Spot Instance](#connect)
    - [Terminate the Spot Instance](#terminate)
- [Conclusion](#conclusion)
- [Additional Resources](#resources)

<a name="intro"></a>
## Introduction
---


Today we're going to walk through Amazon Web Services (AWS). In particular, we'll focus on the services that are commonly used in data science. AWS are cloud computing services; they're essentially virtual machines that you pay to access for the amount of time you need them. 

**Check:** What is a server?

> **Answer**: A server is a computer or software that performs administration or coordination functions within a network.

**Check:** What did the world look like before AWS and Google Cloud?

> **Answer**: Computation was expensive to set up, access, and maintain. Only large companies, governments, and institutions had access to it. Now, anyone can rent it for pennies.

<a name="what-is"></a>
## What is AWS?
---

> _Amazon Web Services (AWS)_ is a subsidiary of Amazon.com that offers a suite of cloud computing services that make up an on-demand computing platform. These services operate from 12 geographical regions around the world. 

> Amazon’s Elastic Compute Cloud (EC2) and Simple Storage Service (S3) are arguably the most essential and best-known services. **AWS now has more than 70 wide-ranging services, including computing, storage, networking, database, analytics, application services, deployment, management, mobile, developer tools, and tools for the Internet of things (IoT).** 

> Amazon markets AWS as a service to provide large computing capacity quicker and cheaper than a client company building an actual physical server farm. _(from Wikipedia)_

Today, we will explore two services that are relevant to many big data scenarios:

1) EC2 (Elastic Compute Cloud)
2) S3 (Simple Storage Service)

By the end of this lesson, you will be able to start and stop a computer and store data in the cloud. How cool is that?

> **Note:** In the absence of Amazon credits, you can sign up with a new account and get free-tier usage for one year.

**Check:** What could be some advantages of using a server in the cloud instead of managing our own data center?

- **Cost reduction**: You don't pay infrastructure costs when you don't need it.
- **Reliability**: Servers are maintained and guaranteed by a company whose only job is to make sure the servers are available for you.
- **Scalability**: You can add more computing power when necessary.

<a name="ec2"></a>
## Elastic Compute Cloud (EC2) Overview
---

The first service we will explore is _Elastic Compute Cloud_ or _EC2_. EC2 forms a central part of Amazon.com's cloud computing platform, allowing users to rent virtual computers on which to run their own computer applications. 

Let's learn some key terms:

- **Instance**: The virtual machine hosted in Amazon's cloud running the software we want.
- **Amazon Machine Image (AMI)**: A snapshot of a configured machine that we can use as a starting point to boot an instance. We can also save a running instance to a new AMI so that, in the future, we can boot a new machine with an identical configuration.
- **SSH key**: A [pair of keys](https://en.wikipedia.org/wiki/Public-key_cryptography) necessary for connecting to an instance remotely. The private key will be downloaded to your laptop and the matching public key will be automatically configured on the instance.

Our main conceptual shift from using a laptop to running an instance in the cloud is that we can think of computing power as ephemeral. We request computing power when we need it, perform a calculation, and dismiss that power when we're done. 

Input and output will not be stored on the machine. Instead, they're stored somewhere else in the cloud (Hint: S3). In this sense, computing power is a commodity that we purchase and use in the amount and time we need.

<a id='signing-in'></a>
### Let's See How It Works

> 1) Create a new account on AWS [here](https://aws.amazon.com/).

It will ask you for your contact and credit card information. Don't worry — most of the things we'll do are free for first-time users and, when we do use paid services, it likely won't cost more than $10.

Here are some screenshots of the process:

![](./assets/images/aws1.png)

![](./assets/images/aws2.png)

![](./assets/images/aws3.png)

**Once you're done, you should get to this page:**

![](./assets/images/aws4.png)

**Let's sign in to the console. You should get to this page:**

![](./assets/images/services1.png)

<a name="ec2-tutorial"></a>
## EC2 Tutorial
---

Let's go ahead and follow the [tutorial for EC2](https://aws.amazon.com/getting-started/tutorials/launch-a-virtual-machine/).

<a id='step1'></a>
### Step 1: Launch an Amazon EC2 instance.

![](./assets/images/launch-instance.png)

<a id='step2'></a>
### Step 2: Configure your instance.

Follow the suggested steps until you see your image booting up:

![](./assets/images/launched1.png)

Notice that we have a lot of information about the instance. In particular:

- Its DNS name and IP address.
- The type of instance.
- The key necessary to connect to it.

**Check:** What is an IP address?

<a id='step3'></a>
### Step 3: Connect to your instance.

Go ahead and follow the instructions for connecting to the instance. In particular:

1) Download a bash shell (optional).
2) Copy the SSH key you downloaded to the appropriate location.
3) Use the SSH key to connect to it, as explained in the tutorial.

![](./assets/images/connected.png)


#### Congratulations! You've just connected to an instance in the cloud.

Try launching Python from the shell and do something with it.

![](./assets/images/python.png)

<a id='step4'></a>
### Step 4: Terminate your instance.

Once you're done with your calculation and you no longer need the instance, you can terminate it. This will kill the instance and it will no longer be available to you. You should make sure you've saved all of the data and code you need somewhere else.

![](./assets/images/terminate.png)

![](./assets/images/terminated.png)

Unless you're using your machine to serve a live application (such as a web app or API), it's essential that you terminate your instance when it's not in use so you don't incur additional costs.


<a id='addl'></a>
### Additional Remarks

We've walked through the simplest way to launch and terminate an instance in the cloud.

In time, you'll discover that it has more complexities. Here are some pointers you may find useful:

- [Pricing](https://aws.amazon.com/ec2/pricing/): EC2 pricing depends on the type of instance and the chosen region. Make sure you understand the cost of the instance you request in order to avoid surprise bills. If in doubt, you can use the convenient [cost calculator](http://calculator.s3.amazonaws.com/index.html) to get an exact forecast of your costs.

![](./assets/images/costcalculator.png)

- [Spot instances](http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-spot-instances.html): Spot instances are even more ephemeral than normal instances. They only live until their cost is lower than the price you agreed to pay. They’re a great way to save money when using more powerful machines.
- [AMIs](http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/AMIs.html): AMIs are a snapshot of your machine. They're useful if you installed a lot of software and want to save that particular configuration.

![](./assets/images/createimage.png)


**Check:** Can you give an example of when AMIs could be useful?

- [Security groups](http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-network-security.html): Security groups are ways of opening ports to the services running on your machine.

- [Elastic IPs](http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/elastic-ip-addresses-eip.html): You can rent a fixed IP address and associate it to your instance. This way, you can configure tools to always connect to the same address, independent of machine.

<a name="demo"></a>
## Simple Storage Service (S3) [5 minutes]

We’ve learned how to start and stop an instance in the cloud. This is helpful because it provides us with "computing power as a service." Now, let's learn how we can also store data in the cloud.

**Amazon's Simple Storage Service (S3)** is an online file storage system. It provides storage through web service interfaces (REST, SOAP, and BitTorrent) using an _object storage architecture_. According to Amazon, S3's design aims to provide scalability, high availability, and low latency at commodity costs.

Objects are organized into **buckets** (each owned by an AWS account) and identified within each bucket by a unique, user-assigned key. Buckets and objects can be created, listed, and retrieved using either a REST-style HTTP interface or a SOAP interface. Additionally, objects can be downloaded using the **HTTP GET interface and the BitTorrent protocol.**


<a name="s3-tutorial"></a>
## Simple Storage Service (S3) Tutorial

In pairs, follow the [tutorial for S3](https://aws.amazon.com/getting-started/tutorials/backup-files-to-amazon-s3/).

The steps should be simple to complete, but feel free to ask any questions you may have.

**Check:** What's a practical use case you can envision for S3?


<a name="awscli"></a>
## AWS Command Line (AWS CLI)
---

We've learned to request and access computing power and storage as a service through AWS. Wouldn't it be nice to be able to do this quickly from the command line? Yeah it would! So, let's get acquainted with AWS CLI.

[AWS CLI](https://github.com/aws/aws-cli) is a unified command line interface that allows us to control most AWS services from the same command line interface.

**Check:** Why is this useful? Why is it powerful? Can you give some examples?

> **Example**: To be able to programmatically turn instances on and off, to create complex architectures, or to provision clusters in response to a demand.

<a name="awscli-tutorial"></a>
## AWS CLI Tutorial
---

Next, follow the [tutorial for AWS CLI](https://aws.amazon.com/getting-started/tutorials/backup-to-s3-cli/).


<a id='cli-step1'></a>
### Step 1: Create an AWS IAM user.

In order to use the command line, we'll have to configure a set of access credentials on our laptop. It's important to create a separate identity with limited permissions instead of using our root account credentials.

**Check:** Why is this so important?

![](./assets/images/identitymanager.png)

> **Note:** It's also a good idea to set up two-factor authentication.

> **Note:** When attaching a policy, you can be more restrictive and only give users permission for the services you intend for them to use.


<a id='cli-step2'></a>
### Step 2: Install and configure the AWS CLI.

http://docs.aws.amazon.com/cli/latest/userguide/installing.html

Notice that one of the methods is to simply use `pip` to install the AWS CLI.

> **Note:** If you already have the AWS CLI configured and would like to have multiple roles, you can do so as explained [here](http://docs.aws.amazon.com/cli/latest/userguide/cli-roles.html).

<a id='cli-step2b'></a>
### Step 2.B: Setting up your environmental variables.

These environmental variables must be set in order for the AWS client to properly authenticate and communicate with your machine. This process is outlined [here](http://docs.aws.amazon.com/cli/latest/userguide/cli-chap-getting-started.html).

```bash
$ aws configure
AWS Access Key ID [None]: AKIAIOSFODNN7EXAMPLE
AWS Secret Access Key [None]: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
Default region name [None]: us-west-2
Default output format [None]: ENTER
```

We should go through where to find these in our [account settings](https://console.aws.amazon.com/iam/home?#security_credential).

<a id='cli-step3'></a>
### Step 3: Using the AWS CLI with Amazon S3.

Now you can copy files back and forth from your command line without ever having to click on the web interface. 

Here's a [cheat sheet](https://github.com/toddm92/aws/wiki/AWS-CLI-Cheat-Sheet) for the AWS CLI.

<a name="ec2-cli"></a>
## EC2 From the Command Line
---

Empowered with a well-configured AWS CLI, we can now start and stop EC2 instances from the command line. Let's use it to create a spot instance.


In [1]:
%%bash
# You must run aws configure in a terminal first!
aws ec2 describe-spot-price-history \
    --start-time $(date -u +"%Y%m%dT%H0000") \
    --product "Linux/UNIX" \
    --instance-type "m3.medium" \
    --region us-west-2 \
    --output table

bash: line 2: aws: command not found


<a id='security'></a>
### Get the Security Group ID

In the previous activity, we launched an instance and created a security group that allows SSH access. Let's use the same security group (or the region you chose).

This will return a JSON string. You want to copy the ID of the security group that has port 22 open (if there are running instances).

In [6]:
%%bash
aws ec2 describe-security-groups --region us-west-2

{
    "SecurityGroups": [
        {
            "IpPermissionsEgress": [
                {
                    "PrefixListIds": [], 
                    "FromPort": 80, 
                    "IpRanges": [
                        {
                            "CidrIp": "0.0.0.0/0"
                        }
                    ], 
                    "ToPort": 80, 
                    "IpProtocol": "tcp", 
                    "UserIdGroupPairs": []
                }
            ], 
            "Description": "Elastic Beanstalk created security group used when no ELB security groups are specified during ELB creation", 
            "Tags": [
                {
                    "Value": "awseb-e-fmndsshbqz-stack", 
                    "Key": "aws:cloudformation:stack-name"
                }, 
                {
                    "Value": "mxlPublicWeb-env", 
                    "Key": "elasticbeanstalk:environment-name"
                }, 
                {
                    "Value": "a

<a id='ami-id'></a>
### Get the AMI ID

Get the AMI ID of the Ubuntu Linux 14.04 image. You can find it by checking the name in the [launch instance window](https://us-west-2.console.aws.amazon.com/ec2/v2/home?region=us-west-2#LaunchInstanceWizard). 

> At the time this lesson was created, it was `ami-9abea4fb`.

You can check it by typing:

```bash
aws ec2 describe-images --image-ids ami-9abea4fb --region us-west-2
```

<a id='launch'></a>
### Launch Spot Instance

You're now ready to submit the spot instance request:

```bash
aws ec2 request-spot-instances \
    --region us-west-2 \
    --spot-price 0.02 \
    --launch-specification "{
        \"KeyName\": \"MyFirstKey\",
        \"ImageId\": \"<MOST RECENT UBUNTU AMI ID>\",
        \"InstanceType\": \"m3.medium\" ,
        \"SecurityGroupIds\": [\"<YOUR SECURITY GROUP ID>\"]
    }"
```

If it's working, this should return a JSON description of the instance request.

You can check that the instance request has been opened here:

![](./assets/images/instancerequest.png)

Or, by using the command line:

```bash
aws ec2 describe-spot-instance-requests --region us-west-2
```

When the request has been accepted, an instance is spawned:

![](./assets/images/spotinstance.png)

Let's retrieve the DNS name:
```bash
aws ec2 describe-instances --region us-west-2 --output json | grep PublicDnsName | head -n 1
```

<a id='connect'></a>
### Connect to the Spot Instance

```bash
ssh -i ~/.ssh/MyFirstKey.pem ubuntu@<YOUR INSTANCE DNS>
```


<a id='terminate'></a>
### Terminate the Spot Instance

Let's retrieve the instance ID and kill it:

```bash
aws ec2 describe-instances --region us-west-2 --output json | grep InstanceId

aws ec2 terminate-instances --instance-ids i-0aa55cd3363b0f187
```

![](./assets/images/terminatedspot.png)


<a name="conclusion"></a>
## Conclusion
---

In this lesson, we’ve learned about two fundamental Amazon Web Services: Elastic Cloud Compute and Simple Storage Service. These two services are widely used because they provide on-demand computation and storage at an affordable cost.

We’ve learned how to use them both from the web interface and the command line.

**Check:** Can you think of a situation where this could be useful?

<a id='resources'></a>
## Additional Resources
---

- [EC2](https://aws.amazon.com/ec2/?nc2=h_m1)
- [S3](https://aws.amazon.com/s3/?nc2=h_m1)
- [Tutorials](https://aws.amazon.com/getting-started/tutorials/)
- [AWS CLI Tutorial](http://www.joyofdata.de/blog/guide-to-aws-ec2-on-cli/)