## Using Boto3 to interact with Amazon Web Services

In [1]:
# install the library if you don't have it
# !pip install boto3

In [2]:
# import the library
import boto3

## SETUP: AWS Key and Secret

You need to have an AWS account which will give you access to a "key" and "secret" that you can view from your account page. There are two ways to setup your local environment with the "key" and "secret":

* Download <a href="https://aws.amazon.com/cli/">AWS CLI</a> and after it is installed, type `aws configure` in a command/terminal prompt. This will then ask you for your key and secret.

**OR**

* Enter the key and secret directly into your python script

I recommend you follow the first method as it is generally never recommended that you allow secrets/keys/passwords/etc. to be directly in your code files (poor security practice).

## LESSON: Accessing AWS S3 Files with Boto3

### Step 1) Initializing boto3

In [None]:
# IF YOU DON'T WANT TO DOWNLOAD AWS CLI... here is a way (not-recommended) to enter your key/secret combo directly into Python

# NO NEED TO RUN THIS IF YOU ENTERED YOUR KEY/SECRET THROUGH AWS CLI
s3 = boto3.resource('s3', 
                    aws_access_key_id = <access_key>,
                    aws_secret_access_key = <secret_key>)

In [4]:
# IF YOU ENTERED YOUR KEY/SECRET THROUGH AWS CLI (HIGHLY RECOMMENDED!!)... run this code block:

# this tells boto3 we want to access AWS's S3 service specifically
s3 = boto3.resource('s3')


### Step 2) Accessing a specific file

Getting a file follows the format: `s3.Object('my-bucket-name', 'my_path/to/file.txt')`

In [17]:
# Bucket = 'data-science-gym'
# File Path = 'iris_data.csv'

file_object = s3.Object('data-science-gym', 'iris_data.csv')
file_content = file_object.get()['Body'].read()

### Step 3) Read the data into Pandas

This is really just one endpoint. If we just wanted to download the file, we could do that instead, but in this example we don't really have a reason to save our data to disk so we're going to go straight to Pandas.

In [20]:
import pandas as pd
import io

df = pd.read_csv(io.BytesIO(file_content))

# you may find that you need to specify the encoding of the file, like this
# df = pd.read_csv(io.BytesIO(file_content), encoding='utf8')

In [21]:
df.head()

Unnamed: 0,Flower Type,Sepal Length,Sepal Width,Petal Length,Petal Width
0,setosa,5.1,3.5,1.4,0.2
1,setosa,4.9,3.0,1.4,0.2
2,setosa,4.7,3.2,1.3,0.2
3,setosa,4.6,3.1,1.5,0.2
4,setosa,5.0,3.6,1.4,0.2


Cool, we got our data straight from S3 into a Pandas dataframe without writing to disk (that's what the whole `io.BytesIO` part is about.

If your data is in a different format than csv or txt, you'll need to change things accordingly. For instance here is an example of getting a json file:

In [25]:
import json # need this to read json data properly

# we're getting gameplay data from a videogame!
file_object = s3.Object('riot-developer-portal', 
                        'seed-data/matches1.json')


json_content = file_object.get()['Body'].read().decode('latin-1')
json_content = json.loads(json_content)

df = pd.DataFrame.from_records(json_content["matches"])

df.head()

Unnamed: 0,gameCreation,gameDuration,gameId,gameMode,gameType,gameVersion,mapId,participantIdentities,participants,platformId,queueId,seasonId,teams
0,1504029097863,3509,2585563902,CLASSIC,MATCHED_GAME,7.17.200.3955,11,"[{'participantId': 1, 'player': {'platformId':...","[{'participantId': 1, 'teamId': 100, 'champion...",NA1,420,9,"[{'teamId': 100, 'win': 'Win', 'firstBlood': T..."
1,1504029495717,3105,2585564285,CLASSIC,MATCHED_GAME,7.17.200.3955,11,"[{'participantId': 1, 'player': {'platformId':...","[{'participantId': 1, 'teamId': 100, 'champion...",NA1,420,9,"[{'teamId': 100, 'win': 'Fail', 'firstBlood': ..."
2,1504029750399,2764,2585564561,CLASSIC,MATCHED_GAME,7.17.200.3955,11,"[{'participantId': 1, 'player': {'platformId':...","[{'participantId': 1, 'teamId': 100, 'champion...",NA1,420,9,"[{'teamId': 100, 'win': 'Fail', 'firstBlood': ..."
3,1504029831363,2785,2585564610,CLASSIC,MATCHED_GAME,7.17.200.3955,11,"[{'participantId': 1, 'player': {'platformId':...","[{'participantId': 1, 'teamId': 100, 'champion...",NA1,420,9,"[{'teamId': 100, 'win': 'Fail', 'firstBlood': ..."
4,1504029887271,2841,2585564622,CLASSIC,MATCHED_GAME,7.17.200.3955,11,"[{'participantId': 1, 'player': {'platformId':...","[{'participantId': 1, 'teamId': 100, 'champion...",NA1,420,9,"[{'teamId': 100, 'win': 'Win', 'firstBlood': T..."


We just accessed a JSON file from the makers of a videogame called League of Legends and got some gameplay data.

So now you know how to get a couple types of data straight from S3!

***

## That's all for now!