# Access data stored in Amazon S3

### Way 1 (as we commonly did in class):

In [1]:
import pandas as pd
import os 

In [23]:
# Read data from AWS S3 (public access)
df = pd.read_csv("https://s3groupmorocco.s3.eu-central-1.amazonaws.com/data/CSPP_breakdown_history.csv")
df.head()

Unnamed: 0,End of Month,Primary market holdings,Share PM,Secondary market holdings,Share SM,Total holdings
0,Jun 16,241,3.76%,6158,96.24%,6398
1,Jul 16,775,5.86%,12439,94.14%,13214
2,Aug 16,1299,6.52%,18622,93.48%,19921
3,Sep 16,3243,10.91%,26479,89.09%,29722
4,Oct 16,4222,11.07%,33922,88.93%,38144


Notice that the only part that has changed with respect how we loaded data in class is the bucket (the root directory under which all the subsequent items will be stored) name "s3groupmorocco" (before it was "goz39a") and the ending part "data/CSPP_breakdown_history.csv". Any other file can be accessed by replacing the ending part. For instance, "data/CSPP_PEPP_corporate_bond_holdings_20210423.csv". Notice further that we can access the data files in this way because I previously stored them as files with public access (see below how to access the datafiles under private access). 

### Way 2 (boto3 package):

This is how it is suggested in the email we can access our S3 bucket. I find the first way easier, however, it requires the data files to be accesible to the public. Therefore, if we are working with sensible data files, we will prefer this way. First, we need to install the boto3 package:

In [17]:
pip install boto3

Note: you may need to restart the kernel to use updated packages.


Notice that we can specify the version by typing "pip install boto3==1.17.75". Now, we need to install AWS Command Line Interface (AWS CLI) to be able to configure our client so we can access the datafiles.

In [18]:
pip install awscli

Note: you may need to restart the kernel to use updated packages.


After installing AWS CLI, we need to configure our credentials. To do this, access the Anaconda prompt and type "aws configure". Then, introduce the AWS Access Key ID and AWS Secret Access Key (I emailed them to you) and the Default region name (eu-central-1). The Default output format can be leave empty. Notice that it is possible to configure our credentials directly in jupyter, however, this would expose our credentials to anyone with access to this file. Once we have configured our credentials, we can finally access our data through the package boto3. 

In [19]:
import boto3

Among all the services provided by AWS, we want to make use of Amazon S3. Thus, we create and s3 object.

In [20]:
resource = boto3.resource('s3')

This object can be used to create buckets or to access a list with existing buckets (the buckets from the other groups will show up too). 

In [25]:
# List buckets' names
for bucket in resource.buckets.all():
    print(bucket.name)

cf-templates-16ihypz3vva90-eu-central-1
goz39a
kulsagemaker20210301
lfspf
s3groupandorra
s3grouparmenia
s3groupaustralia
s3groupbahamas
s3groupbarbados
s3groupbelgium
s3groupcameroon
s3groupchad
s3groupcroatia
s3groupczechia
s3groupdenmark
s3groupegypt
s3groupfinland
s3groupgeorgia
s3groupgermany
s3grouphaiti
s3grouphungary
s3groupiceland
s3groupireland
s3groupjordan
s3grouplatvia
s3grouplebanon
s3groupmalawi
s3groupmalaysia
s3groupmauritius
s3groupmonaco
s3groupmongolia
s3groupmorocco
s3groupmozambique
s3groupnetherlands
s3groupnewzealand
s3grouppanama
s3groupperu
s3groupphilippines
s3grouppoland
s3grouprwanda
s3groupseychelles
s3groupsingapore
s3groupsomalia
s3groupsweden
s3groupswitzerland
s3grouptajikistan
s3grouptest
s3groupthailand
s3grouptimorleste
s3groupturkey
s3grouptuvalu
s3groupvaleria


To access our data files, we need to create a different object (client object). This object allows us, among other things, to upload data files to AWS S3 as well retrieve any data files stored previously. For instance, we can access "CSPP_breakdown_history.csv" in the following way:

In [26]:
client = boto3.client('s3')

# Create the S3 object
obj = client.get_object(
    Bucket = 's3groupmorocco',
    Key = 'data/CSPP_breakdown_history.csv'
)
    
# Read data from the S3 object
data = pd.read_csv(obj['Body'])
data.head()

Unnamed: 0,End of Month,Primary market holdings,Share PM,Secondary market holdings,Share SM,Total holdings
0,Jun 16,241,3.76%,6158,96.24%,6398
1,Jul 16,775,5.86%,12439,94.14%,13214
2,Aug 16,1299,6.52%,18622,93.48%,19921
3,Sep 16,3243,10.91%,26479,89.09%,29722
4,Oct 16,4222,11.07%,33922,88.93%,38144
