**Created by Berkay Alan**

** **

**Reading Data from AWS S3**

**4 of September, 2022**

**For more Tutorial**: https://www.kaggle.com/berkayalan

# Content

- Reading Data from AWS S3 with Boto

# Resources

- [**Connecting to AWS S3 with Python**](https://www.gormanalysis.com/blog/connecting-to-aws-s3-with-python/)

# Installing Libraries

In [2]:
import pandas as pd
import numpy as np
import boto3
import os

from warnings import filterwarnings
filterwarnings('ignore')

# Authentication of Boto3

 In order to connect to S3, we need to authenticate. We can do this in many ways using boto. Perhaps the easiest and most direct method is just to include our credentials as parameters to boto3.resource(). For example, here we create a ServiceResource object that we can use to connect to S3.

In [3]:
s3 = boto3.resource(
    service_name='s3',
    region_name='eu-west-1',
    aws_access_key_id='mykey',
    aws_secret_access_key='mysecretkey'
)

Note that *region_name* should be the region of our S3 bucket.

We can print a list of all the S3 buckets in our resource like this.

In [4]:
# Print out bucket names
for bucket in s3.buckets.all():
    print(bucket.name)

berkayalan


We can also see list all the objects in our bucket:

In [7]:
for obj in s3.Bucket('berkayalan').objects.all(): # bucket name
    print(obj)

s3.ObjectSummary(bucket_name='berkayalan', key='Apple.csv')
s3.ObjectSummary(bucket_name='berkayalan', key='starbucks.csv')


This returns a list of s3_objects. We can wead one of these CSV files from S3 into python by fetching an object and then the object’s Body, like below.

In [14]:
# Load csv file directly into python
object_name = s3.Bucket('berkayalan').Object('Apple.csv').get() # bucket name and file name;

In [15]:
object_name

{'ResponseMetadata': {'RequestId': 'J8VXZS4F3KD4QAGY',
  'HostId': 'JJJXNFIERcN3N1+VpnFyPX/0iTICDdaGvif7XYAWJ2Q2gnx77IKCuUWbvUS/9MYPv0EhizDeBzI=',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amz-id-2': 'JJJXNFIERcN3N1+VpnFyPX/0iTICDdaGvif7XYAWJ2Q2gnx77IKCuUWbvUS/9MYPv0EhizDeBzI=',
   'x-amz-request-id': 'J8VXZS4F3KD4QAGY',
   'date': 'Sun, 04 Sep 2022 17:33:58 GMT',
   'last-modified': 'Sun, 04 Sep 2022 17:25:52 GMT',
   'etag': '"1033b16bc34e2aa0a23fc756c909e807"',
   'accept-ranges': 'bytes',
   'content-type': 'binary/octet-stream',
   'server': 'AmazonS3',
   'content-length': '67664'},
  'RetryAttempts': 0},
 'AcceptRanges': 'bytes',
 'LastModified': datetime.datetime(2022, 9, 4, 17, 25, 52, tzinfo=tzutc()),
 'ContentLength': 67664,
 'ETag': '"1033b16bc34e2aa0a23fc756c909e807"',
 'ContentType': 'binary/octet-stream',
 'Metadata': {},
 'Body': <botocore.response.StreamingBody at 0x7f908802baf0>}

In [16]:
apple = pd.read_csv(object_name['Body'], index_col=0)

In [17]:
apple.head()

Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2019-01-02,38.7225,39.712502,38.557499,39.48,38.168346,148158800
2019-01-03,35.994999,36.43,35.5,35.547501,34.366497,365248800
2019-01-04,36.1325,37.137501,35.950001,37.064999,35.83358,234428400
2019-01-07,37.174999,37.2075,36.474998,36.982498,35.753822,219111200
2019-01-08,37.389999,37.955002,37.130001,37.6875,36.435398,164101200


Alternatively, we could download a file from S3 and then read it from disc.

In [18]:
s3.Bucket('berkayalan').download_file(Key='starbucks.csv', Filename='starbucks_local.csv')
str_local = pd.read_csv('starbucks_local.csv', index_col=0)

In [19]:
str_local.head()

Unnamed: 0_level_0,Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1
2015-01-02,38.0061,6906098
2015-01-05,37.2781,11623796
2015-01-06,36.9748,7664340
2015-01-07,37.8848,9732554
2015-01-08,38.4961,13170548


That's all!