# Accessing an AWS S3 bucket & downloading a csv file



In this notebook we will learn how to access an AWS S3 bucket, download a CSV file from that bucket, create a dataframe from the CSV file contents, and finally perform some calculations on the downloaded data. 

**Note:**  If you are not familiar with manipulating csv files, work with the **simpleCalc.ipynb** jupyter notebook first to become familiar with file manipulation. 



In [1]:
!pip install -r requirements.txt

You should consider upgrading via the '/opt/app-root/bin/python3.8 -m pip install --upgrade pip' command.[0m


## Imports
We will need to import various packages. They are either built in the notebook image you are running, or have been installed in the previous step.

In [2]:
#========================================================================================
# import needed libraries/packages 
#
# Note:  we use boto3 which is a Python SDK for AWS.  It allows you to create,
# configure and manage AWS resources from your Python scripts.  
#========================================================================================
import os
import pandas as pd
import boto3
import botocore

from botocore import UNSIGNED
from botocore.client import Config

In [6]:
#========================================================================================
# Identify the name of your S3 bucket, and then the file you wish to download.
#========================================================================================
bucket_name     = 'rhods-pilot'
file_name       = 'truckdata.csv'
new_file_name   = 'newtruckdata.csv'

local_dest_dir  = os.path.join(os.getcwd(), 'datasets/')
s3              = boto3.client('s3', config=Config(signature_version=UNSIGNED))

s3.download_file(bucket_name, file_name, (local_dest_dir + new_file_name))


In [None]:
#========================================================================================
# Read the CSV file data into a dataframe
#========================================================================================
df              = pd.read_csv(new_file_name)

In [None]:
#========================================================================================
# Print the contents of the dataframe
#========================================================================================
print(df)
print(df.mileage)

In [None]:
#========================================================================================
# Perform calculations on imported data
#========================================================================================
total_mileage   = sum(df.mileage)  #or total_mileage = dset['mileage'].sum()
print(total_mileage)

total_rows      = len(df.index)
print(total_rows)

average_mileage = (total_mileage/total_rows)
print(average_mileage)