# Ingestion and Storage of Static Files into S3 Bucket

## **Objective**
This document outlines the steps to upload static files (e.g., PDFs) into an AWS S3 bucket using Google Colab. This process is integral to setting up the Data Lake for further analysis and integration.

---

## **Prerequisites**
1. **AWS Account**: Ensure you have an active AWS account.
2. **Google Colab**: Python environment set up for the execution of scripts.
3. **AWS Credentials**: Have the following ready:
   - `AWS_ACCESS_KEY_ID`
   - `AWS_SECRET_ACCESS_KEY`
   - `AWS_REGION`

---

## **Step 1: Install Required Libraries**

Install the `boto3` library to interact with AWS services.

```python
!pip install boto3


In [None]:
!pip install boto3



Step 2: Import Required Modules
Import the necessary Python libraries for AWS interaction and file handling.

In [None]:
import boto3
import os


In [None]:
from google.colab import drive
drive.mount('/content/drive')

In [None]:
for file_name in os.listdir(local_directory):
    print(file_name)


Step 3: Configure AWS Credentials
Set up the AWS credentials to allow secure access to the S3 bucket.

In [None]:
# AWS credentials & S3 bucket name
aws_access_key = ''
aws_secret_key = ''
aws_region = 'us-east-1'
bucket_name = 'mysbb-static-files-bucket'

# Initialize the S3 client
s3_client = boto3.client(
    's3',
    aws_access_key_id=aws_access_key,
    aws_secret_access_key=aws_secret_key,
    region_name=aws_region
)


Step 4:
Define the S3 Bucket and Local Directory

Specify the S3 bucket name and the path to the local directory containing files for upload.

In [None]:


# Local directory containing static files
local_directory = '/content/drive/My Drive/Data Warehouse & Lake System'


Step 5: Create the Upload Function

Define a function to upload files from the local directory to the S3 bucket.

The function traverses all files in the directory and uploads them.

In [None]:
def upload_files_to_s3(local_directory, bucket_name):
    for root, _, files in os.walk(local_directory):
        for file in files:
            local_path = os.path.join(root, file)
            relative_path = os.path.relpath(local_path, local_directory)
            s3_path = relative_path.replace("\\", "/")  # Ensure compatibility with Windows paths

            try:
                s3_client.upload_file(local_path, bucket_name, s3_path)
                print(f"Uploaded {local_path} to s3://{bucket_name}/{s3_path}")
            except Exception as e:
                print(f"Failed to upload {local_path}: {str(e)}")


Execute the function to upload all the files in the specified local directory to the S3 bucket.

In [None]:
upload_files_to_s3(local_directory, bucket_name)


Uploaded /content/drive/My Drive/Data Warehouse & Lake System/anzahl-sbb-bahnhofbenutzer-tagesverlauf.csv to s3://mysbb-static-files-bucket/anzahl-sbb-bahnhofbenutzer-tagesverlauf.csv
Uploaded /content/drive/My Drive/Data Warehouse & Lake System/anzahl-sbb-bahnhofbenutzer.csv to s3://mysbb-static-files-bucket/anzahl-sbb-bahnhofbenutzer.csv
Uploaded /content/drive/My Drive/Data Warehouse & Lake System/linie.csv to s3://mysbb-static-files-bucket/linie.csv
Uploaded /content/drive/My Drive/Data Warehouse & Lake System/data_ingestion_static_files.ipynb to s3://mysbb-static-files-bucket/data_ingestion_static_files.ipynb


Step 7: Verify the Uploads
Check whether the files have been successfully uploaded to the S3 bucket by listing its contents.

In [None]:
response = s3_client.list_objects_v2(Bucket=bucket_name)

if 'Contents' in response:
    for obj in response['Contents']:
        print(obj['Key'])
else:
    print("No files found in the bucket.")


OpenWeatherMap .ipynb
anzahl-sbb-bahnhofbenutzer-tagesverlauf.csv
anzahl-sbb-bahnhofbenutzer.csv
data_ingestion_static_files.ipynb
linie.csv
swiss_weather_data.csv
swiss_weather_data_over_time.csv


In [None]:
# List all files in the S3 bucket
response = s3_client.list_objects_v2(Bucket=bucket_name)

if 'Contents' in response:
    for obj in response['Contents']:
        print(f"File: {obj['Key']}, Size: {obj['Size']} bytes")
else:
    print("No files found in the bucket.")


File: OpenWeatherMap .ipynb, Size: 28909 bytes
File: anzahl-sbb-bahnhofbenutzer-tagesverlauf.csv, Size: 23953 bytes
File: anzahl-sbb-bahnhofbenutzer.csv, Size: 6566 bytes
File: data_ingestion_static_files.ipynb, Size: 11254 bytes
File: linie.csv, Size: 100536 bytes
File: swiss_weather_data.csv, Size: 359 bytes
File: swiss_weather_data_over_time.csv, Size: 8674 bytes
