# 📘 Upload Data to S3 – Google Colab + AWS
This notebook will connect your Colab environment to AWS and upload a sample churn dataset to Amazon S3.

## 🔐 Step 1: Set Your AWS Credentials
_(Use IAM user credentials with S3 full access – NEVER share them)_

In [None]:
# ⛔ IMPORTANT: Replace with your actual AWS credentials
aws_access_key = 'YOUR_AWS_ACCESS_KEY_ID'
aws_secret_key = 'YOUR_AWS_SECRET_ACCESS_KEY'
region_name = 'us-east-1'  # or your preferred region


In [None]:
!pip install boto3 pandas scikit-learn --quiet

In [None]:

import boto3
import pandas as pd
from sklearn.datasets import fetch_openml
import os

# Set credentials as environment variables (used by boto3)
os.environ['AWS_ACCESS_KEY_ID'] = aws_access_key
os.environ['AWS_SECRET_ACCESS_KEY'] = aws_secret_key


## 📊 Step 2: Load and Save Dataset

In [None]:

# Load churn dataset from OpenML
data = fetch_openml("telco-customer-churn", version=1, as_frame=True)
df = data.frame.dropna()

# Save to CSV
df.to_csv("churn.csv", index=False)
df.head()


## ☁️ Step 3: Upload to S3

In [None]:

# Connect to S3
s3 = boto3.client('s3', region_name=region_name)

# Define bucket and upload
bucket_name = 'your-sagemaker-bucket-name'  # 🔁 replace with your S3 bucket name
s3_key = 'automl-churn/churn.csv'

s3.upload_file('churn.csv', bucket_name, s3_key)
print(f"✅ Uploaded to s3://{bucket_name}/{s3_key}")


🚀 Done! Your dataset is now in S3 and ready for SageMaker Autopilot.