# Amazon SageMaker Learning Platform - Interactive Tutorial

## 🚀 Comprehensive Guide to Building and Using SageMaker Learning Resources

Welcome to this interactive notebook that demonstrates how to use the Amazon SageMaker Learning Platform. This notebook contains all the resources, links, and practical examples you need to master SageMaker.

### 📋 What You'll Learn:
- How to access and use the downloadable resources (PDF, PPT)
- Interactive Google Colab notebooks for hands-on practice
- Complete SageMaker development workflow
- Best practices and real-world examples

### 🔗 Available Resources:
1. **Developer PDF**: [SageMaker_doc.pdf](https://raw.githubusercontent.com/07Sushant/dump/main/SageMaker_doc.pdf)
2. **Workflow PPT**: [SageMaker_PPT.pptx](https://raw.githubusercontent.com/07Sushant/dump/main/SageMaker_PPT.pptx)
3. **Getting Started Notebook**: [Colab Link 1](https://colab.research.google.com/drive/1k6DfzXKMih7BvzJ5OXFU_WEfQfI7-RC_?usp=sharing)
4. **Model Training Notebook**: [Colab Link 2](https://colab.research.google.com/drive/1F0L2gTWSrZQH0BwIaRrmeS2tE500A-CL?usp=sharing)

---

In [None]:
# Setup and Imports
import boto3
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime
import warnings
warnings.filterwarnings('ignore')

# SageMaker specific imports
import sagemaker
from sagemaker import get_execution_role
from sagemaker.sklearn.estimator import SKLearn
from sagemaker.tensorflow import TensorFlow
from sagemaker.pytorch import PyTorch

# Display versions
print("Python Libraries Setup Complete!")
print(f"SageMaker SDK Version: {sagemaker.__version__}")
print(f"Boto3 Version: {boto3.__version__}")
print(f"Current Time: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")

# Basic AWS Configuration Check
try:
    session = boto3.Session()
    region = session.region_name or 'us-east-1'
    print(f"AWS Region: {region}")
except Exception as e:
    print("Note: Configure AWS credentials for full functionality")

## 🏗️ SageMaker Architecture Overview

Amazon SageMaker is a fully managed machine learning service that enables developers and data scientists to build, train, and deploy ML models at scale.

### Key Components:

#### 1. **SageMaker Studio**
- Integrated development environment for ML
- Jupyter notebook interface
- Visual workflow designer

#### 2. **SageMaker Notebooks**
- Pre-configured Jupyter notebooks
- Built-in algorithms and frameworks
- Collaborative development

#### 3. **SageMaker Training**
- Distributed training capabilities
- Automatic model tuning
- Built-in algorithms

#### 4. **SageMaker Hosting**
- Real-time inference endpoints
- Batch transform jobs
- Multi-model endpoints

#### 5. **SageMaker Pipelines**
- ML workflow orchestration
- CI/CD for ML models
- Automated retraining

### 📊 ML Lifecycle with SageMaker:

```
Data Preparation → Model Training → Model Validation → Model Deployment → Monitoring
      ↓                ↓               ↓                ↓              ↓
  Ground Truth    Training Jobs    Model Registry   Endpoints    Model Monitor
  Data Wrangler   Hyperparameter   A/B Testing     Batch Jobs   Data Quality
  Processing Jobs    Tuning        Model Cards     Multi-Model   Model Drift
```

---

In [None]:
# SageMaker Session Setup
import sagemaker
from sagemaker import get_execution_role

# Initialize SageMaker session
sagemaker_session = sagemaker.Session()

# Get the execution role (when running on SageMaker)
try:
    role = get_execution_role()
    print(f"SageMaker Execution Role: {role}")
except:
    # For local development, you can specify your role ARN
    role = "arn:aws:iam::YOUR_ACCOUNT:role/service-role/AmazonSageMaker-ExecutionRole"
    print("Using local development role")

# Get default bucket
bucket = sagemaker_session.default_bucket()
print(f"Default S3 Bucket: {bucket}")

# Get region
region = sagemaker_session.boto_region_name
print(f"AWS Region: {region}")

# Display SageMaker session information
print("\n=== SageMaker Session Information ===")
print(f"SageMaker Session: {sagemaker_session}")
print(f"Boto Session Region: {sagemaker_session.boto_region_name}")
print(f"Default Bucket: {sagemaker_session.default_bucket()}")

# Create S3 prefix for our experiments
prefix = 'sagemaker-learning-platform'
print(f"S3 Prefix: {prefix}")

# Test S3 connectivity
try:
    s3_client = sagemaker_session.boto_session.client('s3')
    buckets = s3_client.list_buckets()
    print(f"✅ S3 Connection Successful - Found {len(buckets['Buckets'])} buckets")
except Exception as e:
    print(f"❌ S3 Connection Error: {e}")

print("\n🎉 SageMaker environment setup complete!")

## 📊 Data Preparation and Processing

Data preparation is crucial for successful machine learning projects. SageMaker provides several tools and services for data preparation:

### Available Tools:
- **SageMaker Data Wrangler**: Visual data preparation tool
- **SageMaker Processing Jobs**: Scalable data processing
- **SageMaker Ground Truth**: Data labeling service
- **Built-in Algorithms**: Pre-processing capabilities

### Data Processing Workflow:
1. **Data Ingestion**: Load data from various sources
2. **Data Exploration**: Understand data characteristics
3. **Data Cleaning**: Handle missing values, outliers
4. **Feature Engineering**: Create meaningful features
5. **Data Splitting**: Train/validation/test sets
6. **Data Upload**: Store processed data in S3

Let's walk through a practical example using a sample dataset:

---

In [None]:
# Sample Data Creation and Preparation
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.datasets import make_classification
import matplotlib.pyplot as plt
import seaborn as sns

# Create a sample dataset for demonstration
print("Creating sample dataset...")
X, y = make_classification(
    n_samples=1000,
    n_features=10,
    n_informative=8,
    n_redundant=2,
    n_clusters_per_class=1,
    random_state=42
)

# Create feature names
feature_names = [f'feature_{i+1}' for i in range(X.shape[1])]

# Create DataFrame
df = pd.DataFrame(X, columns=feature_names)
df['target'] = y

print(f"Dataset Shape: {df.shape}")
print(f"Features: {list(df.columns[:-1])}")
print(f"Target Distribution:\n{df['target'].value_counts()}")

# Display first few rows
print("\nFirst 5 rows:")
print(df.head())

# Basic statistics
print("\nDataset Statistics:")
print(df.describe())

# Check for missing values
print(f"\nMissing Values: {df.isnull().sum().sum()}")

# Visualize target distribution
plt.figure(figsize=(12, 4))

plt.subplot(1, 2, 1)
df['target'].value_counts().plot(kind='bar')
plt.title('Target Distribution')
plt.xlabel('Class')
plt.ylabel('Count')

plt.subplot(1, 2, 2)
# Correlation heatmap of first 5 features
correlation_matrix = df[feature_names[:5]].corr()
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', center=0)
plt.title('Feature Correlation Matrix (First 5 Features)')

plt.tight_layout()
plt.show()

print("✅ Sample dataset created and explored successfully!")