# Social Media Engagement Prediction - Azure ML Studio

This notebook demonstrates how to:
1. Download your files from Azure Blob Storage
2. Load and explore your cleaned dataset
3. Train ML models for engagement prediction
4. Make predictions on new data
5. Save models back to Azure Storage

## Step 1: Setup and Download Files from Blob Storage

In [None]:
# Install required packages
!pip install azure-storage-blob azure-identity scikit-learn pandas numpy joblib --quiet

In [None]:
import os
import pandas as pd
import numpy as np
from azure.storage.blob import BlobServiceClient
from azure.identity import DefaultAzureCredential

# Configuration
STORAGE_ACCOUNT = "stsocialmediajkvqol"
STORAGE_URL = f"https://{STORAGE_ACCOUNT}.blob.core.windows.net"

print("‚úÖ Imports successful!")

In [None]:
# Download Python scripts from blob storage
print("üì• Downloading files from Azure Blob Storage...\n")

# Use Azure CLI to download files (easier authentication)
!az storage blob download-batch \
    --account-name stsocialmediajkvqol \
    --source notebooks \
    --destination . \
    --auth-mode login \
    --pattern "*.py" \
    --output table

print("\n‚úÖ Files downloaded!")

In [None]:
# Download the cleaned dataset
print("üì• Downloading cleaned dataset...\n")

!az storage blob download \
    --account-name stsocialmediajkvqol \
    --container-name data \
    --name social_media_cleaned.csv \
    --file social_media_cleaned.csv \
    --auth-mode login

print("‚úÖ Dataset downloaded!")

In [None]:
# Download trained models
print("üì• Downloading trained models...\n")

!az storage blob download-batch \
    --account-name stsocialmediajkvqol \
    --source models \
    --destination ./models \
    --auth-mode login \
    --pattern "*.pkl" \
    --output table

print("\n‚úÖ Models downloaded!")

## Step 2: Explore the Dataset

In [None]:
# Load the cleaned dataset
df = pd.read_csv('social_media_cleaned.csv')

print("üìä Dataset Overview:")
print(f"Shape: {df.shape}")
print(f"\nColumns: {list(df.columns)}")
print(f"\nFirst few rows:")
df.head()

In [None]:
# Statistical summary
print("üìà Statistical Summary:")
df.describe()

In [None]:
# Check for missing values
print("üîç Missing Values:")
print(df.isnull().sum())

## Step 3: Train ML Models (Option 1: Run existing script)

In [None]:
# Run the training script
print("üöÄ Starting model training...\n")
%run TRAIN_FINAL_OPTIMIZED.py

## Step 4: Load Trained Models and Make Predictions

In [None]:
import joblib

# Load the best model
model_path = 'models/best_engagement_model.pkl'

if os.path.exists(model_path):
    model = joblib.load(model_path)
    print(f"‚úÖ Model loaded from {model_path}")
    print(f"Model type: {type(model).__name__}")
else:
    print("‚ùå Model not found. Please run training first.")

In [None]:
# Make predictions on test data
print("üîÆ Making predictions...\n")
%run test_model_on_real_data.py

## Step 5: Upload Results Back to Azure Storage

In [None]:
# Upload newly trained models back to blob storage
print("üì§ Uploading models to Azure Storage...\n")

!az storage blob upload-batch \
    --account-name stsocialmediajkvqol \
    --destination models \
    --source ./models \
    --auth-mode login \
    --pattern "*.pkl" \
    --overwrite \
    --output table

print("\n‚úÖ Models uploaded!")

In [None]:
# Upload experiment results
print("üì§ Uploading experiment results...\n")

!az storage blob upload-batch \
    --account-name stsocialmediajkvqol \
    --destination experiments \
    --source . \
    --auth-mode login \
    --pattern "*.csv" \
    --overwrite \
    --output table

print("\n‚úÖ Results uploaded!")

## Step 6: Monitor and Visualize Results

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

# Set style
sns.set_style('whitegrid')
plt.rcParams['figure.figsize'] = (12, 6)

print("‚úÖ Visualization libraries loaded!")

In [None]:
# Visualize engagement distribution
if 'Engagement' in df.columns:
    plt.figure(figsize=(10, 6))
    plt.hist(df['Engagement'], bins=50, edgecolor='black', alpha=0.7)
    plt.xlabel('Engagement Score')
    plt.ylabel('Frequency')
    plt.title('Distribution of Engagement Scores')
    plt.grid(True, alpha=0.3)
    plt.show()
else:
    print("‚ö†Ô∏è 'Engagement' column not found in dataset")

In [None]:
# Feature correlation heatmap
numeric_cols = df.select_dtypes(include=[np.number]).columns
if len(numeric_cols) > 1:
    plt.figure(figsize=(12, 8))
    correlation = df[numeric_cols].corr()
    sns.heatmap(correlation, annot=True, fmt='.2f', cmap='coolwarm', center=0)
    plt.title('Feature Correlation Heatmap')
    plt.tight_layout()
    plt.show()
else:
    print("‚ö†Ô∏è Not enough numeric columns for correlation analysis")

## Summary

This notebook demonstrated:
- ‚úÖ Downloading files from Azure Blob Storage
- ‚úÖ Loading and exploring the dataset
- ‚úÖ Training ML models
- ‚úÖ Making predictions
- ‚úÖ Uploading results back to Azure Storage
- ‚úÖ Visualizing data and results

### Next Steps:
1. Experiment with different model parameters
2. Try feature engineering
3. Deploy the model as a web service
4. Set up automated retraining pipelines