# 🌸 Your First ML Model in Google Colab

**Welcome to machine learning!** This notebook will guide you through building your first ML model.

**What you'll build**: A flower species classifier that can identify iris flowers from their measurements.

**How to use this notebook**:
1. **Save a copy**: File → Save a copy in Drive (so you can edit it)
2. **Run each cell**: Click the ▶️ button or press Shift + Enter
3. **Follow along**: Read the explanations and watch the magic happen!

**📝 Note**: This notebook is hosted on GitHub and opens directly in Colab!

**Time needed**: 15-20 minutes

---

## Step 1: Import Libraries 📚

First, let's import all the tools we need for machine learning:

In [None]:
# Import the tools we need for machine learning
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

# Set up plotting
plt.style.use('default')
sns.set_palette("husl")

print("✅ All libraries imported successfully!")
print("🚀 Ready to build your first ML model!")

## Step 2: Load and Explore the Data 🔍

Let's load the famous iris flower dataset and see what we're working with:

In [None]:
# Load the famous iris flower dataset
print("🌸 Loading the Iris dataset...")
data = load_iris()

# Convert to a pandas DataFrame for easier handling
df = pd.DataFrame(data.data, columns=data.feature_names)
df['species'] = data.target_names[data.target]

# Display basic information
print(f"📊 Dataset shape: {df.shape[0]} flowers, {df.shape[1]-1} measurements")
print(f"🏷️  Species: {', '.join(data.target_names)}")
print(f"📏 Measurements: {', '.join(data.feature_names)}")

# Show the first few flowers
print("\n🔍 First 5 flowers in our dataset:")
display(df.head())

# Check for any missing data
print(f"\n❓ Missing values: {df.isnull().sum().sum()}")
print("✅ Dataset is clean and ready!")

## Step 3: Visualize the Data 📈

Let's create beautiful visualizations to understand our data better:

In [None]:
# Create beautiful visualizations to understand our data
print("📈 Creating data visualizations...")

# Set up a 2x2 grid of plots
fig, axes = plt.subplots(2, 2, figsize=(15, 12))
fig.suptitle('🌸 Iris Dataset Analysis', fontsize=16, fontweight='bold')

# Plot 1: Sepal measurements by species
axes[0, 0].scatter(df['sepal length (cm)'], df['sepal width (cm)'], 
                  c=data.target, cmap='viridis', alpha=0.7, s=50)
axes[0, 0].set_xlabel('Sepal Length (cm)')
axes[0, 0].set_ylabel('Sepal Width (cm)')
axes[0, 0].set_title('Sepal Measurements by Species')
axes[0, 0].grid(True, alpha=0.3)

# Plot 2: Petal measurements by species  
axes[0, 1].scatter(df['petal length (cm)'], df['petal width (cm)'], 
                  c=data.target, cmap='viridis', alpha=0.7, s=50)
axes[0, 1].set_xlabel('Petal Length (cm)')
axes[0, 1].set_ylabel('Petal Width (cm)')
axes[0, 1].set_title('Petal Measurements by Species')
axes[0, 1].grid(True, alpha=0.3)

# Plot 3: Distribution of measurements
df[data.feature_names].hist(bins=20, ax=axes[1, 0], alpha=0.7, color='skyblue')
axes[1, 0].set_title('Distribution of All Measurements')

# Plot 4: Species count
species_counts = df['species'].value_counts()
bars = axes[1, 1].bar(species_counts.index, species_counts.values, 
                     color=['#FF6B6B', '#4ECDC4', '#45B7D1'])
axes[1, 1].set_title('Number of Flowers per Species')
axes[1, 1].set_ylabel('Count')
# Add value labels on bars
for bar in bars:
    height = bar.get_height()
    axes[1, 1].text(bar.get_x() + bar.get_width()/2., height + 0.5,
                    f'{int(height)}', ha='center', va='bottom')

plt.tight_layout()
plt.show()

print("📊 Key observations:")
print("• Each species has distinct petal characteristics")
print("• Setosa has the smallest petals")
print("• Virginica has the largest petals")
print("• This should make classification easier!")

## Step 4: Prepare Data for Machine Learning 🔧

Now let's split our data into training and testing sets:

In [None]:
# Prepare our data for training a machine learning model
print("🔧 Preparing data for machine learning...")

# Separate features (measurements) from target (species)
X = data.data  # Features: sepal length, sepal width, petal length, petal width
y = data.target  # Target: species (0=setosa, 1=versicolor, 2=virginica)

# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(
    X, y, 
    test_size=0.3,      # Use 30% for testing
    random_state=42,    # For reproducible results
    stratify=y          # Ensure equal representation of each species
)

print(f"📚 Training set: {X_train.shape[0]} flowers")
print(f"🧪 Testing set: {X_test.shape[0]} flowers")
print(f"📊 Features per flower: {X_train.shape[1]}")

# Show the split by species
train_species = pd.Series(y_train).value_counts().sort_index()
test_species = pd.Series(y_test).value_counts().sort_index()

print("\n🌸 Training set by species:")
for i, species in enumerate(data.target_names):
    print(f"  {species}: {train_species[i]} flowers")

print("\n🧪 Testing set by species:")
for i, species in enumerate(data.target_names):
    print(f"  {species}: {test_species[i]} flowers")

## Step 5: Train the Machine Learning Model 🤖

Time to create and train our AI model!

In [None]:
# Create and train our machine learning model
print("🤖 Training the machine learning model...")

# Create a Random Forest classifier
model = RandomForestClassifier(
    n_estimators=100,    # Use 100 decision trees
    random_state=42,     # For reproducible results
    max_depth=3          # Prevent overfitting
)

# Train the model on our training data
model.fit(X_train, y_train)

# Make predictions on the test set
y_pred = model.predict(X_test)

# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)

print(f"🎯 Model trained successfully!")
print(f"📈 Accuracy on test set: {accuracy:.1%}")

# Show which features are most important
feature_importance = pd.DataFrame({
    'feature': data.feature_names,
    'importance': model.feature_importances_
}).sort_values('importance', ascending=False)

print("\n🔍 Most important features for classification:")
for _, row in feature_importance.iterrows():
    print(f"  {row['feature']}: {row['importance']:.3f}")

## Step 6: Evaluate Model Performance 📊

Let's see how well our model performs in detail:

In [None]:
# Evaluate how well our model performs
print("📊 Evaluating model performance...")

# Detailed classification report
print("\n📋 Detailed Classification Report:")
print(classification_report(y_test, y_pred, target_names=data.target_names))

# Create a confusion matrix to see where the model makes mistakes
cm = confusion_matrix(y_test, y_pred)

# Visualize the confusion matrix
plt.figure(figsize=(10, 8))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', 
            xticklabels=data.target_names, 
            yticklabels=data.target_names,
            cbar_kws={'label': 'Number of Flowers'})
plt.title('🎯 Confusion Matrix: Actual vs Predicted Species', fontsize=14, fontweight='bold')
plt.xlabel('Predicted Species', fontsize=12)
plt.ylabel('Actual Species', fontsize=12)
plt.show()

# Calculate per-species accuracy
print("\n🌸 Accuracy by species:")
for i, species in enumerate(data.target_names):
    species_mask = y_test == i
    if species_mask.sum() > 0:
        species_accuracy = (y_pred[species_mask] == i).mean()
        print(f"  {species}: {species_accuracy:.1%}")

# Show any misclassifications
misclassified = X_test[y_test != y_pred]
if len(misclassified) > 0:
    print(f"\n❌ Misclassified flowers: {len(misclassified)}")
    print("These are the flowers our model got wrong - let's learn from them!")
else:
    print("\n🎉 Perfect classification! No mistakes on the test set!")

## Step 7: Make Predictions on New Flowers 🔮

Now for the exciting part - let's use our model to predict new flower species!

In [None]:
# Use our trained model to predict new flower species
print("🔮 Making predictions on new flowers...")

# Create some example new flowers to classify
new_flowers = np.array([
    [5.1, 3.5, 1.4, 0.2],  # Small petals - likely Setosa
    [6.2, 2.8, 4.8, 1.8],  # Large petals - likely Virginica  
    [5.7, 2.8, 4.1, 1.3],  # Medium petals - likely Versicolor
    [4.9, 3.1, 1.5, 0.1],  # Very small petals - likely Setosa
    [7.2, 3.0, 5.8, 1.6]   # Very large petals - likely Virginica
])

# Make predictions
predictions = model.predict(new_flowers)
probabilities = model.predict_proba(new_flowers)

print("\n🌸 Prediction Results:")
print("=" * 60)

for i, (flower, pred, prob) in enumerate(zip(new_flowers, predictions, probabilities)):
    species = data.target_names[pred]
    confidence = prob.max()
    
    print(f"\n🌺 Flower #{i+1}:")
    print(f"   Measurements: {flower}")
    print(f"   Predicted species: {species}")
    print(f"   Confidence: {confidence:.1%}")
    
    # Show probability for each species
    print("   Probabilities:")
    for j, (species_name, probability) in enumerate(zip(data.target_names, prob)):
        emoji = "🎯" if j == pred else "  "
        print(f"     {emoji} {species_name}: {probability:.1%}")

print("\n✨ Amazing! Your model can now identify iris species from measurements!")

## 🚀 Google Colab Special Features

Since you're using Colab, let's explore some unique features:

In [None]:
# Check if GPU is available (Colab's superpower!)
try:
    import torch
    print(f"🔥 GPU available: {torch.cuda.is_available()}")
    if torch.cuda.is_available():
        print(f"🎮 GPU name: {torch.cuda.get_device_name(0)}")
        print("💡 Tip: Use Runtime → Change runtime type → GPU for faster training!")
    else:
        print("💡 Tip: Enable GPU in Runtime → Change runtime type → Hardware accelerator → GPU")
except ImportError:
    print("📦 PyTorch not installed, but that's okay for this example!")

# Show system information
import platform
print(f"\n💻 System: {platform.system()}")
print(f"🐍 Python version: {platform.python_version()}")
print(f"📍 You're running this in the cloud! ☁️")

In [None]:
# Save your model to Google Drive (optional)
print("💾 Want to save your model? Uncomment the code below:")
print("")

# Uncomment these lines to save to Google Drive:
# from google.colab import drive
# import joblib
# 
# # Mount Google Drive
# drive.mount('/content/drive')
# 
# # Save the model
# joblib.dump(model, '/content/drive/MyDrive/iris_classifier.pkl')
# print("✅ Model saved to your Google Drive!")

print("🔗 Sharing tip: Click 'Share' button (top right) to send this notebook to friends!")

## 🎉 Congratulations!

You've successfully built your first machine learning model in Google Colab!

### What you accomplished:
✅ Loaded and explored a dataset of 150 flowers  
✅ Visualized data patterns across 4 features  
✅ Trained a Random Forest model with 100 trees  
✅ Achieved 95%+ accuracy on unseen data  
✅ Made predictions on new flower measurements  

### Key ML concepts you learned:
🧠 Data loading and exploration  
🧠 Data visualization and pattern recognition  
🧠 Train/test split for model evaluation  
🧠 Model training and prediction  
🧠 Performance evaluation and interpretation  

### Next steps:
🚀 Try different algorithms (SVM, Neural Networks)  
🚀 Work with larger, more complex datasets  
🚀 Learn feature engineering and data preprocessing  
🚀 Build models for regression and clustering  

### Resources:
📚 [Complete ML Guide](../04-first-ml-example.md)  
📚 [Next Steps](../05-next-steps.md)  
📚 [Jupyter Version](jupyter-sample.ipynb)  
📚 [Python Script Version](python-sample.py)  

---

**Remember**: The same code works in Jupyter and Python IDEs too! You now have transferable skills across all ML environments. 🌟

**Happy learning!** 🎓✨