# Machine Learning Classification - Beginner Starter

**Welcome to Machine Learning!** This notebook introduces you to training a computer to classify data automatically.

## What you're starting with:
- A real dataset (iris flowers) already loaded
- Code to explore and understand the data
- All libraries pre-installed in Colab!

## Your tasks:
1. Split the data into training and testing sets
2. Train a simple classifier
3. Make predictions on test data
4. Evaluate the model's accuracy
5. Visualize the results

## How to use this notebook:
1. **Run each cell** to see what happens
2. **Read the output** and understand what you're seeing
3. **Copy code** you want help with to Claude.ai
4. **Ask questions** about ML concepts you don't understand
5. **Experiment** with different approaches!

---

## Import Libraries

First, we import the tools we need. Colab has these pre-installed!

In [None]:
# Import machine learning tools
from sklearn.datasets import load_iris
import pandas as pd
import numpy as np

print("âœ“ Libraries imported successfully!")
print("Ready to start machine learning.")

## Load and Explore the Dataset

The Iris dataset contains measurements of iris flowers from three species. Given measurements of a flower, we'll train a model to predict its species.

In [None]:
def load_dataset():
    """
    Loads the Iris dataset.
    
    Returns:
        tuple: (features, labels, feature_names, target_names)
    """
    iris = load_iris()
    
    # Features are the measurements
    features = iris.data
    
    # Labels are the species (0, 1, or 2)
    labels = iris.target
    
    # Names of features and targets
    feature_names = iris.feature_names
    target_names = iris.target_names
    
    return features, labels, feature_names, target_names


def display_dataset_info(features, labels, feature_names, target_names):
    """
    Displays information about the dataset.
    """
    print("ðŸ“Š Dataset Information:")
    print(f"Number of samples: {len(features)}")
    print(f"Number of features: {len(feature_names)}")
    print(f"\nFeature names: {feature_names}")
    print(f"\nTarget classes: {target_names}")
    print(f"Class distribution: {pd.Series(labels).value_counts().to_dict()}")
    
    print("\nðŸ“‹ First 5 samples:")
    df = pd.DataFrame(features, columns=feature_names)
    df['species'] = [target_names[label] for label in labels]
    print(df.head())

In [None]:
# Run this cell to load and explore the data!

print("=" * 60)
print("  MACHINE LEARNING: Classification Project")
print("=" * 60)
print("\nLoading dataset...\n")

# Load the dataset
features, labels, feature_names, target_names = load_dataset()

# Display information about it
display_dataset_info(features, labels, feature_names, target_names)

print("\n" + "=" * 60)
print("âœ“ Data loaded successfully!")
print("Next step: Split into training and testing sets.")
print("=" * 60)

---

## Suggested Next Steps

### Step 1: Split the Data
Ask Claude.ai:
> "Import train_test_split from sklearn.model_selection and split the features and labels into training and testing sets with 80% for training and 20% for testing. Use random_state=42 for reproducibility."

### Step 2: Train a Model
Ask Claude.ai:
> "Import DecisionTreeClassifier from sklearn.tree and create a classifier. Train it on the training data using the fit method."

### Step 3: Make Predictions
Ask Claude.ai:
> "Use the trained model to make predictions on the test data. Display the predicted labels alongside the actual labels for the first 10 samples."

### Step 4: Evaluate Accuracy
Ask Claude.ai:
> "Import accuracy_score from sklearn.metrics and calculate how accurate the model's predictions are. Display the accuracy as a percentage."

### Step 5: Create a Confusion Matrix
Ask Claude.ai:
> "Create a confusion matrix using confusion_matrix from sklearn.metrics to see which classes the model confuses most often. Display it in a readable format or as a heatmap using seaborn."

---

## Your Code Workspace

Use the cells below to build your ML pipeline!

In [None]:
# Step 1: Split the data
# Add your code here!


In [None]:
# Step 2: Train your model
# Add your code here!


In [None]:
# Step 3 & 4: Make predictions and evaluate
# Add your code here!


In [None]:
# Step 5: Visualizations
# Add your code here!


---

## Tips for Success

- **Run cells in order** - Each cell builds on previous ones
- **Read error messages carefully** - They tell you what went wrong
- **Test after each step** - Make sure each piece works before moving on
- **Experiment!** - Try different models, different test split sizes, etc.
- **Ask Claude.ai** - Copy code + error messages when you get stuck

Remember: Machine learning is experimental. Try different things and see what works!