# 🌱 Introduction to a Simple Machine Learning Pipeline

In this notebook, we'll go through a simple example of how a **machine learning pipeline** works — from loading data, to training a model, to making predictions and evaluating performance.

We'll use the **Iris dataset**, one of the most popular beginner datasets in machine learning. It contains measurements of flower petals and sepals, and our task is to classify them into three species: *Setosa*, *Versicolor*, and *Virginica*.

## 1️⃣ Loading the Data

We’ll start by loading the dataset using `scikit-learn`, a popular machine learning library. The dataset includes:
- `data`: numerical measurements of each flower (features)
- `target`: the class labels (which species)
- `feature_names`: names of the features
- `target_names`: names of the flower species

In [None]:
from sklearn.datasets import load_iris

iris = load_iris()

print(iris.data[:5])  # show first 5 rows
print(iris.feature_names)
print(iris.target[:10])
print(iris.target_names)

## 2️⃣ Splitting the Data — Training vs Testing

Before training, we split our dataset into **training** and **testing** sets:
- **Training data** is used to teach the model.
- **Testing data** is used to check how well the model performs on unseen data.

We'll use an 80/20 split — meaning 80% for training and 20% for testing.

In [None]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(
    iris.data, iris.target, test_size=0.2, random_state=42
)

print('Training data shape:', X_train.shape)
print('Testing data shape:', X_test.shape)

## 3️⃣ Building and Training the Model

We'll use a simple **K-Nearest Neighbors (KNN)** classifier.

**How it works:**
- When given a new data point, KNN looks at the *k* nearest points in the training data.
- It predicts the most common label among those neighbors.

It’s a good first model because it’s simple and intuitive.

In [None]:
from sklearn.neighbors import KNeighborsClassifier

model = KNeighborsClassifier(n_neighbors=3)
model.fit(X_train, y_train)

## 4️⃣ Making Predictions

After training, the model can make predictions on the **test data**. Let’s see how it performs.

In [None]:
y_pred = model.predict(X_test)

print('Predictions:', y_pred)
print('Actual:', y_test)

## 5️⃣ Evaluating the Model

Now we’ll check how accurate our model is. The simplest metric is **accuracy**, which measures the percentage of correct predictions.

We can also use a **classification report** for a deeper look at performance — it includes:
- **Precision:** Of all predicted positives, how many were correct.
- **Recall:** Of all actual positives, how many were identified correctly.
- **F1-score:** A balance between precision and recall.

In [None]:
from sklearn.metrics import accuracy_score, classification_report

accuracy = accuracy_score(y_test, y_pred)
print('Model Accuracy:', accuracy)

print('\nDetailed Report:')
print(classification_report(y_test, y_pred, target_names=iris.target_names))

## ✅ Summary

In this notebook, we went through the complete **machine learning workflow**:
1. Load and explore data
2. Split data into training and testing sets
3. Build and train a model (KNN)
4. Make predictions
5. Evaluate results

This process forms the foundation for all ML projects — whether simple models like KNN or advanced ones like Neural Networks. 🚀