# Machine Learning Stages

A typical machine learning workflow consists of three main stages: **Training**, **Test (Evaluation)**, and **Inference (Production)**.

![Machine Learning Stages Overview](images/ml-stages-overview.png)



## The Three Stages

1. **Training**  
   - Use a **training set** (labeled data) to learn a function `f(x)` that explains the training data well.
   - Goal: Find the best model parameters that map inputs `x` to correct outputs `y`.

2. **Test (Evaluation)**  
   - Evaluate the trained model on a separate **test set** (data never seen during training).
   - Goal: Measure how well the model **generalizes** to new, unseen data.

3. **Inference (Production)**  
   - Deploy the final trained model to make predictions on new real-world data (no labels available).
   - This is the actual use of the model in applications.

## Example: Cat vs Dog Image Classification

Let's walk through the stages using a simple binary classification task.

### Phase 1: Training (Learning)

![Training Phase - Cat vs Dog](images/cat-dog-training.png)

- We collect a **training set** with images (`x`) and corresponding labels (`y`): "Cat" or "Dog".
- We feed these input-output pairs to the machine learning algorithm.
- The algorithm learns a function `f(x)` that tries to correctly predict the label for each training image.

Goal: The model should "explain well" the training data.

### Phase 2: Test (Evaluation)

![Test Phase - Correct and Incorrect](images/cat-dog-test.png)

- We use a separate **test set** (new images not used in training).
- The model predicts labels for these images.
- We compare predictions with true labels to measure performance.

Key concept: **Generalization**  
A good model performs well on unseen data (not just memorizing the training set).

#### Example Evaluation
Suppose we test on 4 images:
- 3 correct predictions ✓
- 1 incorrect ✗

Test accuracy:
$$\text{Accuracy (\%)} = \frac{N_{\text{correct}}}{N_{\text{total}}} \times 100 = \frac{3}{4} \times 100 = 75\%$$

### Phase 3: Inference (Production)

![Inference Phase](images/cat-dog-inference.png)

- Once satisfied with performance, we deploy the model.
- New images come in (no labels known).
- The model predicts "Cat" or "Dog" in real-time.

**Inference** = running the trained model on live/unlabeled data.

This is how ML powers apps like:
- Photo tagging
- Spam filters
- Medical diagnosis tools
- Self-driving cars

## Summary of Stages

| Stage       | Data Used              | Goal                                      | Has Labels? |
|-------------|------------------------|-------------------------------------------|-------------|
| **Training**    | Training set           | Learn the function `f(x)`                 | Yes         |
| **Test**        | Test set (held-out)    | Evaluate generalization / performance     | Yes         |
| **Inference**   | New real-world data    | Make predictions in production            | No          |

Understanding these stages is crucial — they form the foundation of every machine learning project!

---

