### 4.2.9 Baseline Models

Before developing complex predictive models, it is important to establish a **baseline performance**.  
A **baseline model** provides a simple point of comparison for evaluating whether more advanced models actually improve performance.

Baseline predictions are evaluated using the **same metrics** you would apply to your main model —  
for example, **Classification Accuracy** for classification tasks or **RMSE** for regression tasks.

A good baseline helps answer the question:

> *“Is my model performing better than a simple or random approach?”*

---

### 🔹 Common Baseline Algorithms

1. **Random Prediction Algorithm**  
   - Generates predictions **randomly** based on the distribution of classes (for classification) or random values (for regression).  
   - Serves as a *minimum performance threshold*.  
   - Example: randomly predicting “spam” or “not spam” with equal probability.

2. **Zero Rule (ZeroR) Algorithm**  
   - A very simple method that **always predicts the most frequent class** (for classification) or the **mean value** (for regression).  
   - Provides a strong and interpretable baseline.  
   - Example: always predicting “not spam” if 80% of training samples are “not spam”.

---

### 🔹 Purpose of a Baseline

- Establishes a **minimum expected performance**.  
- Helps identify whether a new model provides **meaningful improvement**.  
- Acts as a **diagnostic tool** — if your model cannot outperform the baseline, it needs revisiting.

---

**In summary:**  
A baseline model may be simple, but it is an **essential first step** in building and validating any predictive modeling pipeline.


In [1]:
from random import seed, randrange

In [5]:
def random_algorithm(train, test):
    output_values = [ row[-1] for row in train ]
    unique = list(set(output_values))
    predicted = list()
    for i in range(len(test)):
        index = randrange(len(unique))
        predicted.append(unique[index])

    return predicted
        

In [8]:
 seed(1)
 train = [[0], [1], [0], [1], [0], [1]]
 test = [[None], [None], [None], [None]]
 predictions = random_algorithm(train, test)
 print(predictions)

[0, 0, 1, 0]
