### 4.2.9 Baseline Models

Before developing complex predictive models, it is important to establish a **baseline performance**.  
A **baseline model** provides a simple point of comparison for evaluating whether more advanced models actually improve performance.

Baseline predictions are evaluated using the **same metrics** you would apply to your main model —  
for example, **Classification Accuracy** for classification tasks or **RMSE** for regression tasks.

A good baseline helps answer the question:

> *“Is my model performing better than a simple or random approach?”*

---

### 🔹 Common Baseline Algorithms

1. **Random Prediction Algorithm**  
   - Generates predictions **randomly** based on the distribution of classes (for classification) or random values (for regression).  
   - Serves as a *minimum performance threshold*.  
   - Example: randomly predicting “spam” or “not spam” with equal probability.

2. **Zero Rule (ZeroR) Algorithm**  
   - A very simple method that **always predicts the most frequent class** (for classification) or the **mean value** (for regression).  
   - Provides a strong and interpretable baseline.  
   - Example: always predicting “not spam” if 80% of training samples are “not spam”.

---

### 🔹 Purpose of a Baseline

- Establishes a **minimum expected performance**.  
- Helps identify whether a new model provides **meaningful improvement**.  
- Acts as a **diagnostic tool** — if your model cannot outperform the baseline, it needs revisiting.

---

**In summary:**  
A baseline model may be simple, but it is an **essential first step** in building and validating any predictive modeling pipeline.


In [1]:
from random import seed, randrange

In [5]:
def random_algorithm(train, test):
    output_values = [ row[-1] for row in train ]
    unique = list(set(output_values))
    predicted = list()
    for i in range(len(test)):
        index = randrange(len(unique))
        predicted.append(unique[index])

    return predicted
        

In [8]:
 seed(1)
 train = [[0], [1], [0], [1], [0], [1]]
 test = [[None], [None], [None], [None]]
 predictions = random_algorithm(train, test)
 print(predictions)

[0, 0, 1, 0]


### Zero Rule Algorithm

The **Zero Rule Algorithm (ZeroR)** is a simple baseline method that uses the most frequent class or average value to make predictions. It provides a more informed baseline than the random prediction algorithm.

#### Classification
For classification problems:
- ZeroR predicts the **most common class** in the training dataset.
- Example: If a dataset has 90 instances of class 0 and 10 of class 1, ZeroR will always predict class 0.
- Baseline accuracy = 90 / 100 = **90%**
- This is higher than the random prediction accuracy (~82%).

#### Key Idea
ZeroR creates **one simple rule** using the target distribution — a strong baseline for comparing more complex models.


In [7]:
def zero_rule_algorithm_classification(train,test):
    output_values = [ row[-1] for row in train ]
    prediction = max(set(output_values), key=output_values.count)
    # print(prediction)
    predicted = [ prediction for i in range(len(test)) ]
    return predicted

In [8]:
from random import seed

seed(1)

train1 = [
    [1, 0],
    [2, 0],
    [3, 1],
    [4, 0]
]
test1 = [
    [5, None],
    [6, None]
]
print("Test 1:", zero_rule_algorithm_classification(train1, test1))

Test 1: [0, 0]


### Zero Rule Algorithm — Regression

For **regression problems**, the Zero Rule Algorithm (ZeroR) predicts a constant real value — typically the **mean** (average) of all observed target values in the training data.

#### How It Works
- Calculates the mean of the output values:
  
  $
  \text{mean} = \frac{\sum_{i=1}^{n} \text{value}_i}{\text{count(values)}}
  $

- This mean value is then used as the prediction for all inputs.
- Using the mean (or sometimes the median) provides a strong baseline, as it usually produces lower error than random predictions.

#### Key Idea
ZeroR for regression predicts the **central tendency** (mean or median) of the target variable — offering a simple yet effective performance benchmark for evaluating advanced models.


In [10]:
def zero_rule_algorithm_regression(train, test):
    output_values = [ row[-1] for row in train ]
    prediction = sum(output_values)/float(len(output_values))
    predicted = [ prediction for i in range(len(test)) ]
    return predicted

In [11]:
train1 = [
    [1, 10],
    [2, 20],
    [3, 30]
]
test1 = [
    [4, None],
    [5, None]
]
print("Test 1:", zero_rule_algorithm_regression(train1, test1))
# Expected mean = (10 + 20 + 30) / 3 = 20
# Output → [20.0, 20.0]


# ✅ Test Case 2: Mixed positive and negative values
train2 = [
    [1, -5],
    [2, 5],
    [3, 15]
]
test2 = [
    [4, None],
    [5, None],
]
print("Test 2:", zero_rule_algorithm_regression(train2, test2))
# Expected mean = (-5 + 5 + 15) / 3 = 5.0
# Output → [5.0, 5.0]


# ✅ Test Case 3: Decimal (float) values
train3 = [
    [1, 2.5],
    [2, 3.5],
    [3, 4.5]
]
test3 = [
    [4, None]
]
print("Test 3:", zero_rule_algorithm_regression(train3, test3))
# Expected mean = (2.5 + 3.5 + 4.5) / 3 = 3.5
# Output → [3.5]


# ✅ Test Case 4: All same output values
train4 = [
    [1, 100],
    [2, 100],
    [3, 100]
]
test4 = [
    [4, None],
    [5, None],
    [6, None]
]
print("Test 4:", zero_rule_algorithm_regression(train4, test4))
# Expected mean = 100.0
# Output → [100.0, 100.0, 100.0]


# ✅ Test Case 5: Single training example
train5 = [
    [1, 42]
]
test5 = [
    [2, None],
    [3, None]
]
print("Test 5:", zero_rule_algorithm_regression(train5, test5))
# Expected mean = 42
# Output → [42.0, 42.0]

Test 1: [20.0, 20.0]
Test 2: [5.0, 5.0]
Test 3: [3.5]
Test 4: [100.0, 100.0, 100.0]
Test 5: [42.0, 42.0]
