# Model Bias versus Model Variance

| Signs of Bias | Signs of Variance |
| --- | --- |
| Poor or inconsistent intuition | Poor intuition with new & testing data |
| Poor intuition with Training data | Noise in data set |
| Poor intuition compared to similar models | Overfitting |
| Underfitting | Complexity |
| Simplicity | High MSE |

Model Bias ≠ ML/AI Bias, they are different concepts, Model Bias is about relisation error.

| Term | Definition |
| --- | --- |
| Generalisation | A model that generalizes well can make accurate predictions or classifications on data it wasn't trained on | 
| Bias | Bias represents the difference between the average prediction and the true value |
| Variance | Variance measures how much, on average, predictions vary for a given data point |
| Noise | Random or irrelevant data points or variations that obscure meaningful patterns and can negatively impact the accuracy of analysis or machine learning models |

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from utils_common import compute_model_output, generate_data

In [None]:
# Generate a random data set
m = generate_data(25, 50, 25, 50, 300, 0.9)
n = generate_data(0, 50, 0, 50, 10, 0.01)

In [None]:
# Variance
plt.xlabel("Feature")
plt.ylabel("Target")
plt.scatter(m[0], m[1], color='blue')
plt.scatter(n[0], n[1], color='blue')
plt.show()

In [None]:
# Biased data
plt.xlabel("Feature")
plt.ylabel("Target")
plt.scatter(m[0], m[1], color='pink', label='Women')
plt.scatter(n[0], n[1], color='blue', label='Men')
plt.legend()
plt.show()

#### Relationship between Bias, Variance and Generalisation (Fit/Intuition)

In [None]:
#Good Fit/Intuition with low bias and low variance
o = np.sort(generate_data(-10, 10, -10, 10, 8, 0.2))

fig, ax = plt.subplots(1,3, figsize=(15, 5))
ax[0].scatter(o[0], o[1], color='blue', s=100)
ax[0].plot(o[0], o[1], color='red')
x = np.linspace(10, -10, 100)
y = -x**2 + 4*x + 2  
ax[2].plot(x, y, color='red')
x = x + np.random.uniform(-5, 5, size=x.shape)
y = y + np.random.uniform(-10, 10, size=y.shape)
ax[1].scatter(x, y, c='b')
ax[2].scatter(x, y, c='b')
x_lin = np.array([-10,10])
tmp_f_mb = compute_model_output(x_lin, 10, -10,)
ax[1].plot(x_lin, tmp_f_mb, c='r')
ax[0].title.set_text("High Variance/Overfitting")
ax[1].title.set_text("High Bias/Underfitting")
ax[2].title.set_text("Low Bias and Low Variance/Good Fit")
for ax in ax.flat:
    ax.set_xticks([])
    ax.set_xticklabels([])
    ax.set_yticks([])
    ax.set_yticklabels([])

plt.show()