## 1. Theory Introduction to Diagnosing Bias/Variance

One of the critical challenges in machine learning is determining whether a model is suffering from bias or variance.

- **Bias**: Refers to the error due to overly simplistic assumptions in the learning algorithm, which can make it underfit the data. A high bias model oversimplifies the problem and performs poorly both on the training set and unseen data.

- **Variance**: Refers to the error due to too much complexity in the learning algorithm, making the model overfit the data. A high variance model captures noise in the training data and performs well on the training set but poorly on unseen data.

### Trade-off:
In general, as you increase the complexity of your model, you will see a reduction in error due to lower bias but an increase in error due to higher variance. Achieving a balance between these two types of error is crucial for creating models that generalize well to new data.

## Library

In [None]:
from sklearn.datasets import make_moons
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score
import matplotlib.pyplot as plt

## 2. Dataset

In [None]:
# Generate a dataset
X, y = make_moons(n_samples=500, noise=0.3, random_state=42)

# Splitting the dataset into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

## 3. Model coded in Python


In [None]:
# Training a decision tree classifier with varying depths
tree_depths = [1, 5, 10, 15]
train_accuracies = []
test_accuracies = []

for depth in tree_depths:
    clf = DecisionTreeClassifier(max_depth=depth)
    clf.fit(X_train, y_train)
    
    # Calculate accuracy on training and test set
    train_accuracy = accuracy_score(y_train, clf.predict(X_train))
    test_accuracy = accuracy_score(y_test, clf.predict(X_test))
    
    train_accuracies.append(train_accuracy)
    test_accuracies.append(test_accuracy)



## 4. Explanation


In [None]:
# Plotting training and test accuracies
plt.plot(tree_depths, train_accuracies, '-o', label='Training Accuracy')
plt.plot(tree_depths, test_accuracies, '-o', label='Test Accuracy')
plt.xlabel('Depth of Decision Tree')
plt.ylabel('Accuracy')
plt.legend()
plt.title('Bias vs. Variance Analysis')
plt.show()

From the plot:

1. **Low Depth (High Bias)**: At a low depth, both the training and test accuracies are low, indicating that the model is too simplistic and suffers from high bias.

2. **High Depth (High Variance)**: At very high depths, the training accuracy is high, but the test accuracy is lower. This indicates that the model has become too complex and is overfitting to the training data, capturing its noise.

3. **Optimal Point**: The optimal depth lies somewhere in the middle, where the difference between training and test accuracies is minimal while both accuracies are relatively high.

This analysis demonstrates the bias-variance trade-off. Too simple a model, and it can't capture the underlying patterns. Too complex, and it starts to fit the noise in the training data, failing to generalize well to new data.