<a href="https://colab.research.google.com/github/cagBRT/IntroToDNNwKeras/blob/master/Over_and_Underfitting_2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In this code, we generate some random data and split it into training and test sets. We then train a linear regression model on the training set and evaluate its performance on both the training and test sets.

If the training set MSE is much lower than the test set MSE, it indicates that the model has overfit the training data, i.e., it has learned the noise in the training data instead of the underlying pattern. This leads to high variance and low bias.

On the other hand, if the training set MSE and test set MSE are both high, it indicates that the model has underfit the data, i.e., it is too simple to capture the underlying pattern. This leads to high bias and low variance.

A good balance between bias and variance can be achieved by choosing an appropriate model complexity and regularization.

In [None]:
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split

# Generate some random data
np.random.seed(42)
X = np.random.rand(100, 1)
y = 2 + 3 * X + np.random.randn(100, 1)

# Split the data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train a linear regression model on the training set
model = LinearRegression()
model.fit(X_train, y_train)

# Predict on the training set and calculate the mean squared error
y_train_pred = model.predict(X_train)
mse_train = mean_squared_error(y_train, y_train_pred)

# Predict on the test set and calculate the mean squared error
y_test_pred = model.predict(X_test)
mse_test = mean_squared_error(y_test, y_test_pred)

print(f"Training set MSE: {mse_train:.2f}")
print(f"Test set MSE: {mse_test:.2f}")