## Linear models make a prediction using a linear function of the input features.

### Linear models for regression:
For regression, the general prediction formula for a linear model looks as follows:

$$
\hat{y} = w_0\,x_0 + w_1\,x_1 + \dots + w_p\,x_p + b
$$

Here, \(x_0\) to \(x_p\) denote the features (in this example, the number of features is \(p\)) of a single data point. The parameters \(w\) (weights) and \(b\) (bias) are learned during training, and ŷ is the model’s prediction.

For a dataset with a single feature, the formula simplifies to:

$$
\hat{y} = w_0\,x_0 + b
$$

In this case, \(w_0\) is the slope and \(b\) is the y‑axis offset. For multiple features, each \(w_i\) represents the slope along the \(i\)-th feature axis, so you can think of the predicted response as a (possibly negative) weighted sum of the input features.

In [2]:
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
import mglearn

X, y = mglearn.datasets.make_wave(n_samples=100)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)

lr = LinearRegression().fit(X_train, y_train)

In [5]:
print(X_train.shape, X_test.shape, y_train.shape, y_test.shape)

(75, 1) (25, 1) (75,) (25,)


<strong>lr.coef_</strong>:
This is the coefficient(s) (also called weights) of the features in the model.<br>

It tells you how much the target value changes with a unit change in the input feature, assuming all other factors are constant.<br><br><br>

<strong>lr.intercept_</strong>:

This is the bias term (also called the y-intercept, b in the equation).<br>

It represents the value of y when all x = 0.

In [6]:
print("lr.coef_: ", lr.coef_)               # Weight (slope)
print("lr.intercept_: ", lr.intercept_)     # Bias (intercept)

lr.coef_:  [0.40443939]
lr.intercept_:  -0.02256802817336538


In [7]:
print("Training set score: {:.2f}".format(lr.score(X_train, y_train)))

print("Test set score: {:.2f}".format(lr.score(X_test, y_test)))

Training set score: 0.59
Test set score: 0.66


## Overfitting Example

In [8]:
X, y = mglearn.datasets.load_extended_boston()

X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)

lr = LinearRegression().fit(X_train, y_train)

print("Training set score: {:.2f}".format(lr.score(X_train, y_train)))
print("Test set score: {:.2f}".format(lr.score(X_test, y_test)))

Training set score: 0.95
Test set score: 0.61
