In [32]:
from sklearn.linear_model import LinearRegression
from sklearn import datasets
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import Ridge , RidgeCV , Lasso , LassoCV
from sklearn.preprocessing import StandardScaler

# **Fitting a Line**

In [2]:
# Load data with only two features
boston = datasets.fetch_california_housing()
features = boston.data[:,0:2]
target = boston.target

# Create linear regression
regression = LinearRegression()

# Fit the linear regression
model = regression.fit(features, target)

Linear regression assumes that the relationship between the features and the target vector is approximately linear.

$$
y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \epsilon
$$

In [3]:
# View the intercept
model.intercept_

np.float64(-0.1018903275908265)

In [4]:
# View the feature coefficients
model.coef_

array([0.43169191, 0.01744134])

In [5]:
# First value in the target vector multiplied by 1000
target[0]*1000

np.float64(4526.0)

In [6]:
# Predict the target value of the first observation, multiplied by 1000
model.predict(features)[0]*1000

np.float64(4207.126263821179)

Our model’s coefficient of this feature was ~0.44, meaning that if we multiply this coefficient by 1,000 , we have the change in each input

In [7]:
# First coefficient multiplied by 1000
model.coef_[0]*1000

np.float64(431.6919075449537)

# **Fitting a Nonlinear Relationship**

In [8]:
# Create polynomial features x^2 and x^3
polynomial = PolynomialFeatures(degree=3, include_bias=False)
features_polynomial = polynomial.fit_transform(features)

# Create linear regression
regression = LinearRegression()

# Fit the linear regression
model = regression.fit(features_polynomial, target)

In [9]:
# First value in the target vector multiplied by 1000
target[0]*1000

np.float64(4526.0)

In [10]:
# Predict the target value of the first observation, multiplied by 1000
model.predict(features_polynomial)[0]*1000

np.float64(4599.137386094626)

# **Reducing Variance with Regularization**

In [19]:
housing = datasets.fetch_california_housing()
features = housing.data
target = housing.target

In [20]:
# Standardize features
scaler = StandardScaler()
features_standardized = scaler.fit_transform(features)

# Create ridge regression with an alpha value
regression = Ridge(alpha=0.5)

# Fit the linear regression
model = regression.fit(features_standardized, target)

In [21]:
model.coef_

array([ 0.82960595,  0.11878426, -0.26546186,  0.3056104 , -0.00449153,
       -0.03932802, -0.89957593, -0.87022841])

**What L2 Regularization Does (Ridge Regression)**
- **Adds penalty for large weights**  
  \\( \Rightarrow \\) Encourages model coefficients (\\( \beta_j \\)) to be small.

- **Reduces model complexity**  
  \\( \Rightarrow \\) Helps prevent overfitting by shrinking less important features.

- **Keeps all features**  
  \\( \Rightarrow \\) Unlike L1 (Lasso), L2 doesn’t set coefficients exactly to zero.

- **Smooths the solution**  
  \\( \Rightarrow \\) More stable when features are correlated (multicollinearity).

- **Controlled by** \\( \alpha \\)  
  \\( \Rightarrow \\) Higher \\( \alpha \\) = more penalty = simpler model.


In [28]:
# Create ridge regression with four alpha values
regr_cv = RidgeCV(alphas=[0.1, 1.0, 10.0, 20.0])

# Fit the linear regression
model_cv = regr_cv.fit(features_standardized, target)

# View coefficients
model_cv.coef_

array([ 0.82906042,  0.12003317, -0.26291284,  0.30226752, -0.00405189,
       -0.03939387, -0.88769099, -0.85823034])

In [29]:
# View alpha
model_cv.alpha_

np.float64(20.0)

# **Reducing Features with Lasso Regression**

simplify linear regression model by reducing the number of features.

In [40]:
housing = datasets.fetch_california_housing()
features = housing.data
target = housing.target

In [41]:
# Standardize features
scaler = StandardScaler()
features_standardized = scaler.fit_transform(features)

# Create lasso regression with alpha value
regression = Lasso(alpha=0.05)

# Fit the linear regression
model = regression.fit(features_standardized, target)

In [42]:
model.coef_

array([ 0.73654297,  0.13900648, -0.        ,  0.        ,  0.        ,
       -0.        , -0.25950684, -0.21678118])

**What L1 Regularization Does (Lasso Regression)**

- **Adds penalty for absolute weights**  
  \\( \Rightarrow \\) Encourages sparsity by penalizing large coefficients using their absolute values.

- **Performs feature selection**  
  \\( \Rightarrow \\) Can shrink some coefficients exactly to zero \\( (\beta_j = 0) \\), effectively removing irrelevant features.

- **Reduces model complexity**  
  \\( \Rightarrow \\) Helps prevent overfitting, especially in high-dimensional data.

- **Useful for sparse models**  
  \\( \Rightarrow \\) Great when only a few features are important.

- **Controlled by** \\( \alpha \\)  
  \\( \Rightarrow \\) Higher \\( \alpha \\) = stronger regularization = more zero coefficients.


In [43]:
# Create lasso regression with four alpha values
lasso_cv = LassoCV(alphas=[0.1, 1.0, 10.0, 20.0])

# Fit the linear regression
model_cv = lasso_cv.fit(features_standardized, target)

# View coefficients
model_cv.coef_

array([ 0.70571337,  0.10601099, -0.        , -0.        , -0.        ,
       -0.        , -0.01121267, -0.        ])

In [44]:
# View alpha
model_cv.alpha_

np.float64(0.1)