# 1. Creating Features


### a) Basics

In [2]:
import numpy as np

# Create X from the radio column's values
X = sales_df["radio"].values

# Create y from the sales column's values
y = sales_df["sales"].values

# Reshape X
X = X.reshape(-1,1)

# Check the shape of the features and targets
print(X.shape, y.shape)

### b) Splitting Test Data

In [None]:
# Create X and y arrays
X = sales_df.drop("sales", axis=1).values
y = sales_df["sales"].values

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)


# 2. Linear Regression Functions

### a) Basics

In [None]:
# Import LinearRegression
from sklearn.linear_model import LinearRegression 

# Create the model
reg = LinearRegression()

# Fit the model to the data
reg.fit(X,y)

# Make predictions
predictions = reg.predict(X)

print(predictions[:5])

### b) Using Split Data

In [None]:
# Fit the model to the data
reg.fit(X_train,y_train)

# Make predictions
y_pred = reg.predict(X_test)
print("Predictions: {}, Actual Values: {}".format(y_pred[:2], y_test[:2]))

# 3. Visualizing

In [None]:
# Import matplotlib.pyplot
import matplotlib.pyplot as plt

# Create scatter plot
plt.scatter(X, y, color="blue")

# Create line plot
plt.plot(X, predictions, color="red")
plt.xlabel("Radio Expenditure ($)")
plt.ylabel("Sales ($)")

# Display the plot
plt.show()

# 4. Testing Performance

### a) R-squared and mean_squared_error

In [None]:
# Import mean_squared_error
from sklearn.metrics import mean_squared_error

# Compute R-squared
r_squared = reg.score(X_test, y_test)

# Compute RMSE
rmse = mean_squared_error(y_test, y_pred, squared=False)

# Print the metrics
print("R^2: {}".format(r_squared))
print("RMSE: {}".format(rmse))

### mean_squared_error vs mean_absolute_error:

To put it in short, if there are many outliers then you may consider using Mean Absolute Error (also called the Average Absolute Deviation). RMSE is more sensitive to outliers than the MAE. But when outliers are exponentially rare (like in a bell-shaped curve), the RMSE performs very well and is generally preferred.

Both the RMSE and the MAE are ways to measure the distance between two vectors: the vector of predictions and the vector of target values. MAE corresponds to the l1 norm or Manhattan norm while RMSE corresponds to the l2 norm or Euclidian Norm. The higher the norm index, the more it focuses on large values and neglects small ones

### b) Cross Validation
Basics:
![Screenshot](Cross-validation.png)

- 5 folds -> 5-fold CV
- k folds -> k-fold CV
- ...
- More folds, more computationally expensive

In [None]:
# Import the necessary modules
from sklearn.model_selection import cross_val_score, KFold

# Create a KFold object
kf = KFold(n_splits=6, shuffle=True, random_state=5)

reg = LinearRegression()

# Compute 6-fold cross-validation scores
cv_scores = cross_val_score(reg, X, y, cv=kf)

# Print scores
print(cv_scores)

Note: we assign a seed to random_state kw argument to make our data would b split in same way. makes results repeatable downstream.

Now, we will analyze our cross-validation metrics

In [None]:
# Print the mean
print(np.mean(cv_results))

# Print the standard deviation
print(np.std(cv_results))

# Print the 95% confidence interval
print(np.quantile(cv_results, [0.025, 0.975]))

### c) Regularization
\>> Technique to avoid over-fitting

##### Why regularize?
- Linear Reg minimize loss f(x)
- It chooses coeff a for each variable, & b
- Large coeffs can lead to overfitting
- Regularization: Penalize large coeffs

#### 1. Ridge Regression
- Loss f(x) = Ordinary Least Squares (OLS) loss f(x) + sum of alpha * coeffs a 
- Alpha: Parameter we nd to choose. Similar to picking k in KNN
- a = 0 = OLS -> Overfitting
- Large a -> Underfitting

#### 2. Lasso Regression
- Same, loss f(x) = OLS + sum of alpha * coeffs a 
- Select impt features in dataset
- Shrinks coeffs of less impt features to 0
- Features not shrunk to 0 will be selected by lasso