3) Implement a simple linear regression model for the sales.csv dataset

Linear Regression comes under the category of supervised machine learning algorithms. In
supervised learning when given a data-set, we already know what the correct output should
look like, we already have an idea of the relationship between the input and the output.
Supervised learning broadly covers two types of problems:

1. Regression problems
2. Classification problems

In simple words, regression problems try to predict results within a continuous output i.e they
try to map input variables to some continuous function. The output here is a continuous set. It
also helps to remember that when the target variable we are trying to predict is continuous.
Simple Linear Regression: This is the simplest form of linear regression, and it involves
only one independent variable and one dependent variable. The equation for simple linear
regression is: Y= β0 + β1.X , where, Y is the dependent variable, X is the independent
variable, β0 is the intercept, β1 is the slope
Scatter Plot: A scatter plot is a type of plot that displays values for two variables as points
on a Cartesian plane. Each point represents a single observation with values for the variables
plotted along the x-axis and y-axis. Scatter plots are useful for visually inspecting the
relationship between two variables and identifying patterns or trends.
Regression Line: In statistics and machine learning, a regression line is a straight line that
best fits the data points in a scatter plot. It represents the relationship between the
independent variable (x-axis) and the dependent variable (y-axis) in a linear regression
model. The regression line is typically expressed as: y = mx + c where y) is the predicted
value of the dependent variable, x is the independent variable, m is the slope of the line,
which represents the rate of change of y with respect to x , c is the y-intercept, which is the
value of y when x is 0.

In [None]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_absolute_error, mean_squared_error
import matplotlib.pyplot as plt

data = pd.read_csv('sales.csv')
X = data['fahrenheit'].values.reshape(-1, 1)
y = data['sales'].values

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

model = LinearRegression()
model.fit(X_train_scaled, y_train)

y_pred_train = model.predict(X_train_scaled)
y_pred_test = model.predict(X_test_scaled)

mae = mean_absolute_error(y_test, y_pred_test)
mse = mean_squared_error(y_test, y_pred_test)
rmse = np.sqrt(mse)

print("\nModel Evaluation Metrics:")
print(f"Mean Absolute Error: {mae:.2f}")
print(f"Mean Squared Error: {mse:.2f}")
print(f"Root Mean Squared Error: {rmse:.2f}")
print(f"R² Score: {model.score(X_test_scaled, y_test):.4f}")

plt.figure(figsize=(12, 6))

plt.scatter(X_train, y_train, color='blue', label='Training Data', alpha=0.5)
# Plot test data
plt.scatter(X_test, y_test, color='green', label='Test Data', alpha=0.5)

X_sorted = np.sort(X)
X_sorted_scaled = scaler.transform(X_sorted.reshape(-1, 1))
y_pred_sorted = model.predict(X_sorted_scaled)
plt.plot(X_sorted, y_pred_sorted, color='red', label='Regression Line')

plt.xlabel('Temperature (Fahrenheit)')
plt.ylabel('Sales')
plt.title('Linear Regression: Sales vs Temperature')
plt.legend()
plt.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("\nModel Coefficients:")
print(f"Slope: {model.coef_[0]:.2f}")
print(f"Intercept: {model.intercept_:.2f}")