## DSI-06 Homework 1: ANSWERS
From Chapter 3, found on page 129 of ISLP

*This question involves the use of simple linear regression on the Auto data set*

In [None]:
# Import standard libraries
import numpy as np
import pandas as pd
from matplotlib.pyplot import subplots
import statsmodels.api as sm

# Import specific objects
from textwrap import wrap # to avoiding label overlapping in plots
from ISLP import load_data
from ISLP.models import (ModelSpec as MS,
                         summarize,
                         poly)

# Load dataset
Auto = load_data('Auto')
Auto

_a)_	Use the `sm.OLS()` function to perform a simple linear regression with `mpg` as the response and `horsepower` as the predictor. Use the `summarize()` function to print the results. Comment on the output

(i) Is there a relationship between the predictor and the response?

*Yes, there is a relationship between the predictor and the response. We can reject the null hypothesis that the regression coefficients are zero, since the F-statistic is much larger than 1 and the p-value is zero.*

(ii) How strong is the relationship between the predictor and the response?

*Since the R-squared value is 0.606, we can say that approximately 60% of the variance in mpg is explained by horsepower.*

(iii) Is the relationship between the predictor and the response positive or negative?

*The relationship is negative because the coefficient corresponding to horsepower is equal to -0.1578.*


(iv) What is the predicted mpg associated with a horsepower of 98? What are the associated 95 % confidence and prediction intervals?

*The predicted 'mpg' is equal to 24.47, with a 95% confidence interval of (-0.17, -0.15) and a 95% prediction interval of (23.97, 24.96).*

In [None]:
# Adding a constant term to the predictor variable
X = sm.add_constant(Auto['horsepower'])
y = Auto['mpg']

# Fit the OLS model
model = sm.OLS(y, X).fit()

# Print summary
print(model.summary())

In [None]:
# Summarize results
summarize(model)

In [None]:
intercept = 39.9359
slope = -0.1578
horsepower_value = 98

# Calculate predicted MPG
predicted_mpg = intercept + (slope * horsepower_value)
print(f"Predicted MPG: {predicted_mpg:.2f}")

In [None]:
# Get confidence and prediction intervals
conf_interval = model.conf_int(alpha=0.05)
predict_interval = model.get_prediction(exog=[1, 98]).conf_int(alpha=0.05)

# Extract values from NumPy arrays
conf_interval_values = tuple(conf_interval.loc['horsepower'].values)
predict_interval_values = tuple(predict_interval[0])

print(f"95% Confidence Interval: {conf_interval_values}")
print(f"95% Prediction Interval: {predict_interval_values}")

_b)_	Plot the response and the predictor in a new set of axes `ax`. Use the `ax.axline()` method or the `abline()` function defined in the lab to display the least squares regression line.

In [None]:
# Plotting the response and predictor
fig, ax = subplots()
ax.scatter(Auto['horsepower'], Auto['mpg'], alpha=0.7, label='Data points')
ax.set_xlabel('Horsepower')
ax.set_ylabel('MPG')

# Adding the regression line
ax.plot(Auto['horsepower'], model.predict(X), color='red', label='Regression Line')
ax.legend()

_c)_	Produce some of diagnostic plots of the least squares regression fit as described in the lab. Comment on any problems you see with the fit.

- The QQ plot indicates that the assumption of normality does hold, since we can fit a straight line quite well.
- For the Scale-Location plot, the residuals are plotted against the fitted values, but here they are all made positive and normalized.

In [None]:
# Diagnostic plots
fig, ax = subplots(2, 2, figsize=(10, 8))

# Residuals vs Fitted Values
ax[0, 0].scatter(model.fittedvalues, model.resid, alpha=0.7)
ax[0, 0].set_xlabel('Fitted Values')
ax[0, 0].set_ylabel('Residuals')
ax[0, 0].set_title('Residuals vs Fitted Values')

# QQ Plot
sm.qqplot(model.resid, line='s', ax=ax[0, 1])
ax[0, 1].set_title('QQ Plot')

# Scale-Location plot
ax[1, 0].scatter(model.fittedvalues, abs(model.get_influence().resid_studentized_internal), alpha=0.7)
ax[1, 0].set_xlabel('Fitted Values')
ax[1, 0].set_ylabel('Square Root of Standardized Residuals')
ax[1, 0].set_title('Scale-Location Plot')

# Leverage-Residual Squared plot
sm.graphics.influence_plot(model, ax=ax[1, 1], criterion="cooks", text_coords=(0.1, 0.1))
ax[1, 1].set_title('Leverage-Residual Squared Plot')