# Visualizing Linear Regression

## Getting ready


In addition to `plotly`, `numpy` and `pandas`, make sure the `scipy` Python library avaiable in your Python environment
You can install it using the command:

```
pip install scipy 
```

For this recipe we will create two data sets

1. Import the Python modules `numpy`, `pandas`. Import the [`norm`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.norm.html) object from `scipy.stats`. This object will allow us to generate random samples from a normal distribution. This will help us to create data sets to be used in this recipe.

In [1]:
import numpy as np
import pandas as pd
from scipy.stats import norm

2. Create two data sets to be used in this recipe

In [19]:
n = 400
x = np.linspace(0, 15, n)
epsilon = norm().rvs(n)
sigma = 2
y = 2*x + sigma*epsilon
data1 = pd.DataFrame({'x':x, 'y':y})

In [20]:
n = 200
x = np.linspace(0, 15, n)
epsilon = norm(loc=20, scale=100).rvs(n)
y = 0.5*x**3 + epsilon -10
data2 = pd.DataFrame({'x':x, 'y':y})

## How to do it

1. Import the `plotly.express` module as `px`

In [21]:
import plotly.express as px

In [22]:
df = data1

In [23]:
fig = px.scatter(df, x='x', y ='y', 
                 trendline_color_override="red",
                 trendline="ols", 
                 height=600, width=800,
                 title='Scatter with OLS trend line')
fig.show()

In [24]:
results_table = px.get_trendline_results(fig)
results = results_table['px_fit_results'][0]

In [25]:
results.summary()

0,1,2,3
Dep. Variable:,y,R-squared:,0.951
Model:,OLS,Adj. R-squared:,0.951
Method:,Least Squares,F-statistic:,7772.0
Date:,"Sat, 05 Oct 2024",Prob (F-statistic):,2.8699999999999997e-263
Time:,13:36:29,Log-Likelihood:,-837.06
No. Observations:,400,AIC:,1678.0
Df Residuals:,398,BIC:,1686.0
Df Model:,1,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,-0.0351,0.196,-0.179,0.858,-0.421,0.351
x1,1.9968,0.023,88.158,0.000,1.952,2.041

0,1,2,3
Omnibus:,3.657,Durbin-Watson:,1.867
Prob(Omnibus):,0.161,Jarque-Bera (JB):,3.611
Skew:,0.194,Prob(JB):,0.164
Kurtosis:,2.744,Cond. No.,17.5


In [26]:
type(results)

statsmodels.regression.linear_model.RegressionResultsWrapper

In [27]:
residuals = results.resid

In [40]:
fitted = results.fittedvalues

In [35]:
fig = px.scatter(y=residuals, x =fitted, 
                 trendline_color_override="red",
                 trendline="ols", 
                 height=600, width=800,
                 title='Residuals vs Fitted Plot')
fig.show()

In [38]:
influence = results.get_influence()
residual_norm = influence.resid_studentized_internal
leverage = influence.hat_matrix_diag
cooks_distance = influence.cooks_distance[0]

nparams = len(results.params)
nresids = len(residual_norm)


residual_norm_abs_sqrt = np.sqrt(np.abs(residual_norm))

In [58]:
fig = px.scatter(x =fitted, y=residual_norm_abs_sqrt, 
                 trendline_color_override="red",
                 trendline="ols", 
                 height=600, width=800,
                 title='Scale-Location Plot')
fig.update_layout(xaxis_title="Fitted values", yaxis_title=r'$\sqrt{|Standardized Residuals|}$')
fig.show()

In [62]:
fig = px.scatter(x =leverage, y=residual_norm, 
                 height=600, width=800,
                 trendline_color_override="red",
                 trendline="ols", 
                 title='Residual vs Leverage Plot')

fig.update_layout(xaxis_title="Leverage", yaxis_title="Standardized Residuals")
fig.show()

In [63]:
from statsmodels.graphics.gofplots import ProbPlot

In [72]:
QQ = ProbPlot(residual_norm)
theoretical_quantiles = QQ.theoretical_quantiles
sample_quantiles = QQ.sample_quantiles

In [88]:
fig = px.scatter(x =theoretical_quantiles, y=sample_quantiles, 
                 height=600, width=800,
                 title='Normal QQ Plot')
fig.add_traces(px.line(x=theoretical_quantiles, y=theoretical_quantiles, color_discrete_sequence=["red"]).data, )
fig.update_layout(xaxis_title="Theoretical Quantiles", yaxis_title="Standardized Residuals")
fig.show()