Back to the [README](./README.md)

Back to the [fitting notebook](./05-fitting-df_low.ipynb)

Back to the [previous notebook](./07-linear-regression-using-statsmodels-api.ipynb)

--------------------

In [1]:
import numpy as np
from scipy.stats import linregress, t

from setup import df_low, Col

--------------------

# Linear Regression Using SciPy.Stats

In [2]:
# Set up the data
x = df_low[Col.age2]    # age**2
y = df_low[Col.charges] # charges

# Perform the regression
result = linregress(x, y)

# Quickly check the result
print(f'R^2:\t\t\t{result.rvalue**2}')
print(f'Intercept:\t\t{result.intercept}')
print(f'Intercept StdErr:\t{result.intercept_stderr}')
print(f'Slope:\t\t\t{result.slope}')
print(f'Slope StdErr:\t\t{result.stderr}')

R^2:			0.9581524960085099
Intercept:		1160.1072156361488
Intercept StdErr:	47.14313657362188
Slope:			3.363981303057231
Slope StdErr:		0.022654677900924087


Again, those are essentially the same parameters the other packages
revealed (except for the new ones, `result.stderr` and `result.intercept_stderr`).
The advantage over `sklearn` is that we do get some statistics on our result
at all, the disadvantage with respect to `statsmodel.api` is that we need
to compute the confidence niveous for our slope and the intercept by ourselves.

For that, we will employ the *t-Distribution* as it is shipped with the
package itself.  Or rather, we need the inverse `tinv(p, df)` for a given
error `p` and known degrees of freedom `df`:

In [3]:
tinv = lambda p, df: abs(t.ppf(p/2, df))

With this, we can obtain the scale that shall give us the confidence niveous
when multiplied with the standard errors:

In [5]:
ts = tinv(.05, len(df_low) - 2)

The resulting prediction for our parameters would thus look like this:

In [6]:
print(f'Slope (95%):\t\t{result.slope} +/- {ts*result.stderr}')
print(f'Intersept (95%):\t{result.intercept} +/- {ts*result.intercept_stderr}')

Slope (95%):		3.363981303057231 +/- 0.04445822971698465
Intersept (95%):	1160.1072156361488 +/- 92.515126656635


And for our line, we would write
$$c_{1, r}(a) =  (3.363981 \pm 0.044458) \cdot a^2 + \underbrace{1160.107216 \pm 92.515127}_{n_{1, r}}$$

Note that this is the same as the `statsmodel.api` result, except the confidence margins
are slightly different.  Here, we computed them ourselves using the *t-Distribution*
whereas the `statsmodel.api` result let us extract/ call them directly.

This concludes our small discussion about linear regression in Python using different
packages, and we return back to the [fitting notebook](./age-charge-relation.ipynb) for
this part of our analysis.

--------------------

Back to the [README](./README.md)

Back to the [previous notebook](./07-linear-regression-using-statsmodels-api.ipynb)

Back to the [fitting notebook](./08-linear-regression-using-scipy-stats.ipynb)