## Packages

In [36]:
import yfinance as yf
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# from sklearn.linear_model import LinearRegression
import statsmodels.api as sm
import statsmodels.formula.api as smf

In [2]:
from IPython.display import display, Latex

In [3]:
import warnings
warnings.filterwarnings("ignore")

## Collecting Data

### kaggle.com

In [46]:
import os
from dotenv import load_dotenv

In [None]:
load_dotenv()
os.environ["KAGGLE_USERNAME"] = os.environ.get("KAGGLE_USERNAME")
os.environ['KAGGLE_KEY'] = os.environ.get("KAGGEL_API_TOKEN")

In [50]:
import kaggle
kaggle.api.authenticate()

### github.com

In [42]:
gdp = pd.read_csv('https://raw.githubusercontent.com/datasets/gdp/refs/heads/main/data/gdp.csv')
gdpUSA = gdp[gdp['Country Code']=='USA']
gdpUSA.head(2)

Unnamed: 0,Country Name,Country Code,Year,Value
13340,United States,USA,1960,541988600000.0
13341,United States,USA,1961,561940300000.0


### `yf`

In [None]:
eurusd = yf.Ticker("EURUSD=X").history(start="2024-01-01", end="2024-02-01", interval="1d")
gbpusd = yf.Ticker("GBPUSD=X").history(start="2024-01-01", end="2024-02-01", interval="1d")
# eurusd.to_csv('../00-data/eurusd.csv')
# gbpusd.to_csv('../00-data/gbpusd.csv')

In [None]:
# apple = yf.Ticker("AAPL").history(start="2025-06-01", end="2025-06-20", interval="1d")
apple = yf.Ticker("AAPL").history(start="1996-06-01", end="1996-06-16", interval="1d")
microsoft = yf.Ticker("MSFT").history(start="1996-06-01", end="1996-06-16", interval="1d")

google = yf.Ticker("GOOG").history(start="2021-06-01", end="2025-06-16", interval="1d")
amazon = yf.Ticker("AMZN").history(start="2025-06-01", end="2025-06-16", interval="1d")

byd = yf.Ticker("BYDDY").history(start="2025-09-01", end="2025-09-10", interval="1d")
tesla = yf.Ticker("TSLA").history(start="2025-09-01", end="2025-09-10", interval="1d")

In [116]:
coke = yf.Ticker("COKE").history(start="2025-09-01", end="2025-09-10", interval="1d")
coke.Open.round()

Date
2025-09-02 00:00:00-04:00    117.0
2025-09-03 00:00:00-04:00    117.0
2025-09-04 00:00:00-04:00    117.0
2025-09-05 00:00:00-04:00    119.0
2025-09-08 00:00:00-04:00    121.0
2025-09-09 00:00:00-04:00    123.0
Name: Open, dtype: float64

In [117]:
pepsi = yf.Ticker("PEP").history(start="2025-09-01", end="2025-09-10", interval="1d")
pepsi.Open.round()

Date
2025-09-02 00:00:00-04:00    156.0
2025-09-03 00:00:00-04:00    148.0
2025-09-04 00:00:00-04:00    148.0
2025-09-05 00:00:00-04:00    146.0
2025-09-08 00:00:00-04:00    146.0
2025-09-09 00:00:00-04:00    141.0
Name: Open, dtype: float64

In [126]:
pepper = yf.Ticker("KDP").history(start="2025-09-01", end="2025-09-10", interval="1d")
pepper.Open.round()

Date
2025-09-02 00:00:00-04:00    29.0
2025-09-03 00:00:00-04:00    28.0
2025-09-04 00:00:00-04:00    29.0
2025-09-05 00:00:00-04:00    29.0
2025-09-08 00:00:00-04:00    28.0
2025-09-09 00:00:00-04:00    27.0
Name: Open, dtype: float64

In [127]:
monster = yf.Ticker("MNST").history(start="2025-09-01", end="2025-09-10", interval="1d")
monster.Open.round()

Date
2025-09-02 00:00:00-04:00    62.0
2025-09-03 00:00:00-04:00    62.0
2025-09-04 00:00:00-04:00    64.0
2025-09-05 00:00:00-04:00    64.0
2025-09-08 00:00:00-04:00    62.0
2025-09-09 00:00:00-04:00    63.0
Name: Open, dtype: float64

## Ordinary Least Squares OLS

Linear regression is a fundamental statistical technique used to model the relationship between a dependent variable ($Y$ also known as the response or target variable) and one or more independent variables (also known as predictors or features $X$). 

The **Gauss-Markov theorem** states 
that **Ordinary Least Squares (OLS)** estimators 
are the **best linear unbiased estimators (BLUE)** 
in a linear regression model, 
meaning they have the minimum variance among all linear unbiased estimators.

### Hypothesis Testing

In [138]:
from scipy.stats import t, f

In statistics, a test of significance is a method of reaching a conclusion to either reject or accept certain claims based on the data. 

In the case of regression analysis, it is used to determine whether an independent variable is significant in explaining the variance of the dependent variable.

$$y = \beta x + \alpha$$

* The **null hypothesis H0** would be: $\beta=0$ i.e predictor $x$ is not able to explain the variance of the independent variable $y$.
* Alternative hypothesis **H1** would be: $\beta\neq 0$ i.e $x$ is significant in predicting the value of $y$.

#### F-statistics

Since here we have only one predictor a **T-test** should be enough. 
However, in reality, our model is going to include a number of independent variables. 
This is where **F-statistic** comes into play.

An insignificant **F-test** implies that the predictors have no linear relationship with the target variable.

**F-statistics** is based on the ratio of two variances: the explained variance ($(\hat{y}-\bar{y})$ due to the model) and the unexplained variance ($(y-\bar{y})$ residuals). 
In other words, **F-statistics** compares the explained variance (due to the model) and the unexplained variance (residuals). 
By comparing these variances, **F-statistics** helps us determine whether the regression model significantly explains the variation in the dependent variable or if the variation can be attributed to random chance.

The **F-statistic** follows an **F-distribution**, and its value helps to determine the probability (**p-value**) of observing such a statistic if **the null hypothesis** is true 
(i.e., no relationship between the dependent and independent variables). 
If the **p-value** is smaller than a predetermined significance level (e.g., 0.05), the null hypothesis is rejected, 
and we conclude that the regression model is statistically significant.

* `dfn`: Degrees of freedom for the numerator (often associated with the variance of the group means in ANOVA).
* `dfd`: Degrees of freedom for the denominator (often associated with the variance within groups in ANOVA).

* If the **p-value** associated with the **F-statistic** is ≥ 0.05: Then there is no relationship between ANY of the independent variables and Y
* If the **p-value** associated with the **F-statistic** < 0.05: Then, AT LEAST 1 independent variable is related to Y

In **F-test** hypothesis testing for linear regression, the **F-statistic** is primarily assessed by comparing it against a critical value from the **F-distribution**, based on the model’s and error’s degrees of freedom and a chosen significance level, like 0.05. 

Additionally, the **p-value** associated with the **F-statistic**, typically calculated using statistical software, plays a key role; if it’s below the significance threshold, it indicates the model’s statistical significance, leading to the rejection of the null hypothesis.

### 2D Playground

Classical normal linear regression assumptions (LINE):

L (**Linearity**). A linear relation exists between $Y_i$ and $X_i$. This means the mean value for $Y$ at each level of $X$ falls on the regression line. 
$$Y = \beta X + \alpha + \varepsilon$$
I (**Independence**). - The error terms $\varepsilon$ are independent of the values of the independent variable $X$. That is, there's no connection between how far any two points lie from the regression line. \
N (**Normality**). For any value $X$, the error term $\varepsilon$ has a normal distribution. \
E (**Homoscedasticity**). The variance of the error term ($\varepsilon$), denoted $\sigma^2$, is the same for all $X$. That is, the spread in the $Y$'s for each level of $X$ is the same.

When $\sigma^2$ is small, an observed point $(X_i,  Y_i)$ will almost always fall quite close to the true regression line, 
whereas observations may deviate considerably from their expected values (corresponding to points far from the line) when $\sigma^2$ is large.

Thus, this variance can be used to tell us how good the linear fit is. 

In [5]:
# sns.scatterplot(data=gdpUSA, x='Year', y='Value')
# plt.show()

$$\beta = \frac{n\sum X_i Y_i -\big(\sum X_i\big)\big(\sum Y_i\big)}{n\sum X_i^2 - \big(\sum X_i\big)^2} = \frac{\text{COV}(X, Y)}{\text{VAR}(X)}$$

$$\alpha = \frac{1}{n}\sum Y_i + \frac{1}{n}\sum X_i \beta = \bar{Y}-\bar{X}\beta$$

#### 1

In [35]:
# x = np.array([2, 4, 6, 7, 9])
# y = np.array([5, 10, 10, 15, 20])
x = np.array([5, 11, 15, 17, 20, 22, 25, 27, 30, 35]) 
y = np.array([70, 65, 55, 60, 50, 35, 40, 30, 25, 32])

In [36]:
display(Latex(f"$\sum xy = {np.inner(x, y)}$"))
display(Latex(f"$\sum x = {np.sum(x)}$"))
display(Latex(f"$\sum y = {np.sum(y)}$"))
display(Latex(f"$\sum x^2 = {sum([num ** 2 for num in x])}$"))
display(Latex(f"$\sum y^2 = {sum([num ** 2 for num in y])}$"))

<IPython.core.display.Latex object>

<IPython.core.display.Latex object>

<IPython.core.display.Latex object>

<IPython.core.display.Latex object>

<IPython.core.display.Latex object>

In [37]:
beta = (len(x)*np.inner(x, y)-np.sum(x)*np.sum(y))/(len(x)*sum([num ** 2 for num in x]) - np.sum(x)*np.sum(x))
beta

np.float64(-1.6304023845007451)

In [34]:
alpha = (np.sum(y) - beta * np.sum(x))/len(x)
alpha

np.float64(79.94932935916543)

In [22]:
x * beta + alpha

array([71.79731744, 62.01490313, 55.49329359, 52.23248882, 47.34128167,
       44.0804769 , 39.18926975, 35.92846498, 31.03725782, 22.8852459 ])

In [23]:
1 - sum([num ** 2 for num in x * beta + alpha - y])/sum([num ** 2 for num in y - np.mean(y)])

np.float64(0.8606888179979808)

#### 2

In [6]:
# x = np.array([1, 3, 5, 5, 6])
# y = np.array([2, 1.5, 1.6, 1.4, 1])
x = np.array([2, 4, 5, 6, 8])
y = np.array([135, 128, 120, 118, 110])

In [11]:
display(Latex(f"$\sum xy = {np.inner(x, y)}$"))
display(Latex(f"$\sum x = {np.sum(x)}$"))
display(Latex(f"$\sum y = {np.sum(y)}$"))
display(Latex(f"$\sum x^2 = {sum([num ** 2 for num in x])}$"))

<IPython.core.display.Latex object>

<IPython.core.display.Latex object>

<IPython.core.display.Latex object>

<IPython.core.display.Latex object>

In [12]:
2970/5

594.0

In [13]:
-(611-594)/(25+29)

-0.3148148148148148

In [14]:
beta = (len(x)*np.inner(x, y)-np.sum(x)*np.sum(y))/(len(x)*sum([num ** 2 for num in x]) - np.sum(x)*np.sum(x))
beta

np.float64(-4.25)

In [17]:
(5*2970-25*611)/(5*145-25**2)

-4.25

In [18]:
alpha = (np.sum(y) - beta * np.sum(x))/len(x)
alpha

np.float64(143.45)

**несмещенной оценкой дисперсии ошибок**

In [20]:
e = y - alpha - beta * x # unbiased estimate of error
sigma2 = sum([num ** 2 for num in e])/(len(x)-2)
sigma2

np.float64(2.516666666666667)

**Дисперсия оценки** $\beta$

In [28]:
sigma_b = np.sqrt( sigma2/sum( [ num**2 for num in (x-x.mean()) ] ) )
sigma_b

np.float64(0.3547299442298794)

In [24]:
2.516/20

0.1258

**Дисперсия оценки** $\alpha$

In [27]:
sigma_a = np.sqrt( sigma2 * sum ([num**2 for num in x]) / (len(x) * sum([num**2 for num in x-x.mean()])) )
sigma_a**2

np.float64(3.6491666666666673)

In [26]:
sum([num**2 for num in x-x.mean()])

np.float64(20.0)

In [29]:
print(beta/sigma_b, alpha/sigma_a)

-11.980945136240958 75.09373452903537


#### 3

In [41]:
x = np.array([1, 3, 4, 6, 8])
y = np.array([8, 12, 15, 20, 24])

In [13]:
display(Latex(f"$\sum xy = {np.inner(x, y)}$"))
display(Latex(f"$\sum x = {np.sum(x)}$"))
display(Latex(f"$\sum y = {np.sum(y)}$"))
display(Latex(f"$\sum x^2 = {sum([num ** 2 for num in x])}$"))

<IPython.core.display.Latex object>

<IPython.core.display.Latex object>

<IPython.core.display.Latex object>

<IPython.core.display.Latex object>

In [14]:
beta = (len(x)*np.inner(x, y)-np.sum(x)*np.sum(y))/(len(x)*sum([num ** 2 for num in x]) - np.sum(x)*np.sum(x))
beta

np.float64(2.3424657534246576)

In [15]:
alpha = (np.sum(y) - beta * np.sum(x))/len(x)
alpha

np.float64(5.493150684931507)

In [32]:
e = y - alpha - beta * x # unbiased estimate of error
sigma2 = sum([num ** 2 for num in e])/(len(x)-2)
sigma2

np.float64(2.516666666666667)

In [35]:
sigma_b = np.sqrt( sigma2/sum( [ num**2 for num in (x-x.mean()) ] ) )
sigma_b**2

np.float64(0.12583333333333335)

In [44]:
sigma_a = np.sqrt( sigma2 * sum ([num**2 for num in x]) / (len(x) * sum([num**2 for num in x-x.mean()])) )
sigma_a


np.float64(0.4068285590388356)

In [45]:
print(beta/sigma_b, alpha/sigma_a)

28.904275511715266 13.50237234551455


In [46]:
5.493/0.4

13.7325

In [69]:
x = np.array([5, 6, 7, 8, 9])
y = np.array([200, 140, 110, 85, 70])

In [70]:
lnx = np.log(x)
lny = np.log(y)

In [71]:
display(Latex(f"$\sum lnxlny = {np.inner(lnx, lny)}$"))
display(Latex(f"$\sum lnx = {np.sum(lnx)}$"))
display(Latex(f"$\sum lny = {np.sum(lny)}$"))
display(Latex(f"$\sum lnx^2 = {sum([num ** 2 for num in lnx])}$"))

<IPython.core.display.Latex object>

<IPython.core.display.Latex object>

<IPython.core.display.Latex object>

<IPython.core.display.Latex object>

In [75]:
beta = (len(lnx)*np.inner(lnx, lny)-np.sum(lnx)*np.sum(lny))/(len(lnx)*sum([num ** 2 for num in lnx]) - np.sum(lnx)*np.sum(lnx))
beta

np.float64(-1.7782433547169036)

In [76]:
alpha = (np.sum(lny) - beta * np.sum(lnx))/len(lnx)
alpha

np.float64(8.148999638685542)

In [80]:
e = lny - alpha - beta * lnx # unbiased estimate of error
sigma2 = sum([num ** 2 for num in e])/(len(lnx)-2)
sigma2

np.float64(0.00027776375504360017)

#### 4

##### 2D

In [89]:
x = np.array([1, 2, 3, 4, 5])
y = np.array([6, 16, 35, 55, 86])

In [90]:
display(Latex(f"$\sum xy = {np.inner(x, y)}$"))
display(Latex(f"$\sum x = {np.sum(x)}$"))
display(Latex(f"$\sum y = {np.sum(y)}$"))
display(Latex(f"$\sum x^2 = {sum([num ** 2 for num in x])}$"))

<IPython.core.display.Latex object>

<IPython.core.display.Latex object>

<IPython.core.display.Latex object>

<IPython.core.display.Latex object>

In [91]:
beta = (len(x)*np.inner(x, y)-np.sum(x)*np.sum(y))/(len(x)*sum([num ** 2 for num in x]) - np.sum(x)*np.sum(x))
beta

np.float64(19.9)

In [92]:
alpha = (np.sum(y) - beta * np.sum(x))/len(x)
alpha

np.float64(-20.1)

In [93]:
alpha + beta * x

array([-0.2, 19.7, 39.6, 59.5, 79.4])

In [96]:
e = y - alpha - beta * x # unbiased estimate of error
sigma2 = sum([num ** 2 for num in e])/(len(x)-2)
sigma2

np.float64(45.69999999999996)

In [101]:
sum( [ num**2 for num in (x-x.mean()) ] )

np.float64(10.0)

In [99]:
sigma_b = np.sqrt( sigma2/sum( [ num**2 for num in (x-x.mean()) ] ) )
print(sigma_b**2, sigma_b)

4.569999999999997 2.1377558326431942


In [104]:
sigma_a = np.sqrt( sigma2 * sum ([num**2 for num in x]) / (len(x) * sum([num**2 for num in x-x.mean()])) )
print(sigma_a**2, sigma_a)

50.26999999999995 7.090133990271267


In [105]:
print(beta/sigma_b, alpha/sigma_a)

9.308827367527263 -2.8349252676437757


In [106]:
TSS = sum([num**2 for num in y-y.mean()])
TSS

np.float64(4097.200000000001)

In [107]:
sigma2*(len(x)-2)

np.float64(137.09999999999988)

In [108]:
R=1-sigma2*(len(x)-2)/TSS
R

np.float64(0.9665381235966026)

In [25]:
F=(len(x)-2)*R/(1-R)
F

np.float64(86.6542669584247)

In [22]:
# 2. Add a constant to the independent variable for the intercept
# This creates the design matrix for statsmodels
X_with_intercept = sm.add_constant(x)
# 3. Create and fit the OLS (Ordinary Least Squares) model
model = sm.OLS(y, X_with_intercept)
results = model.fit()
# 4. Print the detailed summary
print(results.summary())

                            OLS Regression Results                            
Dep. Variable:                      y   R-squared:                       0.967
Model:                            OLS   Adj. R-squared:                  0.955
Method:                 Least Squares   F-statistic:                     86.65
Date:                Wed, 11 Feb 2026   Prob (F-statistic):            0.00262
Time:                        09:21:25   Log-Likelihood:                -15.373
No. Observations:                   5   AIC:                             34.75
Df Residuals:                       3   BIC:                             33.96
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const        -20.1000      7.090     -2.835      0.0

##### 3D

In [52]:
x1 = np.array([40,50,40,30,30])
x2 = np.array([60,40,40,20,90])
y = np.array([3,6,5,3,1])
p=2


In [53]:
I = np.array([1, 1, 1, 1, 1])

In [60]:
x = np.array([I, x1, x2]) @ np.array([I, x1, x2]).T
x

array([[    5,   190,   250],
       [  190,  7500,  9300],
       [  250,  9300, 15300]])

In [55]:
np.linalg.inv(x) 

array([[ 7.59677419e+00, -1.56451613e-01, -2.90322581e-02],
       [-1.56451613e-01,  3.76344086e-03,  2.68817204e-04],
       [-2.90322581e-02,  2.68817204e-04,  3.76344086e-04]])

In [73]:
print(472/62, -97/620, -9/310)

7.612903225806452 -0.15645161290322582 -0.02903225806451613


In [68]:
np.array([I, x1, x2]) @ y

array([ 18, 740, 770])

In [86]:
beta = np.linalg.inv(x) @ np.array([I, x1, x2]) @ y
print( beta)

[-1.38709677  0.17580645 -0.03387097]


In [88]:
y-beta @ np.array([I, x1, x2])

array([-0.61290323, -0.0483871 ,  0.70967742, -0.20967742,  0.16129032])

In [85]:
e = y - beta @ np.array([I, x1, x2]) # unbiased estimate of error
sigma2 = sum([num ** 2 for num in e])/(len(y)-p-1)
sigma = np.sqrt(sigma2)

display(Latex(f"$\sum  e^2 = {sum([num ** 2 for num in e])}$"))
display(Latex(r"$\sigma^2 = \frac{1}{n-p-1}\sum e^2" + f" =  {sigma2}$"))
display(Latex(f"$\sigma = {sigma}$"))

<IPython.core.display.Latex object>

<IPython.core.display.Latex object>

<IPython.core.display.Latex object>

In [80]:
sigma_0 = np.sqrt( sigma2 * np.linalg.inv(x)[0, 0])
sigma_1 = np.sqrt( sigma2 * np.linalg.inv(x)[1, 1])
sigma_2 = np.sqrt( sigma2 * np.linalg.inv(x)[2, 2])
# sigma2 = 7.06
display(Latex(f"$\sigma^2_0 " + f" =  {sigma_0**2}$"))
display(Latex(f"$\sigma_0 =  {sigma_0}$"))
display(Latex(f"$\sigma^2_1 = " + r"\frac{\sigma^2}{\sum (x_1-\bar{x_1})^2}" + f" =  {sigma_1**2}$"))
display(Latex(f"$\sigma_1 =  {sigma_1}$"))
display(Latex(f"$\sigma^2_2 = " + r"\frac{\sigma^2}{\sum (x_2-\bar{x_2})^2}" + f" =  {sigma_2**2}$"))
display(Latex(f"$\sigma_2 =  {sigma_2}$"))

<IPython.core.display.Latex object>

<IPython.core.display.Latex object>

<IPython.core.display.Latex object>

<IPython.core.display.Latex object>

<IPython.core.display.Latex object>

<IPython.core.display.Latex object>

In [46]:
# Convert to pandas DataFrame for easier handling, especially with the formula API
df = pd.DataFrame(np.array([x1, x2]).T, columns=['x1', 'x2'])
df['y'] = y

In [47]:
# Method 2: Using the formula API (often more convenient)
# Requires 'statsmodels.formula.api'
model_formula = smf.ols("y ~ x1 + x2", data=df)
results_formula = model_formula.fit()

In [48]:
# 4. Print the summary of the regression results
print("Results from formula OLS:")
print(results_formula.summary())

Results from formula OLS:
                            OLS Regression Results                            
Dep. Variable:                      y   R-squared:                       0.937
Model:                            OLS   Adj. R-squared:                  0.875
Method:                 Least Squares   F-statistic:                     14.97
Date:                Wed, 11 Feb 2026   Prob (F-statistic):             0.0626
Time:                        10:04:30   Log-Likelihood:                -2.9471
No. Observations:                   5   AIC:                             11.89
Df Residuals:                       2   BIC:                             10.72
Df Model:                           2                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     -1.3871     

#### Control Work 1

In [85]:
# x = coke.Open.round().values #.reshape(-1, 1)
# y = pepsi.Open.round().values
# x = np.array([103, 127, 126, 124, 124])
# y = np.array([26, 24, 25, 26, 27]) 
# x = np.array([9, 12, 4, 3, 10])
# y = np.array([80, 82, 65, 62, 90]) 
# x = np.array([8, 10, 5, 5, 10])
# y = np.array([85, 85, 68, 65, 90]) 
# x = np.array([75, 80, 82, 85, 90])
# y = np.array([52, 60, 64, 68, 75]) 
x = np.array([1, 2, 3, 4, 5])
y = np.array([6, 16, 35, 55, 86])
p = 1

In [86]:
from IPython.display import display, Latex

In [87]:
display(Latex(f"$\sum xy = {np.inner(x, y)}$"))
display(Latex(f"$\sum x = {np.sum(x)}$"))
display(Latex(f"$\sum y = {np.sum(y)}$"))
display(Latex(f"$\sum x^2 = {sum([num ** 2 for num in x])}$"))
display(Latex(f"$\sum y^2 = {sum([num ** 2 for num in y])}$"))

<IPython.core.display.Latex object>

<IPython.core.display.Latex object>

<IPython.core.display.Latex object>

<IPython.core.display.Latex object>

<IPython.core.display.Latex object>

In [88]:
beta = (len(x)*np.inner(x, y)-np.sum(x)*np.sum(y))/(len(x)*sum([num ** 2 for num in x]) - np.sum(x)*np.sum(x))
beta

np.float64(19.9)

In [89]:
# beta = 2.1724
alpha = (np.sum(y) - beta * np.sum(x))/len(x)
alpha

np.float64(-20.1)

In [90]:
e = y - alpha - beta * x # unbiased estimate of error
sigma2 = sum([num ** 2 for num in e])/(len(x)-p-1)
sigma = np.sqrt(sum([num ** 2 for num in e])/(len(x)-p-1))

display(Latex(f"$\sum  e^2 = {sum([num ** 2 for num in e])}$"))
display(Latex(r"$\sigma^2 = \frac{1}{n-2}\sum e^2" + f" =  {sum([num ** 2 for num in e])/(len(x)-p-1)}$"))
display(Latex(f"$\sigma = {np.sqrt((sum([num ** 2 for num in e])/(len(x)-p-1)))}$"))

<IPython.core.display.Latex object>

<IPython.core.display.Latex object>

<IPython.core.display.Latex object>

In [91]:
sigma_b = np.sqrt( sigma2.round(decimals=2)/sum( [ num**2 for num in (x-x.mean()) ] ) )
# sigma2 = 7.06

display(Latex(f"$\sigma^2_b = " + r"\frac{\sigma^2}{\sum (x-\bar{x})^2}" + f" =  {sigma2/sum( [num**2 for num in (x-x.mean())])}$"))
display(Latex(f"$\sigma_b =  {np.sqrt(sigma2/sum( [num**2 for num in (x-x.mean())]))}$"))

<IPython.core.display.Latex object>

<IPython.core.display.Latex object>

In [92]:
sigma_a = np.sqrt( sigma2 * sum ([num**2 for num in x]) / (len(x) * sum([num**2 for num in x-x.mean()])) )

print(f"\u03C3^2_a = \u03C3^2\u03A3 x^2/ n\u03A3 (x-x_avg)^2=  {sigma_a**2}")
print(f"\u03C3_a =  {sigma_a}")

σ^2_a = σ^2Σ x^2/ nΣ (x-x_avg)^2=  50.26999999999995
σ_a =  7.090133990271267


In [93]:
print(f"\u03C3_b = {sigma_b},    \u03C3_a = {sigma_a}")

σ_b = 2.137755832643195,    σ_a = 7.090133990271267


In [94]:
print(f"TSS = {sum([num**2 for num in (y-np.mean(y))])}")
print(f"RSS = {sum([num**2 for num in (beta*x+alpha-np.mean(y))])}")

TSS = 4097.200000000001
RSS = 3960.100000000001


In [95]:
# 2. Add a constant to the independent variable for the intercept
# This creates the design matrix for statsmodels
X_with_intercept = sm.add_constant(x)
# 3. Create and fit the OLS (Ordinary Least Squares) model
model = sm.OLS(y, X_with_intercept)
results = model.fit()
# 4. Print the detailed summary
print(results.summary())

                            OLS Regression Results                            
Dep. Variable:                      y   R-squared:                       0.967
Model:                            OLS   Adj. R-squared:                  0.955
Method:                 Least Squares   F-statistic:                     86.65
Date:                Tue, 11 Nov 2025   Prob (F-statistic):            0.00262
Time:                        20:02:16   Log-Likelihood:                -15.373
No. Observations:                   5   AIC:                             34.75
Df Residuals:                       3   BIC:                             33.96
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const        -20.1000      7.090     -2.835      0.0

In [97]:
print(sigma_a, sigma_b)

7.090133990271267 2.137755832643195


In [30]:
t.ppf(0.025, len(x)-2)

np.float64(-3.1824463052842638)

In [67]:
palpha = 0.025
print(f"{beta + t.ppf(palpha, len(x)-2)*sigma_b}, {beta - t.ppf(palpha, len(x)-2)*sigma_b}")

1.7862002156187953, 5.213799784381205


In [102]:
palpha = 0.01
print(f"{alpha + t.ppf(palpha, len(x)-2)*sigma_a}, {alpha - t.ppf(palpha, len(x)-2)*sigma_a}")

27.442201766312266, 82.03859823368774


## References

- [The Multiple Linear Regression Model](https://online.stat.psu.edu/stat462/node/131/)
- [Hypothesis Test for Linear Regression](https://stats.libretexts.org/Bookshelves/Introductory_Statistics/Mostly_Harmless_Statistics_(Webb)/12%3A_Correlation_and_Regression/12.02%3A_Simple_Linear_Regression/12.2.01%3A_Hypothesis_Test_for_Linear_Regression)
- [F-statistic: Understanding model significance using python](https://medium.com/analytics-vidhya/f-statistic-understanding-model-significance-using-python-c1371980b796)
- [F-test & F-statistics in Linear Regression: Formula, Examples](https://vitalflux.com/interpreting-f-statistics-in-linear-regression-formula-examples/)
- [Understand the F-statistic in Linear Regression](https://quantifyinghealth.com/f-statistic-in-linear-regression/)
- [P Value Calculator](https://www.graphpad.com/quickcalcs/pvalue1/)
- [F-distribution table](https://numiqo.com/tutorial/f-distribution)