# Linear Regression with Categorical Features

## Load data

In [1]:
path = '../../data/FRED/CPI_MR_2008.parquet'

In [2]:
import pandas as pd
df = pd.read_parquet(path)

df

Unnamed: 0_level_0,CPI,MR,period
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2007-06-01,2.602436,6.581667,Before
2007-06-15,2.554240,6.680000,Before
...,...,...,...
2009-04-03,2.324039,4.845000,After
2009-04-17,2.295340,4.790000,After


## Visualize relationship

### Without categorical features

In [3]:
import plotly.express as px
fig = px.scatter(df, x='CPI', y='MR', trendline='ols')

fig

### With categorical features

## Access results by category

### Get table with all results

Unnamed: 0_level_0,px_fit_results
period,Unnamed: 1_level_1
Before,<statsmodels.regression.linear_model.Regressio...
During,<statsmodels.regression.linear_model.Regressio...
After,<statsmodels.regression.linear_model.Regressio...


### Access the summary tables by category

#### Before

Unnamed: 0,0,1,2,3
0,Model:,OLS,Adj. R-squared:,0.582
1,Dependent Variable:,y,AIC:,-14.2762
2,Date:,2025-09-30 12:36,BIC:,-12.8601
3,No. Observations:,15,Log-Likelihood:,9.1381
4,Df Model:,1,F-statistic:,20.46
5,Df Residuals:,13,Prob (F-statistic):,0.000572
6,R-squared:,0.611,Scale:,0.019977


Unnamed: 0,Coef.,Std.Err.,t,P>|t|,[0.025,0.975]
const,10.897281,0.990315,11.003853,5.866413e-08,8.757836,13.036727
x1,-1.741092,0.384912,-4.523355,0.0005723178,-2.572643,-0.909541


#### During

In [10]:
lr = r.px_fit_results["During"]
result = lr.summary2().tables

In [11]:
result[0].style

Unnamed: 0,0,1,2,3
0,Model:,OLS,Adj. R-squared:,0.799
1,Dependent Variable:,y,AIC:,-20.1985
2,Date:,2025-09-30 12:36,BIC:,-18.5321
3,No. Observations:,17,Log-Likelihood:,12.099
4,Df Model:,1,F-statistic:,64.46
5,Df Residuals:,15,Prob (F-statistic):,8.24e-07
6,R-squared:,0.811,Scale:,0.015984


In [12]:
result[1]

Unnamed: 0,Coef.,Std.Err.,t,P>|t|,[0.025,0.975]
const,1.591036,0.561563,2.833229,0.01258617,0.394093,2.787979
x1,1.586528,0.197611,8.028545,8.241145e-07,1.16533,2.007725


#### After

In [13]:
lr = r.px_fit_results["After"]
result = lr.summary2().tables

In [14]:
result[0].style

Unnamed: 0,0,1,2,3
0,Model:,OLS,Adj. R-squared:,0.813
1,Dependent Variable:,y,AIC:,3.038
2,Date:,2025-09-30 12:36,BIC:,4.8187
3,No. Observations:,18,Log-Likelihood:,0.48102
4,Df Model:,1,F-statistic:,74.72
5,Df Residuals:,16,Prob (F-statistic):,2e-07
6,R-squared:,0.824,Scale:,0.062441


In [15]:
result[1]

Unnamed: 0,Coef.,Std.Err.,t,P>|t|,[0.025,0.975]
const,-0.401125,0.683191,-0.587135,0.5653071,-1.849424,1.047174
x1,2.245705,0.25979,8.6443,2.001604e-07,1.694974,2.796436


### All results at once

Unnamed: 0_level_0,Coef,StdErr,t,p,RÂ²,n
Period,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Before,-1.741092,0.384912,-4.523355,0.0005723178,0.611485,15
During,1.586528,0.197611,8.028545,8.241145e-07,0.81122,17
After,2.245705,0.25979,8.6443,2.001604e-07,0.823641,18


## Manual encoding for categorical feature

{'Before': <statsmodels.regression.linear_model.RegressionResultsWrapper at 0x32910f240>,
 'During': <statsmodels.regression.linear_model.RegressionResultsWrapper at 0x329890b50>,
 'After': <statsmodels.regression.linear_model.RegressionResultsWrapper at 0x329891350>}

0,1,2,3
Model:,OLS,Adj. R-squared:,0.799
Dependent Variable:,CPI,AIC:,-39.4478
Date:,2025-09-30 12:36,BIC:,-37.7814
No. Observations:,17,Log-Likelihood:,21.724
Df Model:,1,F-statistic:,64.46
Df Residuals:,15,Prob (F-statistic):,8.24e-07
R-squared:,0.811,Scale:,0.0051515

0,1,2,3,4,5,6
,Coef.,Std.Err.,t,P>|t|,[0.025,0.975]
const,-0.2779,0.3884,-0.7153,0.4854,-1.1058,0.5501
MR,0.5113,0.0637,8.0285,0.0000,0.3756,0.6471

0,1,2,3
Omnibus:,3.943,Durbin-Watson:,1.207
Prob(Omnibus):,0.139,Jarque-Bera (JB):,2.155
Skew:,-0.857,Prob(JB):,0.34
Kurtosis:,3.322,Condition No.:,140.0
