## Entity & Time Fixed Effects

I mentioned fixed effects on [difference in differences post](https://yigitasik.github.io/posts/Diff-in-diffs/) but I wanted to elaborate a bit further on the topic and show where it's useful. I'm diving right into an example and explain along the way.

In [1]:
import pandas as pd
import numpy as np

from matplotlib import pyplot as plt
import seaborn as sns

import statsmodels.api as sm
import statsmodels.formula.api as smf
from linearmodels.panel import PanelOLS

import warnings

warnings.filterwarnings('ignore')

pd.set_option('display.float_format', lambda x: '%.3f' % x)

In [2]:
df = pd.read_csv('Grunfeld.csv', index_col=0)
df.head()

Unnamed: 0,invest,value,capital,firm,year
1,317.6,3078.5,2.8,General Motors,1935
2,391.8,4661.7,52.6,General Motors,1936
3,410.6,5387.1,156.9,General Motors,1937
4,257.7,2792.2,209.2,General Motors,1938
5,330.8,4313.2,203.4,General Motors,1939


I have data from 11 firms: Their capital, market value, investment for each year between 1935 to 1954. This is a panel data, since I have multiple observations for each firm, on different time periods.

Let's say that I am interested in the relationship between market value and investment. For simplicity, if we had data on a single year we could estimate the following for each firm i:

$\displaystyle invest_i = \beta_0 + \beta_1 value_i + \beta_2 capital_i + \epsilon_i$

However, there are things that we miss with this approach:

1. There might be firm-level variables that we would like to have in the model. These are assumed to be constant for a firm.

The idea is pretty neat actually. Think of having two years of data. Let's say 1935 and 1936:

$\displaystyle invest_{i \, 1936} = \beta_0 + \beta_{1}value_{i \, 1936} + \beta_{2}capital_{i \, 1936} + \beta_{3}\alpha_i + \epsilon_{i \, 1936}$

$\displaystyle invest_{i \, 1935} = \beta_0 + \beta_{1}value_{i \, 1935} + \beta_{2}capital_{i \, 1935} + \beta_{3}\alpha_i + \epsilon_{i \, 1935}$

Now, if you take the difference what happens is those $\beta_{3}\alpha_i$ terms get cancelled. What you're left with is:

$\displaystyle invest_{i \, 1936} - invest_{i \, 1935} = \beta_{1}(value_{i\,1936} - value_{i\,1935}) + \beta_{2}(capital_{i\,1936} - capital_{i\,1935}) + (\epsilon_{i \, 1936} - \epsilon_{i \, 1935})$

I believe this is a very intuitive example. Accounting for unobserved firm-level characteristics is just adding firm as dummy in the regression!

2. The other thing that I haven't mentioed above is the effects that are constant within a time period but may differ between years. These are shared between firms. Think of things like inflation, market trends etc.

Well, I've got the idea. Let's add that as a dummy as well?

In [3]:
lm = smf.ols(
    'invest ~ value + capital + C(firm) + C(year)',
    data=df
)
res = lm.fit()

res.summary()

0,1,2,3
Dep. Variable:,invest,R-squared:,0.953
Model:,OLS,Adj. R-squared:,0.945
Method:,Least Squares,F-statistic:,122.1
Date:,"Sun, 12 Oct 2025",Prob (F-statistic):,5.2e-108
Time:,01:14:10,Log-Likelihood:,-1153.0
No. Observations:,220,AIC:,2370.0
Df Residuals:,188,BIC:,2479.0
Df Model:,31,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,18.0876,18.656,0.970,0.334,-18.715,54.890
C(firm)[T.Atlantic Refining],-112.5008,17.752,-6.337,0.000,-147.520,-77.482
C(firm)[T.Chrysler],-13.5993,17.540,-0.775,0.439,-48.199,21.001
C(firm)[T.Diamond Match],16.4928,15.692,1.051,0.295,-14.462,47.448
C(firm)[T.General Electric],-241.0850,28.000,-8.610,0.000,-296.319,-185.851
C(firm)[T.General Motors],-101.7696,55.177,-1.844,0.067,-210.615,7.075
C(firm)[T.Goodyear],-77.9628,16.435,-4.744,0.000,-110.383,-45.543
C(firm)[T.IBM],-6.4573,16.271,-0.397,0.692,-38.554,25.640
C(firm)[T.US Steel],100.5492,28.438,3.536,0.001,44.450,156.648

0,1,2,3
Omnibus:,32.466,Durbin-Watson:,0.988
Prob(Omnibus):,0.0,Jarque-Bera (JB):,180.276
Skew:,0.311,Prob(JB):,7.14e-40
Kurtosis:,7.391,Cond. No.,39200.0


You can fit the same with `PanelOLS`, like below, and get a cleaner table.

In [4]:
fe_model = PanelOLS.from_formula('invest ~ value + capital + EntityEffects + TimeEffects', data=df.set_index(['firm', 'year']))
fe_res = fe_model.fit()

fe_res.summary

0,1,2,3
Dep. Variable:,invest,R-squared:,0.7253
Estimator:,PanelOLS,R-squared (Between):,0.7637
No. Observations:,220,R-squared (Within):,0.7566
Date:,"Sun, Oct 12 2025",R-squared (Overall):,0.7625
Time:,01:14:15,Log-likelihood,-1153.0
Cov. Estimator:,Unadjusted,,
,,F-statistic:,248.15
Entities:,11,P-value,0.0000
Avg Obs:,20.000,Distribution:,"F(2,188)"
Min Obs:,20.000,,

0,1,2,3,4,5,6
,Parameter,Std. Err.,T-stat,P-value,Lower CI,Upper CI
value,0.1167,0.0129,9.0219,0.0000,0.0912,0.1422
capital,0.3514,0.0210,16.696,0.0000,0.3099,0.3930


One more thing though, check covariance type on both tables (nonrobust, unadjusted). It means errors are assumed to be independent which might be violated here. Think about it, observations are grouped in the sense that they belong to same firm. So, they share some unobserved component. Hence, errors might be correlated within each firm (across year).

For the same reason, errors might be correlated within each year (e.g., firms are subject to same inflation).

So, we should allow residuals to be correlated within groups.

It's possible to use clustered covariance type with `statsmodels` but it doesn't allow it to be 2 dimensional. In other words, you either cluster by entity dimension (e.g., firm) or time dimension (e.g., year). PanelOLS, on the other hand, allows for _two-way clustering_.

In [5]:
fe_model = PanelOLS.from_formula('invest ~ value + capital + EntityEffects + TimeEffects', data=df.set_index(['firm', 'year']))
fe_res = fe_model.fit(cov_type='clustered', cluster_entity=True, cluster_time=True)

fe_res.summary

0,1,2,3
Dep. Variable:,invest,R-squared:,0.7253
Estimator:,PanelOLS,R-squared (Between):,0.7637
No. Observations:,220,R-squared (Within):,0.7566
Date:,"Sun, Oct 12 2025",R-squared (Overall):,0.7625
Time:,01:15:17,Log-likelihood,-1153.0
Cov. Estimator:,Clustered,,
,,F-statistic:,248.15
Entities:,11,P-value,0.0000
Avg Obs:,20.000,Distribution:,"F(2,188)"
Min Obs:,20.000,,

0,1,2,3,4,5,6
,Parameter,Std. Err.,T-stat,P-value,Lower CI,Upper CI
value,0.1167,0.0117,10.015,0.0000,0.0937,0.1397
capital,0.3514,0.0447,7.8622,0.0000,0.2633,0.4396


I feel like this one is a very intuitive example but for more, you can check [this](https://matheusfacure.github.io/python-causality-handbook/14-Panel-Data-and-Fixed-Effects.html).