# Statsmodels

![](images/statsmodels.png)

Statsmodels provides statistical models and tools for data analysis. It is designed to support the estimation and testing of statistical models in various contexts, including econometrics, linear and non-linear regression, time series analysis, and more. The library is divided into different modules that cover a wide range of statistical techniques and allows users to perform hypothesis testing, statistical modeling, and exploration of relationships within datasets.

In [1]:
import statsmodels.api as sm

In [2]:
import numpy as np
import pandas as pd

np.random.seed(0)

data = pd.DataFrame({
    'Target': np.random.normal(100, 10, 10),
    'X1': np.random.normal(0, 5, 10),
    'X2': np.random.uniform(0, 100, 10),
    'X3': np.random.choice(['A', 'B'], 10)
})

data = pd.get_dummies(data, columns=['X3'], drop_first=True)

data

Unnamed: 0,Target,X1,X2,X3_B
0,117.640523,0.720218,14.335329,False
1,104.001572,7.271368,94.466892,False
2,109.78738,3.805189,52.184832,False
3,122.408932,0.608375,41.466194,True
4,118.67558,2.219316,26.455561,True
5,90.227221,1.668372,77.423369,False
6,109.500884,7.470395,45.615033,True
7,98.486428,-1.025791,56.843395,False
8,98.967811,1.565339,1.87898,False
9,104.105985,-4.270479,61.76355,True


## Regression & Classification

| Model | Function |
| :---: | :------: |
| Linear | `sm.OLS` |
| Logit | `sm.Logit` |
| Poisson | `sm.GLM` with `family=sm.families.Poisson()` |
| Gamma | `sm.GLM` with `family=sm.families.Gamma()` |
| Negative Binomial | `sm.GLM` with `family=sm.families.NegativeBinomial()` |
| Zero-Inflated Poisson | `ZeroInflatedPoisson` |
| Zero-Inflated Negative Binomial | `ZeroInflatedNegativeBinomial` |
| Robust Regression | `sm.RLM` |
| Quantile Regression | `sm.QuantReg` |

In [3]:
model = sm.OLS(data['Target'], data[['X1', 'X2', 'X3']])
model.fit()

model.summary()

KeyError: "['X3'] not in index"

## ANOVA & ANCOVA

**ANOVA**

In [None]:
from statsmodels.formula.api import ols

model = ols('Target ~ X3', data=data)
model.fit()

anova_table = sm.stats.anova_lm(model, typ=2)
anova_table

**ANCOVA**
cambia solo in ols

In [None]:
from statsmodels.formula.api import ols

model = ols('Target ~ X1 + X2 + X3', data=data)
model.fit()

anova_table = sm.stats.anova_lm(model, typ=2)
anova_table

## Time Series

### ARIMA
### SARIMA
### VAR


## Panel Data


## Tests

| Category | Type | Function |
| :------: | :--: | :------: | 
| ANOVA & ANCOVA    |                        | `sm.stats.anova_lm`         |
| Autocorrelation   | Ljung-Box              | `sm.stats.diagnostic.acorr_ljungbox` |
| Homoschedasticity | Breuschâ€“Pagan          | `sm.stats.het_breuschpagan` |
|                   | White                  | `sm.stats.het_white`        |
| Normality         | D'Agostino-Pearson     | `sm.stats.normaltest`       |
|                   | Kolmogorov-Smirnov     | `sm.stats.kstest`           |
|                   | Shapiro-Wilk           | `sm.stats.shapiro`          |
| Parameter Values  | Likelihood Ratio       | `compare_lr_test`           |
|                   | Lagrange Multipliers   | `score_test`                |
|                   | Wald                   | `wald_test`                 |
| Seasonality       | Kwiatkowski-Phillips-Schmidt-Shin (KPSS) | `sm.tsa.kpss` |
| Stationarity      | Dickey-Fuller (ADF)    | `sm.tsa.stattools.adfuller` |
| Time Series Causality | Granger            | `sm.tsa.stattools.grangercausalitytests` |