# Anomaly Detection API Demonstration

This notebook demonstrates the use of `statsmodels` GLM and diagnostic measures
for anomaly detection using a simple synthetic dataset.


In [16]:
import numpy as np
import pandas as pd
import statsmodels.api as sm


In [17]:
np.random.seed(0)

X = np.random.normal(size=100)
y = (X + np.random.normal(scale=0.5, size=100) > 0).astype(int)

df = pd.DataFrame({"x": X, "y": y})


In [18]:
X_design = sm.add_constant(df["x"])
model = sm.GLM(df["y"], X_design, family=sm.families.Binomial())
results = model.fit()


In [19]:
influence = results.get_influence()
df["cooks_distance"] = influence.cooks_distance[0]
df["deviance_resid"] = results.resid_deviance
df.head()


Unnamed: 0,x,y,cooks_distance,deviance_resid
0,1.764052,1,1.626526e-06,0.050515
1,0.400157,0,0.07830727,-1.920425
2,0.978738,1,0.000175582,0.211465
3,2.240893,1,7.788572e-08,0.021114
4,1.867558,1,8.482332e-07,0.041803


The deviance residuals and Cookâ€™s distance values provide insight into how well
individual observations are explained by the model and how influential they are
on the fitted parameters.

Observations with unusually large diagnostic values may be considered
potential anomalies and can be further investigated.
