<a href="https://colab.research.google.com/github/cristiandarioortegayubro/BA/blob/main/da_01.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

![logo](https://github.com/cristiandarioortegayubro/BA/blob/main/dba.png?raw=true)

![](https://scikit-learn.org/stable/_static/scikit-learn-logo-small.png)

[scikit-learn](https://scikit-learn.org/stable/modules/generated/sklearn.covariance.EllipticEnvelope.html)

# **Detección de Anomalías - FastMCD**


## **Carga de librerías necesarias**

### **Analisis de datos**

In [1]:
import numpy as np
import pandas as pd

### **Modelo y preprocesamiento**

In [2]:
from sklearn.covariance import EllipticEnvelope

### **Graficos**

In [3]:
import matplotlib.pyplot as plt
import seaborn as sns
import altair as alt
import plotly.express as px

## **Obtención de los Datos**

### **Generación de Datos**

In [4]:
d1 = np.random.multivariate_normal(mean = np.array([-.5, 0]), cov = np.array([[1, 0], [0, 1]]), size = 100)

In [5]:
d2 = np.random.multivariate_normal(mean = np.array([15, 10]), cov = np.array([[1, 0.3], [.3, 1]]), size = 100)

In [6]:
outliers = np.array([[0, 10],[0, 9.5]])

### **Creación del dataframe**

In [7]:
df = pd.DataFrame(np.concatenate([d1, d2, outliers], axis = 0), columns = ['Var 1', 'Var 2'])

In [8]:
df

Unnamed: 0,Var 1,Var 2
0,0.173589,0.443017
1,-0.637995,0.381352
2,-1.328323,1.278494
3,0.831462,-1.381213
4,0.847861,0.489595
...,...,...
197,12.735312,9.025267
198,16.098521,10.243032
199,16.723681,10.123145
200,0.000000,10.000000


## **Modelo**

### **Estableciendo el algoritmo**

In [9]:
el = EllipticEnvelope(store_precision=True, 
                      assume_centered=False, 
                      support_fraction=None, 
                      contamination=0.0075, 
                      random_state=0)

### **Aplicando el algoritmo al dataframe**

In [10]:
el.fit(df)

EllipticEnvelope(contamination=0.0075, random_state=0)

### **Realizando la predicción**

In [11]:
df['Anomalia'] = el.predict(df)

In [12]:
df

Unnamed: 0,Var 1,Var 2,Anomalia
0,0.173589,0.443017,1
1,-0.637995,0.381352,1
2,-1.328323,1.278494,1
3,0.831462,-1.381213,1
4,0.847861,0.489595,1
...,...,...,...
197,12.735312,9.025267,1
198,16.098521,10.243032,1
199,16.723681,10.123145,1
200,0.000000,10.000000,-1


## **Graficos del modelo**

### **Contaminación 0.0075**

In [21]:
px.scatter(df,
           x='Var 1', 
           y='Var 2',
           color=df["Anomalia"].astype(str),
           color_continuous_scale='Bluered_r',
           template="simple_white",
           labels={"Var 1":"Variable 1",
                   "Var 2":"Variable 2",
                   "color":"Anomalia"})

In [22]:
px.scatter(df,
           x='Var 1', 
           y='Var 2',
           color="Anomalia",
           color_continuous_scale='Bluered_r',
           template="simple_white",
           labels={"Var 1":"Variable 1",
                   "Var 2":"Variable 2",
                   "color":"Anomalia"})

### **Contaminación 0.1**

In [23]:
el = EllipticEnvelope(store_precision=True, 
                      assume_centered=False, 
                      support_fraction=None, 
                      contamination=0.1, 
                      random_state=0)

In [24]:
el.fit(df)

EllipticEnvelope(random_state=0)

In [25]:
df['Anomalia'] = el.predict(df)

In [26]:
px.scatter(df,
           x='Var 1', 
           y='Var 2',
           color=df["Anomalia"].astype(str),
           color_continuous_scale='Bluered_r',
           template="simple_white",
           labels={"Var 1":"Variable 1",
                   "Var 2":"Variable 2",
                   "color":"Anomalia"})

In [27]:
px.scatter(df,
           x='Var 1', 
           y='Var 2',
           color="Anomalia",
           color_continuous_scale='Bluered_r',
           template="simple_white",
           labels={"Var 1":"Variable 1",
                   "Var 2":"Variable 2",
                   "color":"Anomalia"})

## **Estimación de la contaminación**

### **Distancia de Mahalanobis**

In [28]:
df['Distancia de Mahalanobis'] = el.mahalanobis(df)

In [29]:
df

Unnamed: 0,Var 1,Var 2,Anomalia,Distancia de Mahalanobis
0,0.173589,0.443017,1,0.891479
1,-0.637995,0.381352,1,1.119987
2,-1.328323,1.278494,1,4.751889
3,0.831462,-1.381213,-1,9.110237
4,0.847861,0.489595,1,1.162981
...,...,...,...,...
197,12.735312,9.025267,1,0.741682
198,16.098521,10.243032,1,1.556690
199,16.723681,10.123145,1,2.800120
200,0.000000,10.000000,-1,127.270967


In [34]:
px.box(df, 
       y="Distancia de Mahalanobis",
       template= "simple_white")

In [33]:
px.box(df, 
       y="Distancia de Mahalanobis",
       color="Anomalia",
       template= "gridon")