<a href="https://colab.research.google.com/github/cristiandarioortegayubro/BA/blob/main/da_02.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

![logo](https://github.com/cristiandarioortegayubro/BA/blob/main/dba.png?raw=true)

![](https://scikit-learn.org/stable/_static/scikit-learn-logo-small.png)

[scikit-learn](https://scikit-learn.org/stable/modules/generated/sklearn.covariance.EllipticEnvelope.html)

# **Detección de Anomalías - Isolation Forest**


## **Carga de librerías necesarias**

### **Analisis de datos**

In [1]:
import numpy as np
import pandas as pd

### **Modelo y preprocesamiento**

In [2]:
from sklearn.ensemble import IsolationForest

### **Graficos**

In [3]:
import plotly.express as px

## **Obtención de los Datos**

### **Generación de Datos**

In [4]:
d1 = np.random.multivariate_normal(mean = np.array([-.5, 0]), cov = np.array([[1, 0], [0, 1]]), size = 100)

In [5]:
d2 = np.random.multivariate_normal(mean = np.array([15, 10]), cov = np.array([[1, 0.3], [.3, 1]]), size = 100)

In [6]:
outliers = np.array([[0, 10],[0, 9.5]])

### **Creación del dataframe**

In [7]:
df = pd.DataFrame(np.concatenate([d1, d2, outliers], axis = 0), columns = ['Var 1', 'Var 2'])

In [8]:
df

Unnamed: 0,Var 1,Var 2
0,-1.044292,-0.362318
1,-2.047155,-2.223847
2,1.645173,0.874702
3,0.948420,1.574408
4,-1.862979,0.634817
...,...,...
197,13.866867,10.884350
198,13.922819,8.466816
199,15.148751,12.095164
200,0.000000,10.000000


## **Modelo**

### **Estableciendo el algoritmo**

In [9]:
iforest = IsolationForest(n_estimators = 50, 
                          max_samples = 202, 
                          contamination = .01, 
                          max_features = 2, 
                          bootstrap = False, 
                          n_jobs = 1, 
                          random_state = 1, 
                          verbose = 0, 
                          warm_start = False)

### **Aplicando el algoritmo al dataframe**

In [17]:
X = np.array(df)

In [19]:
iforest.fit(X)

IsolationForest(contamination=0.01, max_features=2, max_samples=202,
                n_estimators=50, n_jobs=1, random_state=1)

### **Realizando la predicción**

In [20]:
prediccion = iforest.predict(X)

In [21]:
prediccion

array([ 1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1, -1,  1,  1,  1,  1,  1,
        1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,
        1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,
        1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,
        1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,
        1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,
        1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,
        1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,
        1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1, -1,  1,  1,  1,
        1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,
        1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,
        1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1, -1,  1])

### **Puntuación**

In [22]:
score = iforest.decision_function(X)

## **Graficos del modelo**

### **Contaminación 0.01**

In [23]:
df1 = df
df1["Score"] = score

In [24]:
px.scatter(df,
           x='Var 1', 
           y='Var 2',
           color="Score",
           color_continuous_scale='Inferno',
           template="simple_white",
           labels={"Var 1":"Variable 1",
                   "Var 2":"Variable 2",
                   "color":"Score"})

In [25]:
df1["Prediccion"] = prediccion

In [26]:
px.scatter(df,
           x='Var 1', 
           y='Var 2',
           color="Prediccion",
           color_continuous_scale='Bluered_r',
           template="simple_white",
           labels={"Var 1":"Variable 1",
                   "Var 2":"Variable 2",
                   "color":"Score"})

### **Contaminación 0.1**

In [27]:
iforest = IsolationForest(n_estimators = 50, 
                          max_samples = 202, 
                          contamination = .1, 
                          max_features = 2, 
                          bootstrap = False, 
                          n_jobs = 1, 
                          random_state = 1, 
                          verbose = 0, 
                          warm_start = False)

In [28]:
iforest.fit(X)
prediccion = iforest.predict(X)
score = iforest.decision_function(X)

In [29]:
df1 = df
df1["Score"] = score

In [30]:
px.scatter(df,
           x='Var 1', 
           y='Var 2',
           color="Score",
           color_continuous_scale='Inferno',
           template="simple_white",
           labels={"Var 1":"Variable 1",
                   "Var 2":"Variable 2",
                   "color":"Score"})

In [31]:
df1["Prediccion"] = prediccion

In [32]:
px.scatter(df,
           x='Var 1', 
           y='Var 2',
           color="Prediccion",
           color_continuous_scale='Bluered_r',
           template="simple_white",
           labels={"Var 1":"Variable 1",
                   "Var 2":"Variable 2",
                   "color":"Score"})

### **Contaminación 0.3**

In [33]:
iforest = IsolationForest(n_estimators = 50, 
                          max_samples = 202, 
                          contamination = .3, 
                          max_features = 2, 
                          bootstrap = False, 
                          n_jobs = 1, 
                          random_state = 1, 
                          verbose = 0, 
                          warm_start = False)

In [34]:
iforest.fit(X)
prediccion = iforest.predict(X)
score = iforest.decision_function(X)

In [35]:
df2 = df
df2["Score"] = score

In [36]:
px.scatter(df,
           x='Var 1', 
           y='Var 2',
           color="Score",
           color_continuous_scale='Inferno',
           template="simple_white",
           labels={"Var 1":"Variable 1",
                   "Var 2":"Variable 2",
                   "color":"Score"})

In [37]:
df2["Prediccion"] = prediccion

In [38]:
px.scatter(df,
           x='Var 1', 
           y='Var 2',
           color="Prediccion",
           color_continuous_scale='Bluered_r',
           template="simple_white",
           labels={"Var 1":"Variable 1",
                   "Var 2":"Variable 2",
                   "color":"Score"})