<a href="https://colab.research.google.com/github/cristiandarioortegayubro/BA/blob/main/pc_da_01.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

![logo](https://github.com/cristiandarioortegayubro/BA/blob/main/dba.png?raw=true)

![logo](https://github.com/cristiandarioortegayubro/BA/blob/main/PyCaret.png?raw=true)

# **Detección de Anomalías - PyCaret**


## **Instalando la librería**

In [1]:
!pip install --pre pycaret[full]

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


## **Carga de librerías necesarias**

### **Analisis de datos**

In [2]:
import numpy as np
import pandas as pd

### **Modelo y preprocesamiento**

In [3]:
from pycaret.anomaly import *

## **Obtención de los Datos**

### **Generación de Datos**

In [4]:
d1 = np.random.multivariate_normal(mean = np.array([-.5, 0]), cov = np.array([[1, 0], [0, 1]]), size = 100)

In [5]:
d2 = np.random.multivariate_normal(mean = np.array([15, 10]), cov = np.array([[1, 0.3], [.3, 1]]), size = 100)

In [6]:
outliers = np.array([[0, 10],[0, 9.5]])

### **Creación del dataframe**

In [7]:
df = pd.DataFrame(np.concatenate([d1, d2, outliers], axis = 0), columns = ['Var 1', 'Var 2'])

In [8]:
df

Unnamed: 0,Var 1,Var 2
0,-0.290685,0.157077
1,1.950299,-1.316325
2,0.229396,-0.135638
3,-1.471264,-0.107086
4,-1.513783,0.925924
...,...,...
197,15.238485,10.589741
198,14.535354,9.701652
199,14.242991,9.689799
200,0.000000,10.000000


In [9]:
anomalia = setup(df,  session_id = 123)

Unnamed: 0,Description,Value
0,Session id,123
1,Original data shape,"(202, 2)"
2,Transformed data shape,"(202, 2)"
3,Numeric features,2
4,Preprocess,True
5,Imputation type,simple
6,Numeric imputation,mean
7,Categorical imputation,constant
8,Low variance threshold,0
9,CPU Jobs,-1


In [10]:
models()

Unnamed: 0_level_0,Name,Reference
ID,Unnamed: 1_level_1,Unnamed: 2_level_1
abod,Angle-base Outlier Detection,pyod.models.abod.ABOD
cluster,Clustering-Based Local Outlier,pyod.models.cblof.CBLOF
cof,Connectivity-Based Local Outlier,pyod.models.cof.COF
iforest,Isolation Forest,pyod.models.iforest.IForest
histogram,Histogram-based Outlier Detection,pyod.models.hbos.HBOS
knn,K-Nearest Neighbors Detector,pyod.models.knn.KNN
lof,Local Outlier Factor,pyod.models.lof.LOF
svm,One-class SVM detector,pyod.models.ocsvm.OCSVM
pca,Principal Component Analysis,pyod.models.pca.PCA
mcd,Minimum Covariance Determinant,pyod.models.mcd.MCD


In [11]:
iforest = create_model('iforest', 
                       max_samples = 202,
                       contamination = 0.01,
                       n_estimators = 50)

Processing:   0%|          | 0/3 [00:00<?, ?it/s]

In [12]:
print(iforest)

IForest(behaviour='new', bootstrap=False, contamination=0.05,
    max_features=1.0, max_samples=202, n_estimators=50, n_jobs=-1,
    random_state=123, verbose=0)


In [13]:
iforest_results = assign_model(iforest)
iforest_results

Unnamed: 0,Var 1,Var 2,Anomaly,Anomaly_Score
0,-0.290685,0.157077,0,-0.184737
1,1.950299,-1.316325,1,0.010671
2,0.229396,-0.135638,0,-0.179353
3,-1.471264,-0.107086,0,-0.152105
4,-1.513783,0.925924,0,-0.134594
...,...,...,...,...
197,15.238484,10.589741,0,-0.168360
198,14.535355,9.701652,0,-0.176116
199,14.242991,9.689799,0,-0.161414
200,0.000000,10.000000,1,0.102779


In [14]:
plot_model(iforest, plot = 'umap')