Automated (Unsupervised) Anomaly Detection Preprocessing Pipeline

I used sklearn's Pipeline and Transformer concept to create this preprocessing pipeline

Abstract View - Project

Decision rules of the pipeline

How to use the pipeline

import numpy as np
import pandas as pd
from dataqualitypipeline import initialize_autoencoder, initialize_autoencoder_modified
from pyod.models.iforest import IForest
from pyod.models.lof import LOF

df_data = pd.read_csv("./HOWTO/players_20.csv")
clf_lof = LOF(n_jobs=-1)

# Init Preprocessing Pipeline
from dataqualitypipeline import DQPipeline
dq_pipe = DQPipeline(
    nominal_columns=["player_tags","preferred_foot",
                     "work_rate","team_position","loaned_from"],

    exclude_columns=["player_url","body_type","short_name", "long_name", 
                     "team_jersey_number","joined","contract_valid_until",
                     "real_face","nation_position","player_positions","nationality","club"],

    time_column_names=["dob"],
    deactivate_pattern_recognition=True,
    remove_columns_with_no_variance=True,
)


# Run Preprocessing-Pipeline (Named dq_pipe)
X_output = dq_pipe.run_pipeline(
    X_train=df_data.iloc[:,0:37],
# Add Anomaly Detection Model (clf)
    clf=clf_lof,
    dump_model=False,
)

X_output.head(40)

Checkout the how_to.ipynb Notebook to use this pipeline.
- There is an example with only train data (unsupervised)

Feel free to contribute 🙂

Reference

https://www.researchgate.net/publication/379640146_Detektion_von_Anomalien_in_der_Datenqualitatskontrolle_mittels_unuberwachter_Ansatze (German Thesis)

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
HOWTO		HOWTO
__pycache__		__pycache__
images		images
preprocessing		preprocessing
prototyping		prototyping
pyod_modified		pyod_modified
visualization		visualization
.gitignore		.gitignore
README.md		README.md
dataqualitypipeline.py		dataqualitypipeline.py
experiment.py		experiment.py
fifa_anomalies.csv		fifa_anomalies.csv
how_to.ipynb		how_to.ipynb
main.py		main.py
pipelines.py		pipelines.py

JAdelhelm/Automated-Anomaly-Detection-Preprocessing-Pipeline

Folders and files

Latest commit

History

Repository files navigation

Automated (Unsupervised) Anomaly Detection Preprocessing Pipeline

I used sklearn's Pipeline and Transformer concept to create this preprocessing pipeline

Abstract View - Project

Decision rules of the pipeline

How to use the pipeline

Feel free to contribute 🙂

Reference

About

Topics

Resources

Stars

Watchers

Forks

Languages