# Credit Card Anomaly Detection Using Unsupervised Techniques

Anomaly detection is the identification of data points, items, observations or events that do not conform to the expected pattern of a given group. These anomalies occur very infrequently but may signify a large and significant threat such as cyber intrusions or fraud. Anomaly detection is heavily used in behavioral analysis and other forms of analysis in order to aid in learning about the detection, identification and prediction of the occurrence of these anomalies.

##### What to expect?

This notebook will extensively cover 10 steps to achieve fraud detection, so sit back and i will try to be as clear as possible.

    1. Algorithm/Model Selection
    2. Data identification and exploration
    3. Data visualization and presentation
    4. Dataset splitting and training
    5. Dataset pre-processing 
    6. Resampling of data in the dataset
    7. Dataset outlier detection using various algorithms (IForest, LOF, COPOD, and DAN)
    8. Visualization of the outliers and inliers (concentrating more on the outliers)
    9. Evaluation and metrics
    10. Predicting fraudulent transactions with ‘unseen data’

Let's Go!!!

### 1. Algorithm/Model Selection

Anomaly detection can be approached in many ways depending on the nature of data and circumstances. Following is a classification of some of those techniques (https://iwringer.wordpress.com/2015/11/17/anomaly-detection-concepts-and-techniques/)

A system for anomaly detection should NOT be a supervised ML algorithm as it will (maybe) learn only anomalies it has seen during training. The true magic lies in being able to identify an anomaly never seen before...

Some of the algorithms we will test out are;
- Multivariate Gaussian probability
- Auto Encoders
- Local Outlier Factor LOF
- Robust Covariance (Elliptic Envelope)
- Isolation Forest
- One Class SVM

### 2. Data identification and exploration

The dataset contains transactions made by credit cards in September 2013 by European cardholders.
This dataset presents transactions that occurred in two days, where we have 492 frauds out of 284,807 transactions. The dataset is highly unbalanced, the positive class (frauds) account for 0.172% of all transactions.

Due to confidentiality issues, features from V1 to V28 have been transformed using PCA , the only features which have not been transformed with PCA are 'Time' and 'Amount'. Feature 'Time' contains the seconds elapsed between each transaction and the first transaction in the dataset.

##### Import all modules

In [3]:
# Data Processing and Visualiation
import numpy as np
from numpy import ma
import pandas as pd
import math
import matplotlib
import matplotlib.pyplot as plt
import matplotlib.colors as colors
%matplotlib inline
from matplotlib import ticker, cm
from matplotlib.pyplot import figure
import seaborn as sns

# Modeling
from scipy.stats import multivariate_normal
from sklearn.metrics import f1_score, confusion_matrix, classification_report, precision_recall_fscore_support
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from sklearn.neighbors import LocalOutlierFactor
from sklearn.ensemble import IsolationForest
from sklearn.covariance import EllipticEnvelope
from sklearn.svm import OneClassSVM

# Others
import os