#Fraud Detection

In the following, fraud detection problem is studied. In this case the fraud cases are so rare that even with dummy classifier and without any learning, and with classifying everything as non-fraud, the score will be really high and misleading.

##Importing needed libraries

In [1]:
import pandas as pd
from sklearn.dummy import DummyClassifier
from sklearn.model_selection import cross_validate
from sklearn.model_selection import train_test_split
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import ConfusionMatrixDisplay
from sklearn.metrics import confusion_matrix
from sklearn.metrics import classification_report



## Importing and splitting the data

In [None]:

df = pd.read_csv('/content/creditcard.csv')
df.head(3)

Unnamed: 0,Time,V1,V2,V3,V4,V5,V6,V7,V8,V9,...,V21,V22,V23,V24,V25,V26,V27,V28,Amount,Class
0,0.0,-1.359807,-0.072781,2.536347,1.378155,-0.338321,0.462388,0.239599,0.098698,0.363787,...,-0.018307,0.277838,-0.110474,0.066928,0.128539,-0.189115,0.133558,-0.021053,149.62,0
1,0.0,1.191857,0.266151,0.16648,0.448154,0.060018,-0.082361,-0.078803,0.085102,-0.255425,...,-0.225775,-0.638672,0.101288,-0.339846,0.16717,0.125895,-0.008983,0.014724,2.69,0
2,1.0,-1.358354,-1.340163,1.773209,0.37978,-0.503198,1.800499,0.791461,0.247676,-1.514654,...,0.247998,0.771679,0.909412,-0.689281,-0.327642,-0.139097,-0.055353,-0.059752,378.66,0


In [None]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 41841 entries, 0 to 41840
Data columns (total 31 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   Time    41841 non-null  int64  
 1   V1      41841 non-null  float64
 2   V2      41841 non-null  float64
 3   V3      41841 non-null  float64
 4   V4      41841 non-null  float64
 5   V5      41841 non-null  float64
 6   V6      41841 non-null  float64
 7   V7      41841 non-null  float64
 8   V8      41841 non-null  float64
 9   V9      41840 non-null  float64
 10  V10     41840 non-null  float64
 11  V11     41840 non-null  float64
 12  V12     41840 non-null  float64
 13  V13     41840 non-null  float64
 14  V14     41840 non-null  float64
 15  V15     41840 non-null  float64
 16  V16     41840 non-null  float64
 17  V17     41840 non-null  float64
 18  V18     41840 non-null  float64
 19  V19     41840 non-null  float64
 20  V20     41840 non-null  float64
 21  V21     41840 non-null  float64
 22

In [None]:
df = df.dropna()

In [None]:
df_train, df_test = train_test_split(df, test_size=0.3, random_state=111)

In [None]:
df_train.describe(include='all', percentiles=[])

Unnamed: 0,Time,V1,V2,V3,V4,V5,V6,V7,V8,V9,...,V21,V22,V23,V24,V25,V26,V27,V28,Amount,Class
count,199364.0,199364.0,199364.0,199364.0,199364.0,199364.0,199364.0,199364.0,199364.0,199364.0,...,199364.0,199364.0,199364.0,199364.0,199364.0,199364.0,199364.0,199364.0,199364.0,199364.0
mean,94888.815669,0.000492,-0.000726,0.000927,0.00063,3.6e-05,1.1e-05,-0.001286,-0.002889,-0.000891,...,0.001205,0.000155,-0.000198,0.000113,0.000235,0.000312,-0.000366,0.000227,88.164679,0.0017
std,47491.435489,1.95987,1.645519,1.505335,1.413958,1.361718,1.327188,1.210001,1.214852,1.096927,...,0.74851,0.726634,0.628139,0.60506,0.520857,0.48196,0.401541,0.333139,238.925768,0.041201
min,0.0,-56.40751,-72.715728,-31.813586,-5.683171,-42.147898,-26.160506,-43.557242,-73.216718,-13.320155,...,-34.830382,-8.887017,-44.807735,-2.824849,-10.295397,-2.24162,-22.565679,-11.710896,0.0,0.0
50%,84772.5,0.018854,0.065463,0.17908,-0.019531,-0.056703,-0.27529,0.040497,0.022039,-0.052607,...,-0.029146,0.007666,-0.011678,0.041031,0.016587,-0.05279,0.001239,0.011234,22.0,0.0
max,172792.0,2.451888,22.057729,9.382558,16.491217,34.801666,23.917837,44.054461,19.587773,15.594995,...,27.202839,10.50309,22.083545,4.022866,6.07085,3.517346,12.152401,33.847808,11898.09,1.0


In [None]:
X_train_big, y_train_big = df_train.drop(columns=['Class']), df_train['Class']
X_test, y_test = df_test.drop(columns=['Class']), df_test['Class']
X_train, X_valid, y_train, y_valid = train_test_split(X_train_big, y_train_big, test_size=0.3, random_state=123)

We did not use cross validate and split into train and validate so that we can see the scores more clearly.

##Training Models

### Baseline Model: Dummy Classifier

In [None]:

dummy = DummyClassifier(strategy='most_frequent')

pd.DataFrame(cross_validate(dummy, X_train, y_train, return_train_score=True)).mean()

Unnamed: 0,0
fit_time,0.018975
score_time,0.002839
test_score,0.998302
train_score,0.998302


Even with dummy classifier, the score is high because it is just returning the majority (there is no method and is not looking at the features and just looks at the target) and since  the target values are not balanced even if it predicts everything as 0, it is wrong just for a cupple of rows, while the goal is predicting 1s or fraudulants.




In [None]:
df_train['Class'].value_counts(normalize=True)

Unnamed: 0_level_0,proportion
Class,Unnamed: 1_level_1
0,0.9983
1,0.0017


### Decision Tree

In [None]:
pipe_tree = make_pipeline(
    (StandardScaler()),
    (DecisionTreeClassifier())
)
pd.DataFrame(cross_validate(pipe_tree, X_train, y_train, return_train_score=True)).mean()



Unnamed: 0,0
fit_time,13.51194
score_time,0.013335
test_score,0.999197
train_score,1.0



As we said we are predicting the 1s. So when looking at the score it matters


how many 1s are not predicted and how many 0s.


What are positive and negative?
there are two types of binary classification problems:
- Distinguishing between two classes
- Spotting class (fraud transaction, spam, disease). In these cases, this class is positive and the other one is negativie.

In [None]:
pipe_tree.fit(X_train, y_train)

###Confusion Matrix

In [None]:


#plot_confusion_matrix(pipe, X_valid, y_valid, display_labels=['Non Fraud', 'Fraud'], values_format='d', cmap='Blues')
#estimator, X, y, labels (w/oit would be 0 & 1), how numbers are displayed ---> d=digit (otherwise it would be scientific, color)

In [None]:
#matrix
predictions = pipe_tree.predict(X_valid)
confusion_matrix(y_valid, predictions)

#true values, predictions

array([[59679,    29],
       [   26,    76]])

In [None]:
TN, FP, FN, TP = confusion_matrix(y_valid, predictions).ravel()
TN, FP, FN, TP

(59679, 29, 26, 76)

###Precision, Recall and F1 score


.score returns accuracy which is misleading in unbalanced data.So we need other measures.

Although both recall and precision are important, it really depends on the nature of the problem. For example, in the fraud detection problem, finding the frauds is important and the more of the real frauds are picked, the better. which  means recall is more important.

F1 score combines both of them and can be a suitable measure.

In [None]:
y_train.value_counts(normalize=True)

Unnamed: 0_level_0,proportion
Class,Unnamed: 1_level_1
0,0.998302
1,0.001698


In [None]:

print(classification_report(y_valid, pipe_tree.predict(X_valid),
                            target_names=['non_fraudulant', 'fraudulant']))

#w/o printing it doesnt look good

                precision    recall  f1-score   support

non_fraudulant       1.00      1.00      1.00     59708
    fraudulant       0.72      0.75      0.73       102

      accuracy                           1.00     59810
     macro avg       0.86      0.87      0.87     59810
  weighted avg       1.00      1.00      1.00     59810



### Addressing Imbalance in the training data

####class_weight

We will repeat the process for two weight methods:


*   Assigning weights to each class (dict)
*   Using automatically adjusted weights inversely proportional to class frequencies (balanced)


#### Dict Weights

In [None]:

pipe_tree_100 = make_pipeline(
    (StandardScaler()),
    (DecisionTreeClassifier(random_state=7, class_weight={1: 100}))
)
pd.DataFrame(cross_validate(pipe_tree_100, X_train, y_train, return_train_score=True)).mean()



Unnamed: 0,0
fit_time,9.742985
score_time,0.012791
test_score,0.998875
train_score,1.0


In [None]:
pipe_tree_100.fit(X_train, y_train)
predictions = pipe_tree_100.predict(X_valid)
confusion_matrix(y_valid, predictions)

array([[8752,    3],
       [   6,   26]])

In [None]:
TN, FP, FN, TP = confusion_matrix(y_valid, predictions).ravel()
TN, FP, FN, TP

(8752, 3, 6, 26)

In [None]:
print(classification_report(y_valid, pipe_tree_100.predict(X_valid),
                            target_names=['non_fraudulant', 'fraudulant']))


                precision    recall  f1-score   support

non_fraudulant       1.00      1.00      1.00      8755
    fraudulant       0.90      0.81      0.85        32

      accuracy                           1.00      8787
     macro avg       0.95      0.91      0.93      8787
  weighted avg       1.00      1.00      1.00      8787



#### Balanced Weights

In [None]:

pipe_tree_balanced = make_pipeline(
    (StandardScaler()),
    (DecisionTreeClassifier(random_state=7, class_weight='balanced'))
)
pd.DataFrame(cross_validate(pipe_tree_balanced, X_train, y_train, return_train_score=True)).mean()


Unnamed: 0,0
fit_time,6.918325
score_time,0.011048
test_score,0.99881
train_score,1.0


In [None]:
pipe_tree_balanced.fit(X_train, y_train)
predictions = pipe_tree_balanced.predict(X_valid)
confusion_matrix(y_valid, predictions)

array([[59667,    41],
       [   24,    78]])

In [None]:
TN, FP, FN, TP = confusion_matrix(y_valid, predictions).ravel()
TN, FP, FN, TP

(59667, 41, 24, 78)

In [None]:
print(classification_report(y_valid, pipe_tree_100.predict(X_valid),
                            target_names=['non_fraudulant', 'fraudulant']))


                precision    recall  f1-score   support

non_fraudulant       1.00      1.00      1.00     59708
    fraudulant       0.69      0.75      0.72       102

      accuracy                           1.00     59810
     macro avg       0.84      0.88      0.86     59810
  weighted avg       1.00      1.00      1.00     59810



###comparison

#####Decision Tree class_weight=None (default):

TN, FP, FN, TP:(59679, 29, 26, 76)

precision = 0.72

recall = 0.75  

f1_score = 0.73

accuracy = 1.00

#####Decision Tree class_weight={1:100}:
TN, FP, FN, TP:(8752, 3, 6, 26)

precision = 0.90

recall = 0.81   

f1_score = 0.85

accuracy = 1.00

#####Decision Tree class_weight=balanced:
TN, FP, FN, TP:(59667, 41, 24, 78)

precision = 0.69

recall = 0.75   

f1_score = 0.72

accuracy = 1.00


In dictionary method, all scores are better. It is also noteworthy that in this case the FN is brutal and we do not want to miss any positive and classify them as negative.