# ***CREDIT CARD FRAUD DETECTION***

Build a model to detect fraudulent credit card transactions. Use a
dataset containing information about credit card transactions, and
experiment with algorithms like Logistic Regression, Decision Trees,
or Random Forests to classify transactions as fraudulent or legitimate.

In [1]:
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score,f1_score,confusion_matrix,classification_report
import matplotlib.pyplot as plt
import seaborn as sns


In [2]:
df=pd.read_csv('fraud1Train.csv')

***First 5 data of the given dataset***

In [3]:
df
df.head()

Unnamed: 0.1,Unnamed: 0,trans_date_trans_time,cc_num,merchant,category,amt,first,last,gender,street,...,lat,long,city_pop,job,dob,trans_num,unix_time,merch_lat,merch_long,is_fraud
0,0,01-01-2019 00:00,2703190000000000.0,"fraud_Rippin, Kub and Mann",misc_net,4.97,Jennifer,Banks,F,561 Perry Cove,...,36.0788,-81.1781,3495,"Psychologist, counselling",09-03-1988,0b242abb623afc578575680df30655b9,1325376018,36.011293,-82.048315,0
1,1,01-01-2019 00:00,630423000000.0,"fraud_Heller, Gutmann and Zieme",grocery_pos,107.23,Stephanie,Gill,F,43039 Riley Greens Suite 393,...,48.8878,-118.2105,149,Special educational needs teacher,21-06-1978,1f76529f8574734946361c461b024d99,1325376044,49.159047,-118.186462,0
2,2,01-01-2019 00:00,38859500000000.0,fraud_Lind-Buckridge,entertainment,220.11,Edward,Sanchez,M,594 White Dale Suite 530,...,42.1808,-112.262,4154,Nature conservation officer,19-01-1962,a1a22d70485983eac12b5b88dad1cf95,1325376051,43.150704,-112.154481,0
3,3,01-01-2019 00:01,3534090000000000.0,"fraud_Kutch, Hermiston and Farrell",gas_transport,45.0,Jeremy,White,M,9443 Cynthia Court Apt. 038,...,46.2306,-112.1138,1939,Patent attorney,12-01-1967,6b849c168bdad6f867558c3793159a81,1325376076,47.034331,-112.561071,0
4,4,01-01-2019 00:03,375534000000000.0,fraud_Keeling-Crist,misc_pos,41.96,Tyler,Garcia,M,408 Bradley Rest,...,38.4207,-79.4629,99,Dance movement psychotherapist,28-03-1986,a41d7549acf90789359a9aa5346dcb46,1325376186,38.674999,-78.632459,0


**Describing the dataset**

In [4]:
df.describe()

Unnamed: 0.1,Unnamed: 0,cc_num,amt,zip,lat,long,city_pop,unix_time,merch_lat,merch_long,is_fraud
count,1048575.0,1048575.0,1048575.0,1048575.0,1048575.0,1048575.0,1048575.0,1048575.0,1048575.0,1048575.0,1048575.0
mean,524287.0,4.171565e+17,70.2791,48801.59,38.53336,-90.22626,89057.76,1344906000.0,38.53346,-90.22648,0.005727773
std,302697.7,1.308811e+18,159.9518,26898.04,5.076852,13.75858,302435.1,10197000.0,5.111233,13.77093,0.07546503
min,0.0,60416210000.0,1.0,1257.0,20.0271,-165.6723,23.0,1325376000.0,19.02779,-166.6712,0.0
25%,262143.5,180040000000000.0,9.64,26237.0,34.6205,-96.798,743.0,1336682000.0,34.72954,-96.89864,0.0
50%,524287.0,3520550000000000.0,47.45,48174.0,39.3543,-87.4769,2456.0,1344902000.0,39.36295,-87.43923,0.0
75%,786430.5,4642260000000000.0,83.05,72042.0,41.9404,-80.158,20328.0,1354366000.0,41.95602,-80.23228,0.0
max,1048574.0,4.99235e+18,28948.9,99783.0,66.6933,-67.9503,2906700.0,1362932000.0,67.51027,-66.9509,1.0


In [5]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1048575 entries, 0 to 1048574
Data columns (total 23 columns):
 #   Column                 Non-Null Count    Dtype  
---  ------                 --------------    -----  
 0   Unnamed: 0             1048575 non-null  int64  
 1   trans_date_trans_time  1048575 non-null  object 
 2   cc_num                 1048575 non-null  float64
 3   merchant               1048575 non-null  object 
 4   category               1048575 non-null  object 
 5   amt                    1048575 non-null  float64
 6   first                  1048575 non-null  object 
 7   last                   1048575 non-null  object 
 8   gender                 1048575 non-null  object 
 9   street                 1048575 non-null  object 
 10  city                   1048575 non-null  object 
 11  state                  1048575 non-null  object 
 12  zip                    1048575 non-null  int64  
 13  lat                    1048575 non-null  float64
 14  long              

**Defining the null values in the Dataset**

In [6]:
df.isnull().sum()

Unnamed: 0,0
Unnamed: 0,0
trans_date_trans_time,0
cc_num,0
merchant,0
category,0
amt,0
first,0
last,0
gender,0
street,0


In [7]:
df.dtypes

Unnamed: 0,0
Unnamed: 0,int64
trans_date_trans_time,object
cc_num,float64
merchant,object
category,object
amt,float64
first,object
last,object
gender,object
street,object


**Droping the unwanted Columns**

In [8]:
df.drop(columns=["Unnamed: 0","trans_num","street"],inplace=True)
df

Unnamed: 0,trans_date_trans_time,cc_num,merchant,category,amt,first,last,gender,city,state,zip,lat,long,city_pop,job,dob,unix_time,merch_lat,merch_long,is_fraud
0,01-01-2019 00:00,2.703190e+15,"fraud_Rippin, Kub and Mann",misc_net,4.97,Jennifer,Banks,F,Moravian Falls,NC,28654,36.0788,-81.1781,3495,"Psychologist, counselling",09-03-1988,1325376018,36.011293,-82.048315,0
1,01-01-2019 00:00,6.304230e+11,"fraud_Heller, Gutmann and Zieme",grocery_pos,107.23,Stephanie,Gill,F,Orient,WA,99160,48.8878,-118.2105,149,Special educational needs teacher,21-06-1978,1325376044,49.159047,-118.186462,0
2,01-01-2019 00:00,3.885950e+13,fraud_Lind-Buckridge,entertainment,220.11,Edward,Sanchez,M,Malad City,ID,83252,42.1808,-112.2620,4154,Nature conservation officer,19-01-1962,1325376051,43.150704,-112.154481,0
3,01-01-2019 00:01,3.534090e+15,"fraud_Kutch, Hermiston and Farrell",gas_transport,45.00,Jeremy,White,M,Boulder,MT,59632,46.2306,-112.1138,1939,Patent attorney,12-01-1967,1325376076,47.034331,-112.561071,0
4,01-01-2019 00:03,3.755340e+14,fraud_Keeling-Crist,misc_pos,41.96,Tyler,Garcia,M,Doe Hill,VA,24433,38.4207,-79.4629,99,Dance movement psychotherapist,28-03-1986,1325376186,38.674999,-78.632459,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1048570,10-03-2020 16:07,6.011980e+15,fraud_Fadel Inc,health_fitness,77.00,Haley,Wagner,F,Annapolis,MD,21405,39.0305,-76.5515,92106,"Accountant, chartered certified",28-05-1943,1362931649,38.779464,-76.317042,0
1048571,10-03-2020 16:07,4.839040e+15,"fraud_Cremin, Hamill and Reichel",misc_pos,116.94,Meredith,Campbell,F,Hedrick,IA,52563,41.1826,-92.3097,1583,Geochemist,28-06-1999,1362931670,41.400318,-92.726724,0
1048572,10-03-2020 16:08,5.718440e+11,"fraud_O'Connell, Botsford and Hand",home,21.27,Susan,Mills,F,Louisville,KY,40202,38.2507,-85.7476,736284,Engineering geologist,02-04-1952,1362931711,37.293339,-84.798122,0
1048573,10-03-2020 16:08,4.646850e+18,fraud_Thompson-Gleason,health_fitness,9.52,Julia,Bell,F,West Sayville,NY,11796,40.7320,-73.1000,4056,Film/video editor,25-06-1990,1362931718,39.773077,-72.213209,0


**Selectng first 20000 rows**

In [9]:
df1=df.head(n=20000)
df1.is_fraud.value_counts()

Unnamed: 0_level_0,count
is_fraud,Unnamed: 1_level_1
0,19850
1,150


In [10]:
df1_process=pd.get_dummies(data=df1)
df1_process

Unnamed: 0,cc_num,amt,zip,lat,long,city_pop,unix_time,merch_lat,merch_long,is_fraud,...,dob_31-05-1948,dob_31-05-1994,dob_31-05-1999,dob_31-07-1941,dob_31-07-1961,dob_31-07-1975,dob_31-08-1984,dob_31-08-1985,dob_31-12-1972,dob_31-12-1986
0,2.703190e+15,4.97,28654,36.0788,-81.1781,3495,1325376018,36.011293,-82.048315,0,...,False,False,False,False,False,False,False,False,False,False
1,6.304230e+11,107.23,99160,48.8878,-118.2105,149,1325376044,49.159047,-118.186462,0,...,False,False,False,False,False,False,False,False,False,False
2,3.885950e+13,220.11,83252,42.1808,-112.2620,4154,1325376051,43.150704,-112.154481,0,...,False,False,False,False,False,False,False,False,False,False
3,3.534090e+15,45.00,59632,46.2306,-112.1138,1939,1325376076,47.034331,-112.561071,0,...,False,False,False,False,False,False,False,False,False,False
4,3.755340e+14,41.96,24433,38.4207,-79.4629,99,1325376186,38.674999,-78.632459,0,...,False,False,False,False,False,False,False,False,False,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
19995,4.104310e+15,98.86,92561,33.6401,-116.5567,1661,1326428819,33.767565,-117.341778,0,...,False,False,False,False,False,False,False,False,False,False
19996,3.034330e+13,2.04,16239,41.4622,-79.1306,4172,1326428902,40.594885,-79.119914,0,...,False,False,False,False,False,False,False,False,False,False
19997,3.502090e+15,38.52,20895,39.0298,-77.0793,19054,1326428946,39.552430,-77.582253,0,...,False,False,False,False,False,False,False,False,False,False
19998,2.706980e+15,110.12,45698,39.2830,-82.3977,341,1326428969,38.402828,-82.689669,0,...,False,False,False,False,False,False,False,False,False,False


In [11]:
df1.duplicated().sum()

0

In [12]:
df.shape

(1048575, 20)

**Defining X train and Y train**

In [13]:
xtrain=df1_process.drop(columns=["is_fraud"])
ytrain=df1_process["is_fraud"]

**SIMILARLY CONTINUE THE PROCESS FOR TEST DATA**

In [14]:
df_test=pd.read_csv('fraud1test.csv')
df_test

Unnamed: 0.1,Unnamed: 0,trans_date_trans_time,cc_num,merchant,category,amt,first,last,gender,street,...,lat,long,city_pop,job,dob,trans_num,unix_time,merch_lat,merch_long,is_fraud
0,0,21-06-2020 12:14,2.291160e+15,fraud_Kirlin and Sons,personal_care,2.86,Jeff,Elliott,M,351 Darlene Green,...,33.9659,-80.9355,333497,Mechanical engineer,19-03-1968,2da90c7d74bd46a0caf3777415b3ebd3,1371816865,33.986391,-81.200714,0
1,1,21-06-2020 12:14,3.573030e+15,fraud_Sporer-Keebler,personal_care,29.84,Joanne,Williams,F,3638 Marsh Union,...,40.3207,-110.4360,302,"Sales professional, IT",17-01-1990,324cc204407e99f51b0d6ca0055005e7,1371816873,39.450498,-109.960431,0
2,2,21-06-2020 12:14,3.598220e+15,"fraud_Swaniawski, Nitzsche and Welch",health_fitness,41.28,Ashley,Lopez,F,9333 Valentine Point,...,40.6729,-73.5365,34496,"Librarian, public",21-10-1970,c81755dbbbea9d5c77f094348a7579be,1371816893,40.495810,-74.196111,0
3,3,21-06-2020 12:15,3.591920e+15,fraud_Haley Group,misc_pos,60.05,Brian,Williams,M,32941 Krystal Mill Apt. 552,...,28.5697,-80.8191,54767,Set designer,25-07-1987,2159175b9efe66dc301f149d3d5abf8c,1371816915,28.812398,-80.883061,0
4,4,21-06-2020 12:15,3.526830e+15,fraud_Johnston-Casper,travel,3.19,Nathan,Massey,M,5783 Evan Roads Apt. 465,...,44.2529,-85.0170,1126,Furniture designer,06-07-1955,57ff021bd3f328f8738bb535c302a31b,1371816917,44.959148,-85.884734,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
555714,555714,31-12-2020 23:59,3.056060e+13,fraud_Reilly and Sons,health_fitness,43.77,Michael,Olson,M,558 Michael Estates,...,40.4931,-91.8912,519,Town planner,13-02-1966,9b1f753c79894c9f4b71f04581835ada,1388534347,39.946837,-91.333331,0
555715,555715,31-12-2020 23:59,3.556610e+15,fraud_Hoppe-Parisian,kids_pets,111.84,Jose,Vasquez,M,572 Davis Mountains,...,29.0393,-95.4401,28739,Futures trader,27-12-1999,2090647dac2c89a1d86c514c427f5b91,1388534349,29.661049,-96.186633,0
555716,555716,31-12-2020 23:59,6.011720e+15,fraud_Rau-Robel,kids_pets,86.88,Ann,Lawson,F,144 Evans Islands Apt. 683,...,46.1966,-118.9017,3684,Musician,29-11-1981,6c5b7c8add471975aa0fec023b2e8408,1388534355,46.658340,-119.715054,0
555717,555717,31-12-2020 23:59,4.079770e+12,fraud_Breitenberg LLC,travel,7.99,Eric,Preston,M,7020 Doyle Stream Apt. 951,...,44.6255,-116.4493,129,Cartographer,15-12-1965,14392d723bb7737606b2700ac791b7aa,1388534364,44.470525,-117.080888,0


In [15]:
df_test.drop(columns=["Unnamed: 0","trans_num","street"],inplace=True)
df_test

Unnamed: 0,trans_date_trans_time,cc_num,merchant,category,amt,first,last,gender,city,state,zip,lat,long,city_pop,job,dob,unix_time,merch_lat,merch_long,is_fraud
0,21-06-2020 12:14,2.291160e+15,fraud_Kirlin and Sons,personal_care,2.86,Jeff,Elliott,M,Columbia,SC,29209,33.9659,-80.9355,333497,Mechanical engineer,19-03-1968,1371816865,33.986391,-81.200714,0
1,21-06-2020 12:14,3.573030e+15,fraud_Sporer-Keebler,personal_care,29.84,Joanne,Williams,F,Altonah,UT,84002,40.3207,-110.4360,302,"Sales professional, IT",17-01-1990,1371816873,39.450498,-109.960431,0
2,21-06-2020 12:14,3.598220e+15,"fraud_Swaniawski, Nitzsche and Welch",health_fitness,41.28,Ashley,Lopez,F,Bellmore,NY,11710,40.6729,-73.5365,34496,"Librarian, public",21-10-1970,1371816893,40.495810,-74.196111,0
3,21-06-2020 12:15,3.591920e+15,fraud_Haley Group,misc_pos,60.05,Brian,Williams,M,Titusville,FL,32780,28.5697,-80.8191,54767,Set designer,25-07-1987,1371816915,28.812398,-80.883061,0
4,21-06-2020 12:15,3.526830e+15,fraud_Johnston-Casper,travel,3.19,Nathan,Massey,M,Falmouth,MI,49632,44.2529,-85.0170,1126,Furniture designer,06-07-1955,1371816917,44.959148,-85.884734,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
555714,31-12-2020 23:59,3.056060e+13,fraud_Reilly and Sons,health_fitness,43.77,Michael,Olson,M,Luray,MO,63453,40.4931,-91.8912,519,Town planner,13-02-1966,1388534347,39.946837,-91.333331,0
555715,31-12-2020 23:59,3.556610e+15,fraud_Hoppe-Parisian,kids_pets,111.84,Jose,Vasquez,M,Lake Jackson,TX,77566,29.0393,-95.4401,28739,Futures trader,27-12-1999,1388534349,29.661049,-96.186633,0
555716,31-12-2020 23:59,6.011720e+15,fraud_Rau-Robel,kids_pets,86.88,Ann,Lawson,F,Burbank,WA,99323,46.1966,-118.9017,3684,Musician,29-11-1981,1388534355,46.658340,-119.715054,0
555717,31-12-2020 23:59,4.079770e+12,fraud_Breitenberg LLC,travel,7.99,Eric,Preston,M,Mesa,ID,83643,44.6255,-116.4493,129,Cartographer,15-12-1965,1388534364,44.470525,-117.080888,0


**Selecting 5000 rows for testing**

In [17]:
d1_test=df_test.sample(frac=1,random_state=1).reset_index()
d1_test=df_test.head(n=5000)
df_test.is_fraud.value_counts()

Unnamed: 0_level_0,count
is_fraud,Unnamed: 1_level_1
0,553574
1,2145


In [18]:
test_process=pd.get_dummies(data=d1_test)
test_process

Unnamed: 0,cc_num,amt,zip,lat,long,city_pop,unix_time,merch_lat,merch_long,is_fraud,...,dob_31-05-1948,dob_31-05-1994,dob_31-05-1999,dob_31-07-1941,dob_31-07-1961,dob_31-07-1975,dob_31-08-1984,dob_31-08-1985,dob_31-12-1972,dob_31-12-1986
0,2.291160e+15,2.86,29209,33.9659,-80.9355,333497,1371816865,33.986391,-81.200714,0,...,False,False,False,False,False,False,False,False,False,False
1,3.573030e+15,29.84,84002,40.3207,-110.4360,302,1371816873,39.450498,-109.960431,0,...,False,False,False,False,False,False,False,False,False,False
2,3.598220e+15,41.28,11710,40.6729,-73.5365,34496,1371816893,40.495810,-74.196111,0,...,False,False,False,False,False,False,False,False,False,False
3,3.591920e+15,60.05,32780,28.5697,-80.8191,54767,1371816915,28.812398,-80.883061,0,...,False,False,False,False,False,False,False,False,False,False
4,3.526830e+15,3.19,49632,44.2529,-85.0170,1126,1371816917,44.959148,-85.884734,0,...,False,False,False,False,False,False,False,False,False,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4995,3.770270e+14,19.53,63665,37.3272,-91.0243,241,1371926348,37.406922,-90.809249,0,...,False,False,False,False,False,False,False,False,False,False
4996,2.131260e+14,37.48,12410,42.0740,-74.4530,397,1371926356,41.951789,-74.605906,0,...,False,False,False,False,False,False,False,False,False,False
4997,6.011650e+15,6.79,65072,38.2911,-92.7059,1847,1371926398,37.802784,-93.461897,0,...,False,False,False,False,False,False,False,False,False,False
4998,3.541160e+15,1.42,62668,39.5723,-90.2379,1512,1371926430,39.261794,-90.922877,0,...,False,False,False,False,False,False,False,False,False,False


In [29]:
x_test=df1_process.drop(columns=["is_fraud"])
y_test=df1_process["is_fraud"]

# **Model Training**
Linear Regression,Decision Tree Classfier, KNN Algorithm

In [30]:
log_reg=LogisticRegression(solver='liblinear')
log_reg.fit(xtrain,ytrain)

In [33]:
!pip install --upgrade pandas
import pandas as pd
test_process=pd.get_dummies(data=d1_test)

log_reg=LogisticRegression(solver='liblinear')
log_reg.fit(xtrain,ytrain)

pred=log_reg.predict(x_test)



In [35]:
pred_prob=log_reg.predict_proba(x_test)

**Accuracy score for Logistic Regression**

In [36]:
accuracy_score(y_test,pred)

0.9925

In [37]:
print(classification_report(y_test,pred))

              precision    recall  f1-score   support

           0       0.99      1.00      1.00     19850
           1       0.00      0.00      0.00       150

    accuracy                           0.99     20000
   macro avg       0.50      0.50      0.50     20000
weighted avg       0.99      0.99      0.99     20000



  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


**Decison Tree**

In [38]:
dt=DecisionTreeClassifier()
dt.fit(xtrain,ytrain)

In [39]:
pred1=dt.predict(x_test)

**Accuracy Score for Decson Tree Classifier**

In [40]:
accuracy_score(y_test,pred1)

1.0

In [41]:
f1_score(y_test,pred1)

1.0

**Confusion Matrix for Decision Tree Classifier**

In [42]:
confusion_matrix(y_test,pred1)

array([[19850,     0],
       [    0,   150]])

In [43]:
print(classification_report(y_test,pred1))

              precision    recall  f1-score   support

           0       1.00      1.00      1.00     19850
           1       1.00      1.00      1.00       150

    accuracy                           1.00     20000
   macro avg       1.00      1.00      1.00     20000
weighted avg       1.00      1.00      1.00     20000



**K-Nearest Neighbour ALgorithm**

In [44]:
knn=KNeighborsClassifier(n_neighbors=3)
knn.fit(xtrain,ytrain)

In [46]:
pred1=knn.predict(x_test)

**Accuracy Score for K-nn algorithm**

In [47]:
accuracy_score(y_test,pred1)

0.99335

In [48]:
print(classification_report(y_test,pred1))

              precision    recall  f1-score   support

           0       0.99      1.00      1.00     19850
           1       0.66      0.23      0.34       150

    accuracy                           0.99     20000
   macro avg       0.83      0.62      0.67     20000
weighted avg       0.99      0.99      0.99     20000



### ***CONCLUSION:***
In the Credit Card Fraud Detection by using Logistic Regression,Decision Tree Classifier and K-NN algorithm, among them K-NN algorithim shows best result compare to other two models since decison tree classifier has accuracy 1.00 it leads to overfitting so K-NN algorithm is the best model.  