# Fraud Detection

EDA on a fraud detection dataset, with statistical validation and actionable business insights.

#### 1) Importing Necessary Libraries:

In [11]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

#### 2) Data Overview:

##### 1) Data Loading:

In [15]:
df= pd.read_csv('FraudDetection_final.csv')

In [16]:
df.head()

Unnamed: 0,transaction_id,customer_id,transaction_amount,transaction_type,transaction_time,device_type,location,merchant_category,account_age_days,num_prev_transactions,avg_transaction_amount,is_international,is_high_risk_country,failed_login_attempts,card_present,fraud
0,1,4174,180.35,Online Purchase,2023-01-01 00:00:00,Mobile,Rural,Travel,1753.0,441,239.66,0,0,3.0,0,0
1,2,4507,105.99,ATM Withdrawal,2023-01-01 01:00:00,Desktop,Urban,Electronics,1654.0,260,75.8,0,0,2.0,0,0
2,3,1860,,ATM Withdrawal,2023-01-01 02:00:00,Mobile,Suburban,Clothing,445.0,332,215.77,0,0,4.0,0,0
3,4,2294,,Online Purchase,2023-01-01 03:00:00,Desktop,Urban,Electronics,348.0,231,180.2,0,0,,0,0
4,5,2130,194.96,Online Purchase,2023-01-01 04:00:00,Mobile,Urban,Clothing,,247,57.9,0,0,,0,0


In [18]:
df.tail()

Unnamed: 0,transaction_id,customer_id,transaction_amount,transaction_type,transaction_time,device_type,location,merchant_category,account_age_days,num_prev_transactions,avg_transaction_amount,is_international,is_high_risk_country,failed_login_attempts,card_present,fraud
7995,7996,2576,116.15,Online Purchase,2023-11-30 03:00:00,Tablet,Urban,Travel,308.0,473,34.12,0,0,2.0,0,0
7996,7997,1335,58.22,Online Purchase,2023-11-30 04:00:00,Mobile,Urban,,803.0,195,176.79,0,0,4.0,0,0
7997,7998,3209,273.65,POS Purchase,2023-11-30 05:00:00,Mobile,Rural,Clothing,882.0,271,88.02,0,0,2.0,0,1
7998,7999,4231,201.13,Transfer,2023-11-30 06:00:00,Mobile,Urban,Clothing,1563.0,472,64.67,0,0,2.0,0,0
7999,8000,2497,296.15,Transfer,2023-11-30 07:00:00,Mobile,Urban,Electronics,460.0,466,43.79,0,0,,0,0


##### 2) Data Overview:

In [17]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8000 entries, 0 to 7999
Data columns (total 16 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   transaction_id          8000 non-null   int64  
 1   customer_id             8000 non-null   int64  
 2   transaction_amount      7600 non-null   float64
 3   transaction_type        8000 non-null   object 
 4   transaction_time        8000 non-null   object 
 5   device_type             7840 non-null   object 
 6   location                8000 non-null   object 
 7   merchant_category       7760 non-null   object 
 8   account_age_days        7360 non-null   float64
 9   num_prev_transactions   8000 non-null   int64  
 10  avg_transaction_amount  8000 non-null   float64
 11  is_international        8000 non-null   int64  
 12  is_high_risk_country    8000 non-null   int64  
 13  failed_login_attempts   7200 non-null   float64
 14  card_present            8000 non-null   

As can be seen here,
- Several Columns, 'transaction_amount', 'device_type', 'merchant_category', 'account_age_days' and 'failed_login_attempts' have some Missing Values.
- 'transaction_time' column is stored as object datatype instead of datetime datatype.

###### Converting Transaction Time Column to Date-Time Data Type:

In [19]:
df['transaction_time'] = pd.to_datetime(df['transaction_time'])

In [20]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8000 entries, 0 to 7999
Data columns (total 16 columns):
 #   Column                  Non-Null Count  Dtype         
---  ------                  --------------  -----         
 0   transaction_id          8000 non-null   int64         
 1   customer_id             8000 non-null   int64         
 2   transaction_amount      7600 non-null   float64       
 3   transaction_type        8000 non-null   object        
 4   transaction_time        8000 non-null   datetime64[ns]
 5   device_type             7840 non-null   object        
 6   location                8000 non-null   object        
 7   merchant_category       7760 non-null   object        
 8   account_age_days        7360 non-null   float64       
 9   num_prev_transactions   8000 non-null   int64         
 10  avg_transaction_amount  8000 non-null   float64       
 11  is_international        8000 non-null   int64         
 12  is_high_risk_country    8000 non-null   int64   

In [21]:
df.head()

Unnamed: 0,transaction_id,customer_id,transaction_amount,transaction_type,transaction_time,device_type,location,merchant_category,account_age_days,num_prev_transactions,avg_transaction_amount,is_international,is_high_risk_country,failed_login_attempts,card_present,fraud
0,1,4174,180.35,Online Purchase,2023-01-01 00:00:00,Mobile,Rural,Travel,1753.0,441,239.66,0,0,3.0,0,0
1,2,4507,105.99,ATM Withdrawal,2023-01-01 01:00:00,Desktop,Urban,Electronics,1654.0,260,75.8,0,0,2.0,0,0
2,3,1860,,ATM Withdrawal,2023-01-01 02:00:00,Mobile,Suburban,Clothing,445.0,332,215.77,0,0,4.0,0,0
3,4,2294,,Online Purchase,2023-01-01 03:00:00,Desktop,Urban,Electronics,348.0,231,180.2,0,0,,0,0
4,5,2130,194.96,Online Purchase,2023-01-01 04:00:00,Mobile,Urban,Clothing,,247,57.9,0,0,,0,0
