#**Fraud Transaction Detection**

#**1. Importing Libraries**

###1. numpy: A library for efficient numerical computation.
###2. pandas: A library for data manipulation and analysis.
###3. seaborn: A visualization library based on matplotlib.
###4. matplotlib.pyplot: A plotting library.
###5. RandomOverSampler: A class for handling imbalanced data by oversampling the minority class.
###6. StandardScaler: A scaler that scales features to a common range.
###7. RandomForestClassifier: A classifier that combines multiple decision trees to make predictions.

In [1]:
# Import Libraries
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, classification_report, confusion_matrix
from sklearn.preprocessing import StandardScaler
from imblearn.over_sampling import RandomOverSampler

#**2.Data Loading and Preprocessing**

###1. Loading the Dataset: Loads the credit card transaction dataset from a CSV file.

In [3]:
# Load the dataset
df = pd.read_csv('/content/creditcard.csv')

###2. Data Shape: Displays the shape of the loaded data.

In [4]:
# Explore the dataset
print(df.shape)

(1986, 31)


###3. Data Information: Displays information about the data, including data types and missing values.

In [5]:
print(df.columns)

Index(['Time', 'V1', 'V2', 'V3', 'V4', 'V5', 'V6', 'V7', 'V8', 'V9', 'V10',
       'V11', 'V12', 'V13', 'V14', 'V15', 'V16', 'V17', 'V18', 'V19', 'V20',
       'V21', 'V22', 'V23', 'V24', 'V25', 'V26', 'V27', 'V28', 'Amount',
       'Class'],
      dtype='object')


In [8]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1986 entries, 0 to 1985
Data columns (total 31 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   Time    1986 non-null   int64  
 1   V1      1986 non-null   float64
 2   V2      1986 non-null   float64
 3   V3      1986 non-null   float64
 4   V4      1986 non-null   float64
 5   V5      1986 non-null   float64
 6   V6      1986 non-null   float64
 7   V7      1986 non-null   float64
 8   V8      1986 non-null   float64
 9   V9      1986 non-null   float64
 10  V10     1986 non-null   float64
 11  V11     1986 non-null   float64
 12  V12     1986 non-null   float64
 13  V13     1986 non-null   float64
 14  V14     1985 non-null   float64
 15  V15     1985 non-null   float64
 16  V16     1985 non-null   float64
 17  V17     1985 non-null   float64
 18  V18     1985 non-null   float64
 19  V19     1985 non-null   float64
 20  V20     1985 non-null   float64
 21  V21     1985 non-null   float64
 22  

###4. Data Head and Tail: Displays the first and last few rows of the data.

In [7]:
df.head()

Unnamed: 0,Time,V1,V2,V3,V4,V5,V6,V7,V8,V9,...,V21,V22,V23,V24,V25,V26,V27,V28,Amount,Class
0,0,-1.359807,-0.072781,2.536347,1.378155,-0.338321,0.462388,0.239599,0.098698,0.363787,...,-0.018307,0.277838,-0.110474,0.066928,0.128539,-0.189115,0.133558,-0.021053,149.62,0.0
1,0,1.191857,0.266151,0.16648,0.448154,0.060018,-0.082361,-0.078803,0.085102,-0.255425,...,-0.225775,-0.638672,0.101288,-0.339846,0.16717,0.125895,-0.008983,0.014724,2.69,0.0
2,1,-1.358354,-1.340163,1.773209,0.37978,-0.503198,1.800499,0.791461,0.247676,-1.514654,...,0.247998,0.771679,0.909412,-0.689281,-0.327642,-0.139097,-0.055353,-0.059752,378.66,0.0
3,1,-0.966272,-0.185226,1.792993,-0.863291,-0.010309,1.247203,0.237609,0.377436,-1.387024,...,-0.1083,0.005274,-0.190321,-1.175575,0.647376,-0.221929,0.062723,0.061458,123.5,0.0
4,2,-1.158233,0.877737,1.548718,0.403034,-0.407193,0.095921,0.592941,-0.270533,0.817739,...,-0.009431,0.798278,-0.137458,0.141267,-0.20601,0.502292,0.219422,0.215153,69.99,0.0


In [9]:
df.tail()

Unnamed: 0,Time,V1,V2,V3,V4,V5,V6,V7,V8,V9,...,V21,V22,V23,V24,V25,V26,V27,V28,Amount,Class
1981,1524,-0.340622,1.132232,1.291494,0.062313,0.016387,-0.97707,0.723755,-0.07463,-0.396655,...,-0.262948,-0.688785,-0.010937,0.334061,-0.160025,0.071779,0.245128,0.098336,5.35,0.0
1982,1525,-1.842696,1.740641,0.861526,-0.856315,-0.655376,-0.842786,0.198563,0.602764,0.455595,...,-0.213609,-0.400617,0.030013,0.512611,-0.077087,0.286218,0.586012,0.35261,1.0,0.0
1983,1525,-0.480693,0.646091,1.577264,-0.084411,-0.305958,-0.534739,0.860346,-0.028569,-0.800705,...,0.121681,0.17519,0.035986,0.557665,-0.112301,0.337154,-0.015602,0.051504,80.7,0.0
1984,1525,-0.342132,1.091125,1.282729,0.068076,-0.022498,-0.996727,0.676304,-0.04225,-0.312036,...,-0.26985,-0.734148,-0.007354,0.319161,-0.179146,0.073683,0.241932,0.097139,3.59,0.0
1985,1526,-0.854343,1.382948,1.278665,2.914727,-0.183139,-0.349329,0.274566,0.435277,-1.576521,...,,,,,,,,,,


###5. Data Null Values: Checks for null values in the data.

In [10]:
df.isnull()

Unnamed: 0,Time,V1,V2,V3,V4,V5,V6,V7,V8,V9,...,V21,V22,V23,V24,V25,V26,V27,V28,Amount,Class
0,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
1,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
2,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
3,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
4,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1981,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
1982,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
1983,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
1984,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False


###6. Data Null Values Sum: Calculates the total number of null values in the data.

In [11]:
df.isnull().sum()

Time      0
V1        0
V2        0
V3        0
V4        0
V5        0
V6        0
V7        0
V8        0
V9        0
V10       0
V11       0
V12       0
V13       0
V14       1
V15       1
V16       1
V17       1
V18       1
V19       1
V20       1
V21       1
V22       1
V23       1
V24       1
V25       1
V26       1
V27       1
V28       1
Amount    1
Class     1
dtype: int64

In [12]:
df.isnull().sum().sum()

17

###7. Data Description: Displays summary statistics for the data.

In [13]:
df.describe()

Unnamed: 0,Time,V1,V2,V3,V4,V5,V6,V7,V8,V9,...,V21,V22,V23,V24,V25,V26,V27,V28,Amount,Class
count,1986.0,1986.0,1986.0,1986.0,1986.0,1986.0,1986.0,1986.0,1986.0,1986.0,...,1985.0,1985.0,1985.0,1985.0,1985.0,1985.0,1985.0,1985.0,1985.0,1985.0
mean,761.03575,-0.284195,0.266886,0.848005,0.151216,-0.077457,0.050205,0.138347,-0.058795,0.012145,...,-0.011611,-0.144319,-0.043045,0.013864,0.108372,0.049408,0.027197,-0.002018,68.602469,0.001008
std,451.034025,1.353508,1.142026,1.012645,1.264932,1.272512,1.274204,1.14075,0.966493,0.900828,...,0.6532,0.588201,0.35289,0.60137,0.407874,0.454251,0.369485,0.272864,241.677019,0.031734
min,0.0,-11.140706,-12.114213,-12.389545,-4.657545,-32.092129,-3.498447,-4.925568,-12.258158,-3.110515,...,-4.709977,-2.776923,-4.0203,-2.162523,-1.577384,-1.243924,-5.336289,-2.738566,0.0,0.0
25%,366.0,-1.045512,-0.204111,0.280517,-0.670513,-0.576269,-0.691393,-0.286991,-0.172322,-0.47931,...,-0.226941,-0.547474,-0.181176,-0.350802,-0.151028,-0.281097,-0.049467,-0.021053,4.95,0.0
50%,750.0,-0.437621,0.314294,0.864505,0.190698,-0.154843,-0.198063,0.117535,0.037598,-0.034097,...,-0.087329,-0.152603,-0.057041,0.093137,0.131713,0.036992,0.023011,0.022722,15.09,0.0
75%,1161.0,1.095047,0.926126,1.486942,1.002546,0.376901,0.389714,0.569262,0.279513,0.449706,...,0.08353,0.252698,0.064859,0.428755,0.383339,0.303731,0.140481,0.09092,63.65,0.0
max,1526.0,1.685314,6.11894,4.017561,6.013346,7.672544,21.393069,34.303177,3.877662,6.450992,...,6.765928,1.957759,4.095021,1.215279,1.629684,3.463246,3.852046,4.157934,7712.43,1.0


In [14]:
df.describe().transpose()

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
Time,1986.0,761.03575,451.034025,0.0,366.0,750.0,1161.0,1526.0
V1,1986.0,-0.284195,1.353508,-11.140706,-1.045512,-0.437621,1.095047,1.685314
V2,1986.0,0.266886,1.142026,-12.114213,-0.204111,0.314294,0.926126,6.11894
V3,1986.0,0.848005,1.012645,-12.389545,0.280517,0.864505,1.486942,4.017561
V4,1986.0,0.151216,1.264932,-4.657545,-0.670513,0.190698,1.002546,6.013346
V5,1986.0,-0.077457,1.272512,-32.092129,-0.576269,-0.154843,0.376901,7.672544
V6,1986.0,0.050205,1.274204,-3.498447,-0.691393,-0.198063,0.389714,21.393069
V7,1986.0,0.138347,1.14075,-4.925568,-0.286991,0.117535,0.569262,34.303177
V8,1986.0,-0.058795,0.966493,-12.258158,-0.172322,0.037598,0.279513,3.877662
V9,1986.0,0.012145,0.900828,-3.110515,-0.47931,-0.034097,0.449706,6.450992


###8. Data Handling Missing Values: Handles missing values by filling them with the mean of the column.

In [15]:
# Handle missing values by filling them with the mean of the column
df.fillna(df.mean(), inplace=True)

In [16]:
df.isnull().sum().sum()

0

#**3. Model Training and Evaluation**

In [17]:
# Ensure the 'Class' column is of integer type
df['Class'] = df['Class'].astype(int)

In [18]:
# Split the data into features (X) and target (y)
X = df.drop(['Class'], axis=1)  # features
y = df['Class']  # target variable (fraudulent or not)

In [19]:
# Handle Imbalanced Data
ros = RandomOverSampler(random_state=42)
X_resampled, y_resampled = ros.fit_resample(X, y)

###1. Splitting Data: Splits the data into training and testing sets.

In [21]:
## 8. Split Data into Training and Testing Sets
X_train, X_test, y_train, y_test = train_test_split(X_resampled, y_resampled, test_size=0.2, random_state=42)

In [22]:
# Standardize the features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

In [23]:
# Create a Random Forest Classifier model
rfc = RandomForestClassifier(n_estimators=100, random_state=42)

###2. Training Model: Trains a Random Forest Classifier model on the training data.

In [24]:
# Train the model on the training data
rfc.fit(X_train_scaled, y_train)

###3. Making Predictions: Makes predictions on the testing data.

In [25]:
# Make predictions on the testing data
y_pred = rfc.predict(X_test_scaled)

###4. Model Evaluation Metrics: Calculates the accuracy, precision, recall, and F1-score for the model.
###5. Confusion Matrix: Displays the confusion matrix for the model.

In [26]:
# Evaluate the model
print('*'*50)
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.2f}")
print('*'*50)
precision = precision_score(y_test, y_pred)
print(f"Precision: {precision:.2f}")
print('*'*50)
recall = recall_score(y_test, y_pred)
print(f"Recall: {recall:.2f}")
print('*'*50)
f1 = f1_score(y_test, y_pred)
print(f"F1-score: {f1:.2f}")
print('*'*50)
print("Confusion Matrix:")
print(confusion_matrix(y_test, y_pred))
print('*'*50)

**************************************************
Accuracy: 1.00
**************************************************
Precision: 1.00
**************************************************
Recall: 1.00
**************************************************
F1-score: 1.00
**************************************************
Confusion Matrix:
[[420   0]
 [  0 374]]
**************************************************


In [27]:
# Perform Cross-Validation
scores = cross_val_score(rfc, X_resampled, y_resampled, cv=5)
print(f"Cross-Validation Accuracy: {scores.mean():.2f} ± {scores.std():.2f}")

Cross-Validation Accuracy: 1.00 ± 0.00


#**4. Using the Model to Detect Fraudulent Transactions**

###1. Function to Detect Fraud: Defines a function to detect fraudulent transactions given input data.

In [28]:
# Use the model to detect fraudulent transactions
def detect_fraud(transaction_data):
    # Preprocess the transaction data (e.g., scale/normalize)
    transaction_data = pd.DataFrame(transaction_data, columns=X.columns)
    transaction_data_scaled = scaler.transform(transaction_data)
    # Make predictions
    predictions = rfc.predict(transaction_data_scaled)
    return ["Yes" if pred else "No" for pred in predictions.tolist()]

###2. Example Usage: Uses the function to detect fraudulent transactions for the entire dataset.

In [29]:
print(detect_fraud(df))

['No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No

###3. Multiple Examples: Detects fraudulent transactions for multiple sets of data.

In [30]:
print("Is there fraud? " + detect_fraud(df)[0])

Is there fraud? No


In [31]:
print("Is there fraud? " + detect_fraud(df.head())[0])

Is there fraud? No


In [32]:
print("Is there fraud? " + detect_fraud(df.tail())[0])

Is there fraud? No


#**5. Summary**

###This code demonstrates the steps involved in building a Random Forest Classifier model to detect fraudulent transactions in a credit card dataset. It includes data loading and preprocessing, model training and evaluation, and using the model to detect fraudulent transactions. The model is trained on the training data and evaluated using various metrics such as accuracy, precision, recall, and F1-score. The model is then used to detect fraudulent transactions in the testing data and in multiple sets of data.