# **CodSoft | Data Science | Internship | Task 05 | CREDIT CARD FRAUD DETECTION**

---

## **Problem Statement | Task 05 | CREDIT CARD FRAUD DETECTION**

 - Build a machine learning model to identify fraudulent credit card transactions.
 - Preprocess and normalize the transaction data, handle class
 imbalance issues, and split the dataset into training and testing sets.
 - Train a classification algorithm, such as logistic regression or randomforests, to classify transactions as fraudulent or genuine.
 - Evaluate the model's performance using metrics like precision, recall, and F1-score, and consider techniques like oversampling or
 undersampling for improving results.

### **Importing Necessary Libraries**

In [None]:
# Importing Necessary Libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, confusion_matrix


: 

### **Load the Credit Card Fraud Detection dataset**

In [None]:
# Load the Credit Card Fraud Detection dataset
cfd = pd.read_csv("creditcard.csv")

: 

In [None]:
cfd.head()

: 

### **EDA (Exploratory Data Analysis)**
#### **Display basic information about the dataset**

In [None]:
print("Dataset Information: \n")
print(cfd.info())

: 

In [None]:
print("Shape of Dataset", cfd.shape)
print("No: of Rows in Dataset:", cfd.shape[0])
print("No: of Columns in Dataset:", cfd.shape[1])

: 

In [None]:
print("\nStatistical Summary:\n")
cfd.describe()

: 

### **Missing Values**

In [None]:
# Check for missing values
print("\nMissing Values:")
print(cfd.isnull().sum())

: 

In [None]:
# Missing Value - Heatmap
sns.heatmap(cfd.isnull(),yticklabels = False, cbar = False,cmap = 'tab20c_r')
plt.title('Missing Data: Training Set')
plt.show()

: 

In [None]:
plt.figure(figsize=(8,8))
sns.scatterplot(x="Time", y="Amount", hue="Class", data=cfd)
plt.show()

: 

### **Now let's plot the correlation matrix of our data with a heatmap.**

In [None]:
# Now let's plot the correlation matrix of our data with a heatmap.
plt.subplots(figsize=(14, 10))
sns.heatmap(cfd.corr(), cmap = "YlGnBu", annot=True, fmt=".2f")
plt.show()

: 

### **Data Preprocessing**

In [None]:
# Normalize and scale features (assuming the features are numerical)
scaler = StandardScaler()
cfd[['Amount', 'Time']] = scaler.fit_transform(cfd[['Amount', 'Time']])

: 

In [None]:
cfd.head()

: 

### **Split the data into features (X) and target variable (y)**

In [None]:
# Split the data into features (X) and target variable (y)
X = cfd.drop('Class', axis=1)
y = cfd['Class']

: 

In [None]:
# Split the data into training and testing sets (80% training, 20% testing)
X_train, X_test, y_train, y_test = train_test_split(X, y, 
                                                    test_size=0.2, random_state=42)

: 

In [None]:
print("Shape of X_train:" ,X_train.shape)
print("Shape of y_train:" ,y_train.shape)
print("Shape of X_test:" ,X_test.shape)
print("Shape of y_test:" ,y_test.shape)

: 

### **Train a Logistic Regression**

In [None]:
# Train a Logistic Regression
model = LogisticRegression(random_state=42)
model.fit(X_train, y_train)

: 

### **Make predictions**

In [None]:
# Make predictions
y_pred = model.predict(X_test)

: 

In [None]:
print(y_test, y_pred)

: 

### **Evaluate the model's performance**

In [None]:
# Evaluate the model's performance
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)

: 

In [None]:
precision = precision_score(y_test, y_pred)
print("Precision:", precision)

: 

In [None]:
recall = recall_score(y_test, y_pred)
print("Recall:", recall)

: 

In [None]:
f1 = f1_score(y_test, y_pred)
print("F1 Score:", f1)

: 

In [None]:
conf_matrix = confusion_matrix(y_test, y_pred)
print("Confusion Matrix:\n", conf_matrix)

: 

In [None]:
print("\nClassification Report:\n", 
      classification_report(y_test, y_pred))

: 