<a href="https://www.kaggle.com/code/ishkag26/project-1-credit-card-fraud-detection?scriptVersionId=142460652" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

**CREDIT CARD FRAUD DETECTION -**

> **Credit card fraud is the act of using another person’s credit card to make purchases or request cash advances without the cardholder’s knowledge or consent.**
 
> **Credit card fraud detection is the collective term for the policies, tools, methodologies, and practices that credit card companies and financial institutions take to combat identity fraud and stop fraudulent transactions.**

> **In recent years, as the amount of data has exploded and the number of payment card transactions has skyrocketed, credit fraud detection has become largely digitized and automated. Most modern solutions leverage artificial intelligence (AI) and machine learning (ML) to manage data analysis, predictive modeling, decision-making, fraud alerts and remediation activity that occur when individual instances of credit card fraud are detected.**

**IMPORTING THE LIBRARIES**

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split  # train_test_split function allows us to split into training and test data
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
import seaborn as sns  # for visualization 


**LOADING THE DATASET**

In [None]:
credit_card_data = pd.read_csv("/kaggle/input/credit-card-fraud-detection/creditcard.csv")
credit_card_data

**EXPLORING ROWS AND COLUMNS**

In [None]:
credit_card_data.info()

In [None]:
credit_card_data.describe().style.background_gradient()

In [None]:
credit_card_data.head()  # first 5 rows of dataset

In [None]:
credit_card_data.tail()   # last 5 rows of dataset

In [None]:
type(credit_card_data)   # type of the dataset

In [None]:
type(credit_card_data["Amount"])    # showing series as only one column is there

In [None]:
credit_card_data.index

In [None]:
credit_card_data.values

**DATA CLEANING**

In [None]:
credit_card_data.isnull()    # to check whether NULL values are there or not

In [None]:
credit_card_data.isnull().sum()   # check missing value by variable

In [None]:
credit_card_data.isnull().sum().sum()     # returns the number of missing value in dataset

**> Hence, conclusion comes that the dataset contains no NULL value.**

**MATHEMATICAL OPERATIONS**

In [None]:
credit_card_data["Amount"].sum()

In [None]:
credit_card_data["V5"].mean()

In [None]:
credit_card_data["V5"].median()

In [None]:
credit_card_data["Amount"].std

**EXPLORING THE DATASET**

In [None]:
# distribution of legit and fraudulent transaction 
# normal transaction - 0
# fraudulent transaction - 1

credit_card_data["Class"].value_counts()

**Separate data for analysis**

In [None]:
legit = credit_card_data[credit_card_data.Class == 0]
fraud = credit_card_data[credit_card_data.Class == 1]

print(legit.shape)
print(fraud.shape)

**Stastistical measures for our data**

In [None]:
legit.Amount.describe()

In [None]:
fraud.Amount.describe()

**Compare values for both transactions**

In [None]:
credit_card_data.groupby("Class").mean()

Under-Sampling - Building a sample dataset 

In [None]:
legit_sample = legit.sample(n=492)  # take random data points

In [None]:
# concatenating two DataFrames

new_dataset = pd.concat([legit_sample,fraud],axis = 0)

In [None]:
new_dataset.head()

In [None]:
new_dataset["Class"].value_counts()

In [None]:
new_dataset.groupby("Class").mean()     

In [None]:
X = new_dataset.drop(columns = "Class",axis = 1)
Y = new_dataset["Class"]

In [None]:
print(X)

In [None]:
print(Y)

In [None]:
# Split the data into training and test data

X_train, X_test, Y_train, Y_test = train_test_split(X,Y,test_size = 0.2,stratify = Y,random_state = 2)

In [None]:
print(X.shape,X_train.shape,X_test.shape)

**Model Training**

In [None]:
# Logistic Regression 

datamodel = LogisticRegression()

In [None]:
# training the logistic regression model with Training data

datamodel.fit(X_train,Y_train)

**Model Evaluation**

In [None]:
# accuracy on training data

X_train_prediction = datamodel.predict(X_train)
training_data_accuracy = accuracy_score(X_train_prediction, Y_train)
print("Accuracy : ",training_data_accuracy)

In [None]:
# accuracy on testing data

X_test_prediction = datamodel.predict(X_test)
testing_data_accuracy = accuracy_score(X_test_prediction, Y_test)
print("Accuracy : ",testing_data_accuracy)

In [None]:
# Updating values 

credit_card_data.loc["284802","V2"] = "280000"
credit_card_data

In [None]:
# removing rows having null values

credit_card_data.dropna()

In [None]:
# deleting column 

del credit_card_data["V5"]
credit_card_data

In [None]:
# sorting data by default in ascending order

credit_card_data.sort_values("V4")

In [None]:
# finding data between (1,100)

mask = credit_card_data["V4"].between(1,100)
mask
credit_card_data[mask]

In [None]:
credit_card_data["V4"].duplicated()    # to find if duplicate element exist or not

In [None]:
credit_card_data["V4"].nunique()    # checking for unique elements in V5 column

In [None]:
credit_card_data.iloc[0]      # accessing value using numeric index

In [None]:
credit_card_data.iloc[[0,100,1000]]    # shows dataframe as multiple values

In [None]:
credit_card_data.sample(n = 4,axis = 1)   # number of random columns

In [None]:
credit_card_data.sample(frac = 0.3)    # we want 30 percent of the original data

In [None]:
credit_card_data.nsmallest(1,"V4")

In [None]:
credit_card_data.index.get_level_values(0)   # fetching index

In [None]:
credit_card_data["V6"] = credit_card_data["V6"].mul(2)
credit_card_data

In [None]:
# Adding new column 

credit_card_data["Payment"] = "1000"
credit_card_data

In [None]:
# inserting new column at index 4 having value as "online"

credit_card_data.insert(4,"Mode_","online")
credit_card_data

**DATA VISUALIZATION**

In [None]:
def countplot_data(data,feature):
    plt.figure(figsize = (5,5))
    sns.countplot(x = feature, data = data,palette = "Set1")   # countplot is used to display the count of categorical observations 
    
    plt.title("Transaction Class Distribution")
    plt.show()
def paiplot_data_grid(data,feature1,feature2,target):
    sns.FaceGrid(data,hue = target,height = 6).map(plt.scatter,feature1, feature2).add_legend()
    plt.xticks(range(2),LABELS)
    plt.legend()
    plt.show()
countplot_data(credit_card_data,credit_card_data.Class)

**> Analysis from count plot shows that legit transaction count is extensively high as compared to the fraud transaction.**

In [None]:
print((credit_card_data.groupby("Class")["Class"].count()/credit_card_data["Class"].count())*100)
((credit_card_data.groupby("Class")["Class"].count()/credit_card_data["Class"].count())*100).plot.pie()

**Analysis from pie chart shows that legit transaction count is extensively high as compared to the fraud transaction.**

In [None]:
fig = plt.figure(figsize = (6,5))
plt.title("Credit card transaction correlation plot")
corr = credit_card_data.corr()
sns.heatmap(credit_card_data.corr(),cmap = "Purples")
plt.show()

In [None]:
from pandas.plotting import scatter_matrix
attributes = ["Amount","Class","V4","V7"]
scatter_matrix(credit_card_data[attributes],figsize = (7,7))

In [None]:
from mpl_toolkits.mplot3d import Axes3D
x = credit_card_data["Time"]
y = credit_card_data["Class"]
z = credit_card_data["V4"]
f = plt.figure()
credit_card_data = f.add_subplot(111,projection = "3d")
credit_card_data.scatter(x,y,z,color = "in")
credit_card_data.set_xlabel("Amount")
credit_card_data.set_ylabel("Class")
credit_card_data.set_zlabel("V4")
plt.show()

**Summary**

**> So this is how Python Programming language can be used to detect the fraud transactions taking place through credit card.**