# Credit Card Fraud Detection with Random Forest Classifier: A Machine Learning Approach

We'll be building a predictive model using the Random Forest Classifier algorithm to determine if an individual will make timely payments (with Source code)

## So let’s start

The first step is we need to download the dataset and then apply the dataset to the model. You can download or copy data from the [URL](https://raw.githubusercontent.com/aviralb13/git-codes/main/datas/credit.csv) 

# Importing the libraries

To use the pandas and NumPy libraries in our code, we will import them as shown below. In case these libraries are not installed in your system, you can obtain them by running the pip command..

In [1]:
import pandas as pd
import numpy as np

# Data preparation

We'll be using Pandas to read the data and store it in a variable named "data" to avoid repeated calls. By using the "head" command, we can view the first 5 records of the data, and if we want to view more records, we can specify the number inside the parentheses.

In [2]:
URL = 'https://raw.githubusercontent.com/aviralb13/git-codes/main/datas/credit.csv'
data=pd.read_csv(URL)
data.head()


Unnamed: 0,Gender,Age,Debt,Married,BankCustomer,Industry,Ethnicity,YearsEmployed,PriorDefault,Employed,CreditScore,DriversLicense,Citizen,ZipCode,Income,Approved
0,1,30.83,0.0,1,1,Industrials,White,1.25,1,1,1,0,ByBirth,202,0,1
1,0,58.67,4.46,1,1,Materials,Black,3.04,1,1,6,0,ByBirth,43,560,1
2,0,24.5,0.5,1,1,Materials,Black,1.5,1,0,0,0,ByBirth,280,824,1
3,1,27.83,1.54,1,1,Industrials,White,3.75,1,1,5,1,ByBirth,100,3,1
4,1,20.17,5.625,1,1,Industrials,White,1.71,1,0,0,0,ByOtherMeans,120,0,1


# Defining X and Y

I have compiled a list of factors that I believe will be instrumental in predicting the risk of fraud, such as gender, age, and debt, among others. I have assigned this list to a variable named "features." Next, I will use these features to create two new variables, "x" and "y," where "x" will store the dataset with the selected features, and "y" will store the diagnosis results.

In [3]:
features = ['Gender', 'Age', 'Debt', 'Married', 'BankCustomer', 'YearsEmployed', 'PriorDefault', 'Employed', 'CreditScore','DriversLicense',  'ZipCode', 'Income']
x=data[features]
y = data['Approved']


I believe that these x parameters are more appropriate, and if you want to modify the parameters because you believe they are relevant, you can do so.

# Splitting the Dataset

Before we can split our model dataset into a training and testing set, we need to import the "train_test_split" function from the scikit-learn model selection module.

In [4]:
from sklearn.model_selection import train_test_split
train_x,test_x,train_y,test_y = train_test_split(x,y)

# Model

Since we need to classify our values as either "yes" or "no," we'll be using a Random Forest Classifier. To do this, we'll import the "RandomForestClassifier" from the "sklearn.ensemble" module, and then we'll fit our training data into the model to train it.

In [5]:
from sklearn.ensemble import RandomForestClassifier
regressor_model = RandomForestClassifier(n_estimators = 10, random_state = 0)
regressor_model.fit(train_x,train_y) 


And finally, our model is ready now we are all set to predict from our model

# Accuracy

Machine learning accuracy refers to the degree of correctness or precision with which a machine learning model predicts the outcome of unseen data.

In [6]:
from sklearn.metrics import accuracy_score
predictions = regressor_model.predict(test_x)
accuracy_score(test_y, predictions)

0.9075144508670521

Now, we will see how to get our model accuracy here our model is 90% accurate which means it has guessed 90 values correct out of 100 which is a very good accuracy.

# [Source code](https://github.com/arsalkhan75/Github)