# **Credit Card Fraud Detection**

**Abstract:**


Credit card fraud is a significant concern in financial transactions, posing substantial risks to both consumers and financial institutions. With the increasing sophistication of fraudulent activities, there is a pressing need for robust detection mechanisms. In this study, we explore the application of logistic regression, a classical statistical technique, for credit card fraud detection. Leveraging transactional data and features indicative of fraudulent behavior, logistic regression models offer a promising approach to identifying fraudulent transactions efficiently and accurately. Through this research, we aim to contribute to the ongoing efforts in enhancing fraud detection systems, thereby safeguarding the integrity of financial transactions and bolstering consumer trust in electronic payment systems.

**Introduction:**


Credit card fraud has emerged as a formidable challenge in the realm of electronic transactions, posing substantial financial losses and reputational damage to businesses and consumers alike. As technology advances, so do the methods employed by fraudsters, necessitating continuous innovation in fraud detection techniques. Traditional approaches, while effective to some extent, often struggle to keep pace with the evolving nature of fraudulent activities.

In recent years, machine learning algorithms have gained traction as powerful tools for detecting fraudulent transactions. Logistic regression, a fundamental technique in the realm of supervised learning, offers a computationally efficient and interpretable solution for binary classification problems, making it particularly well-suited for fraud detection tasks.

In this study, we delve into the application of logistic regression models for credit card fraud detection. By leveraging historical transactional data and engineered features indicative of fraudulent behavior, logistic regression models can learn to discern between legitimate and fraudulent transactions. The simplicity and interpretability of logistic regression models render them accessible to practitioners and analysts, facilitating the integration of fraud detection mechanisms into existing financial systems.

Through empirical evaluation and analysis, we seek to elucidate the efficacy and performance characteristics of logistic regression models in identifying fraudulent transactions. By shedding light on the strengths and limitations of logistic regression-based approaches, this research endeavors to inform the development of more robust and resilient fraud detection systems. Ultimately, the goal is to bolster consumer confidence in electronic payment systems and mitigate the adverse impacts of credit card fraud on businesses and financial institutions.

**Step 1**: Import the dependancies

In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import plotly.express as px


**Step 2:** Load the Dataset to Pandas Dataframe

In [3]:
df =pd.read_csv("creditcard.csv")

**Step 3:** Print the first 5 rows of the Dataset

In [4]:
df.head(5)

Unnamed: 0,Time,V1,V2,V3,V4,V5,V6,V7,V8,V9,...,V21,V22,V23,V24,V25,V26,V27,V28,Amount,Class
0,0.0,-1.359807,-0.072781,2.536347,1.378155,-0.338321,0.462388,0.239599,0.098698,0.363787,...,-0.018307,0.277838,-0.110474,0.066928,0.128539,-0.189115,0.133558,-0.021053,149.62,0
1,0.0,1.191857,0.266151,0.16648,0.448154,0.060018,-0.082361,-0.078803,0.085102,-0.255425,...,-0.225775,-0.638672,0.101288,-0.339846,0.16717,0.125895,-0.008983,0.014724,2.69,0
2,1.0,-1.358354,-1.340163,1.773209,0.37978,-0.503198,1.800499,0.791461,0.247676,-1.514654,...,0.247998,0.771679,0.909412,-0.689281,-0.327642,-0.139097,-0.055353,-0.059752,378.66,0
3,1.0,-0.966272,-0.185226,1.792993,-0.863291,-0.010309,1.247203,0.237609,0.377436,-1.387024,...,-0.1083,0.005274,-0.190321,-1.175575,0.647376,-0.221929,0.062723,0.061458,123.5,0
4,2.0,-1.158233,0.877737,1.548718,0.403034,-0.407193,0.095921,0.592941,-0.270533,0.817739,...,-0.009431,0.798278,-0.137458,0.141267,-0.20601,0.502292,0.219422,0.215153,69.99,0


**Step 4:** Print the last 5 rows of the Dataset

In [5]:
df.tail(5)

Unnamed: 0,Time,V1,V2,V3,V4,V5,V6,V7,V8,V9,...,V21,V22,V23,V24,V25,V26,V27,V28,Amount,Class
284802,172786.0,-11.881118,10.071785,-9.834783,-2.066656,-5.364473,-2.606837,-4.918215,7.305334,1.914428,...,0.213454,0.111864,1.01448,-0.509348,1.436807,0.250034,0.943651,0.823731,0.77,0
284803,172787.0,-0.732789,-0.05508,2.03503,-0.738589,0.868229,1.058415,0.02433,0.294869,0.5848,...,0.214205,0.924384,0.012463,-1.016226,-0.606624,-0.395255,0.068472,-0.053527,24.79,0
284804,172788.0,1.919565,-0.301254,-3.24964,-0.557828,2.630515,3.03126,-0.296827,0.708417,0.432454,...,0.232045,0.578229,-0.037501,0.640134,0.265745,-0.087371,0.004455,-0.026561,67.88,0
284805,172788.0,-0.24044,0.530483,0.70251,0.689799,-0.377961,0.623708,-0.68618,0.679145,0.392087,...,0.265245,0.800049,-0.163298,0.123205,-0.569159,0.546668,0.108821,0.104533,10.0,0
284806,172792.0,-0.533413,-0.189733,0.703337,-0.506271,-0.012546,-0.649617,1.577006,-0.41465,0.48618,...,0.261057,0.643078,0.376777,0.008797,-0.473649,-0.818267,-0.002415,0.013649,217.0,0


**Step 5**: Print the dataset information

In [6]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 284807 entries, 0 to 284806
Data columns (total 31 columns):
 #   Column  Non-Null Count   Dtype  
---  ------  --------------   -----  
 0   Time    284807 non-null  float64
 1   V1      284807 non-null  float64
 2   V2      284807 non-null  float64
 3   V3      284807 non-null  float64
 4   V4      284807 non-null  float64
 5   V5      284807 non-null  float64
 6   V6      284807 non-null  float64
 7   V7      284807 non-null  float64
 8   V8      284807 non-null  float64
 9   V9      284807 non-null  float64
 10  V10     284807 non-null  float64
 11  V11     284807 non-null  float64
 12  V12     284807 non-null  float64
 13  V13     284807 non-null  float64
 14  V14     284807 non-null  float64
 15  V15     284807 non-null  float64
 16  V16     284807 non-null  float64
 17  V17     284807 non-null  float64
 18  V18     284807 non-null  float64
 19  V19     284807 non-null  float64
 20  V20     284807 non-null  float64
 21  V21     28

**Step 6:** Check the number of missing values in each column

In [7]:
df.isnull().sum()

Time      0
V1        0
V2        0
V3        0
V4        0
V5        0
V6        0
V7        0
V8        0
V9        0
V10       0
V11       0
V12       0
V13       0
V14       0
V15       0
V16       0
V17       0
V18       0
V19       0
V20       0
V21       0
V22       0
V23       0
V24       0
V25       0
V26       0
V27       0
V28       0
Amount    0
Class     0
dtype: int64

**Step 7:** In the training dataset, how many real and fake transactions are there?

In [8]:
px.histogram(data_frame=df,x="Class",color="Class")

In [9]:
df.Class.value_counts()

Class
0    284315
1       492
Name: count, dtype: int64

If you observe the data, there are too many legit transactions (284315) and only a few fraudulent transactions (492) This is a highly biased dataset. Hence you must seperate the true and false transactions.

**Step 8:** Seperate the real transactions into (legit variable) and the false into (fraud variable)

In [10]:
legit = df[df["Class"]==0]
fraud = df[df["Class"]==1]

In [11]:
legit

Unnamed: 0,Time,V1,V2,V3,V4,V5,V6,V7,V8,V9,...,V21,V22,V23,V24,V25,V26,V27,V28,Amount,Class
0,0.0,-1.359807,-0.072781,2.536347,1.378155,-0.338321,0.462388,0.239599,0.098698,0.363787,...,-0.018307,0.277838,-0.110474,0.066928,0.128539,-0.189115,0.133558,-0.021053,149.62,0
1,0.0,1.191857,0.266151,0.166480,0.448154,0.060018,-0.082361,-0.078803,0.085102,-0.255425,...,-0.225775,-0.638672,0.101288,-0.339846,0.167170,0.125895,-0.008983,0.014724,2.69,0
2,1.0,-1.358354,-1.340163,1.773209,0.379780,-0.503198,1.800499,0.791461,0.247676,-1.514654,...,0.247998,0.771679,0.909412,-0.689281,-0.327642,-0.139097,-0.055353,-0.059752,378.66,0
3,1.0,-0.966272,-0.185226,1.792993,-0.863291,-0.010309,1.247203,0.237609,0.377436,-1.387024,...,-0.108300,0.005274,-0.190321,-1.175575,0.647376,-0.221929,0.062723,0.061458,123.50,0
4,2.0,-1.158233,0.877737,1.548718,0.403034,-0.407193,0.095921,0.592941,-0.270533,0.817739,...,-0.009431,0.798278,-0.137458,0.141267,-0.206010,0.502292,0.219422,0.215153,69.99,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
284802,172786.0,-11.881118,10.071785,-9.834783,-2.066656,-5.364473,-2.606837,-4.918215,7.305334,1.914428,...,0.213454,0.111864,1.014480,-0.509348,1.436807,0.250034,0.943651,0.823731,0.77,0
284803,172787.0,-0.732789,-0.055080,2.035030,-0.738589,0.868229,1.058415,0.024330,0.294869,0.584800,...,0.214205,0.924384,0.012463,-1.016226,-0.606624,-0.395255,0.068472,-0.053527,24.79,0
284804,172788.0,1.919565,-0.301254,-3.249640,-0.557828,2.630515,3.031260,-0.296827,0.708417,0.432454,...,0.232045,0.578229,-0.037501,0.640134,0.265745,-0.087371,0.004455,-0.026561,67.88,0
284805,172788.0,-0.240440,0.530483,0.702510,0.689799,-0.377961,0.623708,-0.686180,0.679145,0.392087,...,0.265245,0.800049,-0.163298,0.123205,-0.569159,0.546668,0.108821,0.104533,10.00,0


**Step 9:** Print the shape of the above 2 variables

In [12]:
print(legit.shape)
print(fraud.shape)

(284315, 31)
(492, 31)


**Step 10:** Find the statistical meausres of the false and true data (describe the data)

In [13]:
legit.describe()

Unnamed: 0,Time,V1,V2,V3,V4,V5,V6,V7,V8,V9,...,V21,V22,V23,V24,V25,V26,V27,V28,Amount,Class
count,284315.0,284315.0,284315.0,284315.0,284315.0,284315.0,284315.0,284315.0,284315.0,284315.0,...,284315.0,284315.0,284315.0,284315.0,284315.0,284315.0,284315.0,284315.0,284315.0,284315.0
mean,94838.202258,0.008258,-0.006271,0.012171,-0.00786,0.005453,0.002419,0.009637,-0.000987,0.004467,...,-0.001235,-2.4e-05,7e-05,0.000182,-7.2e-05,-8.9e-05,-0.000295,-0.000131,88.291022,0.0
std,47484.015786,1.929814,1.636146,1.459429,1.399333,1.356952,1.329913,1.178812,1.161283,1.089372,...,0.716743,0.723668,0.621541,0.605776,0.520673,0.482241,0.399847,0.32957,250.105092,0.0
min,0.0,-56.40751,-72.715728,-48.325589,-5.683171,-113.743307,-26.160506,-31.764946,-73.216718,-6.29073,...,-34.830382,-10.933144,-44.807735,-2.836627,-10.295397,-2.604551,-22.565679,-15.430084,0.0,0.0
25%,54230.0,-0.917544,-0.599473,-0.884541,-0.850077,-0.689398,-0.766847,-0.551442,-0.208633,-0.640412,...,-0.228509,-0.542403,-0.161702,-0.354425,-0.317145,-0.327074,-0.070852,-0.05295,5.65,0.0
50%,84711.0,0.020023,0.06407,0.182158,-0.022405,-0.053457,-0.273123,0.041138,0.022041,-0.049964,...,-0.029821,0.006736,-0.011147,0.041082,0.016417,-0.052227,0.00123,0.011199,22.0,0.0
75%,139333.0,1.316218,0.800446,1.028372,0.737624,0.612181,0.399619,0.571019,0.3262,0.59823,...,0.185626,0.528407,0.147522,0.439869,0.350594,0.240671,0.090573,0.077962,77.05,0.0
max,172792.0,2.45493,18.902453,9.382558,16.875344,34.801666,73.301626,120.589494,18.709255,15.594995,...,22.614889,10.50309,22.528412,4.584549,7.519589,3.517346,31.612198,33.847808,25691.16,0.0


In [14]:
print("statistical representation for legit")
print("mean",legit['Amount'].mean())
print("median",legit['Amount'].median())
print("standrad devation",legit['Amount'].std())


statistical representation for legit
mean 88.29102242231328
median 22.0
standrad devation 250.10509222589243


In [15]:
print("statistical representation for fraud")
print("mean",fraud['Amount'].mean())
print("median",fraud['Amount'].median())
print("standrad devation",fraud['Amount'].std())

statistical representation for fraud
mean 122.21132113821139
median 9.25
standrad devation 256.6832882977121


**Step 11:** Compare the values for the above 2 clases (legit and fraudulent transactions) -- use groupby function here

In [16]:
grouped_Data = df.groupby("Class")["Amount"].describe()
grouped_Data

Unnamed: 0_level_0,count,mean,std,min,25%,50%,75%,max
Class,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
0,284315.0,88.291022,250.105092,0.0,5.65,22.0,77.05,25691.16
1,492.0,122.211321,256.683288,0.0,1.0,9.25,105.89,2125.87


In [17]:
#for real
grouped_Data.loc[0]

count    284315.000000
mean         88.291022
std         250.105092
min           0.000000
25%           5.650000
50%          22.000000
75%          77.050000
max       25691.160000
Name: 0, dtype: float64

In [18]:
#for fraud
grouped_Data.loc[1]

count     492.000000
mean      122.211321
std       256.683288
min         0.000000
25%         1.000000
50%         9.250000
75%       105.890000
max      2125.870000
Name: 1, dtype: float64

**Step 12:** Build a sample dataset containing similar distribution of normal trasactions and fraudulent transactions ---> basically make a dataset wtih 492 of each type.

In [49]:
legit_sample = legit.sample(n=492)
fraud_sample = fraud


In [50]:
fraud_sample

Unnamed: 0,Time,V1,V2,V3,V4,V5,V6,V7,V8,V9,...,V21,V22,V23,V24,V25,V26,V27,V28,Amount,Class
541,406.0,-2.312227,1.951992,-1.609851,3.997906,-0.522188,-1.426545,-2.537387,1.391657,-2.770089,...,0.517232,-0.035049,-0.465211,0.320198,0.044519,0.177840,0.261145,-0.143276,0.00,1
623,472.0,-3.043541,-3.157307,1.088463,2.288644,1.359805,-1.064823,0.325574,-0.067794,-0.270953,...,0.661696,0.435477,1.375966,-0.293803,0.279798,-0.145362,-0.252773,0.035764,529.00,1
4920,4462.0,-2.303350,1.759247,-0.359745,2.330243,-0.821628,-0.075788,0.562320,-0.399147,-0.238253,...,-0.294166,-0.932391,0.172726,-0.087330,-0.156114,-0.542628,0.039566,-0.153029,239.93,1
6108,6986.0,-4.397974,1.358367,-2.592844,2.679787,-1.128131,-1.706536,-3.496197,-0.248778,-0.247768,...,0.573574,0.176968,-0.436207,-0.053502,0.252405,-0.657488,-0.827136,0.849573,59.00,1
6329,7519.0,1.234235,3.019740,-4.304597,4.732795,3.624201,-1.357746,1.713445,-0.496358,-1.282858,...,-0.379068,-0.704181,-0.656805,-1.632653,1.488901,0.566797,-0.010016,0.146793,1.00,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
279863,169142.0,-1.927883,1.125653,-4.518331,1.749293,-1.566487,-2.010494,-0.882850,0.697211,-2.064945,...,0.778584,-0.319189,0.639419,-0.294885,0.537503,0.788395,0.292680,0.147968,390.00,1
280143,169347.0,1.378559,1.289381,-5.004247,1.411850,0.442581,-1.326536,-1.413170,0.248525,-1.127396,...,0.370612,0.028234,-0.145640,-0.081049,0.521875,0.739467,0.389152,0.186637,0.76,1
280149,169351.0,-0.676143,1.126366,-2.213700,0.468308,-1.120541,-0.003346,-2.234739,1.210158,-0.652250,...,0.751826,0.834108,0.190944,0.032070,-0.739695,0.471111,0.385107,0.194361,77.89,1
281144,169966.0,-3.113832,0.585864,-5.399730,1.817092,-0.840618,-2.943548,-2.208002,1.058733,-1.632333,...,0.583276,-0.269209,-0.456108,-0.183659,-0.328168,0.606116,0.884876,-0.253700,245.00,1


**Step 13:** Join the legit and fraud 492 datasets ---> use concat

In [51]:
new_data = pd.concat([legit_sample,fraud_sample])

**Step 14:** Print the head and tail of new dataset

In [52]:
new_data.head()

Unnamed: 0,Time,V1,V2,V3,V4,V5,V6,V7,V8,V9,...,V21,V22,V23,V24,V25,V26,V27,V28,Amount,Class
213182,139142.0,-6.354478,-3.066068,-3.120387,0.529458,-2.948709,3.091657,2.662565,-0.5279,1.004195,...,-0.403143,-1.192606,-2.356835,-1.333487,-1.531992,-0.144886,1.499569,-0.183228,907.9,0
7227,9519.0,-1.265655,1.757645,1.545024,0.878105,0.346507,-0.484966,1.045463,-0.836756,1.535453,...,0.016571,0.613666,-0.087586,0.575519,-0.41077,-0.505892,-0.793346,-0.106087,4.23,0
136649,81806.0,-0.598107,0.154937,1.991921,1.294947,-0.988058,0.132521,0.515741,0.113659,0.107403,...,0.355465,0.86415,0.269727,0.364941,-0.421092,-0.258295,0.167898,0.196077,174.6,0
211658,138505.0,0.0932,-0.949141,-0.295676,-3.64786,1.965121,3.637804,-0.777691,0.740385,-1.93662,...,-0.094429,0.231508,-0.025945,0.682738,-0.653426,-0.274389,0.061642,-0.033797,5.0,0
25690,33725.0,1.05307,-0.022043,1.239622,1.104271,-0.671484,0.389423,-0.663045,0.292735,0.151673,...,0.216407,0.64557,-0.018519,0.021997,0.248889,-0.338195,0.076193,0.031562,25.0,0


In [124]:
new_data.tail()

Unnamed: 0,Time,V1,V2,V3,V4,V5,V6,V7,V8,V9,...,V21,V22,V23,V24,V25,V26,V27,V28,Amount,Class
279863,169142.0,-1.927883,1.125653,-4.518331,1.749293,-1.566487,-2.010494,-0.88285,0.697211,-2.064945,...,0.778584,-0.319189,0.639419,-0.294885,0.537503,0.788395,0.29268,0.147968,390.0,1
280143,169347.0,1.378559,1.289381,-5.004247,1.41185,0.442581,-1.326536,-1.41317,0.248525,-1.127396,...,0.370612,0.028234,-0.14564,-0.081049,0.521875,0.739467,0.389152,0.186637,0.76,1
280149,169351.0,-0.676143,1.126366,-2.2137,0.468308,-1.120541,-0.003346,-2.234739,1.210158,-0.65225,...,0.751826,0.834108,0.190944,0.03207,-0.739695,0.471111,0.385107,0.194361,77.89,1
281144,169966.0,-3.113832,0.585864,-5.39973,1.817092,-0.840618,-2.943548,-2.208002,1.058733,-1.632333,...,0.583276,-0.269209,-0.456108,-0.183659,-0.328168,0.606116,0.884876,-0.2537,245.0,1
281674,170348.0,1.991976,0.158476,-2.583441,0.40867,1.151147,-0.096695,0.22305,-0.068384,0.577829,...,-0.16435,-0.295135,-0.072173,-0.450261,0.313267,-0.289617,0.002988,-0.015309,42.53,1


**Step 15:** Split the data into X and y

In [64]:
X = new_data.drop(["Class","Time"],axis=1)
y = new_data.Class

In [65]:
y

213182    0
7227      0
136649    0
211658    0
25690     0
         ..
279863    1
280143    1
280149    1
281144    1
281674    1
Name: Class, Length: 984, dtype: int64

**Step 16:** Split into training and testing data using train_test_split

In [66]:
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.2)

**Step 17:** Print the shape of each of the variables mentioned in step 16.

In [67]:
print("X_train shape:",X_train.shape)
print("y_train shape:",y_train.shape)

print("X_test shape:",X_test.shape)
print("y_test shape:",y_test.shape)



X_train shape: (787, 29)
y_train shape: (787,)
X_test shape: (197, 29)
y_test shape: (197,)


**Step 18:** Create a Logistic Regression Model

In [68]:
from sklearn.linear_model import LogisticRegression
model = LogisticRegression()

**Step 19:** Train the Logistic Regression Model

In [91]:
model.fit(X_train,y_train)


lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression



**Step 20:** Make predictions using the Model

In [73]:

y_train_pred = model.predict(X_train)
y_test_pred = model.predict(X_test)

In [86]:
print(y_test,y_test_pred)


191583    0
141260    1
79525     1
270300    0
6899      1
         ..
138471    0
15451     1
192588    0
218912    0
224199    0
Name: Class, Length: 197, dtype: int64 [0 1 1 0 1 1 0 1 0 1 0 0 0 1 0 0 0 0 1 1 0 1 0 1 0 1 1 1 0 1 1 1 0 0 0 1 0
 0 0 1 1 1 0 0 0 1 0 1 1 1 1 0 0 0 0 0 1 0 0 0 0 1 0 1 0 0 1 1 0 0 1 1 0 0
 1 0 1 1 0 1 0 0 0 0 1 0 1 0 1 0 1 0 1 0 1 1 1 0 1 1 0 1 0 0 0 1 1 1 0 1 1
 1 0 1 0 1 0 1 1 1 0 1 1 0 0 1 1 0 0 1 1 0 1 0 1 0 0 1 1 0 1 0 1 1 1 0 1 0
 0 1 1 0 0 1 1 0 0 0 1 0 1 1 0 0 1 0 1 1 0 0 1 1 0 0 1 0 0 1 0 0 1 0 1 0 1
 0 1 1 0 0 1 0 0 1 0 1 0]


**Step 21:** Find the accruacy of the Model on the training data

In [79]:
from sklearn.metrics import accuracy_score
train_accuracy = accuracy_score(y_train,y_train_pred)
train_accuracy

0.9440914866581956

**Step 22**: Find the accuracy of the Model on the testing data

In [81]:
test_accuracy = accuracy_score(y_test,y_test_pred)
test_accuracy

0.9390862944162437

In [119]:
X_train.tail(2)

Unnamed: 0,V1,V2,V3,V4,V5,V6,V7,V8,V9,V10,...,V20,V21,V22,V23,V24,V25,V26,V27,V28,Amount
192584,-2.434004,3.225947,-6.596282,3.593161,-1.079452,-1.739741,-0.04742,0.301424,-1.779434,-5.836453,...,-0.280533,-0.035491,-0.419178,0.157436,-0.714849,0.468859,-0.348522,0.420036,-0.327643,362.55
112915,-1.358142,0.62331,2.19977,1.308244,-0.684154,0.361441,-0.063951,0.639456,0.312221,-1.061337,...,-0.156463,-0.204972,-0.339457,0.035416,0.631324,-0.120482,-0.604806,-0.053873,0.057133,49.99


**Step 23:** Create a dataset on your own for prediction and predict the values using your trained model.

In [120]:
x_new = pd.DataFrame({'V1':-1.739741, 'V2':4.658, 'V3':8.658, 'V4':-3.965, 'V5':-9.365, 'V6':-9.95, 'V7':6.336, 'V8':-5.66, 'V9':-6.66, 'V10':-9.856, 'V11':3.256,
       'V12':8.96, 'V13':8.33, 'V14':0.9656, 'V15':0.2599, 'V16':-0.2659, 'V17':0.2569, 'V18':-0.264864, 'V19':-0.269568, 'V20':0.864521, 'V21':0.02187,
       'V22':0.25489, 'V23':0.2648, 'V24':-0.9656, 'V25':1.00564, 'V26':1.5568, 'V27':0.2215, 'V28':-2.0326,'Amount':362.55},index=[1])
    

In [121]:
x_new

Unnamed: 0,V1,V2,V3,V4,V5,V6,V7,V8,V9,V10,...,V20,V21,V22,V23,V24,V25,V26,V27,V28,Amount
1,-1.739741,4.658,8.658,-3.965,-9.365,-9.95,6.336,-5.66,-6.66,-9.856,...,0.864521,0.02187,0.25489,0.2648,-0.9656,1.00564,1.5568,0.2215,-2.0326,362.55


In [122]:
y_new_pred =model.predict(x_new)

In [123]:
y_new_pred

array([0], dtype=int64)

In [126]:
x_new1 = pd.DataFrame({'V1':1.991976, 'V2':0.158476, 'V3':-2.583441, 'V4':-0.408670, 'V5':0.408670, 'V6':-0.096695, 'V7':0.223050, 'V8':-0.068384, 'V9':0.577829, 'V10':-9.856, 'V11':3.256,
       'V12':8.96, 'V13':8.33, 'V14':0.9656, 'V15':0.2599, 'V16':-0.2659, 'V17':0.002988, 'V18':-0.015309, 'V19':-0.269568, 'V20':0.864521, 'V21':0.02187,
       'V22':-0.164350, 'V23':-0.295135, 'V24':-0.072173, 'V25':1.00564, 'V26':-0.450261, 'V27':0.313267, 'V28':-0.289617,'Amount':242.53},index=[1])

In [128]:
model.predict(x_new1)

array([0], dtype=int64)

Congratulations you completed 75% of your Project. Let ma'am know once you are down till here. Now it's time to visualize your insights on PowerBI.