# Credit Card Fraud Detection Project Notebook

## By Eng. Ramy Gendy

<a id='Intro'></a>

## Introduction

> It is important that credit card companies are able to recognize fraudulent credit card transactions so that customers are not charged for items that they did not purchase. I will use various predictive models to see how accurate they are in detecting whether a transaction is a normal payment or a fraud.

## Table of Contents:
 * <a href="#Intro">Introduction.</a>
 * <a href="#Investigation Overview">Investigation Overview.</a>
 * <a href="#Dataset Overview & Understanding">Dataset Overview & Understanding.</a>
 * <a href="#Data Preprocessing">Data Preprocessing:</a>
     * Apply Feature Engineering and Extraction:
       - Domain knowledge features.
       - Apply string operations.
       - Work with Text.
     * Apply Feature Transformations: 
       - Data Cleaning.
       - Work with Missing data.
       - Work with Categorical data.
 * <a href="#Exploratory Data Analysis">Exploratory Data Analysis</a>
 * <a href="#Conclusion">Conclusion</a>
 * <a href="#References">References</a>

<a id='Investigation Overview'></a>
## Investigation Overview

> In this project, I will conduct analysis on `Credit Card Dataset`. I will pick some of the variables that are related to Target to analyze. I will first do some data wrangling, and then move on to EDA using different types of charts to explore relationships between/among variables, and create and answer our questions.

 **Questions:**

<a href="#01">01. ?</a>

<a href="#02">02. ?</a>

<a href="#03">03. ?</a>

<a href="#04">04. ?</a>

<a href="#05">05. ?</a>

<a href="#06">06. ?</a>

<a href="#0708">07. ?</a>

<a href="#0708">08. ?</a>

<a href="#09">09. ?</a>

<a href="#10">10. ?</a>

<a href="#11">11. ?</a>

<a href="#12">12. ?</a>

<a href="#13">13. ?</a>


<a id='Dataset Overview & Understanding'></a>
## Dataset Overview & Understanding

> The best way to understand the challenges underlying the design of a credit card fraud detection is by designing one. This notebook shows an implementation of a CCFD model and covers the main steps that need to be considered. Starting with Downloading and exploring our data from kaggle as follows.

### Dataset & Libraries preparation

> * Dataset can be downloaded directly from [kaggle](https://www.kaggle.com/datasets/mlg-ulb/creditcardfraud/download?datasetVersionNumber=3) or by using Kaggle API which will need to install Kaggle package.
* **_you can skip this part if you already using local machine or uploaded the dataset manually._**

In [None]:
# Install Kaggle Package
# we add ` &> /dev/null ` to silence its output outside of any errors that may arise. 
!pip install kaggle &> /dev/null

In [None]:
# create the .kaggle directory and an empty kaggle.json file
!mkdir -p /root/.kaggle
!touch /root/.kaggle/kaggle.json
!chmod 600 /root/.kaggle/kaggle.json

In [None]:
# Fill in your user name and key from creating the kaggle account and API token file
import json
kaggle_username = "Your_Username" # "Your_Username"
kaggle_key = "you_Kaggle_API_Token" # "you_Kaggle_API_Token"

# Save API token the kaggle.json file
with open("/root/.kaggle/kaggle.json", "w") as f:
    f.write(json.dumps({"username": kaggle_username, "key": kaggle_key}))

In [None]:
# Download the dataset, it will be in a .zip file so you'll need to unzip it as well.
!kaggle datasets download -d mlg-ulb/creditcardfraud

Downloading creditcardfraud.zip to /content
 97% 64.0M/66.0M [00:00<00:00, 84.6MB/s]
100% 66.0M/66.0M [00:00<00:00, 80.5MB/s]


In [None]:
# If you already downloaded it you can use the -o command to overwrite the file
!unzip -o creditcardfraud.zip

Archive:  creditcardfraud.zip
  inflating: creditcard.csv          


### Notebook Settings

> Loading required libraries, config. warnings and display settings.

In [None]:
# Importing libraries
# numpy library use to do array operations and also to do calculations
import numpy as np
# pandas library use to load dataset and also manipulate tabular data
import pandas as pd
# matplot library use to plot different graphs
import matplotlib
import matplotlib.pyplot as plt
%matplotlib inline
from IPython.core.pylabtools import figsize
from matplotlib import rcParams
rcParams['figure.figsize'] = 12,5
# seaborn library use to plot different plots
import seaborn as sns

In [None]:
# Ignore matched warnings and never print them
import warnings
warnings.filterwarnings('ignore')

In [None]:
# Set display format for float numbers to the neareast 2 decimal points
#pd.options.display.float_format = "{:,.2f}".format
# Settings the display
#pd.set_option("display.max_columns", None)
#pd.set_option('display.max_colwidth', None)
#pd.set_option('display.max_rows', None)

### Data Exploration

> By reading and exploring data reading data, displaying it using head() or tails(), explore data using describe(), info(), unique() and value_counts()

In [None]:
# Read dataset
df = pd.read_csv('creditcard.csv')

In [None]:
# view dataset
df.head()

Unnamed: 0,Time,V1,V2,V3,V4,V5,V6,V7,V8,V9,...,V21,V22,V23,V24,V25,V26,V27,V28,Amount,Class
0,0.0,-1.359807,-0.072781,2.536347,1.378155,-0.338321,0.462388,0.239599,0.098698,0.363787,...,-0.018307,0.277838,-0.110474,0.066928,0.128539,-0.189115,0.133558,-0.021053,149.62,0
1,0.0,1.191857,0.266151,0.16648,0.448154,0.060018,-0.082361,-0.078803,0.085102,-0.255425,...,-0.225775,-0.638672,0.101288,-0.339846,0.16717,0.125895,-0.008983,0.014724,2.69,0
2,1.0,-1.358354,-1.340163,1.773209,0.37978,-0.503198,1.800499,0.791461,0.247676,-1.514654,...,0.247998,0.771679,0.909412,-0.689281,-0.327642,-0.139097,-0.055353,-0.059752,378.66,0
3,1.0,-0.966272,-0.185226,1.792993,-0.863291,-0.010309,1.247203,0.237609,0.377436,-1.387024,...,-0.1083,0.005274,-0.190321,-1.175575,0.647376,-0.221929,0.062723,0.061458,123.5,0
4,2.0,-1.158233,0.877737,1.548718,0.403034,-0.407193,0.095921,0.592941,-0.270533,0.817739,...,-0.009431,0.798278,-0.137458,0.141267,-0.20601,0.502292,0.219422,0.215153,69.99,0
