# <a id='toc1_'></a>[Credit Card Fraud](#toc0_)

### <a id='toc1_1_1_'></a>[Introduction](#toc0_)

This dataset consists of credit card transactions in the western United States. It includes information about each transaction including customer details, the merchant and category of purchase, and whether or not the transaction was a fraud.

This analysis uses a combination of SQL and Tableau. Python will be used in the future to predict whether a transaction will be fraudulent.


**Table of contents**<a id='toc0_'></a>    
- [Credit Card Fraud](#toc1_)    
    - [Introduction](#toc1_1_1_)    
    - [Convert CSV to .db file](#toc1_1_2_)    
    - [Load the SQL Extension and Database](#toc1_1_3_)    
    - [Data Dictionary](#toc1_1_4_)    
      - [Data Dictionary](#toc1_1_4_1_)    
    - [Class Distribution](#toc1_1_5_)    
    - [Q1: Fraud Frequency by Merchant](#toc1_1_6_)    

<!-- vscode-jupyter-toc-config
	numbering=false
	anchor=true
	flat=false
	minLevel=1
	maxLevel=6
	/vscode-jupyter-toc-config -->
<!-- THIS CELL WILL BE REPLACED ON TOC UPDATE. DO NOT WRITE YOUR TEXT IN THIS CELL -->

### <a id='toc1_1_2_'></a>[Convert CSV to .db file](#toc0_)

In [None]:
## Code block to convert csv file to .db using pandas

import pandas as pd
import sqlite3

# Load the CSV into a Pandas DataFrame
df = pd.read_csv('credit_card_fraud.csv')

# Create an SQLite database (or connect to an existing one)
conn = sqlite3.connect('credit_card_fraud.db')

# Write the data from the DataFrame to a new table in the SQLite database
df.to_sql('your_table_name', conn, if_exists='replace', index=False)

# Close the connection
conn.close()

### <a id='toc1_1_3_'></a>[Load the SQL Extension and Database](#toc0_)

In [2]:
# Load the SQL extension
# %load_ext sql
%reload_ext sql

# Connect to your SQLite database
%sql sqlite:///C:/Users/Laxman/OneDrive/Documents/Portfolio/SQL/Projects/CreditCardFraud/credit_card_fraud.db

### <a id='toc1_1_4_'></a>[Data Table](#toc0_)

In [3]:
%%sql
SELECT * FROM transactions LIMIT 5;

 * sqlite:///C:/Users/Laxman/OneDrive/Documents/Portfolio/SQL/Projects/CreditCardFraud/credit_card_fraud.db
Done.


trans_date_trans_time,merchant,category,amt,city,state,lat,long,city_pop,job,dob,trans_num,merch_lat,merch_long,is_fraud
2019-01-01 00:00:44,"Heller, Gutmann and Zieme",grocery_pos,107.23,Orient,WA,48.8878,-118.2105,149,Special educational needs teacher,1978-06-21,1f76529f8574734946361c461b024d99,49.159047,-118.186462,0
2019-01-01 00:00:51,Lind-Buckridge,entertainment,220.11,Malad City,ID,42.1808,-112.262,4154,Nature conservation officer,1962-01-19,a1a22d70485983eac12b5b88dad1cf95,43.150704,-112.154481,0
2019-01-01 00:07:27,Kiehn Inc,grocery_pos,96.29,Grenada,CA,41.6125,-122.5258,589,Systems analyst,1945-12-21,413636e759663f264aae1819a4d4f231,41.65752,-122.230347,0
2019-01-01 00:09:03,Beier-Hyatt,shopping_pos,7.77,High Rolls Mountain Park,NM,32.9396,-105.8189,899,Naval architect,1967-08-30,8a6293af5ed278dea14448ded2685fea,32.863258,-106.520205,0
2019-01-01 00:21:32,Bruen-Yost,misc_pos,6.85,Freedom,WY,43.0172,-111.0292,471,"Education officer, museum",1967-08-02,f3c43d336e92a44fc2fb67058d5949e3,43.753735,-111.454923,0


#### <a id='toc1_1_4_1_'></a>[Data Dictionary](#toc0_)

| transdatetrans_time | Transaction DateTime                        |
|---------------------|---------------------------------------------|
| merchant            | Merchant Name                               |
| category            | Category of Merchant                        |
| amt                 | Amount of Transaction                       |
| city                | City of Credit Card Holder                  |
| state               | State of Credit Card Holder                 |
| lat                 | Latitude Location of Purchase               |
| long                | Longitude Location of Purchase              |
| city_pop            | Credit Card Holder's City Population        |
| job                 | Job of Credit Card Holder                   |
| dob                 | Date of Birth of Credit Card Holder         |
| trans_num           | Transaction Number                          |
| merch_lat           | Latitude Location of Merchant               |
| merch_long          | Longitude Location of Merchant              |
| is_fraud            | Whether Transaction is Fraud (1) or Not (0) |

### <a id='toc1_1_5_'></a>[Q1: Class Distribution](#toc0_)

In [4]:
%%sql
SELECT 
    SUM(CASE WHEN is_fraud = 0 THEN 1 ELSE 0 END) AS not_fraud,
    SUM(CASE WHEN is_fraud = 1 THEN 1 ELSE 0 END) AS is_fraud
FROM transactions;

 * sqlite:///C:/Users/Laxman/OneDrive/Documents/Portfolio/SQL/Projects/CreditCardFraud/credit_card_fraud.db
Done.


not_fraud,is_fraud
337825,1782


| not_fraud | is_fraud |
|-----------|----------|
| 337825    | 1782     |

This SQL query calculates the total number of non-fraudulent and fraudulent transactions in the transactions table. Ratio of non-fraudulent to fraudulent transactions is 169:1. Using SMOTE(Synthetic Minority Over-sampling Technique) in future studies might be beneficial when creating a model to predict fraudulent transactions to improve the model performance and as the imbalance can lead to biases. More research is required in SMOTE.

<img src="../../../Tableau/Project/CreditCardFraud/Q1 - Fraudulent vs  Non-Fraudulent.png" alt="" width="500" height="400">


### <a id='toc1_1_6_'></a>[Q2: Fraud Frequency by Merchant](#toc0_)

In [46]:
%%sql
SELECT 
    merchant AS Merchant,
    COUNT(CASE WHEN is_fraud = 1 THEN 1 END) AS is_fraudulent,
    COUNT(CASE WHEN is_fraud = 0 THEN 1 END) AS not_fraudulent,
    ROUND(CAST(COUNT(CASE WHEN is_fraud = 1 THEN 1 END) AS FLOAT) / CAST(COUNT(CASE WHEN is_fraud = 0 THEN 1 END) AS FLOAT), 4) AS Ratio
FROM transactions
GROUP BY merchant
ORDER BY ratio DESC
LIMIT 10;



 * sqlite:///C:/Users/Laxman/OneDrive/Documents/Portfolio/SQL/Projects/CreditCardFraud/credit_card_fraud.db
Done.


Merchant,is_fraudulent,not_fraudulent,Ratio
"Romaguera, Cruickshank and Greenholt",18,503,0.0358
Kerluke-Abshire,17,498,0.0341
Kiehn-Emmerich,19,639,0.0297
Terry-Huel,13,506,0.0257
"Tillman, Fritsch and Schmitt",9,355,0.0254
Kunze Inc,16,629,0.0254
"Moore, Dibbert and Koepp",8,329,0.0243
Welch Inc,8,334,0.024
Lebsack and Sons,8,350,0.0229
Kerluke Inc,7,319,0.0219


In [48]:
%%sql
SELECT 
    merchant AS Merchant,
    COUNT(CASE WHEN is_fraud = 1 THEN 1 END) AS is_fraudulent,
    COUNT(CASE WHEN is_fraud = 0 THEN 1 END) AS not_fraudulent
FROM transactions
GROUP BY merchant
ORDER BY is_fraudulent DESC
LIMIT 10;


 * sqlite:///C:/Users/Laxman/OneDrive/Documents/Portfolio/SQL/Projects/CreditCardFraud/credit_card_fraud.db
Done.


Merchant,is_fraudulent,not_fraudulent
Kiehn-Emmerich,19,639
"Romaguera, Cruickshank and Greenholt",18,503
Kerluke-Abshire,17,498
Kunze Inc,16,629
Kilback LLC,15,1134
Strosin-Cruickshank,14,657
Terry-Huel,13,506
"Schultz, Simonis and Little",13,609
Murray-Smitham,13,651
McDermott-Weimann,13,626


The SQL statement retrieves a list of merchants from the transactions table, along with the count of fraudulent transactions associated with each merchant, and the RATIO of fraudulent to non-fraudulent transactions. The SQL statement outputs the 10 merchants with the largest fraudulent to non-fraudulent ratios - model could use ratios as a parameter to predict whether a transaction is fraudulent.

The second SQL statement is a simple query to find the merchants with the most amount of fraudulent transactions.


### <a id='toc1_1_6_'></a>[Q3: Top Categories Involved in Fraudulent Transactions](#toc0_)