# Credit Card Fraud Example

### Objective:

In this scenerio, a credit card company is determining what kind of customers they have that are being targeted by frauds and scams. In this exercise, we are going to explore the data for any potential patterns try and identify which customers are the most likely to be targeted, which kinds of customers drive the largest amount and most frequent amounts of fraud, and any other pertinent information that may exist. Our ultimate goal is to identify this so we can take precautions and build solutions so we can reduce the likelihood of the customers to become victims of fraud in the future.


#### Background:

A new credit card company has just entered the U.S. market. They are trying to identifyinstances of fraud. In the internal data, we have raw transaction data with instances of fraud that have been flagged internally, by customers who have called in to customer support 

The executive wants to know how accurately you can predict fraud using this data. She has stressed that the model should err on the side of caution: it is not a big problem to flag transactions as fraudulent when they aren't just to be safe. In your report, you will need to describe how well your model functions and how it adheres to these criteria.

You will need to prepare a report that is accessible to a broad audience. It will need to outline your 


### Motivation: 






#### Business Understanding


Questions I developed in the early stages of just looking over the data briefly and using common knowledge to analyze the task at hand:

What types of purchases are most likely to be instances of fraud?

What types of customers are most likely to be instances of fraud?

What are the fraud rates across the US? What are they across different states?

Are the targets of these predominately in rural areas? Urban? Suburban?

Are older customers more likely to be victims of credit card fraud? Does it make a difference?

Are the consistencies amongst the merchants for these fraudulent transactions to occur?

#### Data Understanding

Data Dictionary

| Column                | Description                           |
|-----------------------|---------------------------------------|
| trans_date_trans_time | Transaction date and time             |
| merchant              | Merchant name                         |
| category              | Category of the transaction           |
| amt                   | Amount of the transaction             |
| city                  | City of the transaction               |
| state                 | State of the transaction              |
| lat                   | Latitude of the customer              |
| long                  | Longitude of the customer             |
| city_pop              | Population of the city                |
| job                   | Job title of the cardholder           |
| dob                   | Date of birth of the cardholder       |
| trans_num             | Transaction number                    |
| merch_lat             | Merchant's latitude                   |
| merch_long            | Merchant's longitude                  |
| is_fraud              | Indicator of fraud (1 for fraud, 0 for non-fraud) |


Data Sources: Datacamp

#### Data Preperation / System Design

- I noticed that there were longitudal and latitudal information for the merchant, but not the city or state. That makes it a little difficult to see if proximity to the physical store had any relation to if the customer were scammed for the fraudulent transaction or not. Maybe if they were closer to the store, and it was familiair or somewhere they drove by often it was more likely to occur, the customer could more easily be tricked?

- After going through the data a little more granulrly, it made sense to look for general US population data, broken down by town to help map out the vendor's location and see if that made any difference as to why individuals were being targeted, or why the specific merchants were being targeted. This step was to be conducted <b>after</b> an initial analysis based on solely the initial dataset.



Exploratory Data Analysis (EDA)

### Analysis:

#### Steps


### Findings:


### Conclusion:



In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

import datetime
from datetime import datetime
from datetime import timedelta
from dateutil import rrule
import time

In [2]:
df = pd.read_csv('./Datasets/datacamp_credit_card_fraud_data.csv')
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 339607 entries, 0 to 339606
Data columns (total 15 columns):
 #   Column                 Non-Null Count   Dtype  
---  ------                 --------------   -----  
 0   trans_date_trans_time  339607 non-null  object 
 1   merchant               339607 non-null  object 
 2   category               339607 non-null  object 
 3   amt                    339607 non-null  float64
 4   city                   339607 non-null  object 
 5   state                  339607 non-null  object 
 6   lat                    339607 non-null  float64
 7   long                   339607 non-null  float64
 8   city_pop               339607 non-null  int64  
 9   job                    339607 non-null  object 
 10  dob                    339607 non-null  object 
 11  trans_num              339607 non-null  object 
 12  merch_lat              339607 non-null  float64
 13  merch_long             339607 non-null  float64
 14  is_fraud               339607 non-nu

In [3]:
df.head(5)

Unnamed: 0,trans_date_trans_time,merchant,category,amt,city,state,lat,long,city_pop,job,dob,trans_num,merch_lat,merch_long,is_fraud
0,2019-01-01 00:00:44,"Heller, Gutmann and Zieme",grocery_pos,107.23,Orient,WA,48.8878,-118.2105,149,Special educational needs teacher,1978-06-21,1f76529f8574734946361c461b024d99,49.159047,-118.186462,0
1,2019-01-01 00:00:51,Lind-Buckridge,entertainment,220.11,Malad City,ID,42.1808,-112.262,4154,Nature conservation officer,1962-01-19,a1a22d70485983eac12b5b88dad1cf95,43.150704,-112.154481,0
2,2019-01-01 00:07:27,Kiehn Inc,grocery_pos,96.29,Grenada,CA,41.6125,-122.5258,589,Systems analyst,1945-12-21,413636e759663f264aae1819a4d4f231,41.65752,-122.230347,0
3,2019-01-01 00:09:03,Beier-Hyatt,shopping_pos,7.77,High Rolls Mountain Park,NM,32.9396,-105.8189,899,Naval architect,1967-08-30,8a6293af5ed278dea14448ded2685fea,32.863258,-106.520205,0
4,2019-01-01 00:21:32,Bruen-Yost,misc_pos,6.85,Freedom,WY,43.0172,-111.0292,471,"Education officer, museum",1967-08-02,f3c43d336e92a44fc2fb67058d5949e3,43.753735,-111.454923,0
