# Challenge

Another approach to identifying fraudulent transactions is to look for outliers in the data. Standard deviation or quartiles are often used to detect outliers. Using this starter notebook, code two Python functions:

* One that uses standard deviation to identify anomalies for any cardholder.

* Another that uses interquartile range to identify anomalies for any cardholder.

## Identifying Outliers using Standard Deviation

In [1]:
# Initial imports
import pandas as pd
import numpy as np
import random
from sqlalchemy import create_engine
from dotenv import load_dotenv
import os
load_dotenv()


True

In [2]:
# Create a connection to the database
user = os.getenv("DB_USER")
password = os.getenv("DB_PASSWORD")
engine = create_engine(f"postgresql://{user}:{password}@localhost:5432/fraud_db")

In [3]:
# Get some data with Raw SQL
query = """
    SELECT
        t.date,
        t.amount,
        ch.id as customer_id,
        ch.name as customer_name,
        m.name as merchant_name,
        mc.name as merchant_type
    FROM transaction t
    INNER JOIN merchant m on m.id = t.id_merchant
    INNER JOIN merchant_category mc on mc.id = m.id_merchant_category
    INNER JOIN credit_card cc on cc.card = t.card
    INNER JOIN card_holder ch on ch.id = cc.cardholder_id
    ORDER BY DATE ASC
"""
fraud_df = pd.read_sql(query, engine)
fraud_df.set_index('date', inplace=True)

fraud_df.head()

Unnamed: 0_level_0,amount,customer_id,customer_name,merchant_name,merchant_type
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2018-01-01 21:35:10,6.22,13,John Martin,Dominguez PLC,food truck
2018-01-01 21:43:12,3.83,13,John Martin,Patton-Rivera,bar
2018-01-01 22:41:21,9.61,10,Matthew Gutierrez,Day-Murray,food truck
2018-01-01 23:13:30,19.03,4,Danielle Green,Miller-Blevins,pub
2018-01-01 23:15:10,2.95,18,Malik Carlson,"Cline, Myers and Strong",restaurant


In [4]:
# Write function that locates outliers using standard deviation
std_dev = fraud_df['amount'].std()

# Filter amounts that are +/- 3 times the standard deviation
outliers = fraud_df[(fraud_df['amount'] > 3 * std_dev) | (fraud_df['amount'] < -3 * std_dev)]
outliers.head()

Unnamed: 0_level_0,amount,customer_id,customer_name,merchant_name,merchant_type
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2018-01-02 23:27:46,1031.0,12,Megan Price,Baxter-Smith,restaurant
2018-01-04 03:05:18,1685.0,7,Sean Taylor,"Kelly, Dyer and Schmitt",food truck
2018-01-08 02:34:32,1029.0,6,Beth Hernandez,Hood-Phillips,bar
2018-01-22 08:07:03,1131.0,16,Crystal Clark,"Walker, Deleon and Wolf",restaurant
2018-01-23 06:29:37,1678.0,12,Megan Price,Garcia-White,pub


In [5]:
# Find anomalous transactions for 3 random card holders

# Cast series to a set for distinct values and then back to list
customer_ids = list(set(outliers.customer_id))

for i in range(3):
    random_id = random.choice(customer_ids)
    # remove ids already used
    customer_ids.remove(random_id)
    print(outliers[outliers['customer_id'] == random_id])

                     amount  customer_id  customer_name  \
date                                                      
2018-02-19 22:48:25  1839.0           18  Malik Carlson   
2018-04-03 03:23:37  1077.0           18  Malik Carlson   
2018-06-03 20:02:28  1814.0           18  Malik Carlson   
2018-07-18 09:19:08   974.0           18  Malik Carlson   
2018-09-10 22:49:41  1176.0           18  Malik Carlson   
2018-11-17 05:30:43  1769.0           18  Malik Carlson   
2018-12-13 12:09:58  1154.0           18  Malik Carlson   

                                 merchant_name merchant_type  
date                                                          
2018-02-19 22:48:25               Baxter-Smith    restaurant  
2018-04-03 03:23:37          Townsend-Anderson    restaurant  
2018-06-03 20:02:28  Boone, Davis and Townsend           pub  
2018-07-18 09:19:08          Santos-Fitzgerald           pub  
2018-09-10 22:49:41                Lopez-Kelly    restaurant  
2018-11-17 05:30:43        

## Identifying Outliers Using Interquartile Range

In [6]:
# Write a function that locates outliers using interquartile range
