# Challenge

Another approach to identifying fraudulent transactions is to look for outliers in the data. Standard deviation or quartiles are often used to detect outliers. Using this starter notebook, code two Python functions:

* One that uses standard deviation to identify anomalies for any cardholder.

* Another that uses interquartile range to identify anomalies for any cardholder.

## Identifying Outliers using Standard Deviation

In [1]:
# Initial imports
import pandas as pd
import numpy as np
import random
from sqlalchemy import create_engine



In [2]:
# Create a connection to the database
engine = create_engine("postgresql://postgres:postgres@localhost:5432/fraud_detection")



In [8]:
# Write function that locates outliers using standard deviation
def outlier_detector(lists):
    outlier_list = []
    mean = np.mean(lists)
    stddev = np.std(lists)
    for num in lists:
        if num > (mean + (stddev*3)) or num < (mean - (stddev*3)):
            outlier_list.append(num)
    return outlier_list
            

In [12]:
# Find anomalous transactions for 3 random card holders
query = """
select cardholder_id, amount from join_transaction
where cardholder_id = 5
or cardholder_id = 7
or cardholder_id = 10
"""

rand_cardholder = pd.read_sql(query, engine)
rand_cardholder_5 = list(rand_cardholder[rand_cardholder['cardholder_id'] == 5]['amount'].values)
rand_cardholder_7 = list(rand_cardholder[rand_cardholder['cardholder_id'] == 7]['amount'].values)
rand_cardholder_10 = list(rand_cardholder[rand_cardholder['cardholder_id'] == 10]['amount'].values)

print(f"outliers for cardholder id 2 is {outlier_detector(rand_cardholder_5)}")
print(f"outliers for cardholder id 7 is {outlier_detector(rand_cardholder_7)}")
print(f"outliers for cardholder id 13 is {outlier_detector(rand_cardholder_10)}")

outliers for cardholder id 2 is []
outliers for cardholder id 7 is [1685.0000000000002, 1072.0, 1086.0, 1449.0, 2249.0, 1296.0]
outliers for cardholder id 13 is []


## Identifying Outliers Using Interquartile Range

In [17]:
# Write a function that locates outliers using interquartile range
def interquartile_outlier(lists):
    median = np.median(lists)
    q3 = np.median([num for num in lists if num > median])
    q1 = np.median([num for num in lists if num < median])
    iqr = q3 - q1
    
    outlier = [num for num in lists if  (q1 - (1.5 * iqr)) > num or num > (q3 + (1.5 * iqr))]
    return outlier
   
    
    

In [18]:
# Find anomalous transactions for 3 random card holders
print(f"outliers for cardholder id 2 is {interquartile_outlier(rand_cardholder_5)}")
print(f"outliers for cardholder id 7 is {interquartile_outlier(rand_cardholder_7)}")
print(f"outliers for cardholder id 13 is {interquartile_outlier(rand_cardholder_10)}")

outliers for cardholder id 2 is []
outliers for cardholder id 7 is [1685.0000000000002, 445.0, 1072.0, 543.0, 1086.0, 160.0, 233.0, 1449.0, 2249.0, 1296.0]
outliers for cardholder id 13 is []
