# Challenge

Another approach to identifying fraudulent transactions is to look for outliers in the data. Standard deviation or quartiles are often used to detect outliers. Using this starter notebook, code two Python functions:

* One that uses standard deviation to identify anomalies for any cardholder.

* Another that uses interquartile range to identify anomalies for any cardholder.

## Identifying Outliers using Standard Deviation

In [1]:
# Initial imports
import pandas as pd
import numpy as np
import random
from sqlalchemy import create_engine

In [2]:
# Create a connection to the database
engine = create_engine("postgresql://postgres:postgres@localhost:5432/fraud_detection")

In [3]:
query = "SELECT id, amount\
         FROM complete_table\
         WHERE id = 1 or id = 2 or id = 3"
    
# Create a DataFrame from the query result. HINT: Use pd.read_sql(query, engine)
cardholders = pd.read_sql(query, engine)
cardholders

Unnamed: 0,id,amount
0,1,3.12
1,1,11.50
2,1,10.94
3,1,19.93
4,1,10.24
...,...,...
285,3,5.05
286,3,11.29
287,3,313.00
288,3,7.61


In [4]:
# Write function that locates outliers using standard deviation
def outliers_std(data):

    # calculate summary statistics
    data_mean = np.mean(data)
    data_std = np.std(data)
    
    # identify outliers
    cutoff = data_std * 3
    lower = data_mean - cutoff
    upper = data_mean + cutoff
    
    outliers = [x for x in data if x < lower or x > upper]
    return(outliers)


In [5]:
# Find anomalous transactions for 3 random card holders
cardholder_1 = cardholders[cardholders['id']==1]['amount']
print(f"The outliers for cardholder 1: {outliers_std(cardholder_1)}")
cardholder_2 = cardholders[cardholders['id']==2]['amount']
print(f"The outliers for cardholder 2: {outliers_std(cardholder_2)}")
cardholder_3 = cardholders[cardholders['id']==3]['amount']
print(f"The outliers for cardholder 3: {outliers_std(cardholder_3)}")

The outliers for cardholder 1: [1691.0, 1302.0, 1789.9999999999998, 1660.0000000000002, 1894.0000000000002]
The outliers for cardholder 2: []
The outliers for cardholder 3: [1119.0, 1159.0, 1160.0]


## Identifying Outliers Using Interquartile Range

In [6]:
# Write a function that locates outliers using interquartile range
def outliers_iqr(data):
    # calculate interquartile range
    q25 = np.percentile(data, 25)
    q75 = np.percentile(data, 75)
    iqr = q75 - q25
    
    # calculate the outlier cutoff
    cutoff = iqr * 1.5
    lower = q25 - cutoff
    upper = q75 + cutoff
    
    outliers = [x for x in data if x < lower or x > upper]
    return(outliers)

In [7]:
# Find anomalous transactions for 3 random card holders
cardholder_1 = cardholders[cardholders['id']==1]['amount']
print(f"The outliers for cardholder 1: {outliers_iqr(cardholder_1)}")
cardholder_2 = cardholders[cardholders['id']==2]['amount']
print(f"The outliers for cardholder 2: {outliers_iqr(cardholder_2)}")
cardholder_3 = cardholders[cardholders['id']==3]['amount']
print(f"The outliers for cardholder 3: {outliers_iqr(cardholder_3)}")

The outliers for cardholder 1: [1691.0, 283.0, 1302.0, 1789.9999999999998, 1017.0, 1056.0, 1060.0, 484.0, 267.0, 1660.0000000000002, 1894.0000000000002, 1033.0]
The outliers for cardholder 2: []
The outliers for cardholder 3: [1119.0, 1159.0, 1160.0, 188.0, 626.0, 757.0, 206.0, 1053.0, 1054.0, 313.0]
