In [19]:
pip install -U altair 

Note: you may need to restart the kernel to use updated packages.


# Fraudulent Transaction Predictions

### Introduction

The data set used in this proposal, Transaction Data for Fraud Analysis, analyses various transactions both fraudulent and non-fraudulent. This data set was taken from Kaggle and contains synthetic transaction data that can be used for data analytics practice. Included in the data set are the ids of the transaction, customer, and merchant, the amount of the transaction, time of transaction, type of card used, location of transaction, category of purchase, customer age, description of the transaction, and whether the transaction was fraudulent or not. A study published in 2014 found that older adults are more susceptible to scams and that susceptibility to scams increases with age (James et al., 2014). The researchers measured susceptibility to scams using a 5-point measuring scale as well as other potentially correlating factors on a sample of 639 older adults. Integrating this evidence into out project, we intend on using age of customer and transaction amount as predictors of whether a charge is fraudulent or non-fraudulent. 



### Preliminary exploratory data analysis









In [20]:
import pandas as pd
import altair as alt

In [21]:
financial_data = pd.read_csv("data/synthetic_financial_data.csv")
financial_data

Unnamed: 0,transaction_id,customer_id,merchant_id,amount,transaction_time,is_fraudulent,card_type,location,purchase_category,customer_age,transaction_description
0,1,1082,2027,5758.59,2023-01-01 00:00:00,0,MasterCard,City-30,Gas Station,43,Purchase at Merchant-2027
1,2,1015,2053,1901.56,2023-01-01 00:00:01,1,Visa,City-47,Online Shopping,61,Purchase at Merchant-2053
2,3,1004,2035,1248.86,2023-01-01 00:00:02,1,MasterCard,City-6,Gas Station,57,Purchase at Merchant-2035
3,4,1095,2037,7619.05,2023-01-01 00:00:03,1,Discover,City-6,Travel,59,Purchase at Merchant-2037
4,5,1036,2083,1890.10,2023-01-01 00:00:04,1,MasterCard,City-34,Retail,36,Purchase at Merchant-2083
...,...,...,...,...,...,...,...,...,...,...,...
9995,9996,1056,2023,8935.28,2023-01-01 02:46:35,1,MasterCard,City-39,Restaurant,36,Purchase at Merchant-2023
9996,9997,1053,2026,30.15,2023-01-01 02:46:36,0,MasterCard,City-32,Retail,42,Purchase at Merchant-2026
9997,9998,1041,2034,6333.64,2023-01-01 02:46:37,0,American Express,City-1,Online Shopping,36,Purchase at Merchant-2034
9998,9999,1009,2019,2837.13,2023-01-01 02:46:38,1,Visa,City-11,Retail,57,Purchase at Merchant-2019


In [22]:
financial_data.columns

Index(['transaction_id', 'customer_id', 'merchant_id', 'amount',
       'transaction_time', 'is_fraudulent', 'card_type', 'location',
       'purchase_category', 'customer_age', 'transaction_description'],
      dtype='object')

In [23]:
def categorize(value):
    if value == 0:
        return 'no'
    else:
        return 'yes'

financial_data['Fraudulent'] = financial_data['is_fraudulent'].apply(categorize)

financial_data

Unnamed: 0,transaction_id,customer_id,merchant_id,amount,transaction_time,is_fraudulent,card_type,location,purchase_category,customer_age,transaction_description,Fraudulent
0,1,1082,2027,5758.59,2023-01-01 00:00:00,0,MasterCard,City-30,Gas Station,43,Purchase at Merchant-2027,no
1,2,1015,2053,1901.56,2023-01-01 00:00:01,1,Visa,City-47,Online Shopping,61,Purchase at Merchant-2053,yes
2,3,1004,2035,1248.86,2023-01-01 00:00:02,1,MasterCard,City-6,Gas Station,57,Purchase at Merchant-2035,yes
3,4,1095,2037,7619.05,2023-01-01 00:00:03,1,Discover,City-6,Travel,59,Purchase at Merchant-2037,yes
4,5,1036,2083,1890.10,2023-01-01 00:00:04,1,MasterCard,City-34,Retail,36,Purchase at Merchant-2083,yes
...,...,...,...,...,...,...,...,...,...,...,...,...
9995,9996,1056,2023,8935.28,2023-01-01 02:46:35,1,MasterCard,City-39,Restaurant,36,Purchase at Merchant-2023,yes
9996,9997,1053,2026,30.15,2023-01-01 02:46:36,0,MasterCard,City-32,Retail,42,Purchase at Merchant-2026,no
9997,9998,1041,2034,6333.64,2023-01-01 02:46:37,0,American Express,City-1,Online Shopping,36,Purchase at Merchant-2034,no
9998,9999,1009,2019,2837.13,2023-01-01 02:46:38,1,Visa,City-11,Retail,57,Purchase at Merchant-2019,yes


In [24]:
financial_data=financial_data[['amount', 'Fraudulent', 'customer_age']]
financial_data

Unnamed: 0,amount,Fraudulent,customer_age
0,5758.59,no,43
1,1901.56,yes,61
2,1248.86,yes,57
3,7619.05,yes,59
4,1890.10,yes,36
...,...,...,...
9995,8935.28,yes,36
9996,30.15,no,42
9997,6333.64,no,36
9998,2837.13,yes,57


In [25]:
import altair as alt

In [26]:
financial_data = financial_data[financial_data['Fraudulent']=="yes"]
financial_data

Unnamed: 0,amount,Fraudulent,customer_age
1,1901.56,yes,61
2,1248.86,yes,57
3,7619.05,yes,59
4,1890.10,yes,36
5,8487.68,yes,43
...,...,...,...
9990,3377.98,yes,43
9993,6563.32,yes,20
9995,8935.28,yes,36
9998,2837.13,yes,57


In [29]:
finance_plot = alt.Chart(financial_data[0:1000]).mark_bar().encode(
    x='customer_age',
    y='count()'
)
finance_plot

### Methods
The model will use amount, customer_age and is_fraudulent columns to predict if credit card fraud is specific to certain age demographics and transaction amount. We will visualize the findings using a bar graph through 2 bar graphs: age demographics and amount. Grouping the ages into separate age brackets will display if certain demographics are more susceptible to credit card fraud than others. On our graph, we will display the age groups on the x-axis and the count on the y-axis. The second bar graph will be grouping dollars amount which will display if greater transaction amounts are indicative of fraud. The  x-axis will show the transaction brackets and the y-axis will display the count per group.



### Expected outcomes and significance









In this project, using age and transaction amount to detect credit card fraud, we expect to find that certain age groups or transaction amounts are more strongly associated with fraud. These findings could have an impact on fraud detection algorithms, potentially leading to more accurate and efficient models. It could also raise questions about whether age or transaction amount alone is sufficient for fraud detection or if other features should be considered for a more robust model.

### References

James, B. D., Boyle, P. A., & Bennett, D. A. (2014). Correlates of susceptibility to scams in older adults without dementia. Journal of elder abuse & neglect, 26(2), 107–122. https://doi.org/10.1080/08946566.2013.821809

Kaggle dataset: https://www.kaggle.com/datasets/isabbaggin/transaction-fraudulent-financial-syntheticdata