Welcome to Are You a Fraud! This project involves an in-depth analysis utilizing machine learning models to predict if a credit card transaction is fraudulent or authentic. Below, you'll find an overview of the analysis, the dataset, the models used, and the key results. We also focused on the feature importance of our K-Nearest model as well as our Gradient Boosting model to better understand what variables are looked at most when determining fraud.
The primary objective of this analysis is to read in credit card transactions and train our machine learning model to more accurately predict credit card fraud. We aim to leverage machine learning models to predict when a charge is fraud based on financial information in the provided dataset.
The dataset contains eight variables of data that include distance from home, distance from last transaction, ration to median purchase price, repeat retailer, used chip, used pin number, online order, and fraud. These variable can provide valuable insights into the validity of the credit card transactions on our provided dataset.
The machine learning process encompassed data preprocessing, feature engineering, model selection, and evaluation. Each stage was carefully executed to ensure optimal model performance. We utilized the XGBoost model to make sure the Gradient Boosting model did not over fit. We visualized the importance features in the K-Nearest model as well as the Gradient Boosting model to see what variables had the greatest impact on fraudulent charges.
We employed various machine learning models, including:
- K-Nearest Neighbors
- Gradient Boosting
- XGBoost
We achieved the following classification report results for each model:
We created a correlation matrix heatmap to visually explore the relationships between selected features and the target variable.
From our Gradient Boosting model we were able to visualize which variables were more important when making a decision if the transaction was fraudulent.
In conclusion, the analysis provided valuable insights into predicting credit card fraud with machine learning models. Although our results may not have yielded exactly what we hoped for, we learned of valauable resources and practices that guided along the way, including feature importance. Overall Gradient Boosting was the most accurate at predicting false and non-false charges. We still made sure to use XGBoost to make Gradient Boosting did not over fit the model. As we noticed in out heat map median purchase price had the highest correlation at determining if a charge was fraudulent or not.







