Skip to content

dinbav/Predictive-Modeling-TA--Fraud-case-study

Repository files navigation

Predictive Modeling TA- Fraud IEEE case study

Predictive modeling on fraud dataset Those notebooks were created as part of my role as a predictive modeling teaching assistant for master students. In complimentary with the course theory dealing with data science pipeline.

Implementing data science theory into practice using unbalanced fraud data into colab python notebooks. (Full notebooks that I shared with my student found here)

  • Performing exploratory data analysis (EDA) using NumPy, pandas, matplotlib, seaborn, spicy, and ploty in python.

  • Exploring the pros and cons of different methods to handle missing data, outliers, and transformations.

    • Handle missing data: dropping missing data, fill with ‘NaN’ and ‘0’, forward and back-fill, fill with mode and mean, fill nulls by distribution, handling nulls with interpolate.
    • Transformations of the data according to the positivity and the negativity of the distribution’s skew.
    • Removing outliers according to the quantile and kurtosis.
  • Feature selection using correlation and mutual info.

  • Handling categorical features using get dummies.

  • Handling unbalanced data by using SMOTE nested within cross-validation using K-Folds. Balancing positive and negative target data selection for the cross-validation by divided sampling.

  • Applying Logistic Regression machine learning model (Intentionally- for the purpose of exploring the consequence of data handling, a Decision Tree is a better fitted modal for this type of data).

  • Evaluating Accuracy, confusion matrix (precision and recall), AUC (Area under the ROC Curve), and f1-score.

About

Full implementation of data science pipeline on unbalanced data.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published