Skip to content

Worked on a classification analysis (class imbalance) for a business problem. Analysis was done using Anaconda Python.

Notifications You must be signed in to change notification settings

JalalQ/4045-Classification-Anaconda

Repository files navigation

Introduction

In this predictive analysis problem, I worked on classification of electronic components to determine which electronic components should be procured by an electronic component suppliers. Due to the non-disclosure agreement (NDA) which I have signed with the company, I am not authorized to share the original data-set.

As part of this project, I held meeting with the business managers of the company to better understand the business requirements and value they expected me to achieve from the dataset which was shared with me. Based on these meetings, I then proposed to use classification analysis on the dataset. Some of dataset had class imbalance problem.

  • For the dataset with balance classes, I first audited the features and evaluated the impact of these features on the classes. Based on this analysis, features with missing values and those which were not impacting the prediction of the classes were removed.
  • Then various classifiers were trained, and the majority voting ensemble classifier was choosen for final classification. It achieved an accuracy of 80% on the validation dataset.
  • For dataset with class imbalance problem, oversampling was used. While the classification model had an accuracy rate of 67%, a high recall rate of 86% was achieved.
  • An executive summary of the key findings of the analysis is provided in "Final Presentation.pdf" file.

About

Worked on a classification analysis (class imbalance) for a business problem. Analysis was done using Anaconda Python.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published