SECOM_class_imbalance

Approaches for the class imbalance problem (in semicondutor manufacturing process line data)

Data Description

The SECOM dataset in the UCI Machine Learning Repository is semicondutor manufacturing data which has 1567 records, 590 anonymized features and 104 fails. The process yield has a simple pass/fail response (encoded -1/1).

The dataset has the following characteristics:

two-class problem
an imbalance with a 14:1 skew of pass to fails
large number of features -- 590
missing data
features/columns which do not have sufficient information
4% of the columns/features have more than 50% of their records missing
some columns have constant values

Objective

The SECOM dataset presents us with two problems: (i) working with skewed data and (ii) feature selection. The main focus for this analysis will be the class imbalance issue and the ability to successfully predict fails. Strategies used in fraud/anomaly detection/rare disease diagnosis will be useful here. A secondary objective will be feature reduction. (In some to the literature pertaining to the SECOM dataset, this was the primary goal [1].) A streamlined feature set can not only lead to better prediction accuracy and data understanding but also save manufacturing resources.

Software

Python 2.7
scikit-learn packages for algorithms
pandas for data wrangling
Matplotlib and Seaborn for plotting and visualization

Methods

We will look at some of the approaches that deal with class imbalance. These can be a cost sensitive learning approach or sampling-based. We will also be working with feature selection methods. This is a list of methods we use:

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
LICENSE		LICENSE
README.md		README.md
_config.yml		_config.yml
secomdata_gbm.ipynb		secomdata_gbm.ipynb
secomdata_ocsvm.ipynb		secomdata_ocsvm.ipynb
secomdata_rf.ipynb		secomdata_rf.ipynb
secomdata_svm_smote.ipynb		secomdata_svm_smote.ipynb
secomdata_svm_undersampling.ipynb		secomdata_svm_undersampling.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SECOM_class_imbalance

Data Description

Objective

Software

Methods

Further Reading

About

Releases

Packages

Languages

License

Meena-Mani/SECOM_class_imbalance

Folders and files

Latest commit

History

Repository files navigation

SECOM_class_imbalance

Data Description

Objective

Software

Methods

Further Reading

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages