Exploratory analysis on imbalanced Medical Datasets

Abstract

In this project I evaluate and compare different methods of Semi-supervised learning on three different datasets. For the evaluation I record the Accuracy, F1-score, Running Time and the ROC plot for each algorithm on different percentages of unlabelled data. Since the datasets are completely labelled, I use the fully labelled data as a baseline for the different semi-supervised algorithms, then I unlabelled some percentage of the labelled data and evaluate the performance of the algorithms again. I do this for 10%, 20%, 50%, 90% and 95% unlabelled data and record the performance of the algorithms on the same.

Datasets

3 datasets were used in this experiment

1. Online Shoppers Purchasing Intention Dataset Data Set

From Sakar, C.O., Polat, S.O., Katircioglu, M. et al. Neural Comput & Applic (2018).
Dataset Location -> https://archive.ics.uci.edu/ml/datasets/Online+Shoppers+Purchasing+Intention+Dataset

2. Customer Personality Analysis

Dataset Location -> https://www.kaggle.com/imakash3011/customer-personality-analysis/version/1

3. Heart Disease UCI

Dataset Location -> https://www.kaggle.com/ronitf/heart-disease-uci

Semi-supervised Algorithms Used

Self-Training

Semi-Boost

SemiBoost Boosting for Semi supervised Learning: Pavan Kumar Mallapragada, Student Member, IEEE, Rong Jin, Member, IEEE, Anil K. Jain, Fellow, IEEE, and Yi Liu, Student Member, IEEE [Reference->https://github.com/papabloblo/semi_boost/tree/master/src]

Majority Voting Classifier

Ensemble Learning with Voting Aggregation for Semi-supervised Classification Tasks: Matheus Alves; Ana L. C. Bazzan; Mariana Recamonde-Mendoza - Unsupervised Preprocessing Using Autoencoders

Requirements

The requirements for this project are given in the requirements.txt file, to install the requirements run the below command:

pip install -r requirements.txt

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
data		data
files		files
Project.ipynb		Project.ipynb
README.md		README.md
SelfTraining.py		SelfTraining.py
SemiBoost.py		SemiBoost.py
UnsupervisedPretrainClassifier.py		UnsupervisedPretrainClassifier.py
Voting.py		Voting.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Exploratory analysis on imbalanced Medical Datasets

Abstract

Datasets

1. Online Shoppers Purchasing Intention Dataset Data Set

2. Customer Personality Analysis

3. Heart Disease UCI

Semi-supervised Algorithms Used

Self-Training

Semi-Boost

Majority Voting Classifier

Requirements

About

Releases

Packages

Languages

Nihal987/Exploratory-analysis-on-imbalanced-Medical-Datasets

Folders and files

Latest commit

History

Repository files navigation

Exploratory analysis on imbalanced Medical Datasets

Abstract

Datasets

1. Online Shoppers Purchasing Intention Dataset Data Set

2. Customer Personality Analysis

3. Heart Disease UCI

Semi-supervised Algorithms Used

Self-Training

Semi-Boost

Majority Voting Classifier

Requirements

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages