Skip to content

Patients readmitted rate, which is a supervised learning classification problem. Using a combination of models (KNN, SVM, Decision Tree, Perceptron and Naïve Bayes algorithms to deal with the real-world datasets from UCI machine learning lab. After cleaning and hyper tuning, ensemble results and get the best accuracy.

Notifications You must be signed in to change notification settings

MarcoXM/Readmittedrate

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

46 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Readmitted-Rate

Introduction

Patients readmitted rate, which is a supervised learning classification problem.

Using a combination of models (KNN, SVM, Decision Tree, Perceptron and Naïve Bayes algorithms) to deal with the real-world datasets from UCI machine learning lab.

Including:

EDA
Data Cleaning
Feature Seletion
Models tuning
Ensemble
and get the best accuracy about the readmitted rate.

Implementation:

Put these file { Smote_BackwardWrapper_Pipeline.py,

ForwardWrapper_Smote.py,

Results for the test set.py,

GridSearch Main Version Updated.py,

Base_Scoring_Algorithm (Corr, Chi2, Mutual_Info, C4_5 Importance).py,

New_train_set.csv,

New_test_set.csv } in the same location.

Filter Method:

Base_Scoring_Algorithm

(Corr, Chi2, Mutual_Info, C4_5 Importance).py is for feature selection:

use to calculate correlation, chi-square, mutual information between variables and target label

GridSearch Main Version Updated.py:

Using the results of Scoring to do feature selection with filter method; Tuning different models with selected features using gridsearch

###Results for the test set.py: Compute different accuracy score of test set with best tuned parameters

Wrapper Method:

Fordward wrapper to find the global best collection of features.

ForwardWrapper_Smote.py:

Input training set data and return selected features as csv

Smote_BackwardWrapper_Pipeline.py

Take advantages of pipeline to do SMOTE, backward wrapper and tuning;

Compute different accuracy score of test set with best tuned parameters;

Save best parameters, test scores, and feature selected as csv

About

Patients readmitted rate, which is a supervised learning classification problem. Using a combination of models (KNN, SVM, Decision Tree, Perceptron and Naïve Bayes algorithms to deal with the real-world datasets from UCI machine learning lab. After cleaning and hyper tuning, ensemble results and get the best accuracy.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 98.9%
  • Python 1.1%