Skip to content

LucDemortier/HeartDiseaseStudy

Repository files navigation

HeartDiseaseStudy

For this project I applied a logistic regression model to the Cleveland Heart Disease data set.

This was my "Project McNulty" in the Spring 2015 Metis Data Science Boot Camp.

See blog post at lucdemortier.github.io for a description of the results.

iPython notebooks and other files used to generate the results and plots for the McNulty project:

  1. convert_ssv_to_csv.py: Converts a file with space-separated values into a file with comma-separated values.

  2. join_files.py: Joins files downloaded from the UC Irvine Machine Learning Repository into a single file for processing by the iPython notebook below.

  3. KNearestNeighbors.py: Short program to select features by maximizing the accuracy of K-nearest neighbors classifier.

  4. HeartDiseaseProject.ipynb: iPython notebook to read in the data, store them in a Pandas dataframe for initial processing and plots, and analyze with a logistic regression model. Cells at the end of the notebook investigate naive Bayes, support vector machine, decision tree, and random forest classifiers to select features that maximize accuracy. These methods were not pursued further however.

The initial processing steps of this study are as follows:

curl -o data/cleveland14.csv https://archive.ics.uci.edu/ml/machine-learning-databases/heart-disease/processed.cleveland.data

curl -o data/hungarian14r.ssv https://archive.ics.uci.edu/ml/machine-learning-databases/heart-disease/reprocessed.hungarian.data

curl -o data/switzerland14.csv https://archive.ics.uci.edu/ml/machine-learning-databases/heart-disease/processed.switzerland.data

curl -o data/long_beach_va14.csv https://archive.ics.uci.edu/ml/machine-learning-databases/heart-disease/processed.va.data

python convert_ssv_to_csv.py hungarian14r

python join_files.py

The output of join_files.py is file data/heart_disease_all14.csv and is ready for processing by HeartDiseaseProject.ipynb.

About

A study of the Cleveland Heart Disease data set using logistic regression

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published