Skip to content

Statistical analysis of heart disease data project completed during my enrollment in the Data Science program through Thinkful.

Notifications You must be signed in to change notification settings

coreycoole/heart_disease_analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 

Repository files navigation

Capstone-Projects

Statistical analysis of heart disease data.

The dataset used in this project comes from four different sources.

  1. Cleveland Clinic Foundation
  2. Hungarian Institute of Cardiology, Budapest
  3. V.A. Medical Center, Long Beach, CA
  4. University Hospital, Zurich, Switzerland

The raw dataset contains 76 attributes, however all published experiments refer to using a set of 14 chosen from the larger collection. Thise 14 data indicators are age, sex, chest pain type, resting blood pressure, cholesterol level, fasting blood sugar, resting electrocardiographic results maximum heart rate achieved, exercize induced angina(true/false), ST depression induced by exercize related to rest, the slope of the peak exercize ST segment, number of major vessels colored by fluorosopy, thallium test result, heart disease risk value. The attributes were gathered to study and try to predict the presence of heart disease in a patient. The risk value refers to the presence of heart disease in the patient. It is integer valued from 0 (no presence) to 4. This report does not investigate any further than how such risk values were attributed.

It is unclear however, if the patients were admitted into the hospital's care because of suspected heart disease risk or if the patients come from a more general selection pool. Thusly, these results may not reflect a wide percentage of the population at hand. It is also recognizd that this dataset was curated no later then July 1988, as such this dataset would not reflect the results of present day trends of health and culture with respect to heart disease.

With that said, this report addresses three seperate questions relating to the nature and statistical relationship between the distinct locations of the data as well as some of the attributes therein. The questions are as follows:

Question 1 - Does there seem to be an average prediction risk value of heart disease shared among the data, or does one location stand out from the rest?

Question 2 - Does the data reflect an increased risk for heart disease in older patients?

Question 3 - How probable is a patient in the dataset to have a higher than average risk of heart disease if their cholesterol level is above 200 mg/dL?

About

Statistical analysis of heart disease data project completed during my enrollment in the Data Science program through Thinkful.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published