Skip to content

Diabetes is one of the most prevalent chronic diseases in the US, impacting millions of Americans each year. The objective of this investigation is to identify what risk factors are most predictive or prevalent in the diabetes population, in order to halt the persistent increase.

Notifications You must be signed in to change notification settings

SindiAI/DiabetesIndicators

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 

Repository files navigation

Diabetes Indicators Analysis

Diabetes is a chronic disease that affects millions of Americans every year. It is a condition where the body is unable to properly process glucose (sugar), resulting in high blood sugar levels that can lead to a variety of complications. The objective of this investigation is to identify the risk factors that are most predictive or prevalent in the diabetes population, in order to help prevent the persistent increase of this disease.

Data

Datasets for this diabetes health indicators analysis consisted of 253,680 individuals.Individuals were asked a series of questions pertaining to common health conditions,lifestyle choices, and demographics. Once merged, the information was analyzed to find commonalities and variations in the diabetic and nondiabetic populations of the sample.

Goal

The goal in this project is to explore some of the following research questions:

  • Can survey questions from the data in this project provide accurate predictions of whether an individual has diabetes?
  • What risk factors are most predictive of diabetes risk?
  • Can we use a subset of the risk factors to accurately predict whether an individual has diabetes?
  • Can we create a short form of questions from the datasets in this task using feature selection to accurately predict if someone might have diabetes or is at high risk of diabetes, based on the analysis one will conduct?

In this repository, we focused in what risk factors are most predictive of diabetes risk.

SQL Portion

During the SQL portion, important questions were answered about general physicalhealth (difficulty walking, diabetes, physical illness, BMI, etc.), lifestyle choices (smoking), and demographic information (education, income, sex) in our dataset. SQL image

Tableau Portion

Over the past four decades, the global incidence of diabetes has seen a significant increase. In the United States, diabetes is a prevalent chronic condition that affects a significant portion of the population. The dashboard presents an analysis of diabetes indicators, including age range and gender, to provide a comprehensive understanding of this disease. Click here

Deep Analysis

Deep dive analysis allowed for a more in-depth understanding of the dataset. Summary statistics provided useful information about the continuous data including means, quartiles, and possible outliers. Data cleaning organized the dataset into a more intuitive format which then facilitated the distribution analysis of the two groups (diabetic vs. nondiabetic). Cross-correlation analysis allowed for the presence of certain relationships to be recognized and the significance of each. High blood pressure, high cholesterol, and difficulty walking appear to be the columns that differ the most in diabetics and nondiabetics.

About

Diabetes is one of the most prevalent chronic diseases in the US, impacting millions of Americans each year. The objective of this investigation is to identify what risk factors are most predictive or prevalent in the diabetes population, in order to halt the persistent increase.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published