# What do your blood sugars tell you?

## 📖 Background

Diabetes mellitus remains a global health issue, causing several thousand people to die each day from this single condition. Finding and avoiding diabetes in the earlier stages can help reduce the risk of serious health issues such as circulatory system diseases, kidney malfunction, and vision loss. This competition involves developing a predictive model for effectively detecting potential Diabetes cases, ideally, before commencing preventive treatment.


## 🔎 Key Findings
- The dataset contains

## 💾 The data

The dataset contains diagnostic measurements that are associated with diabetes, which were collected from a population of Pima Indian women. The data includes various medical and demographic attributes, making it a well-rounded resource for predictive modeling.

The columns and Data Types are as follows:


| Column Name                  | Data Type   | Description |
| :----------------            | :------     | :---- |
| Pregnancies                  | Numerical   | Number of times the patient has been pregnant. |
| Glucose                      | Numerical   | Plasma glucose concentration a 2 hours in an oral glucose tolerance test. |
| BloodPressure                | Numerical   | Diastolic blood pressure (mm Hg). |
| SkinThickness                | Numerical   | Triceps skinfold thickness (mm).  |
| Insulin                      | Numerical   | 2-Hour serum insulin (mu U/ml). |
| BMI                          | Numerical   | Body mass index (weight in kg/(height in m)^2).. |
| DiabetesPedigreeFunction     | Numerical   | A function that represents the likelihood of diabetes based on family history. |
| SkinThickness                | Numerical   | Triceps skinfold thickness (mm).  |
| Age                          | Numerical   | Age of the patient in years. |
| Outcome           | Binary (Categorical)   | Class variable (0 or 1) indicating whether the patient is diagnosed with diabetes. 1 = Yes, 0 = No.  |


In [2]:
# Import Python's packages
import pandas as pd
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
from scipy.stats import linregress

In [3]:
# Load and display the data into a DataFrame
data = pd.read_csv('data/diabetes.csv')
data.head()

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
0,6,148,72,35,0,33.6,0.627,50,1
1,1,85,66,29,0,26.6,0.351,31,0
2,8,183,64,0,0,23.3,0.672,32,1
3,1,89,66,23,94,28.1,0.167,21,0
4,0,137,40,35,168,43.1,2.288,33,1


In [4]:
# Print the data information (Columns, Non-Null Count, Data Types)
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 768 entries, 0 to 767
Data columns (total 9 columns):
 #   Column                    Non-Null Count  Dtype  
---  ------                    --------------  -----  
 0   Pregnancies               768 non-null    int64  
 1   Glucose                   768 non-null    int64  
 2   BloodPressure             768 non-null    int64  
 3   SkinThickness             768 non-null    int64  
 4   Insulin                   768 non-null    int64  
 5   BMI                       768 non-null    float64
 6   DiabetesPedigreeFunction  768 non-null    float64
 7   Age                       768 non-null    int64  
 8   Outcome                   768 non-null    int64  
dtypes: float64(2), int64(7)
memory usage: 54.1 KB


In [4]:
data.describe()

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
count,768.0,768.0,768.0,768.0,768.0,768.0,768.0,768.0,768.0
mean,3.845052,120.894531,69.105469,20.536458,79.799479,31.992578,0.471876,33.240885,0.348958
std,3.369578,31.972618,19.355807,15.952218,115.244002,7.88416,0.331329,11.760232,0.476951
min,0.0,0.0,0.0,0.0,0.0,0.0,0.078,21.0,0.0
25%,1.0,99.0,62.0,0.0,0.0,27.3,0.24375,24.0,0.0
50%,3.0,117.0,72.0,23.0,30.5,32.0,0.3725,29.0,0.0
75%,6.0,140.25,80.0,32.0,127.25,36.6,0.62625,41.0,1.0
max,17.0,199.0,122.0,99.0,846.0,67.1,2.42,81.0,1.0


##### 1. How many are diagnosed with diabetes?

In [13]:
# Calculate the total number of people diagnosed with diabetes
diagnosed_with_diabetes = data['Outcome'].value_counts()[1]
print("Total number of people diagnosed with diabetes:", diagnosed_with_diabetes)

Total number of people diagnosed with diabetes: 768
