# What do your blood sugars tell you?

## 📖 Background

Diabetes mellitus remains a global health issue, causing several thousand people to die each day from this single condition. Finding and avoiding diabetes in the earlier stages can help reduce the risk of serious health issues such as circulatory system diseases, kidney malfunction, and vision loss. This competition involves developing a predictive model for effectively detecting potential Diabetes cases, ideally, before commencing preventive treatment.


## 🔎 Key Findings
- The dataset contains

## 💾 The data

The dataset contains diagnostic measurements that are associated with diabetes, which were collected from a population of Pima Indian women. The data includes various medical and demographic attributes, making it a well-rounded resource for predictive modeling.

The columns and Data Types are as follows:


| Column Name                  | Data Type   | Description |
| :----------------            | :------     | :---- |
| Pregnancies                  | Numerical   | Number of times the patient has been pregnant. |
| Glucose                      | Numerical   | Plasma glucose concentration a 2 hours in an oral glucose tolerance test. |
| BloodPressure                | Numerical   | Diastolic blood pressure (mm Hg). |
| SkinThickness                | Numerical   | Triceps skinfold thickness (mm).  |
| Insulin                      | Numerical   | 2-Hour serum insulin (mu U/ml). |
| BMI                          | Numerical   | Body mass index (weight in kg/(height in m)^2).. |
| DiabetesPedigreeFunction     | Numerical   | A function that represents the likelihood of diabetes based on family history. |
| SkinThickness                | Numerical   | Triceps skinfold thickness (mm).  |
| Age                          | Numerical   | Age of the patient in years. |
| Outcome           | Binary (Categorical)   | Class variable (0 or 1) indicating whether the patient is diagnosed with diabetes. 1 = Yes, 0 = No.  |


In [63]:
# Import Python's packages
import pandas as pd
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
from scipy.stats import linregress

In [64]:
# Load and display the data into a DataFrame
data = pd.read_csv('data/diabetes.csv')
data.head()

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
0,6,148,72,35,0,33.6,0.627,50,1
1,1,85,66,29,0,26.6,0.351,31,0
2,8,183,64,0,0,23.3,0.672,32,1
3,1,89,66,23,94,28.1,0.167,21,0
4,0,137,40,35,168,43.1,2.288,33,1


In [65]:
# Print the data information (Columns, Non-Null Count, Data Types)
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 768 entries, 0 to 767
Data columns (total 9 columns):
 #   Column                    Non-Null Count  Dtype  
---  ------                    --------------  -----  
 0   Pregnancies               768 non-null    int64  
 1   Glucose                   768 non-null    int64  
 2   BloodPressure             768 non-null    int64  
 3   SkinThickness             768 non-null    int64  
 4   Insulin                   768 non-null    int64  
 5   BMI                       768 non-null    float64
 6   DiabetesPedigreeFunction  768 non-null    float64
 7   Age                       768 non-null    int64  
 8   Outcome                   768 non-null    int64  
dtypes: float64(2), int64(7)
memory usage: 54.1 KB


In [66]:
data.describe()

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
count,768.0,768.0,768.0,768.0,768.0,768.0,768.0,768.0,768.0
mean,3.845052,120.894531,69.105469,20.536458,79.799479,31.992578,0.471876,33.240885,0.348958
std,3.369578,31.972618,19.355807,15.952218,115.244002,7.88416,0.331329,11.760232,0.476951
min,0.0,0.0,0.0,0.0,0.0,0.0,0.078,21.0,0.0
25%,1.0,99.0,62.0,0.0,0.0,27.3,0.24375,24.0,0.0
50%,3.0,117.0,72.0,23.0,30.5,32.0,0.3725,29.0,0.0
75%,6.0,140.25,80.0,32.0,127.25,36.6,0.62625,41.0,1.0
max,17.0,199.0,122.0,99.0,846.0,67.1,2.42,81.0,1.0


##### 1. How many are diagnosed with diabetes?

In [67]:
# Calculate the total number of people diagnosed with diabetes
diagnosed_with_diabetes = data['Outcome'].value_counts()[1]
print("Total number of people diagnosed with diabetes:", diagnosed_with_diabetes)

print(data['Outcome'].value_counts())

Total number of people diagnosed with diabetes: 268
Outcome
0    500
1    268
Name: count, dtype: int64


##### 2. BMI Classifications
- Underweight: BMI < 18.5
- Normal weight: 18.5 ≤ BMI < 24.9
- Overweight: 25 ≤ BMI < 29.9
- Obesity (Class 1): 30 ≤ BMI < 34.9
- Obesity (Class 2): 35 ≤ BMI < 39.9
- Extreme Obesity (Class 3): BMI ≥ 40

In [68]:
# Define the BMI classification function
def bmi_classification(value):
    if value < 18.5:
        return "Underweight"
    
    elif 18.5 <= value < 24.9:
        return "Normal Weight"
    
    elif 25 <= value < 29.9:
        return "Overweight"
    
    elif 30 <= value <34.9:
        return "Obesity (Class 1)"
    
    elif 35 <= value < 39.9:
        return "Obesity (Class 2)"
    else:
        return "Extreme Obesity"
    
# Apply the BMI classification to the 'BMI' column
data['BMI_Class'] = data['BMI'].apply(bmi_classification)

# Tally the BMI classifications
tally_bmi = data['BMI_Class'].value_counts()

# Display the tally
print(tally_bmi)

BMI_Class
Obesity (Class 1)    218
Overweight           174
Obesity (Class 2)    147
Extreme Obesity      113
Normal Weight        101
Underweight           15
Name: count, dtype: int64


##### 2. Blood Pressure Classifications
- Normal Blood Pressure: Less than 80
- High Blood Pressure Stage 1: 80 - 89
- High Blood Pressure Stage 2: 90 above
- Hypertensive Crisis: Higher than 120

In [69]:
# Define the Blood Pressure classification function
def bp_classification(value):
    if value < 80:
        return "Normal"
    
    elif 80 <= value <= 89:
        return "High Blood Pressure Stage 1"
    
    elif value >= 90 :
        return "High Blood Pressure Stage 2"
    
    else:
        return "Hypertensive Crisis"
    
# Apply the Blood Pressure classification to the 'BloodPressure' column
data['BP_Class'] = data['BloodPressure'].apply(bmi_classification)

# Tally the counts of each classification
tally_bp = data['BP_Class'].value_counts()
print(tally_bp)


BP_Class
Extreme Obesity      729
Underweight           35
Obesity (Class 1)      2
Normal Weight          1
Obesity (Class 2)      1
Name: count, dtype: int64


In [70]:

bmi_classification_list = ["Underweight", "Normal Weight", "Overweight", "Obesity (Class 1)", "Obesity (Class 2)", "Extreme Obesity"]

for item in list:
    overweight_data = data[data['BMI_Class'] == {item}]

    # Count the number of outcomes equal to 1 in the filtered data
    overweight_diabetes = overweight_data['Outcome'].sum()

    # Display the count
    print(f"Number of diabetes and {item}: {outcome_count}")





# Filter data for 'Overweight' classification
overweight_data = data[data['BMI_Class'] == 'Overweight']


# Count the number of outcomes equal to 1 in the filtered data
overweight_diabetes = overweight_data['Outcome'].sum()

# Display the count
print(f"Number of diabetes and overweight: {outcome_count}")

TypeError: 'type' object is not iterable