In [None]:
!pip install bubbly 

In [None]:
# Required Python Libraries
"""
Input/Output: os
Data Manipulation: pandas
Visualization: matplotlib, seaborn
Statistical Analysis: numpy, scipy, statsmodels
"""
import os
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
import seaborn as sns
import scipy.stats as sps
import statsmodels as sm
import statsmodels.formula.api as smf
# import basic libraries

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import plotly.express as px
import plotly.offline as py
import bubbly

In [None]:
# Define File Path : Replace xxxxx with appropriate File Path
file_path = r'diabetes_prediction_dataset.csv'

# Import & Read Dataset
df = pd.read_csv(file_path)

# Display Dataset Information
df.head()

In [None]:
fig = px.scatter(df, x="bmi", y="blood_glucose_level", animation_frame="age", animation_group="gender",
           size="age", color="smoking_history", hover_name="smoking_history", facet_col="smoking_history",
           log_x=True, size_max=60, range_x=[min(df['bmi']), max(df['bmi'])], range_y=[min(df['blood_glucose_level']), max(df['blood_glucose_level'])])
fig.show()

**OBJECTIVE:**
1. create an animated scatter plot using the Plotly Express library in Python, visualizing the relationship between BMI (Body Mass Index) and blood glucose levels in a diabetes prediction dataset. 
2. The animation is based on the "age" variable, and the data points are grouped and colored by "smoking_history." Additionally, the scatter plot is organized into facet columns based on "smoking_history," and the size of the points is determined by the "age" variable. 
3. The plot is configured with logarithmic scaling on the x-axis, and specific ranges are set for both the x and y axes. It aims to provide a dynamic and insightful visualization of how BMI, blood glucose levels, age, gender, and smoking history interplay in the diabetes prediction dataset.


**ANALYSIS:**

1. **BMI:** The BMI values range from 9.0 to 80.0, with a majority of the values falling between 20 and 60. There are some extreme values, such as 80.0, which might be considered outliers. The distribution of BMI values appears to be right-skewed, with a higher concentration of lower BMI values.
     

2. **Age:** The age of the patients ranges from 1.48 to 88 years. The distribution of ages is relatively even, with no apparent skewness.
     
     
3. **Blood Glucose Level:** The blood glucose level values range from 0.08 to 52.0. The distribution of blood glucose level values appears to be right-skewed, with a higher concentration of lower values.
      
      
4. **Smoking History:** The smoking history variable has five categories: "never," "No Info," "current," "former," and "not current." The "never" category has the highest number of patients, followed by "No Info" and "former." The "current" and "not current" categories have the lowest number of patients.

**Conclusion:** The given data shows a wide range of BMI, age, and blood glucose level values. The distribution of BMI and blood glucose level values appears to be right-skewed, with a higher concentration of lower values. The smoking history variable has five categories, with the "never" category having the highest number of patients. The analysis of these variables can provide valuable insights into the relationship between BMI, age, blood glucose level, and smoking history in patients with heart conditions.


**MANAGERIAL IMPLICATION:**

1. **Targeted screening and interventions:** Prioritize screening for diabetes and metabolic risks in patients with higher BMIs (especially above 25), regardless of age or gender. Implement proactive interventions, such as lifestyle modifications and dietary counseling, for at-risk individuals to prevent or delay the onset of diabetes.

2. **Smoking cessation initiatives:** Emphasize the importance of smoking cessation programs for heart patients, considering the higher prevalence of diabetes among current and former smokers. Offer comprehensive tobacco cessation support and resources to encourage behavioral change.

3. **Age-specific care strategies:** Develop care plans that address the specific needs of patients across different age groups. Elderly patients with high BMIs and blood glucose levels might require closer monitoring and medication adjustments for effective diabetes management.

4. **Community-based outreach programs:** Invest in programs that promote healthy lifestyle choices and raise awareness about diabetes prevention, particularly targeting communities with high percentages of individuals with overweight or obesity.


5. **Policy and environmental changes:** Advocate for policies that promote healthy eating and physical activity in public spaces, such as schools and workplaces. Encourage healthier food options and discourage sugary drinks to create a more supportive environment for diabetes prevention.

In [None]:
import plotly.express as px

# Sorting the data by 'age' (or a relevant column)
df_sorted = df.sort_values(by='age')

# Creating the animated bubble plot without the 'diabetes' size parameter and 'smoking_history' parameter
fig = px.scatter(df_sorted, x='bmi', y='blood_glucose_level',
                 animation_frame='age', title='BMI vs Blood Glucose Level vs Age',
                 labels={'Explicit': 'Explicit'}, log_x=False)

# Updating layout with play/pause buttons
fig.update_layout(updatemenus=[dict(type='buttons', showactive=False,
                                    buttons=[dict(label='Play',
                                                  method='animate',
                                                  args=[None, dict(frame=dict(duration=1000, redraw=True),
                                                                  fromcurrent=True)]),
                                            dict(label='Pause',
                                                  method='animate',
                                                  args=[[None], dict(frame=dict(duration=300, redraw=True),
                                                                    mode='immediate',
                                                                    transition=dict(duration=100))])])],
                 xaxis_title="BMI", yaxis_title="Blood Glucose Level",
                 xaxis=dict(type='linear'), height=600)

# Display the plot
fig.show()


**OBJECTIVE:**
Create an animated bubble plot using Plotly Express. The plot visualizes the relationship between BMI (Body Mass Index) and Blood Glucose Level across different ages. The data is sorted by age, and the animation is based on the age variable. Each frame of the animation represents a different age group, and the bubbles move to show how the BMI and Blood Glucose Level change with age.


**ANALYSIS:**

1. There appears to be a positive correlation between BMI and blood glucose level. This means that as BMI increases, blood glucose level also tends to increase.

2. The data points are spread out across the graph, indicating that there is variability in the relationship between BMI and blood glucose level. This means that not everyone with a high BMI will have a high blood glucose level, and vice versa.

3. There is a steeper increase in blood glucose level at higher BMI values. This suggests that the relationship between BMI and blood glucose level may not be linear.


**MANAGERIAL IMPLICATION:**

1. **Incentivize preventative care:** Develop insurance plans that incentivize routine health check-ups, BMI and blood glucose level monitoring, and participation in preventative programs for at-risk individuals. This can help identify and manage risk factors early, potentially reducing future healthcare costs associated with diabetes complications.

2. **Targeted coverage for high-risk groups:** Consider offering insurance plans with lower premiums or co-pays for patients with higher BMIs or identified risk factors for diabetes, encouraging them to seek regular care and manage their health proactively.

3. **Individual-level awareness:** Increase public awareness about the relationship between BMI, smoking, and diabetes risk through informative campaigns and educational resources. Encourage individuals to maintain a healthy weight, adopt healthy lifestyle habits, and seek regular health check-ups to monitor their risk factors.

4. **Personal responsibility and healthy choices:** Promote individual responsibility and empower people to make informed choices about their health through access to clear information, support systems, and resources for maintaining a healthy weight, diet, and physical activity levels.

In [None]:
import plotly.express as px

# Assuming your DataFrame is named df
fig = px.line(df, x="HbA1c_level", y="blood_glucose_level", color="smoking_history",
              line_group="gender", animation_frame="bmi", facet_col="smoking_history")
fig.show() 

**OBJECTIVE:**
1. Visualize the relationship between HbA1c levels, blood glucose levels, BMI (Body Mass Index), and smoking history within the context of the diabetes prediction dataset.
2. The line chart is designed to provide insights into how variations in HbA1c levels, blood glucose levels, and BMI impact each other, while also considering the influence of smoking history. 
3. The animation through different BMI values allows for a dynamic exploration of these relationships, while the facetting based on smoking history provides a clearer comparison across different smoking categories. Overall, the visualization aims to uncover patterns and trends that may contribute to a better understanding of factors influencing diabetes risk within the dataset.



**ANALYSIS:**

1. We can observe that blood glucose levels are generally higher for individuals with higher HbA1c_level values, which suggests that individuals with a higher HbA1c_level are at a higher risk of developing blood glucose issues.

2. The line chart is further faceted by smoking history categories, which means that there is a separate chart for each smoking history category. By comparing these charts, we can see that blood glucose levels tend to be higher for individuals who smoke or have smoked in the past.

3. There appears to be a steeper slope for the data points in the higher HbA1c range (>7%), suggesting that small changes in HbA1c in this range may be associated with larger changes in blood glucose level.


**MANAGERIAL IMPLICATION:**

1. **Targeted HbA1c monitoring:** Prioritize regular HbA1c monitoring for patients with known diabetes or risk factors, especially those with HbA1c levels above 7%. This allows for early detection of blood glucose imbalances and prompt intervention.

2. **Regular HbA1c monitoring:** Understand the importance of regular HbA1c testing, especially if you have risk factors for diabetes or elevated blood glucose levels. Stay informed about your HbA1c status and discuss it with your healthcare provider.

3. **Tiered coverage based on HbA1c:** Consider offering tiered coverage plans with lower premiums or co-pays for patients with lower HbA1c levels, encouraging them to maintain good glycemic control. This can incentivize healthy habits and reduce healthcare costs.

In [None]:
import plotly.express as px

# Assuming your DataFrame is named df
fig = px.area(df, x="age", y="blood_glucose_level", color="smoking_history",
              animation_frame="bmi", facet_col="smoking_history")
fig.show()

**OBJECTIVE:**
1. Identification of age-related patterns in blood glucose levels for different smoking history categories.
2. Understanding how BMI contributes to variations in blood glucose across age groups and smoking histories.
3. Visualizing potential differences in blood glucose trajectories among never smokers, former smokers, current smokers, and those with incomplete smoking history information.
4. Observing the impact of BMI variations on the dynamics of blood glucose levels within each smoking history subgroup.
5. Overall objective is to gain comprehensive insights into the interplay between age, blood glucose levels, BMI, and smoking history, with a focus on dynamic changes over different BMI values.

**ANALYSIS:**

The visualization shows five stacked area graphs, each representing a different smoking history category: never smoked, former smoker, current smoker, not current smoker, and no information available.
For all smoking history categories, blood glucose levels generally tend to increase with age. However, the rate of increase appears to be steeper for current smokers and former smokers compared to never smokers and those with no smoking history.

There is some variability in blood glucose levels within each smoking history category, suggesting that other factors besides smoking history may also influence blood glucose levels.
Specific observations by smoking history category:

**Never smokers:** The area graph for never smokers shows a relatively steady increase in blood glucose levels with age, starting from around 25 mg/dL at age 20 and reaching around 65 mg/dL at age 80.

**Former smokers:** The area graph for former smokers shows a similar pattern to never smokers, but with slightly higher blood glucose levels at most ages. For example, at age 50, former smokers have an average blood glucose level of around 45 mg/dL, while never smokers have an average of around 40 mg/dL.

**Current smokers:** The area graph for current smokers shows the steepest increase in blood glucose levels with age. At age 50, current smokers have an average blood glucose level of around 55 mg/dL, which is significantly higher than never smokers and former smokers at the same age.

**Not current smokers:** The area graph for not current smokers shows a pattern that is intermediate between current smokers and former smokers. The blood glucose levels start out similar to current smokers but then begin to plateau or even decrease slightly in later ages.

**No information:** The area graph for participants with no information on smoking history shows a similar pattern to never smokers, but with a wider range of blood glucose levels. This could be due to the fact that this group is more heterogeneous and may include people with a variety of smoking histories.

**Conclusion:**
The BMI value at the bottom of the plot likely refers to the average BMI for the participants in the study.
Overall, it provides a good overview of the relationship between smoking history, age, and blood glucose levels. 
The data suggests that smoking, particularly current smoking, is associated with higher blood glucose levels. 
However, it is important to note that this is just an observational study and does not prove cause and effect. 
More research is needed to understand the full range of factors that influence blood glucose levels.

**MANAGERIAL IMPLICATIONS:**

1. **Occupational Health Programs:** Organizations can consider integrating insights from this analysis into occupational health programs. This may involve routine health screenings, health risk assessments, and lifestyle interventions targeted at employees with specific risk factors identified in the analysis.

2. **Research and Further Investigations:** Managers in research institutions or healthcare organizations may see the need for further investigations. This analysis serves as an observational study, and causal relationships are not established. Allocating resources for in-depth research exploring the interplay of various factors influencing blood glucose levels, beyond smoking history, can contribute to a more comprehensive understanding of diabetes risk factors.

3. **Targeted Interventions for Smokers:** Organizations focusing on smoking cessation programs may find this analysis valuable. It suggests that addressing smoking habits, especially among current smokers, could potentially contribute to better blood glucose management. Employers or insurers might consider offering support programs for smoking cessation as part of employee wellness initiatives, potentially impacting long-term health outcomes.

4. **Educational Campaigns:** The findings can inform public health campaigns and educational initiatives. Raising awareness about the link between smoking, aging, and blood glucose levels may encourage behavioral changes and preventive measures. Campaigns can target specific demographics, such as current smokers, with messages emphasizing the importance of regular health check-ups and lifestyle modifications to manage blood glucose.


In [None]:
import plotly.express as px

# Assuming your DataFrame is named df
fig = px.area(df, x="age", y="blood_glucose_level", color="gender",
              animation_frame="bmi", facet_col="gender")
fig.show()

**OBJECTIVE:**
It visualize the dynamic changes in blood glucose levels concerning age and BMI categories across different genders in the diabetes prediction dataset. The area plot, animated by BMI, aims to illustrate the distribution and variations in blood glucose levels for distinct age groups and genders. This visualization allows for an exploration of how these variables interact over time, offering insights into potential patterns or trends within the dataset.


**ANALYSIS:**
1. The scatter plot shows a positive correlation between age and blood glucose level. This means that as age increases, blood glucose level also tends to increase.

2. The data points are spread out across the graph, indicating variability in the relationship between age and blood glucose. Not everyone at a given age has the same blood glucose level.

3. There appears to be a steeper slope for the data points in the higher age range (>60 years old), suggesting that age may have a more pronounced effect on blood glucose levels in older adults.

4. Overall, The data suggests that age is a risk factor for higher blood glucose levels, especially in older adults. Additionally, gender might play a role in blood glucose levels at similar ages.


**MANAGERIAL IMPLICATION:**
1. **Holistic Health Assessments:** To enhance the understanding of factors influencing blood glucose levels, managers should consider implementing holistic health assessments. Beyond age and gender, incorporating information on variables such as BMI categories, smoking history, medication use, and family history of diabetes can provide a more comprehensive view. This approach can guide personalized healthcare strategies and interventions.

2. **Collaborative Care Models:** Understanding the limitations of the analysis, managers can promote collaborative care models that involve multidisciplinary teams. Integrating data from various health dimensions and leveraging the expertise of healthcare professionals can enhance the accuracy of risk assessments and enable more informed decision-making for patient care.

3. **Health Education Programs:** Organizations can implement health education programs to raise awareness about the multifaceted nature of blood glucose regulation. Educating individuals about lifestyle factors, the importance of regular health screenings, and the role of genetics can empower them to make informed choices for better health outcomes.

4. **Technology Integration:** Leveraging technology for continuous monitoring and data collection can provide real-time insights into individual health profiles. Managers can explore the integration of wearable devices and health apps to track and manage blood glucose levels, fostering a proactive approach to health management.

5. **Research and Continuous Improvement:** Encouraging ongoing research and continuous improvement initiatives is vital. Managers can support initiatives that investigate the nuanced factors influencing blood glucose levels, facilitating the development of evidence-based strategies and interventions.

https://www.kaggle.com/ (Kaggle) For Dataset-diabetes_prediction_dataset.csv

https://chat.openai.com/ (Chatgpt) For Code Correction and Assistance