Installation of Python 3.10. 4, Numpy, Pandas, Matplotlib, Seaborn, and Scikit Learn libraries are sufficient to run the codes in the notebook.
I was interested in creating a predictive model that could predict the likelihood of someone having Diabetes Mellitus (DM). From a public health lens, creating such a model can be used to identify populations that are at high risk of getting DM so that preventative measures can be taken to reduce the incidence of the disease in those populations.
The dataset utilized for this project is originally from the National Institute of Diabetes and Digestive and Kidney Diseases. This data was sources from the 'Diabetes Dataset' by Mehmet Akturk. The dataset can be accessed via the Kaggle platform, at the following link.
There is one note book file that contains the exploratory data analysis and model building. Markdown cells were used to assist in walking through the process for individual steps.
The main findings and answers to the questions can be found on a blog post here
Must give credit to Kaggle for the data. You can find the Licensing for the data and other descriptive information at the Kaggle link available here.