This repository contains a detailed analysis of the Pima Indians Diabetes Database found on kaggle. Both predictive and descriptive analyses were performed, using various algorithms and information about Diabetes found in papers online. The document will be updated frequently, in order to implement new algorithms or ideas; thus, it can be viewed as a proof of principle of sorts.
diabetes.csv
files contains- 8 medical predictor factors: pregnancies, glucose, blood pressure, skin thickness, insulin, BMI, diabetes pedigree function and age
- One target variable: outcome
- Data from 768 female patients
*.ipynb
files are Jupyter notebooks that document the researchutils.py
contains all functions used for analysisenvironment.yml
used to create a conda environment
- Report: main analysis and discussion
To see the notebooks, run jupyter notebook
from the root directory of the project.
Special thanks to the Takeda Data Challenge, which took place in June 2018; it inspired me to work on this dataset extensively, and helped me greatly in finding my strenghts and weaknesses.