Final project in Udacity's Intro to Data Analysis course part of Data Analyst Nanodegree.
Research question: What features (columns) are important to predict whether a patient is not going to show up to their scheduled medical appointment?
This analysis contains EDA, data wrangling, and hyperparameter tuning of a GBM model. It is all done in Python using sklearn, matplotlib, pandas and numpy.
Access the Jupyter notebook by clicking on "project_submission.ipynb" or click https://github.com/AmmarJawad/No-show-Medical-Appointments/blob/master/project_submission.ipynb
Original dataset can be downloaded here https://www.kaggle.com/joniarroba/noshowappointments/data
Udacity's version of the dataset (which this analysis has been conducted on) can be downloaded here https://www.google.com/url?q=https://d17h27t6h515a5.cloudfront.net/topher/2017/October/59dd2e9a_noshowappointments-kagglev2-may-2016/noshowappointments-kagglev2-may-2016.csv&sa=D&ust=1513377859161000&usg=AFQjCNELJtHRQ9r28kGlBHv9nIUVIMalkQ