# Medical Appointment No Shows Capstone

# 1. Problem Identification
- Identify the correct problem to solve.

___

<img src="img/doc.jpg" alt="pic" style="width: 650px;"/>

___

##  <font color='blue'>Problem Statement</font>
<strong>What are the top 5 attributes among Highlands Hospital’s patients that cause them to fail to show up for their scheduled appointments.</strong>

## Summary
<p>Did you know that - despite recent technological advancements and medical innovations - studies show that the US HealthCare system loses \$150 billion annually due to patients no shows? A 'No show' is when a patient fails to appear for their scheduled appointment or cancels last minute. Private clinics are also taking a hit on this issue Research shows that physicians can lose on average \$200 per unused time slot. Patients will have a negative impact as well since their illnesses will go untreated, their preventative services will be delayed, and their medication intake won’t be monitored </p>

## Overview
<p>As a new data analyst intern at Highlands Hospital working in the business intelligence department, I’ve been assigned to work with a team of data scientists to identify what are the reasons why patients aren’t showing up to their appointments. We will first practice on a dataset from Brazilian hospitals to analyze a couple of factors that can give us some insights on what to predict for the types of patients who don’t appear for their appointments at Highlands Hospital. 
The most important features in the dataset that will help us understand why patients aren’t showing up for their appointments: the time of day the appointment was scheduled, the text reminders, and the patient’s health conditions (diabetes, alcoholism, handicap, hypertension). The neighborhood and scholarship features will bring us some insights, but may not be beneficial to all hospitals and clinics. </p>


## Context
Highlands Hospital, a reputable hospital in the city, has been losing revenue due to patients not showing up for their scheduled appointment. The CEO is well aware of the problem and wants to reduce ‘no shows’ by determining the most and least common factors for why patients don’t show up for their appointments. 


## Criteria for Success
Build a ML model that can accurately obtain and predict the types of patients that might not appear for their scheduled appointment. The model will be applied to the data of Brazilian hospitals, which will give insights for Highlands Hospital. 


## Scope of Solution Space
Determine the top 5 important features that are useful to predict 'No-Shows'. Figure out if the text message service is helping to prevent no shows and whether or not to cancel that service from the Electronic Medical Record(EMR) Software. Build additional features out of existing data (Feature Engineering) and perform Exploratory Data Analysis.


## Constraints within Solution Space
Because we are working with only 14 features, we may not know what other factors affect the no-shows. Methods for patient outreach like email, voice messages, and portal notification are not available which would be great if those observations were in our data. We don’t have resources at the moment to gather new data in a timely manner. 


## Stakeholders to Provide Key Insight
- Russell Manning - CEO
- Stephanie Clay - CTO
- Patrick Manson - Tech Lead 

## Data Acquisition and Key Data Sources
Joni Hoppen, a Data Scientist from Aquarela Advanced Analytics, provided us with one CSV file: (data2016.csv). It is the only dataset available. 

https://www.kaggle.com/joniarroba/noshowappointments

The following table includes all the 14 features of the dataset and their description.

| | <strong>Features</strong> | <strong>Description</strong> |
|------|------:|------|
| 1 | PatientId | Identification of a patient|
| 2 | AppointmentId | Identification of each appointment |
| 3 | Gender | Male or Female|
| 4 | ScheduledDay| The day of the ‘Actual’ appointment. |
| 5 | AppointmentDay | The day someone called or registered the appointment. The day the appointment was scheduled. |
| 6 | Age | How old the patient is. | 
| 7 | Neighborhood | "Where the appointment takes place?"|
| 8 | Scholarship | (True or False) Observation https://en.wikipedia.org/wiki/Bolsa_Fam%C3%ADlia |
| 9 | Hypertension| (True or False)|
| 10 | Diabetes | (True or False)|
| 11 | Alcoholism | (True or False)|
| 12 | Handicap | (True or False)|
| 13 | SMSSent | 1 or more messages sent to the patient|
| 14 | NoShow | Did the patient show up for their appointment? (T or F)|




#### Below is a snippet of the dataset.

In [22]:
import pandas as pd
df = pd.read_csv('data2016.csv')
df.sample(5)

Unnamed: 0,PatientId,AppointmentID,Gender,ScheduledDay,AppointmentDay,Age,Neighbourhood,Scholarship,Hipertension,Diabetes,Alcoholism,Handcap,SMS_received,No-show
17146,5496198000000.0,5587429,M,2016-04-15T08:39:47Z,2016-05-12T00:00:00Z,33,RESISTÊNCIA,0,0,0,0,0,1,No
25859,6228594000.0,5661294,F,2016-05-05T07:21:31Z,2016-05-16T00:00:00Z,53,ESTRELINHA,0,0,0,0,0,0,No
46931,58916360000000.0,5705010,M,2016-05-16T17:23:19Z,2016-05-30T00:00:00Z,67,MARUÍPE,0,1,0,0,0,0,No
99618,3653988000000.0,5768153,F,2016-06-03T07:22:19Z,2016-06-07T00:00:00Z,59,GOIABEIRAS,0,1,0,0,0,1,No
70039,843841900000.0,5695687,F,2016-05-13T10:43:47Z,2016-05-16T00:00:00Z,29,SÃO PEDRO,1,0,0,0,0,0,No


In [23]:
print("There are {}(Rows) Appointments scheduled in this dataset".format(df.shape[0]))

There are 110527(Rows) Appointments scheduled in this dataset


___