# Reducing hospital readmissions

## 📖 Background
You work for a consulting company helping a hospital group better understand patient readmissions. The hospital gave you access to ten years of information on patients readmitted to the hospital after being discharged. The doctors want you to assess if initial diagnoses, number of procedures, or other variables could help them better understand the probability of readmission. 

They want to focus follow-up calls and attention on those patients with a higher probability of readmission.

In [2]:
import os

git_author_emal = os.environ["GIT_AUTHOR_EMAL"]
git_author_name = os.environ["GIT_AUTHOR_NAME"]
git_committer_email = os.environ["GIT_COMMITTER_EMAIL"]
git_committer_name = os.environ["GIT_COMMITTER_NAME"]
git_username = os.environ["GIT_USERNAME"]
git_password = os.environ["GIT_PASSWORD"]

## 💾 The data
You have access to ten years of patient information ([source](https://archive.ics.uci.edu/ml/datasets/Diabetes+130-US+hospitals+for+years+1999-2008)):

#### Information in the file
- "age" - age bracket of the patient
- "time_in_hospital" - days (from 1 to 14)
- "n_procedures" - number of procedures performed during the hospital stay
- "n_lab_procedures" - number of laboratory procedures performed during the hospital stay
- "n_medications" - number of medications administered during the hospital stay
- "n_outpatient" - number of outpatient visits in the year before a hospital stay
- "n_inpatient" - number of inpatient visits in the year before the hospital stay
- "n_emergency" - number of visits to the emergency room in the year before the hospital stay
- "medical_specialty" - the specialty of the admitting physician
- "diag_1" - primary diagnosis (Circulatory, Respiratory, Digestive, etc.)
- "diag_2" - secondary diagnosis
- "diag_3" - additional secondary diagnosis
- "glucose_test" - whether the glucose serum came out as high (> 200), normal, or not performed
- "A1Ctest" - whether the A1C level of the patient came out as high (> 7%), normal, or not performed
- "change" - whether there was a change in the diabetes medication ('yes' or 'no')
- "diabetes_med" - whether a diabetes medication was prescribed ('yes' or 'no')
- "readmitted" - if the patient was readmitted at the hospital ('yes' or 'no') 

***Acknowledgments**: Beata Strack, Jonathan P. DeShazo, Chris Gennings, Juan L. Olmo, Sebastian Ventura, Krzysztof J. Cios, and John N. Clore, "Impact of HbA1c Measurement on Hospital Readmission Rates: Analysis of 70,000 Clinical Database Patient Records," BioMed Research International, vol. 2014, Article ID 781670, 11 pages, 2014.*

In [9]:
import pandas as pd
df = pd.read_csv("~/workspace/reducing-hospital-readmissions/data/hospital_readmissions.csv")
df.head()

Unnamed: 0,age,time_in_hospital,n_lab_procedures,n_procedures,n_medications,n_outpatient,n_inpatient,n_emergency,medical_specialty,diag_1,diag_2,diag_3,glucose_test,A1Ctest,change,diabetes_med,readmitted
0,[70-80),8,72,1,18,2,0,0,Missing,Circulatory,Respiratory,Other,no,no,no,yes,no
1,[70-80),3,34,2,13,0,0,0,Other,Other,Other,Other,no,no,no,yes,no
2,[50-60),5,45,0,18,0,0,0,Missing,Circulatory,Circulatory,Circulatory,no,no,yes,yes,yes
3,[70-80),2,36,0,12,1,0,0,Missing,Circulatory,Other,Diabetes,no,no,yes,yes,yes
4,[60-70),1,42,0,7,0,0,0,InternalMedicine,Other,Circulatory,Respiratory,no,no,no,yes,no


## 💪 Competition challenge
Create a report that covers the following:

1. What is the most common primary diagnosis by age group? 
2. Some doctors believe diabetes might play a central role in readmission. Explore the effect of a diabetes diagnosis on readmission rates. 
3. On what groups of patients should the hospital focus their follow-up efforts to better monitor patients with a high probability of readmission?

## 🧑‍⚖️ Judging criteria

| CATEGORY | WEIGHTING | DETAILS                                                              |
|:---------|:----------|:---------------------------------------------------------------------|
| **Recommendations** | 35%       | <ul><li>Clarity of recommendations - how clear and well presented the recommendation is.</li><li>Quality of recommendations - are appropriate analytical techniques used & are the conclusions valid?</li><li>Number of relevant insights found for the target audience.</li></ul>       |
| **Storytelling**  | 35%       | <ul><li>How well the data and insights are connected to the recommendation.</li><li>How the narrative and whole report connects together.</li><li>Balancing making the report in-depth enough but also concise.</li></ul> |
| **Visualizations** | 20% | <ul><li>Appropriateness of visualization used.</li><li>Clarity of insight from visualization.</li></ul> |
| **Votes** | 10% | <ul><li>Up voting - most upvoted entries get the most points.</li></ul> |

## ✅ Checklist before publishing into the competition
- Rename your workspace to make it descriptive of your work. N.B. you should leave the notebook name as notebook.ipynb.
- **Remove redundant cells** like the judging criteria, so the workbook is focused on your story.
- Make sure the workbook reads well and explains how you found your insights. 
- Try to include an **executive summary** of your recommendations at the beginning.
- Check that all the cells run without error.

## ⌛️ Time is ticking. Good luck!

In [41]:
df_group_diagnosis_by_age = df.groupby(
                                ["age", "diag_1"],
                                sort=False
                            ).size(
                            ).reset_index(
                            name='counts_diag_1_per_age'
)

df_group_diagnosis_by_age.head(10)

Unnamed: 0,age,diag_1,counts_diag_1_per_age
0,[70-80),Circulatory,2392
1,[70-80),Other,1693
2,[50-60),Circulatory,1256
3,[60-70),Other,1402
4,[40-50),Other,750
5,[50-60),Injury,273
6,[60-70),Circulatory,1962
7,[80-90),Digestive,402
8,[70-80),Respiratory,964
9,[70-80),Injury,444


In [65]:
df_group_diagnosis_by_age_first = df_group_diagnosis_by_age [ 
    df_group_diagnosis_by_age.groupby(["age", "diag_1"])['counts_diag_1_per_age'].rank() == 1]

df_group_diagnosis_by_age_first.head()

Unnamed: 0,age,diag_1,counts_diag_1_per_age
0,[70-80),Circulatory,2392
1,[70-80),Other,1693
2,[50-60),Circulatory,1256
3,[60-70),Other,1402
4,[40-50),Other,750


In [2]:
def flag_diabetes_rates (diag_1, readmitted):   
    return 'True' if diag_1 == 'Diabetes' and readmitted == 'yes' else 'False'

df_diabetes = df 

df_diabetes['diabetes_rates'] = df_diabetes.apply( lambda x: flag_diabetes_rates( x.diag_1, x.readmitted), axis = 1)

df_diabetes.head(5)

Unnamed: 0,age,time_in_hospital,n_lab_procedures,n_procedures,n_medications,n_outpatient,n_inpatient,n_emergency,medical_specialty,diag_1,diag_2,diag_3,glucose_test,A1Ctest,change,diabetes_med,readmitted,diabetes_rates
0,[70-80),8,72,1,18,2,0,0,Missing,Circulatory,Respiratory,Other,no,no,no,yes,no,False
1,[70-80),3,34,2,13,0,0,0,Other,Other,Other,Other,no,no,no,yes,no,False
2,[50-60),5,45,0,18,0,0,0,Missing,Circulatory,Circulatory,Circulatory,no,no,yes,yes,yes,False
3,[70-80),2,36,0,12,1,0,0,Missing,Circulatory,Other,Diabetes,no,no,yes,yes,yes,False
4,[60-70),1,42,0,7,0,0,0,InternalMedicine,Other,Circulatory,Respiratory,no,no,no,yes,no,False


In [3]:
def get_diabetes_rates_porcentage(df):
    
    count_true = 0

    count_false = 0

    for idx, row in df.iterrows():
        if row['diabetes_rates'] == 'True':
            count_true = count_true + 1
        else:
            count_false = count_false + 1
            
    count_all = count_true + count_false
    
    return print((count_true/count_all) * 100)
                
get_diabetes_rates_porcentage(df_diabetes)

3.7479999999999998


In [18]:
# On what groups of patients should the hospital focus their follow-up efforts to better monitor patients with a high probability of readmission?

# who are the group of pacients wich has more impact in readmission rates. What is the proportion of it 

df.loc[(df["readmitted"] == "yes")].head(30)

# 1 df.loc[(df["readmitted"] == "yes") & (df["n_lab_procedures"] == 0)].head()
# first criteria pacients

# 1 df.loc[(df["readmitted"] == "yes") & (df["diabetes_med"] == 'yes')].head()
# first criteria pacients

# 2 df.loc[(df["readmitted"] == "yes") & (df["diabetes_med"] == 'no') & (df["medical_specialty"] == ('Missing', 'InternalMedicine', 'Emergency/Trauma','Cardiology','Family/GeneralPractice', 'Surgery', 'Other')].head()
# second criteria pacients



Unnamed: 0,age,time_in_hospital,n_lab_procedures,n_procedures,n_medications,n_outpatient,n_inpatient,n_emergency,medical_specialty,diag_1,diag_2,diag_3,glucose_test,A1Ctest,change,diabetes_med,readmitted
2,[50-60),5,45,0,18,0,0,0,Missing,Circulatory,Circulatory,Circulatory,no,no,yes,yes,yes
3,[70-80),2,36,0,12,1,0,0,Missing,Circulatory,Other,Diabetes,no,no,yes,yes,yes
5,[40-50),2,51,0,10,0,0,0,Missing,Other,Other,Other,no,no,no,no,yes
7,[60-70),1,19,6,16,0,0,1,Other,Circulatory,Other,Other,no,no,no,yes,yes
8,[80-90),4,67,3,13,0,0,0,InternalMedicine,Digestive,Other,Other,no,no,no,no,yes
15,[80-90),8,52,0,20,0,0,0,Missing,Other,Circulatory,Other,no,no,yes,yes,yes
16,[70-80),3,52,0,10,0,0,0,Other,Circulatory,Other,Diabetes,no,no,no,yes,yes
18,[40-50),7,72,0,13,0,0,0,InternalMedicine,Diabetes,Other,Other,no,high,no,yes,yes
22,[70-80),3,69,1,13,0,0,0,Missing,Circulatory,Circulatory,Diabetes,no,no,no,yes,yes
26,[80-90),4,40,0,8,0,0,0,Missing,Digestive,Other,Diabetes,no,no,no,no,yes


In [17]:
df.loc[(df["readmitted"] == "yes") & (df["n_lab_procedures"] >= 100)].head()

Unnamed: 0,age,time_in_hospital,n_lab_procedures,n_procedures,n_medications,n_outpatient,n_inpatient,n_emergency,medical_specialty,diag_1,diag_2,diag_3,glucose_test,A1Ctest,change,diabetes_med,readmitted
675,[70-80),10,108,3,26,0,0,1,Missing,Other,Circulatory,Respiratory,no,high,yes,yes,yes
1120,[60-70),13,100,1,26,2,1,0,Missing,Circulatory,Other,Circulatory,no,no,no,no,yes
5892,[60-70),9,113,2,17,0,1,0,Family/GeneralPractice,Other,Circulatory,Other,no,no,yes,yes,yes
7316,[50-60),11,101,6,38,1,0,1,Missing,Other,Respiratory,Other,no,no,yes,yes,yes
19993,[80-90),7,105,3,16,0,0,0,Missing,Circulatory,Diabetes,Other,high,high,no,yes,yes


In [22]:
df.loc[(df["readmitted"] == "yes") & (df["diabetes_med"] == 'no')].head(5)

Unnamed: 0,age,time_in_hospital,n_lab_procedures,n_procedures,n_medications,n_outpatient,n_inpatient,n_emergency,medical_specialty,diag_1,diag_2,diag_3,glucose_test,A1Ctest,change,diabetes_med,readmitted
5,[40-50),2,51,0,10,0,0,0,Missing,Other,Other,Other,no,no,no,no,yes
8,[80-90),4,67,3,13,0,0,0,InternalMedicine,Digestive,Other,Other,no,no,no,no,yes
26,[80-90),4,40,0,8,0,0,0,Missing,Digestive,Other,Diabetes,no,no,no,no,yes
38,[60-70),1,1,0,5,0,0,0,Emergency/Trauma,Diabetes,Digestive,Other,no,no,no,no,yes
44,[50-60),4,42,0,9,0,0,0,Missing,Digestive,Digestive,Digestive,no,no,no,no,yes
