## **Hospitals Satisfaction Survey - 2016 🏥**

#### In this dataset we have the results of a survey conducted at hospitals in Israel 2016. It includes the responses of around 11,100 patients who were hospitalized at the Surgical Ward, Internal Ward and the rest excluding those who were in the Female , Rehab, ICU Wards. 
#### *The dataset can be found on this link* : https://data.gov.il/dataset/satisfaction-hosp-general2016

##### Steps for our project :
- Go to the original excel file and make some changes before moving to work here.
- Get our data loaded in and take a look to see what we are working with.
- Clean the data and transform it into the format we see fit for use later.
- Run some EDA and gain some insights.
- Visualize our data and then build a dashboard using Tableau.

#### **Step I : Make some changes to our excel file.**

##### *After looking at the excel file it had 3 sheets: one with the data itself, another sheet that offers some explanation about what the column is about and the last one offers interpretation about the answers the patients submitted, feel free to take a look*.
##### *The only thing I did was delete a bunch of columns and rename some others and those changes are in a seperate sheet that I called 'modified', so I don't mess the raw data. More on that I will cover in a separate Medium post.*

#### **Step II : Load our data and take a look.**

##### Go to the link provided above and download the dataset. Then, add it to our project directory after making changes with the excel file.

In [None]:
# Let's check the raw data first: 

import pandas as pd

raw = pd.read_excel('satisfaction-hosp-general2016.xls', sheet_name='raw')
raw

##### Yeah I know, too much columns, but rest assured, I took care of that. Those who don't speak Hebrew shouldn't lose hope, yet. I will transform our data to English as we go along.

In [None]:
# Now let's see the modified version after deleting some columns.

hosp_sat = pd.read_excel('satisfaction-hosp-general2016.xls', sheet_name='modified')
hosp_sat

##### Yes what you see is correct. We went from 143 columns to 48 columns. Pretty neat.
##### So as we see, there are some columns about the patients and hospitals themselves like gender, which hospital is it, ward, language , age.. and we also have the responses they gave to around 37 questions. Looks straightforward to me. Now that we know with which kind of data we work with, let's start cleaning ! 

#### **Step III: Clean and Transform the data.**

In [None]:
hosp_sat.info()

##### Looking at the info we got above, we need to work on these :
- Change the datatypes of code_hospital, Miyun_or_Electiv, CHRONIC_2, HEALTH_STATUS, Q3_G, q31_G and  KUPAT_HOLIM to 'object'. The reason for it is that if we look at the values sheet in the excel file all the possible answers for these should be numbers that represent a certain answer. We also need to change to object so it will be easier later when working with Tableau and plotting the values.

- Deal with the null values in the following columns: Miyun_or_Electiv, CHRONIC_2, HEALTH_STATUS, Q3_G, q31_G, KUPAT_HOLIM.

In [None]:
# Firstly, let's deal with the null values. How many null values do each of the columns above contain ?

null_columns = ['Miyun_or_Electiv', 'CHRONIC_2', 'HEALTH_STATUS', 'Q3_G', 'q31_G', 'KUPAT_HOLIM']

for null_column in null_columns:
    print(f"{null_column} column has {hosp_sat[null_column].isnull().sum()} null values.")

In [None]:
# Before starting dealing with the NaN's, let's see the different unique values each of them has:

for null_column in null_columns:
    print(f"unique values in {null_column} are : {hosp_sat[null_column].unique()}")

In [None]:
# Let's start filling the NaN's.

# Miyun_or_Electiv : we will fill them with 0's .
hosp_sat['Miyun_or_Electiv'].fillna(0, inplace=True)

# CHRONIC_2 : Here we will fill with 0's as well.
hosp_sat['CHRONIC_2'].fillna(0, inplace=True)

# HEALTH_STATUS: The values corresponding to this column are ints between 1 to 5 describing the patient's general health status. We will fill with the most frequent value.
hosp_sat['HEALTH_STATUS'].fillna(hosp_sat['HEALTH_STATUS'].mode(), inplace=True)

# Q3_G: This column is about if they were satisfied in general with the treatment in the hospital. 1 indicated they are and a 0 if 'else'. we will fill with 0's.
hosp_sat['Q3_G'].fillna(0, inplace=True)

# q31_G : if they are willing to recommend the hospital as a good place to be hospitalized. 1 means yes and the 0's are 'else'. We wil fill with 0's. 
hosp_sat['q31_G'].fillna(0, inplace=True)

# KUPAT_HOLIM : it's basically the HMO in Israel. 1-4 list 4 names of the HMO's in Israel with 5 being 'else'. We will fill with 5.
hosp_sat['KUPAT_HOLIM'].fillna(5, inplace=True)


print(f"Number of NaN values after filling: {hosp_sat.isnull().sum().sum()}.")

##### Hmm, what did we do wrong ? The number 228 is also the number of NaN's in the HEALTH_STATUS column so let's see:

In [None]:
hosp_sat['HEALTH_STATUS'].isnull().sum()

In [None]:
# Sounds like we found the culprit. Why didn't it work ? 
hosp_sat['HEALTH_STATUS'].mode()


In [None]:
# The mode is correct, so why not filling?
# After a 5 seconds search on stackoverflow I found the problem. Link to it: https://stackoverflow.com/questions/42789324/how-to-pandas-fillna-with-mode-of-column

hosp_sat['HEALTH_STATUS'].fillna(hosp_sat['HEALTH_STATUS'].mode()[0], inplace=True)

print(f"Number of NaN values after filling: {hosp_sat.isnull().sum().sum()}.")

##### Before moving on after dealing with NaN, let's make sure that the values all the columns have are valid:

In [None]:
for column in hosp_sat.columns:
    print(f"Unique Values in {column} column are : {hosp_sat[column].unique()}")

In [None]:
# After double checking with the values presented in the excel sheet all of them are fine except the columns 'CHOICE' and 'code_hospital'. The column 'CHOICE' should be either 0 or 1. What is that annoying 3 we see there ? How many are they ? And the column 'code_hospital' has a duplicate value. let's start with the choice column:

hosp_sat['CHOICE'].value_counts()

In [None]:
# Well, since that is a lot of 3's I can't just fill with either 0 or 1. But we also don't have a context about it. What we will do is use a fill technique that fills the 3 with the value precceding it.
import numpy as np
hosp_sat['CHOICE'] = hosp_sat['CHOICE'].mask(hosp_sat['CHOICE'].isin([3]),hosp_sat['CHOICE'].replace(3,np.nan).ffill())

hosp_sat['CHOICE'].value_counts()

In [None]:
# If you pay close attention to the possible values for the code_hospital the values 2.7 and 27 point to the same hospital:

%matplotlib inline
from IPython.display import Image
Image(r'C:\Users\armon\OneDrive\Desktop\Capture.png')

In [None]:
# Let's change the 2.7 to 27:

hosp_sat.loc[hosp_sat['code_hospital'] == 2.70, 'code_hospital'] = 27
hosp_sat['code_hospital'].unique()

##### Now it's time to populate the dataframe with the correct responses instead of the numbers while also translating the ones in Hebrew

In [None]:
# First off , the code_hospital column:

conditions = [hosp_sat['code_hospital'].eq(1),hosp_sat['code_hospital'].eq(2), hosp_sat['code_hospital'].eq(3), hosp_sat['code_hospital'].eq(4),
            hosp_sat['code_hospital'].eq(5), hosp_sat['code_hospital'].eq(6), hosp_sat['code_hospital'].eq(7), hosp_sat['code_hospital'].eq(8),
            hosp_sat['code_hospital'].eq(9), hosp_sat['code_hospital'].eq(10), hosp_sat['code_hospital'].eq(11), hosp_sat['code_hospital'].eq(12),
             hosp_sat['code_hospital'].eq(13),hosp_sat['code_hospital'].eq(14), hosp_sat['code_hospital'].eq(15), hosp_sat['code_hospital'].eq(16),
            hosp_sat['code_hospital'].eq(17), hosp_sat['code_hospital'].eq(18),hosp_sat['code_hospital'].eq(19), hosp_sat['code_hospital'].eq(20),
            hosp_sat['code_hospital'].eq(21), hosp_sat['code_hospital'].eq(22), hosp_sat['code_hospital'].eq(23),hosp_sat['code_hospital'].eq(24),
            hosp_sat['code_hospital'].eq(25), hosp_sat['code_hospital'].eq(26), hosp_sat['code_hospital'].eq(27), hosp_sat['code_hospital'].eq(28)]
            
choices = ['Sheba', 'Rambam', 'Wolfson', 'Ziv', 'Hillel Yaffe', 'Galilee', 'Brazilai', 'Baruch Padeh', 'Ichilov', 'Bnai Zion', 'Beilinson', 'Soroka', 'Meir', 'Kaplan', 'Emek',
            'Carmel','Hasharon', 'Yoseftal', 'Hadassah-Ein Karem', 'Hadassah-Mount Scopus', 'Nazareth Hospital EMMS', 'Holy Family', 'Shaare Zedek', 'Laniado',
            'Augusta Victoria', 'Mayanei HaYeshua', 'Shamir', 'Saint Vincent De Paul']

hosp_sat['code_hospital'] = np.select(conditions, choices, default=0)
hosp_sat.head(20)

In [None]:
# Let's tackle the Gender column as well:

conditions = [hosp_sat['Gender'].eq('זכר'), hosp_sat['Gender'].eq('נקבה')]
choices = ['M', 'F']
hosp_sat['Gender'] = np.select(conditions, choices, default=0)
hosp_sat.sample(10)

In [None]:
# Now that you get the hang of it, I will speedrun through a few column except the questions:

conditions = [hosp_sat['Code_ward'].eq(1), hosp_sat['Code_ward'].eq(2), hosp_sat['Code_ward'].eq(3)]
choices = ['Internal', 'Surgical', 'Other']
hosp_sat['Code_ward'] = np.select(conditions, choices, default=0)
hosp_sat.rename({'Code_ward': 'Ward'}, axis=1, inplace=True)


conditions = [hosp_sat['SIZE_new'].eq(1), hosp_sat['SIZE_new'].eq(2), hosp_sat['SIZE_new'].eq(3)]
choices = ['Small', 'Medium', 'Big']
hosp_sat['SIZE_new'] = np.select(conditions, choices, default=0)
hosp_sat.rename({'SIZE_new': 'Hospital_size'}, axis=1, inplace=True)


conditions = [hosp_sat['Miyun_or_Electiv'].eq(1), hosp_sat['Miyun_or_Electiv'].eq(0)]
choices = ['Emergency hospitalization', 'Elective hospitalization']
hosp_sat['Miyun_or_Electiv'] = np.select(conditions, choices, default=0)
hosp_sat.rename({'Miyun_or_Electiv': 'Emergency_Or_Elective'}, axis=1, inplace=True)


conditions = [hosp_sat['CHOICE'].eq(1), hosp_sat['CHOICE'].eq(0)]
choices = ['Yes', 'No']
hosp_sat['CHOICE'] = np.select(conditions, choices, default=0)
hosp_sat.rename({'CHOICE': 'Can_Choose_Hosp'}, axis=1, inplace=True)


conditions = [hosp_sat['corridor1'].eq(1), hosp_sat['corridor1'].eq(0)]
choices = ['No', 'Yes']
hosp_sat['corridor1'] = np.select(conditions, choices, default=0)
hosp_sat.rename({'corridor1': 'Lay_Corridor'}, axis=1, inplace=True)


conditions = [hosp_sat['CHRONIC_2'].eq(1), hosp_sat['CHRONIC_2'].eq(0)]
choices = ['Yes', 'No']
hosp_sat['CHRONIC_2'] = np.select(conditions, choices, default=0)
hosp_sat.rename({'CHRONIC_2': 'Chronic'}, axis=1, inplace=True)


conditions = [hosp_sat['HEALTH_STATUS'].eq(1), hosp_sat['HEALTH_STATUS'].eq(2), hosp_sat['HEALTH_STATUS'].eq(3), hosp_sat['HEALTH_STATUS'].eq(4), hosp_sat['HEALTH_STATUS'].eq(5)]
choices = ['Excellent', 'Very Good', 'Good', 'Reasonable', 'Deficient']
hosp_sat['HEALTH_STATUS'] = np.select(conditions, choices, default=0)


conditions = [hosp_sat['KUPAT_HOLIM'].eq(1), hosp_sat['KUPAT_HOLIM'].eq(2), hosp_sat['KUPAT_HOLIM'].eq(3), hosp_sat['KUPAT_HOLIM'].eq(4), hosp_sat['KUPAT_HOLIM'].eq(5)]
choices = ['Clalit', 'Leumit', 'Meuhedet', 'Maccabi', 'Other']
hosp_sat['KUPAT_HOLIM'] = np.select(conditions, choices, default=0)


conditions = [hosp_sat['baalut'].eq(1), hosp_sat['baalut'].eq(2), hosp_sat['baalut'].eq(3), hosp_sat['baalut'].eq(5)]
choices = ['Goverment', 'Clalit', 'Hadassah', 'Public']
hosp_sat['baalut'] = np.select(conditions, choices, default=0)
hosp_sat.rename({'baalut': 'Hosp_Ownership'}, axis=1, inplace=True)

hosp_sat.sample(10)

##### The only things left are the questions: Changing the column title instead of Q-something to the actual question, also changing the values in each of those columns to their interpretation. After a little thinking, we will not be doing that. First off the questions themselves are very long and will take too much space. The only thing we will be doing is just change the values to each column but keep the column title as it is. I will translate them later and store in a different dataframe. Let's get started:

In [None]:
# Q3 is basically the satisfaction score from 1 to 10. No need for masking here. We will only change the column name:
hosp_sat.rename({'Q3': 'sat_score'}, axis=1, inplace=True)


# Q31 is if they would recommend for other to be hospitalized in that hospital. No need for masking here. We will only change the column name:
hosp_sat.rename({'Q31': 'would_recommend'}, axis=1, inplace=True)


# Since a lot of questions have similiar values as answers , I'd like to do them all at once, maybe use a loop. Let's group columns that have similiar values:
# P.S : The columns we won't include in the following group is either because the values for them are completely different or because there are some values or extra values
# that don't have the same interpretation for the same answer so we will do them seperately afterwards.

group = ['Q5', 'Q6', 'Q7','Q9', 'Q10', 'Q11','Q14', 'Q15','Q17','Q21_2016', 'Q22', 'Q23', 'Q24','Q27', 'Q28']

for column in group:
    conditions = [hosp_sat[column].eq(1), hosp_sat[column].eq(2), hosp_sat[column].eq(3), hosp_sat[column].eq(4), hosp_sat[column].eq(5), hosp_sat[column].eq(99)]
    choices = ['Very Satisfied', 'Satisfied', 'Neutral', 'Dissatisfied', 'Very Dissatisfied', 'Do not know / irrelevant']
    hosp_sat[column] = np.select(conditions, choices, default=0)



conditions = [hosp_sat['Q4'].eq(1), hosp_sat['Q4'].eq(2), hosp_sat['Q4'].eq(3), hosp_sat['Q4'].eq(4), hosp_sat['Q4'].eq(5), hosp_sat['Q4'].eq(98), hosp_sat['Q4'].eq(99)]
choices = ['Very Satisfied', 'Satisfied', 'Neutral', 'Dissatisfied', 'Very Dissatisfied', 'Was not emergency hospitalization', 'Do not know / irrelevant']
hosp_sat['Q4'] = np.select(conditions, choices, default=0)

conditions = [hosp_sat['Q8'].eq(1), hosp_sat['Q8'].eq(2), hosp_sat['Q8'].eq(3), hosp_sat['Q8'].eq(4), hosp_sat['Q8'].eq(5), hosp_sat['Q8'].eq(6), hosp_sat['Q8'].eq(99)]
choices = ['Very Satisfied', 'Satisfied', 'Neutral', 'Dissatisfied', 'Very Dissatisfied','Did not receive explanation', 'Do not know / irrelevant']
hosp_sat['Q8'] = np.select(conditions, choices, default=0)

conditions = [hosp_sat['Q12'].eq(1), hosp_sat['Q12'].eq(2), hosp_sat['Q12'].eq(3), hosp_sat['Q12'].eq(4), hosp_sat['Q12'].eq(5), hosp_sat['Q12'].eq(6), hosp_sat['Q12'].eq(99)]
choices = ['Very Satisfied', 'Satisfied', 'Neutral', 'Dissatisfied', 'Very Dissatisfied','Did not receive explanation', 'Do not know / irrelevant']
hosp_sat['Q12'] = np.select(conditions, choices, default=0)

conditions = [hosp_sat['Q13'].eq(1), hosp_sat['Q13'].eq(2), hosp_sat['Q13'].eq(3), hosp_sat['Q13'].eq(4), hosp_sat['Q13'].eq(5), hosp_sat['Q13'].eq(6), hosp_sat['Q13'].eq(98), hosp_sat['Q13'].eq(99)]
choices = ['Very Satisfied', 'Satisfied', 'Neutral', 'Dissatisfied', 'Very Dissatisfied', "don't know/couldn't know", "don't know/couldn't know", 'irrelevant']
hosp_sat['Q13'] = np.select(conditions, choices, default=0)

conditions = [hosp_sat['Q16'].eq(1), hosp_sat['Q16'].eq(2), hosp_sat['Q16'].eq(3), hosp_sat['Q16'].eq(4), hosp_sat['Q16'].eq(5), hosp_sat['Q16'].eq(98), hosp_sat['Q16'].eq(99)]
choices = ['Very Satisfied', 'Satisfied', 'Neutral', 'Dissatisfied', 'Very Dissatisfied', "didn't suffer / didn't want to get treatment for pain", "don't know"]
hosp_sat['Q16'] = np.select(conditions, choices, default=0)

conditions = [hosp_sat['Q18'].eq(1), hosp_sat['Q18'].eq(2), hosp_sat['Q18'].eq(3), hosp_sat['Q18'].eq(4), hosp_sat['Q18'].eq(5), hosp_sat['Q18'].eq(97), hosp_sat['Q18'].eq(98), hosp_sat['Q18'].eq(99)]
choices = ['Very Satisfied', 'Satisfied', 'Neutral', 'Dissatisfied', 'Very Dissatisfied', 'Not interested in being shared with the information', 'My medical condition did not allow for sharing', "Don't know"]
hosp_sat['Q18'] = np.select(conditions, choices, default=0)

conditions = [hosp_sat['Q19'].eq(1), hosp_sat['Q19'].eq(2), hosp_sat['Q19'].eq(3), hosp_sat['Q19'].eq(4), hosp_sat['Q19'].eq(5), hosp_sat['Q19'].eq(6), hosp_sat['Q19'].eq(99)]
choices = ['Very Satisfied', 'Satisfied', 'Neutral', 'Dissatisfied', 'Very Dissatisfied', 'There were no alternatives', "Don't know"]
hosp_sat['Q19'] = np.select(conditions, choices, default=0)

conditions = [hosp_sat['Q20'].eq(1), hosp_sat['Q20'].eq(2), hosp_sat['Q20'].eq(3), hosp_sat['Q20'].eq(4), hosp_sat['Q20'].eq(99)]
choices = ['Always', 'Usually Yes', 'Usually No', 'Never', "Don't know"]
hosp_sat['Q20'] = np.select(conditions, choices, default=0)

conditions = [hosp_sat['Q25'].eq(1), hosp_sat['Q25'].eq(2), hosp_sat['Q25'].eq(3), hosp_sat['Q25'].eq(4), hosp_sat['Q25'].eq(5), hosp_sat['Q25'].eq(6), hosp_sat['Q25'].eq(99)]
choices = ['Very Satisfied', 'Satisfied', 'Neutral', 'Dissatisfied', 'Very Dissatisfied','Did not receive explanation', 'Do not know']
hosp_sat['Q25'] = np.select(conditions, choices, default=0)

conditions = [hosp_sat['Q26'].eq(1), hosp_sat['Q26'].eq(2), hosp_sat['Q26'].eq(3), hosp_sat['Q26'].eq(4), hosp_sat['Q26'].eq(99)]
choices = ['Always', 'Usually Yes', 'Usually No', 'Never', "Don't know"]
hosp_sat['Q26'] = np.select(conditions, choices, default=0)

conditions = [hosp_sat['Q29'].eq(1), hosp_sat['Q29'].eq(2), hosp_sat['Q29'].eq(3), hosp_sat['Q29'].eq(4), hosp_sat['Q29'].eq(5), hosp_sat['Q29'].eq(98), hosp_sat['Q29'].eq(99)]
choices = ['Very Satisfied', 'Satisfied', 'Neutral', 'Dissatisfied', 'Very Dissatisfied', "Didn't eat the hospital food", "Don't know / irrelevant"]
hosp_sat['Q29'] = np.select(conditions, choices, default=0)
hosp_sat.rename({'Q29': 'Hospital_food'}, axis=1, inplace=True)


conditions = [hosp_sat['Q30'].eq(1), hosp_sat['Q30'].eq(2), hosp_sat['Q30'].eq(3), hosp_sat['Q30'].eq(4), hosp_sat['Q30'].eq(5), hosp_sat['Q30'].eq(98), hosp_sat['Q30'].eq(99)]
choices = ['Very Satisfied', 'Satisfied', 'Neutral', 'Dissatisfied', 'Very Dissatisfied', 'Had no companions', "Don't know"]
hosp_sat['Q30'] = np.select(conditions, choices, default=0)

conditions = [hosp_sat['Q33'].eq(1), hosp_sat['Q33'].eq(2), hosp_sat['Q33'].eq(3), hosp_sat['Q33'].eq(4), hosp_sat['Q33'].eq(5), hosp_sat['Q33'].eq(6), hosp_sat['Q33'].eq(99)]
choices = ['Hebrew', 'English', 'Arabic', 'Russian', 'Amharic', 'Other', 'Refused to answer']
hosp_sat['Q33'] = np.select(conditions, choices, default=0)
hosp_sat.rename({'Q33': 'Language'}, axis=1, inplace=True)

conditions = [hosp_sat['Q34'].eq(1), hosp_sat['Q34'].eq(2), hosp_sat['Q34'].eq(3)]
choices = ['Entirety', 'Partly', 'At all']
hosp_sat['Q34'] = np.select(conditions, choices, default=0)
hosp_sat.rename({'Q34': 'corridor_stay'}, axis=1, inplace=True) # and since we have another column called Lay_Corridor let's drop that one
hosp_sat.drop('Lay_Corridor', axis=1, inplace=True)

conditions = [hosp_sat['Q36'].eq(1), hosp_sat['Q36'].eq(2), hosp_sat['Q36'].eq(3), hosp_sat['Q36'].eq(4), hosp_sat['Q36'].eq(5), hosp_sat['Q36'].eq(6), hosp_sat['Q36'].eq(7)]
choices = ['Alone', 'With Family Member', 'Home with a caregiver', 'At Family member', 'Assisted living/nursing home', 'Nursing facility/rehabilitation center', 'Refused to answer']
hosp_sat['Q36'] = np.select(conditions, choices, default=0)
hosp_sat.rename({'Q36': 'Recently_lived_with'}, axis=1, inplace=True)

conditions = [hosp_sat['Q37'].eq(1), hosp_sat['Q37'].eq(2), hosp_sat['Q37'].eq(3), hosp_sat['Q37'].eq(4), hosp_sat['Q37'].eq(5), hosp_sat['Q37'].eq(6)]
choices = ['Jewish', 'Muslim', 'Christian', 'Druze', 'Other', 'Refused to answer']
hosp_sat['Q37'] = np.select(conditions, choices, default=0)
hosp_sat.rename({'Q37': 'Religion'}, axis=1, inplace=True)


#############################################
# Let's see if all our hard work paid off :
hosp_sat.sample(20)

In [None]:
# Columns Q3_G and q31_G are redundant , let's drop them:
hosp_sat.drop('Q3_G', axis=1, inplace=True)
hosp_sat.drop('q31_G', axis=1, inplace=True)

hosp_sat.shape

##### By now we already dealt with the NaN's , made sure all our entries are valid, translated what we needed to, populated with the answer itself instead of it's number, dropped the irrelevant columns. Last thing, I will check again what are the datatypes of the columns to make sure it's ok before moving it to Tableau.

In [None]:
hosp_sat.info()

In [None]:
# Final touch-ups 

# rename code_hospital to Hospital:
hosp_sat.rename({'code_hospital': 'HOSPITAL'}, axis=1, inplace=True)

# convert sat_score and would_recommend because they are categorical data: 
hosp_sat['sat_score'] = hosp_sat['sat_score'].astype('object')
hosp_sat['would_recommend'] = hosp_sat['would_recommend'].astype('object')

print(hosp_sat['sat_score'].dtype)
print(hosp_sat['would_recommend'].dtype)

##### Now that have cleaned our data and transformed it into the way we want it to be, let's save it as a xlsx file to use later in Tableau.

In [None]:
hosp_sat.to_excel('hosp_sat.xlsx')

#### **Step IV + V : Exploratory Data Analysis & Visualization**

In [None]:
# Before continuing let's translate the questions and add in a dataframe so it would be easier for you to follow up:

pd.options.display.max_colwidth = 300


questions = {'Question':
            ['Q4', 'Q5', 'Q6', 'Q7', 'Q8', 'Q9', 'Q10', 'Q11', 'Q12', 'Q13', 'Q14', 'Q15','Q16','Q17', 'Q18', 'Q19','Q20', 'Q21_2016', 'Q22', 'Q23', 'Q24', 'Q25', 'Q26', 'Q27', 'Q28', 'Q30'],

            'Translation': [
            "If you were hospitalized through the emergency room, to what extent were you satisfied with the care you received?",
            "From the moment you arrived at the ward, to what extent was the admission process conducted efficiently?",
            "During your last hospitalization, to what extent did you feel that the nurses treated you with kindness and respect?",
            "To what extent did the nurses listen to you and address your questions and concerns?",
            "To what extent were the explanations you received during hospitalization from the nurses clear and understandable to you?",
            "During your last hospitalization, to what extent did you feel that the doctors treated you with kindness and respect?",
            "To what extent during the doctors visit did you feel that you were treated personally?",
            "To what extent did the doctors listen to you and address your questions and concerns?",
            "To what extent were the explanations you received during hospitalization from the doctors clear and understandable to you?", 
            "To what extent did you feel that the staff treating you at the hospital knew your medical condition before hospitalization?",
            "To what extent were the explanations given to you during the hospitalization initiated by the ward staff?",
            "To what extent did you feel that the department staff worked in coordination and cooperation (among themselves) in everything related to your care? (For example, transferring information from one to another, implementing the doctors' recommendations)",
            "To what extent did you feel that the staff addressed your pain or other symptoms such as nausea or dizziness, and helped you deal with them?",
            "To what extent did you feel that the care team works to maintain your safety to prevent medical errors in cases such as identifying a patient sensitivity to medications, preventing falls, etc.?",
            "To what extent did you feel that you were shared with the therapeutic options, to the extent that you were interested? That is, you were involved in the decisions, and your preferences were taken into account.",
            "To what extent did you feel that additional treatment methods / therapeutic alternatives were presented to you?",
            "During the last hospitalization, did you feel that you knew what the next step in hospital treatment was?",
            "To what extent did you feel that you received an answer to your requests and needs easily and without the need to make an effort?",
            "To what extent did you feel during the hospitalization that you were treated in good hands?",
            "To what extent was the discharge process from the hospital conducted efficiently?",
            "At the time of discharge from the hospital, to what extent did you receive an explanation summarizing your medical problem and the treatment you were given?",
            "To what extent were the explanations and instructions for further treatment clear and understandable to you? This refers to explanations regarding the medical problem for which you were hospitalized, the treatment given to you, unusual symptoms to be aware of and medications you must take.",
            "Were the room and bathroom clean?",
            "To what extent are you satisfied with the conditions in the room where you were hospitalized? (air conditioning, bed, mattress...)",
            "During the hospitalization, to what extent was it quiet at night in your room and in your surroundings?",
            "To what extent were the conditions available to your companions and visitors comfortable and adequate?"]
            }

questions_df = pd.DataFrame(questions)
questions_df

##### let's start with seeing which hospital got the more patients:

In [None]:
import matplotlib.pylab as plt
import seaborn as sns 

plt.figure(figsize=(13,8))
ax = sns.countplot(y=hosp_sat["HOSPITAL"], hue=hosp_sat['Hospital_size'], order = hosp_sat['HOSPITAL'].value_counts().index, dodge=False, palette='GnBu_r');
for label in ax.containers:
    ax.bar_label(label);
ax;

##### How many male and female patients did we have ?

In [None]:
hosp_sat.Gender.value_counts()

##### The average age among patients :

In [None]:
hosp_sat.AGE_TODAY.mean().__round__(2)

##### Is the average age similiar when seperated by gender ?

In [None]:
hosp_sat.groupby('Gender', as_index=False)['AGE_TODAY'].mean().__round__(2)

##### Which ward had the most patients :

In [None]:
ax = sns.countplot(x=hosp_sat['Ward'], palette='GnBu_r');
ax.bar_label(ax.containers[0]);
ax.get_yaxis().set_visible(False)
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
ax.spines['left'].set_visible(False)

##### Health status & Chronic conditions:

In [None]:
sns.violinplot(x=hosp_sat["Chronic"], y=hosp_sat["AGE_TODAY"]);

##### Kupat Holim :

In [None]:
hosp_sat['KUPAT_HOLIM'].value_counts()

In [None]:
# Data
names = hosp_sat['KUPAT_HOLIM'].value_counts().index
size = hosp_sat['KUPAT_HOLIM'].value_counts()
 
# create a figure and set different background
fig = plt.figure(figsize=(8,8))
fig.patch.set_facecolor('white')
 
# Change color of text
plt.rcParams['text.color'] = 'black'
 
# Create a circle at the center of the plot
my_circle=plt.Circle( (0,0), 0.7, color='white')
 
# Pieplot + circle on it
plt.pie(size, labels=names)
p=plt.gcf()
p.gca().add_artist(my_circle)
plt.title('Kupat Holim')
plt.show()

##### Language and it's relation to the satisfaction score:

In [None]:
plt.figure(figsize=(15,8))
sns.countplot(x=hosp_sat['Language'], hue=hosp_sat['sat_score']);
plt.legend(loc='upper left');

In [None]:
hosp_sat['Language'].value_counts()

##### Let's check a few questions as well, starting with question Q13 : *To what extent did you feel that the staff treating you at the hospital knew your medical condition before hospitalization?*


In [None]:
plt.figure(figsize=(8,8))
sns.countplot(y=hosp_sat['Q13']);

##### Q16 : *To what extent did you feel that the staff addressed your pain or other symptoms such as nausea or dizziness, and helped you deal with them?*

In [None]:
plt.figure(figsize=(10,8))
ax = sns.countplot(y=hosp_sat['Q16']);
ax.bar_label(ax.containers[0]);
ax.get_xaxis().set_visible(False)
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
ax.spines['bottom'].set_visible(False)
ax;

##### Q24: *At the time of discharge from the hospital, to what extent did you receive an explanation summarizing your medical problem and the treatment you were given?*

In [None]:
sns.countplot(y=hosp_sat['Q24']);

#### **That's about it for this notebook. Now it's time to grab the excel file we saved earlier after cleaning the data and go to Tableau to build a dashboard.**