# Health Care Survey¶

### Title: Health Care Patient Survey Analysis

### Objective: Analyze patient survey data to extract insights, clean the data, and prepare it for further modeling.



# Introduction


###### This project analyzes a healthcare patient survey dataset. The goal is to clean the dataset, handle missing values, remove duplicates, and 
###### prepare it for future machine learning tasks. This data contains information about healthcare providers, their services, and patients' ratings.

#  Goal of the Project

##### To explore and clean the data.
##### To handle issues like missing values, duplicates, and irrelevant columns.
##### To identify trends in the survey data for actionable insights.

# Importing necessary libraries

 #### 1. pandas and numpy
 ##### pandas: Used for data manipulation and analysis. It helps you load, explore, and preprocess the dataset.
 ##### numpy: Provides support for numerical operations, which will be used for handling arrays and mathematical operations.
 #### 2. matplotlib.pyplot and seaborn matplotlib: Used for plotting and visualizing data.
 ##### seaborn: Built on top of matplotlib, it provides an easier and more aesthetically pleasing way to create plots.
 #### 3. OneHotEncoder from sklearn.preprocessing
 ##### OneHotEncoder: Useful for converting categorical variables into numeric format (one-hot encoding), which is required for m
 ##### any machine learning algorithms.
 ##### StandardScaler : It rescales the data to have a mean of 0 and a standard deviation of 1
 ##### train_test_split : one of the most important process in data preprocessing involves splitting the data set into train and test
 ##### set. By doing this we can enhance the performance of our model and hence provide better predictability


##### The dataset contains 12,159 rows and 39 columns. It includes provider details, survey star ratings, patient recommendations, and more.
#### Key features include:

###### Star Rating Columns: Numerical ratings for various healthcare aspects.
###### Footnote Columns: Supplementary text about the ratings and responses.
###### Categorical Features: Ownership types and services offered.

In [35]:
# Import libraries
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder, StandardScaler
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
from scipy import stats
import matplotlib.pyplot as plt
import seaborn as sns


from sklearn.linear_model import LinearRegression
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor
from sklearn.svm import SVR
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
from sklearn.model_selection import GridSearchCV

In [36]:
#1. Load the Dataset
df = pd.read_csv('Health_Care_Patient_survey.csv')
df

Unnamed: 0,State,CMS Certification Number (CCN)*,Provider Name,Address,City,Zip,Phone,Type of Ownership,Offers Nursing Care Services,Offers Physical Therapy Services,...,Star Rating for how patients rated overall care from agency,Footnote for Star Rating for overall care from agency,Percent of patients who gave their home health agency a rating of 9 or 10 on a scale from 0 (lowest) to 10 (highest),Footnote for percent of patients who gave their home health agency a rating of 9 or 10 on a scale from 0 (lowest) to 10 (highest),"Percent of patients who reported YES, they would definitely recommend the home health agency to friends and family","Footnote for percent of patients who reported YES, they would definitely recommend the home health agency to friends and family",Number of completed Surveys,Footnote for number of completed surveys,Response rate,Footnote for response rate
0,AL,17000,BUREAU OF HOME & COMMUNITY SERVICES ...,"201 MONROE STREET, THE RSA TOWER, SUITE 1200 ...",MONTGOMERY,36104,3342065341,Official Health Agency,True,True,...,,No survey results are available for this period.,Not Available,No survey results are available for this period.,Not Available,No survey results are available for this period.,Not Available,No survey results are available for this period.,Not Available,No survey results are available for this period.
1,AL,17008,JEFFERSON COUNTY HOME CARE ...,2201 ARLINGTON AVENUE ...,BESSEMER,35020,2059169500,Official Health Agency,True,True,...,3.0,Fewer than 100 patients completed the survey. ...,86,Fewer than 100 patients completed the survey. ...,68,Fewer than 100 patients completed the survey. ...,58,Fewer than 100 patients completed the survey. ...,24,Fewer than 100 patients completed the survey. ...
2,AL,17009,ALACARE HOME HEALTH & HOSPICE ...,2970 LORNA ROAD ...,BIRMINGHAM,35216,2058242680,Local,True,True,...,5.0,,91,,86,,348,,34,
3,AL,17013,GENTIVA HEALTH SERVICES ...,"557 GLOVER STREET, SUITE 5 ...",ENTERPRISE,36330,3343470234,Official Health Agency,True,True,...,4.0,,89,,90,,248,,38,
4,AL,17014,AMEDISYS HOME HEALTH ...,68278 MAIN STREET ...,BLOUNTSVILLE,35031,2054294919,Local,True,True,...,4.0,,85,,86,,159,,31,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
12154,TX,747970,"ROSE OF SHARON HOME HEALTH, INC. ...",109 EAST HOUSTON AVENUE ...,CROCKETT,75835,7133677275,Local,True,True,...,,No survey results are available for this period.,Not Available,No survey results are available for this period.,Not Available,No survey results are available for this period.,Not Available,No survey results are available for this period.,Not Available,No survey results are available for this period.
12155,TX,747971,"SACRED CARE HOME HEALTH, LLC ...",222 W BURLESON ...,WHARTON,77488,9795313068,Local,True,True,...,,No survey results are available for this period.,Not Available,No survey results are available for this period.,Not Available,No survey results are available for this period.,Not Available,No survey results are available for this period.,Not Available,No survey results are available for this period.
12156,TX,747972,"ALL DAY HEALTHCARE, INC. ...","330 MAIN STREET, SUITE #1B ...",SEALY,77474,9795894084,Local,True,True,...,,No survey results are available for this period.,Not Available,No survey results are available for this period.,Not Available,No survey results are available for this period.,Not Available,No survey results are available for this period.,Not Available,No survey results are available for this period.
12157,TX,747973,BRUSHY CREEK HOME HEALTH AGENCY INC ...,608 MORROW ST STE 105 ...,AUSTIN,78752,5123236175,Local,True,True,...,,No survey results are available for this period.,Not Available,No survey results are available for this period.,Not Available,No survey results are available for this period.,Not Available,No survey results are available for this period.,Not Available,No survey results are available for this period.


In [37]:
df = df.rename(columns={'HHCAHPS Survey Summary Star Rating': 'Summary Rating', 
                        'Footnote HHCAHPS Survey Summary Star Rating':'footnote1',
                        'Star Rating for health team gave care in a professional way':'professional rating',
                        'Footnote for Star Rating for gave care in a professional way':'footnote2',
                        'CMS Certification Number (CCN)*':'CNN',
                        'Percent of patients who reported that their home health team gave care in a professional way':'% health care professional way',
                        'Footnote for Star Rating for overall care from agenc':'footnote3',
                        'Percent of patients who gave their home health agency a rating of 9 or 10 on a scale from 0 (lowest) to 10 (highest)':'% health rating 9 or10',
                        'Footnote for percent of patients who gave their home health agency a rating of 9 or 10 on a scale from 0 (lowest) to 10 (highest)':'footnote4',
                        'Percent of patients who reported YES, they would definitely recommend the home health agency to friends and family':'reported yes',
                        'Footnote for percent of patients who reported YES, they would definitely recommend the home health agency to friends and family':'footnote5',
                        'Footnote for Star Rating for overall care from agency':'footnote6',
                        'Footnote for response rate':'footnote7',
                        'Footnote for number of completed surveys':'footnote8',
                        'Footnote for percent of patients who reported that their home health team gave care in a professional way':'footnote9',
                        'Star Rating for health team communicated well with them':'* communicated well',
                        'Footnote for Star Rating for communicated well with them':'footnote10',
                        'Percent of patients who reported that their home health team communicated well with them':'% communicated well',
                        'Footnote for percent of patients who reported that their home health team communicated well with them':'footnote11',
                        'Star Rating team discussed medicines, pain, and home safety':'* medicine',
                        'Footnote Star Rating discussed medicines, pain, home safety':'footnote12',
                        'Percent of patients who reported that their home health team discussed medicines, pain, and home safety with them':'% medicine',
                        'Footnote for percent of patients who reported that their home health team discussed medicines, pain, and home safety with them':'footnote13',
                        'Star Rating for how patients rated overall care from agency': 'Overall Rating'})
df

Unnamed: 0,State,CNN,Provider Name,Address,City,Zip,Phone,Type of Ownership,Offers Nursing Care Services,Offers Physical Therapy Services,...,Overall Rating,footnote6,% health rating 9 or10,footnote4,reported yes,footnote5,Number of completed Surveys,footnote8,Response rate,footnote7
0,AL,17000,BUREAU OF HOME & COMMUNITY SERVICES ...,"201 MONROE STREET, THE RSA TOWER, SUITE 1200 ...",MONTGOMERY,36104,3342065341,Official Health Agency,True,True,...,,No survey results are available for this period.,Not Available,No survey results are available for this period.,Not Available,No survey results are available for this period.,Not Available,No survey results are available for this period.,Not Available,No survey results are available for this period.
1,AL,17008,JEFFERSON COUNTY HOME CARE ...,2201 ARLINGTON AVENUE ...,BESSEMER,35020,2059169500,Official Health Agency,True,True,...,3.0,Fewer than 100 patients completed the survey. ...,86,Fewer than 100 patients completed the survey. ...,68,Fewer than 100 patients completed the survey. ...,58,Fewer than 100 patients completed the survey. ...,24,Fewer than 100 patients completed the survey. ...
2,AL,17009,ALACARE HOME HEALTH & HOSPICE ...,2970 LORNA ROAD ...,BIRMINGHAM,35216,2058242680,Local,True,True,...,5.0,,91,,86,,348,,34,
3,AL,17013,GENTIVA HEALTH SERVICES ...,"557 GLOVER STREET, SUITE 5 ...",ENTERPRISE,36330,3343470234,Official Health Agency,True,True,...,4.0,,89,,90,,248,,38,
4,AL,17014,AMEDISYS HOME HEALTH ...,68278 MAIN STREET ...,BLOUNTSVILLE,35031,2054294919,Local,True,True,...,4.0,,85,,86,,159,,31,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
12154,TX,747970,"ROSE OF SHARON HOME HEALTH, INC. ...",109 EAST HOUSTON AVENUE ...,CROCKETT,75835,7133677275,Local,True,True,...,,No survey results are available for this period.,Not Available,No survey results are available for this period.,Not Available,No survey results are available for this period.,Not Available,No survey results are available for this period.,Not Available,No survey results are available for this period.
12155,TX,747971,"SACRED CARE HOME HEALTH, LLC ...",222 W BURLESON ...,WHARTON,77488,9795313068,Local,True,True,...,,No survey results are available for this period.,Not Available,No survey results are available for this period.,Not Available,No survey results are available for this period.,Not Available,No survey results are available for this period.,Not Available,No survey results are available for this period.
12156,TX,747972,"ALL DAY HEALTHCARE, INC. ...","330 MAIN STREET, SUITE #1B ...",SEALY,77474,9795894084,Local,True,True,...,,No survey results are available for this period.,Not Available,No survey results are available for this period.,Not Available,No survey results are available for this period.,Not Available,No survey results are available for this period.,Not Available,No survey results are available for this period.
12157,TX,747973,BRUSHY CREEK HOME HEALTH AGENCY INC ...,608 MORROW ST STE 105 ...,AUSTIN,78752,5123236175,Local,True,True,...,,No survey results are available for this period.,Not Available,No survey results are available for this period.,Not Available,No survey results are available for this period.,Not Available,No survey results are available for this period.,Not Available,No survey results are available for this period.


In [38]:
#2. Display first few rows
df.head()

Unnamed: 0,State,CNN,Provider Name,Address,City,Zip,Phone,Type of Ownership,Offers Nursing Care Services,Offers Physical Therapy Services,...,Overall Rating,footnote6,% health rating 9 or10,footnote4,reported yes,footnote5,Number of completed Surveys,footnote8,Response rate,footnote7
0,AL,17000,BUREAU OF HOME & COMMUNITY SERVICES ...,"201 MONROE STREET, THE RSA TOWER, SUITE 1200 ...",MONTGOMERY,36104,3342065341,Official Health Agency,True,True,...,,No survey results are available for this period.,Not Available,No survey results are available for this period.,Not Available,No survey results are available for this period.,Not Available,No survey results are available for this period.,Not Available,No survey results are available for this period.
1,AL,17008,JEFFERSON COUNTY HOME CARE ...,2201 ARLINGTON AVENUE ...,BESSEMER,35020,2059169500,Official Health Agency,True,True,...,3.0,Fewer than 100 patients completed the survey. ...,86,Fewer than 100 patients completed the survey. ...,68,Fewer than 100 patients completed the survey. ...,58,Fewer than 100 patients completed the survey. ...,24,Fewer than 100 patients completed the survey. ...
2,AL,17009,ALACARE HOME HEALTH & HOSPICE ...,2970 LORNA ROAD ...,BIRMINGHAM,35216,2058242680,Local,True,True,...,5.0,,91,,86,,348,,34,
3,AL,17013,GENTIVA HEALTH SERVICES ...,"557 GLOVER STREET, SUITE 5 ...",ENTERPRISE,36330,3343470234,Official Health Agency,True,True,...,4.0,,89,,90,,248,,38,
4,AL,17014,AMEDISYS HOME HEALTH ...,68278 MAIN STREET ...,BLOUNTSVILLE,35031,2054294919,Local,True,True,...,4.0,,85,,86,,159,,31,


# Insights Explanation

##### The dataset contains several missing values in critical columns (like star ratings). The next steps involve handling these
##### issues for a cleaner dataset


In [29]:
# Summary of the dataset
df.info()

# Display basic statistics
df.describe()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 12159 entries, 0 to 12158
Data columns (total 39 columns):
 #   Column                                Non-Null Count  Dtype  
---  ------                                --------------  -----  
 0   State                                 12159 non-null  object 
 1   CNN                                   12159 non-null  int64  
 2   Provider Name                         12159 non-null  object 
 3   Address                               12159 non-null  object 
 4   City                                  12159 non-null  object 
 5   Zip                                   12159 non-null  int64  
 6   Phone                                 12159 non-null  int64  
 7   Type of Ownership                     12159 non-null  object 
 8   Offers Nursing Care Services          12159 non-null  bool   
 9   Offers Physical Therapy Services      12159 non-null  bool   
 10  Offers Occupational Therapy Services  12159 non-null  bool   
 11  Offers Speech P

Unnamed: 0,CNN,Zip,Phone,Summary Rating,professional rating,* communicated well,* medicine,Overall Rating
count,12159.0,12159.0,12159.0,12159.0,12159.0,12159.0,12159.0,12159.0
mean,326171.615182,57901.661732,6028284000.0,3.668641,3.891289,3.979617,3.075958,3.151045
std,226214.421878,25335.435766,2498248000.0,0.652841,0.695801,0.687862,0.734038,0.759003
min,17000.0,601.0,2012916000.0,1.0,1.0,1.0,1.0,1.0
25%,117094.0,35963.5,3308363000.0,3.668641,3.891289,3.979617,3.0,3.0
50%,267655.0,60607.0,6182779000.0,3.668641,3.891289,3.979617,3.075958,3.151045
75%,459232.5,77478.0,8181149000.0,4.0,4.0,4.0,3.075958,3.151045
max,747974.0,99901.0,9898949000.0,5.0,5.0,5.0,5.0,5.0


In [34]:
# 2. Handle Missing Data
missing_data = df.isnull().sum()  # Identify missing values
print("\nMissing Values:\n", missing_data[missing_data > 0])


Missing Values:
 Series([], dtype: int64)


#   Check for Null Values

In [31]:
# Impute missing numerical data with mean, categorical data with mode
for column in df.columns:
    if df[column].dtype in ['float64', 'int64']:
        df[column].fillna(df[column].mean(), inplace=True)
    else:
        df[column].fillna(df[column].mode()[0], inplace=True)
null_counts = df.isnull().sum()
null_counts
df

Unnamed: 0,State,CNN,Provider Name,Address,City,Zip,Phone,Type of Ownership,Offers Nursing Care Services,Offers Physical Therapy Services,...,Overall Rating,footnote6,% health rating 9 or10,footnote4,reported yes,footnote5,Number of completed Surveys,footnote8,Response rate,footnote7
0,AL,17000,BUREAU OF HOME & COMMUNITY SERVICES ...,"201 MONROE STREET, THE RSA TOWER, SUITE 1200 ...",MONTGOMERY,36104,3342065341,Official Health Agency,True,True,...,3.151045,No survey results are available for this period.,Not Available,No survey results are available for this period.,Not Available,No survey results are available for this period.,Not Available,No survey results are available for this period.,Not Available,No survey results are available for this period.
1,AL,17008,JEFFERSON COUNTY HOME CARE ...,2201 ARLINGTON AVENUE ...,BESSEMER,35020,2059169500,Official Health Agency,True,True,...,3.000000,Fewer than 100 patients completed the survey. ...,86,Fewer than 100 patients completed the survey. ...,68,Fewer than 100 patients completed the survey. ...,58,Fewer than 100 patients completed the survey. ...,24,Fewer than 100 patients completed the survey. ...
2,AL,17009,ALACARE HOME HEALTH & HOSPICE ...,2970 LORNA ROAD ...,BIRMINGHAM,35216,2058242680,Local,True,True,...,5.000000,No survey results are available for this period.,91,No survey results are available for this period.,86,No survey results are available for this period.,348,No survey results are available for this period.,34,No survey results are available for this period.
3,AL,17013,GENTIVA HEALTH SERVICES ...,"557 GLOVER STREET, SUITE 5 ...",ENTERPRISE,36330,3343470234,Official Health Agency,True,True,...,4.000000,No survey results are available for this period.,89,No survey results are available for this period.,90,No survey results are available for this period.,248,No survey results are available for this period.,38,No survey results are available for this period.
4,AL,17014,AMEDISYS HOME HEALTH ...,68278 MAIN STREET ...,BLOUNTSVILLE,35031,2054294919,Local,True,True,...,4.000000,No survey results are available for this period.,85,No survey results are available for this period.,86,No survey results are available for this period.,159,No survey results are available for this period.,31,No survey results are available for this period.
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
12154,TX,747970,"ROSE OF SHARON HOME HEALTH, INC. ...",109 EAST HOUSTON AVENUE ...,CROCKETT,75835,7133677275,Local,True,True,...,3.151045,No survey results are available for this period.,Not Available,No survey results are available for this period.,Not Available,No survey results are available for this period.,Not Available,No survey results are available for this period.,Not Available,No survey results are available for this period.
12155,TX,747971,"SACRED CARE HOME HEALTH, LLC ...",222 W BURLESON ...,WHARTON,77488,9795313068,Local,True,True,...,3.151045,No survey results are available for this period.,Not Available,No survey results are available for this period.,Not Available,No survey results are available for this period.,Not Available,No survey results are available for this period.,Not Available,No survey results are available for this period.
12156,TX,747972,"ALL DAY HEALTHCARE, INC. ...","330 MAIN STREET, SUITE #1B ...",SEALY,77474,9795894084,Local,True,True,...,3.151045,No survey results are available for this period.,Not Available,No survey results are available for this period.,Not Available,No survey results are available for this period.,Not Available,No survey results are available for this period.,Not Available,No survey results are available for this period.
12157,TX,747973,BRUSHY CREEK HOME HEALTH AGENCY INC ...,608 MORROW ST STE 105 ...,AUSTIN,78752,5123236175,Local,True,True,...,3.151045,No survey results are available for this period.,Not Available,No survey results are available for this period.,Not Available,No survey results are available for this period.,Not Available,No survey results are available for this period.,Not Available,No survey results are available for this period.


In [32]:
null_counts = df.isnull().sum()
null_counts

State                                   0
CNN                                     0
Provider Name                           0
Address                                 0
City                                    0
Zip                                     0
Phone                                   0
Type of Ownership                       0
Offers Nursing Care Services            0
Offers Physical Therapy Services        0
Offers Occupational Therapy Services    0
Offers Speech Pathology Services        0
Offers Medical Social Services          0
Offers Home Health Aide Services        0
Date Certified                          0
Summary Rating                          0
footnote1                               0
professional rating                     0
footnote2                               0
% health care professional way          0
footnote9                               0
* communicated well                     0
footnote10                              0
% communicated well               

In [25]:
# 3. Handle Duplicates
duplicates = df.duplicated().sum()  # Check for duplicate rows
print("\nNumber of duplicate rows:", duplicates)
#df = df.drop_duplicates()  # Remove duplicate rows


Number of duplicate rows: 0


In [33]:
# Summary of the dataset
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 12159 entries, 0 to 12158
Data columns (total 39 columns):
 #   Column                                Non-Null Count  Dtype  
---  ------                                --------------  -----  
 0   State                                 12159 non-null  object 
 1   CNN                                   12159 non-null  int64  
 2   Provider Name                         12159 non-null  object 
 3   Address                               12159 non-null  object 
 4   City                                  12159 non-null  object 
 5   Zip                                   12159 non-null  int64  
 6   Phone                                 12159 non-null  int64  
 7   Type of Ownership                     12159 non-null  object 
 8   Offers Nursing Care Services          12159 non-null  bool   
 9   Offers Physical Therapy Services      12159 non-null  bool   
 10  Offers Occupational Therapy Services  12159 non-null  bool   
 11  Offers Speech P