# CLASSIFICATION

I am using Hospital Review data published in March 2021 by US Centers for Medicare & Medicaid Services (CMS). It publishes findings of targeted hospitals based on complaint, random checks and normal recertification.

In [1]:
import pandas as pd

pd.set_option('display.max_colwidth', 200)
pd.set_option('display.max_columns', 100)
pd.set_option('display.max_rows', 300)

In [2]:
df = pd.read_excel("last6monthstoMarch.xlsx")
df.head(1)

Unnamed: 0,CCN,Facility,Address,City,State,Zip Code,date_survey_completed,survey_reason,Survey Visit Tag Cited On,Tag,violation,CFR,memo_tag_type,date_the_citation_was_corrected,immediate_jeopardy,provider_deemed_at_time_of_survey,deficiencies,Statement of Deficiencies text (cont.),deficiencies_continuation
0,20001,PROVIDENCE ALASKA MEDICAL CENTER,3200 PROVIDENCE DRIVE,ANCHORAGE,AK,99508,09/18/2020,Complaint Investigation,1,A0115,Patient Rights,482.13,Condition of Participation,01/08/2021,N,Not Deemed,"30614 . Based on observations, interviews and record reviews the facility failed to ensure patients' rights were protected and promoted in accordance's to the Condition of Participation: CFR 482....",,


In [3]:
df.survey_reason.value_counts()

Complaint Investigation        667
Recertification                 77
Full Survey After Complaint     22
Name: survey_reason, dtype: int64

# I will use classification to see instances where hospitals were investigated because of a complaint

In [4]:
df['complaint'] = (df['survey_reason'] == 'Complaint Investigation').astype(int)
df.head(1)

Unnamed: 0,CCN,Facility,Address,City,State,Zip Code,date_survey_completed,survey_reason,Survey Visit Tag Cited On,Tag,violation,CFR,memo_tag_type,date_the_citation_was_corrected,immediate_jeopardy,provider_deemed_at_time_of_survey,deficiencies,Statement of Deficiencies text (cont.),deficiencies_continuation,complaint
0,20001,PROVIDENCE ALASKA MEDICAL CENTER,3200 PROVIDENCE DRIVE,ANCHORAGE,AK,99508,09/18/2020,Complaint Investigation,1,A0115,Patient Rights,482.13,Condition of Participation,01/08/2021,N,Not Deemed,"30614 . Based on observations, interviews and record reviews the facility failed to ensure patients' rights were protected and promoted in accordance's to the Condition of Participation: CFR 482....",,,1


In [5]:
df.shape

(766, 20)

In [6]:
df = df.drop_duplicates()
df

Unnamed: 0,CCN,Facility,Address,City,State,Zip Code,date_survey_completed,survey_reason,Survey Visit Tag Cited On,Tag,violation,CFR,memo_tag_type,date_the_citation_was_corrected,immediate_jeopardy,provider_deemed_at_time_of_survey,deficiencies,Statement of Deficiencies text (cont.),deficiencies_continuation,complaint
0,20001,PROVIDENCE ALASKA MEDICAL CENTER,3200 PROVIDENCE DRIVE,ANCHORAGE,AK,99508,09/18/2020,Complaint Investigation,1,A0115,Patient Rights,482.13,Condition of Participation,01/08/2021,N,Not Deemed,"30614 . Based on observations, interviews and record reviews the facility failed to ensure patients' rights were protected and promoted in accordance's to the Condition of Participation: CFR 482....",,,1
1,20001,PROVIDENCE ALASKA MEDICAL CENTER,3200 PROVIDENCE DRIVE,ANCHORAGE,AK,99508,09/18/2020,Complaint Investigation,1,A0164,Patient Rights: Restraint Or Seclusion,482.13(e)(2),Standard of Participation,01/08/2021,N,Not Deemed,"30614 . Based on record review, interview, and document review, the facility failed to ensure that less restrictive intervention was not effective before the use of manual restraint for 1 patien...",,,1
2,20001,PROVIDENCE ALASKA MEDICAL CENTER,3200 PROVIDENCE DRIVE,ANCHORAGE,AK,99508,09/18/2020,Complaint Investigation,1,A0165,Patient Rights: Restraint Or Seclusion,482.13(e)(3),Standard of Participation,01/08/2021,N,Not Deemed,"30614 . Based on record review, interview, and document review the facility failed to ensure that less restrictive interventions were used before restraint implementation for 1 patient (#1) out ...",,,1
3,20001,PROVIDENCE ALASKA MEDICAL CENTER,3200 PROVIDENCE DRIVE,ANCHORAGE,AK,99508,09/18/2020,Complaint Investigation,1,A0166,Patient Rights: Restraint Or Seclusion,482.13(e)(4)(i),Standard of Participation,01/08/2021,N,Not Deemed,"30614 . Based on record review, interview, and document review, the facility failed to ensure the use of restraint for 1 patient (#1) out of 11 sampled patients was in accordance with a written ...",,,1
4,20001,PROVIDENCE ALASKA MEDICAL CENTER,3200 PROVIDENCE DRIVE,ANCHORAGE,AK,99508,09/18/2020,Complaint Investigation,1,A0167,Patient Rights: Restraint Or Seclusion,482.13(e)(4)(ii),Standard of Participation,01/08/2021,N,Not Deemed,"30614 . Based on record review, interview, and document review, the facility failed to ensure the use of restraint for 1 patient (#1) out of 11 sampled patients was conducted in accordance to ho...",,,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
754,522006,SELECT SPECIALTY HOSPITAL MILWAUKEE,8901 W LINCOLN AVE 2ND FLOOR,WEST ALLIS,WI,53227,09/09/2020,Complaint Investigation,1,A0450,Medical Record Services,482.24(c)(1),Standard of Participation,10/28/2020,N,Joint Commission,"37419 Based on record review and interview, the facility failed to ensure the medical staff follow their policies and procedures for authentication of their history and physicals in 3 of 10 hist...",,,1
755,522006,SELECT SPECIALTY HOSPITAL MILWAUKEE,8901 W LINCOLN AVE 2ND FLOOR,WEST ALLIS,WI,53227,09/09/2020,Complaint Investigation,1,A0454,Content Of Record: Orders Dated & Signed,482.24(c)(2),Standard of Participation,10/28/2020,N,Joint Commission,"37419 Based on record review and interview, the facility failed to ensure verbal orders are promptly signed, dated and authenticated in 8 of 9 medical records with restraint orders (Patient #1,...",,,1
756,522006,SELECT SPECIALTY HOSPITAL MILWAUKEE,8901 W LINCOLN AVE 2ND FLOOR,WEST ALLIS,WI,53227,09/09/2020,Complaint Investigation,1,A0468,Content Of Record: Discharge Summary,482.24(c)(4)(vii),Standard of Participation,10/28/2020,N,Joint Commission,"37419 Based on record review and interview the facility failed to ensure the medical staff followed their medical staff bylaws, rules and regulations to provide complete discharge summaries to a...",,,1
764,524002,WINNEBAGO MENTAL HEALTH INSTITUTE,4100 TREFFERT DR,WINNEBAGO,WI,54985,10/27/2020,Complaint Investigation,1,A0115,Patient Rights,482.13,Condition of Participation,12/29/2020,N,Joint Commission,"38763 Based on interviews and record review, the facility staff failed to ensure the safety of 1 of 7 discharged minor patients (Patient #1) out of a total of 10 patient records reviewed. Find...",,,1


In [7]:
df.shape

(649, 20)

In [8]:
df.isnull().sum()

CCN                                         0
Facility                                    0
Address                                     0
City                                        0
State                                       0
Zip Code                                    0
date_survey_completed                       0
survey_reason                               0
Survey Visit Tag Cited On                   0
Tag                                         0
violation                                   0
CFR                                       156
memo_tag_type                               0
date_the_citation_was_corrected           266
immediate_jeopardy                          0
provider_deemed_at_time_of_survey           0
deficiencies                                0
Statement of Deficiencies text (cont.)    640
deficiencies_continuation                 648
complaint                                   0
dtype: int64

In [9]:
df.shape

(649, 20)

In [10]:
from sklearn.feature_extraction.text import TfidfVectorizer

#A vectorizer
vectorizer = TfidfVectorizer(stop_words='english', min_df=100, max_features=10, max_df=0.5)

# Learn and count the words 
matrix = vectorizer.fit_transform(df.deficiencies)

# Convert the matrix of counts to a dataframe
words_df = pd.DataFrame(matrix.toarray(),
                        columns=vectorizer.get_feature_names())

In [11]:

words_df.head()

Unnamed: 0,12,19,covid,dated,did,nurse,nursing,pm,revealed,room
0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0
1,0.066134,0.0,0.0,0.450447,0.14316,0.431999,0.025919,0.140424,0.715799,0.229887
2,0.03943,0.0,0.0,0.447605,0.085354,0.515128,0.0,0.083723,0.682832,0.228437
3,0.078234,0.0,0.0,0.399643,0.127014,0.383276,0.022995,0.124586,0.78325,0.20396
4,0.066774,0.0,0.0,0.454803,0.168635,0.436177,0.026169,0.165413,0.698631,0.232111


In [12]:
X = words_df
y =  df.complaint

In [13]:
from sklearn.ensemble import RandomForestClassifier

clf = RandomForestClassifier(n_estimators=10)
clf.fit(X, y)

RandomForestClassifier(n_estimators=10)

In [14]:
clf.score(X, y)

0.9044684129429892

In [16]:
from sklearn.metrics import confusion_matrix

y_true = y
y_pred = clf.predict(X)
matrix = confusion_matrix(y_true, y_pred)

label_names = pd.Series(['complaint', 'no complaint'])
pd.DataFrame(matrix,
     columns='Predicted ' + label_names,
     index='Is ' + label_names)


Unnamed: 0,Predicted complaint,Predicted no complaint
Is complaint,53,46
Is no complaint,16,534


In [17]:
import eli5

feature_names = list(words_df.columns)
eli5.show_weights(clf, feature_names=feature_names)

Weight,Feature
0.2385  ± 0.1531,revealed
0.1212  ± 0.0878,pm
0.1199  ± 0.1010,12
0.1012  ± 0.0585,19
0.0976  ± 0.0958,dated
0.0845  ± 0.0752,nursing
0.0767  ± 0.0499,nurse
0.0725  ± 0.0773,did
0.0593  ± 0.0534,room
0.0286  ± 0.0742,covid
