In [2]:
%matplotlib inline
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()

import re

### Meeting notes

Outcomes of interest

- medication use
- time to normal diet
- harms

Can use study-specific random effect to account for differneces in baseline dosages among study sites.


Import outcome data and rename columns

In [4]:
raw_data = (pd.read_excel('data/Tonsillectomy_OUTCOME_Data_KQ5_Master.xlsx', 
              sheetname='Outcome data', 
              na_values=['null', 'ND'])
            .drop(['Comments', 'Other stats \nName','Other Stats','Results'], 
                  axis=1)
            .rename(columns={'OUTC_Main_\nCATG':'outcome_cat',
                            "Outc_SUB_\nCATG":'outcome_subcat',
                            'Outcome\nN': 'N',
                            'Outcome  \n%': 'outcome_pct',
                            "Outcome\n Mean": 'outcome_mean',
                            "Outcome \nSD": 'outcome_sd',
                            "Outcome \n Median": 'outcome_med',
                            "Outcome \n 95% L": 'outcome_lo_95',
                            "Outcome \n 95% H": 'outcome_hi_95'}))
raw_data.shape

(404, 44)

In [6]:
raw_data.Refid.unique()

array([ 192,  253,  319,  382,  470,  493,  676,  800,  876, 1039, 1085,
       1108, 1202, 1898, 1991, 2171, 2213, 2550, 3326, 3583, 3857, 3865,
       4033, 2629, 2853, 3031, 3086, 3155, 3213, 3218, 3243, 3287, 3498,
       3558, 3669, 3686, 3836, 3997, 6217, 6295, 6432, 6439, 6452, 6529,
       6586, 6728, 7078, 7097, 7170, 7241])

Import sample size information for each study

In [8]:
baseline_data = pd.read_excel('data/Tonsillectomy_OUTCOME_Data_KQ5_Master.xlsx', 
              sheetname='Basic_N', na_values=['null', 'ND', 'NA'])

Attempting to aggregate groups

In [9]:
# Lower case
groups = raw_data.Group_Desc.str.lower()
# Strip information after commas
groups = groups.str.split(',').apply(lambda s: s[0])
# Remove chunks with numeric characters or 'kg', as these are dosages
groups = groups.str.split(' ').apply(lambda s: 
                        ' '.join([si for si in s 
                                  if not re.compile('[\d()/]').search(si)]))
# Combine saline, control and placebo, assume groups starting with 'no' means placebo
groups = groups.replace({'saline':'placebo',
                        'control':'placebo'}).apply(lambda s: 'placebo'*s.startswith('no ') or s)

Groupings of interventions

- all -trons
- dexamethasone
- ibuprofen, pracetamol, etc.

In [10]:
groups.value_counts()

placebo                                                                     91
dexamethasone                                                               87
ramosetron                                                                  27
granisetron                                                                  9
ibuprofen                                                                    9
ondansetron                                                                  9
tropisetron                                                                  7
metoclopramide                                                               7
dolasetron                                                                   6
preoperative ketoprofen + saline                                             6
intravenous dexamethasone                                                    6
dexamethasone sodium phosphate                                               6
levobupivacaine with epinephrine                    

Merge three data tables

In [44]:
data_merged = raw_data.merge(baseline_data, on='Refid').dropna(subset=['Drug Class'])

Strip whitespace

In [45]:
data_merged['Drug Class'] = data_merged['Drug Class'].str.strip()

In [46]:
data_merged.replace({'Drug Class':{'Control (no dexamethasone)': 'control',
                                  'No Rx': 'control',
                                  'no dexamethasone': 'control',
                                  'saline': 'control',
                                  'salvia officinalis oral rinse': 'control',
                                  'paerioperative analgesic': 'perioperative analgesic'}})['Drug Class'].unique()

array(['perioperative steroid', 'control', 'perioperative NSAID',
       'perioperative saline', 'perioperative steroid + placebo',
       'perioperative analgesic',
       'perioperative analgesic + perioperative steroid',
       'perioperative local anesthetic', 'local anesthetic', 'placebo',
       'perioperative anesthetic', 'perioperative steroid and anesthetic',
       'perioperative antiemetic',
       'perioperative antiemetic and anesthetic',
       'postoperative antiemetic', 'antifibrinolytic',
       'perioperative antiemetic and steroid', 'postoperative analgesic',
       'perioperative opiate analgesic', 'postoperative NSAID',
       'preoperative NSAID + saline', 'postperative NSAID + saline',
       'perioperative NSAID + placebo', 'preoperative NSAID',
       'preoperative analgesic', 'perioperative analgesic + analgesic',
       'Orogastric suction'], dtype=object)

In [54]:
data_merged.loc[data_merged['Drug Class'].str.startswith('post'), ['Citation', 'Refid', 'Drug Class']].drop_duplicates()

Unnamed: 0,Citation,Refid,Drug Class
153,Y. Fujii and H. Tanaka. Results of a prospect...,3326,postoperative antiemetic
290,S. Lalicevic and I. Djordjevic. Comparison of...,3243,postoperative analgesic
300,"S. Oztekin, H. Hepaguslar, A. A. Kar, D. Ozzey...",3558,postoperative NSAID
319,H. Kokki and A. Salonen. Comparison of pre- a...,3686,postperative NSAID + saline
389,"I. H. Lee, C. Y. Sung, J. I. Han, C. H. Kim an...",6728,postoperative NSAID
390,"I. H. Lee, C. Y. Sung, J. I. Han, C. H. Kim an...",6728,postoperative analgesic


In [30]:
data_merged.columns

Index(['Citation', 'Family', 'Refid', 'Number of \nArms', 'Rx Grouping',
       'Group_Desc', 'Drug Class', 'Dose', 'Route', 'Rx_Durn',
       'Last Assesment tmpt for the study', 'Followup duration category',
       'outcome_cat', 'outcome_subcat', 'Outcome_specify',
       'Outcome sample size', 'Presentation \nlocation', 'Outc_Unit',
       'Outc_Tool', 'BL_N', 'BL %', 'BL Mean', 'BL SD', 'BL SE', 'BL_Median',
       'BL_Q1', 'BL_Q3', 'BL Min', 'BL Max', 'BL 95% L', 'BL 95% H',
       'Outcome timepoint (when was this outcome measured, e.g., in PACU, 12 months post-op, immediately post-op--would need a row for each outcome at each timepoint of interest)',
       'Outcome\ncount', 'outcome_pct', 'outcome_mean', 'outcome_sd',
       'Outcome \n SE', 'outcome_med', 'Outcome\n _Q1', 'Outcome \n_Q3',
       'Outcome\n  Min', 'Outcome \n Max', 'outcome_lo_95', 'outcome_hi_95',
       'Population_Catg', 'Diagnostic Method', 'Population\n specify',
       'Trial name', 'Study Design', 'Rx s

In [33]:
data_merged.outcome_cat.value_counts()

Pain management            181
Harms                      130
Emesis Management           64
Health Care Utilization     17
Return to normal diet       12
Name: outcome_cat, dtype: int64

In [38]:
data_merged[data_merged.outcome_cat=='Harms'].Outcome_specify.value_counts()

post-op bleeding                       24
Serious adverse events                 14
re-operation for bleeding              14
readmission for bleeding               12
readmission for PONV                    9
readmission for dehydration             6
secondary bleeding                      4
readmission for post-op bleeding        3
post-op hemorrhage                      3
ER visit for Vomiting/hydration         3
hospital admission for vomiting         3
adverse events                          3
side effects reported                   3
postoperative hemorrhage                2
post-op tonsillar fossa hemorrhage      2
readmission for pain or bleeding        2
serious adverse events                  2
ER visit -  unspecified                 2
re-operation for post-op hemorrhage     2
primary post-op bleeding                2
death                                   2
ER visit for post-op pain               2
readmission for pain management         2
errhysis                          

Extract top-3 interventions

In [32]:
interventions_of_interest = data_merged.Maj_catg.value_counts()[:3].index.values

In [33]:
data_merged.outcome_cat.value_counts()

Health Care Utilization    216
Harms                      132
Pain management             41
Return to normal diet       12
Name: outcome_cat, dtype: int64

In [23]:
data_merged.outcome_subcat.value_counts()

HC utilization-Need for Rescue meds         105
HC utilization-Additional meds use/ dose     65
Pain management                              41
Harms: Post-op bleeding                      39
HC utilization-Number of Rescue meds         31
Harms-other                                  22
Harms-Re-operation for bleeding              17
HC utilization-# of antibiotics              17
Harms- readmission for bleeding              15
Time to Return to normal diet                12
Harms- readmission for PONV                   9
Harms- readmission for dehydration            6
Harms- readmission for post-op pain           4
Harms-readmission-unspecified                 4
Harms- ER visit for PONV                      3
Harms-Hospital admission                      3
Harms- ER visit-Unspecified                   2
Harms- ER visit for dehydration               2
Harms- ER visit for post-op pain              2
Harms-Death-30 day                            2
Name: outcome_subcat, dtype: int64

Post-op nausea, vomiting, dehydration (everything but bleeding)
- assume "unspecified" is NOT bleeding
Post-op bleeding

Meds/pain management

Antibiotics

Re-op is only for bleeding.

Filter readmission data by looking for outcomes with "admission" or "visit" in the name

In [24]:
readmission_outcomes = data_merged[data_merged.outcome_subcat.str.contains(r'admission|visit')]

In [26]:
readmission_outcomes[['N', 'outcome_pct']]

Unnamed: 0,N,outcome_pct
12,0,0.0
13,0,0.0
69,0,0.0
70,0,0.0
185,1,2.0
186,0,0.0
187,1,2.0
188,1,2.0
189,0,0.0
190,0,0.0
