This notebook attempts to address the following reviewer comment, by first figuring out how many patients meet the requirements for doing the analysis (we expect that it'll be too small):

> If feasible, it would be very nice to see if using those participants who did not receive a VFSS, if they can be categorized into aspirators/non-aspirators based on their microbiome. The authors do an AUC for site prediction (gastric vs. lung, etc.), but have their looked at the lung microbiome and used their methods to categorize participants without VFSS as aspirators/non-aspirators and go back into their data/chart to see if they had symptoms of aspiration? This may satisfy the clinical significance and give use a powerful tool (also look at predicting the development of bronchitis, bronchiolitis, bronchiectasis, or pneumonia). Of course, this is likely outside of the data available

In [1]:
import pandas as pd

In [2]:
fmeta = '../../data/clean/rosen.metadata.symptoms_meds_only.clean'
meta = pd.read_csv(fmeta, sep='\t')
meta.head()

Unnamed: 0,sample_id,subject_id,site,mbs_consolidated,gender_all,abx_all,ppi_all,h2blockers_all,inhaled_steroids_all,oral_steroids_all,...,wtloss_all,foodup_all,chest_pain_all,asthma_all,chronic_cough_all,pneum_all,ear_inf_all,sinus_inf_all,subject_id.1,metadata_id
0,04-006-9B,04-006-9,bal,Aspiration/Penetration,1.0,0.0,,,,,...,,0.0,,1.0,1.0,,,,04-006-9,k23_v2
1,04-009-2B,04-009-2,bal,Aspiration/Penetration,0.0,0.0,,,,,...,0.0,0.0,,1.0,1.0,1.0,0.0,0.0,04-009-2,k23_v2
2,04-009-2G,04-009-2,gastric_fluid,Aspiration/Penetration,0.0,0.0,,,,,...,0.0,0.0,,1.0,1.0,1.0,0.0,0.0,04-009-2,k23_v2
3,04-011-3B,04-011-3,bal,Normal,1.0,1.0,1.0,,,,...,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,04-011-3,k23_v2
4,04-011-3G,04-011-3,gastric_fluid,Normal,1.0,1.0,1.0,,,,...,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,04-011-3,k23_v2


In [9]:
samples = meta['sample_id'].tolist()

print(meta.shape)
# Don't include samples from second time point or lung transplants
exclude = ['2', 'F', 'sick', 'F2T']
for s in exclude:
    samples = [i for i in samples if not i.endswith(s)]
samples = [i for i in samples if not i.startswith('05')]

meta = meta.query('sample_id == @samples')
print(meta.shape)

(586, 25)
(461, 25)


In [10]:
meta['mbs_consolidated'] = meta['mbs_consolidated'].fillna('nan')

In [13]:
keep_sites = ['bal', 'throat_swab']

(meta.query('mbs_consolidated == "nan"').query('site == @keep_sites').groupby(['subject_id']).size() == 2).sum()

25

Ok, so there are 25 untested patients with both lung and throat samples. Let's see how many also have chronic cough metadata...

In [18]:
bal_throat_subjs = (
    meta
        .query('mbs_consolidated == "nan"')
        .query('site == @keep_sites')
        .groupby(['subject_id'])
        .size()
        .reset_index(name='n')
)
bal_throat_subjs.head()
    

Unnamed: 0,subject_id,n
0,01-112-7,1
1,01-164-7,1
2,01-173-4,1
3,01-200-1,1
4,01-247-3,1


In [20]:
subjs = bal_throat_subjs.query('n == 2')['subject_id'].tolist()

In [22]:
meta.columns

Index([u'sample_id', u'subject_id', u'site', u'mbs_consolidated',
       u'gender_all', u'abx_all', u'ppi_all', u'h2blockers_all',
       u'inhaled_steroids_all', u'oral_steroids_all', u'prob_swall_all',
       u'food_stuck_all', u'diff_swall_all', u'abd_pain_all', u'const_all',
       u'wtloss_all', u'foodup_all', u'chest_pain_all', u'asthma_all',
       u'chronic_cough_all', u'pneum_all', u'ear_inf_all', u'sinus_inf_all',
       u'subject_id.1', u'metadata_id'],
      dtype='object')

In [26]:
keepcols = ['mbs_consolidated', 'prob_swall_all', 'food_stuck_all', 
            'diff_swall_all', 'abd_pain_all', 'const_all',
            'wtloss_all', 'foodup_all', 'chest_pain_all', 'asthma_all',
            'chronic_cough_all', 'pneum_all', 'ear_inf_all', 'sinus_inf_all',
            'subject_id']
subjs_meta = meta.query('subject_id == @subjs')[keepcols].drop_duplicates()
print(subjs_meta.shape)
subjs_meta.head()

(25, 15)


Unnamed: 0,mbs_consolidated,prob_swall_all,food_stuck_all,diff_swall_all,abd_pain_all,const_all,wtloss_all,foodup_all,chest_pain_all,asthma_all,chronic_cough_all,pneum_all,ear_inf_all,sinus_inf_all,subject_id
30,,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,04-062-8
42,,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,,04-067-2
55,,0.0,0.0,0.0,1.0,1.0,1.0,1.0,1.0,0.0,0.0,0.0,,1.0,04-075-8
67,,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,04-080-7
70,,0.0,1.0,1.0,0.0,0.0,1.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,04-081-3


In [27]:
subjs_meta.groupby(['chronic_cough_all']).size()

chronic_cough_all
0.0     6
1.0    17
dtype: int64