# RadReportAnnotator Demo

We demonstrate on data from the [Indiana University Chest X-ray Dataset (Demner-Fushman et al.)](https://www.ncbi.nlm.nih.gov/pubmed/26133894)

This example can be adapted to your own collection of radiology reports exported from Montage 
and a manually-generated set of classification labels

Import library:

In [1]:
import RadReportAnnotator as ra
import os.path

Instantiate RadReportAnnotator object with paths to demo `reports` and `labels`. 

`Reports` contains 3,666 deidentified chest x-ray radiology reports. 

`Labels` contains binary labels for `Normal`, `Opacity`, `Cardiomegaly`, `Nodule`, and `Fibrosis` for 1,500 of these reports.

In [2]:
CXRAnnotator = ra.RadReportAnnotator(report_dir_path=os.path.join("pseudodata","reports"), 
                                     validation_file_path=os.path.join("pseudodata","labels","labeled_reports.xlsx"))

Set arguments for RadReportAnnotator here in define_config - see documentation in RadReportAnnotator for more information.

Models that use only bag of words (`DO_BOW=True,DO_WORD2VEC=False`) have been competitive in our experience with those that use both bag of words and word embeddings (`DO_BOW=True, DO_WORD2VEC=True`). Word embeddings can take considerable time to train on larger datasets. 

In the below demo, we use bag of words features (`DO_BOW=True`) with 1, 2, and 3-grams (`N_GRAM_SIZES=[1,2,3]`).

In [3]:
CXRAnnotator.define_config(DO_BOW=True,
	DO_WORD2VEC=False,
	DO_PARAGRAPH_VECTOR=False,
	N_GRAM_SIZES=[1,2,3],
	SILVER_THRESHOLD="fiftypct",
	NAME_UNID_REPORTS = "ACCID", 
	NAME_TEXT_REPORTS ="REPORT", 
	N_THRESH_CORPUS=10,
	N_THRESH_OUTCOMES=50)

Build corpus from reports

In [4]:
CXRAnnotator.build_corpus()

building pre-corpus
pre-corpus built
preprocessing reports


100%|█████████████████████████████████████████████████████████████████████████████| 3666/3666 [00:07<00:00, 473.86it/s]


creating n-grams


100%|████████████████████████████████████████████████████████████████████████████| 3666/3666 [00:00<00:00, 6268.53it/s]


number of unique n-grams: 33865
number of unique n-grams after filtering out low frequency tokens: 2425


We can examine how the preprocessing works. Let's look at the original input text for report at index 500:

In [5]:
CXRAnnotator.df_data['Report Text'].iloc[500]

'  Comparison:  None   Indication:  Central line placement   Findings:  The heart is borderline in size. The aorta is mildly tortuous. XXXX right IJ catheter is in XXXX with tip in proximal right atrium/cavoatrial junction. There is no pneumothorax. Lungs are grossly clear. There is no large effusion.   Impression:  Right IJ catheter tip in proximal right atrium. No pneumothorax. '

Let's look this report after preprocessing:

In [6]:
print(CXRAnnotator.processed_reports[500])

['comparison', 'none', 'indic', 'central', 'line', 'placement', 'find', 'the', 'heart', 'is', 'borderlin', 'in', 'size', 'sentenceend', 'the', 'aorta', 'is', 'mildli', 'tortuou', 'sentenceend', 'xxxx', 'right', 'ij', 'cathet', 'is', 'in', 'xxxx', 'with', 'tip', 'in', 'proxim', 'right', 'atrium', 'cavoatri', 'junction', 'sentenceend', 'there', 'is', 'no', 'pneumothorax', 'sentenceend', 'lung', 'are', 'grossli', 'clear', 'sentenceend', 'there', 'is', 'no', 'larg', 'effus', 'sentenceend', 'impress', 'right', 'ij', 'cathet', 'tip', 'in', 'proxim', 'right', 'atrium', 'sentenceend', 'no', 'pneumothorax', 'sentenceend', 'sentenceend', 'sentenceend', 'sentenceend']


Words were stemmed ("indication"-->"indic"), extra punctuation was removed, and periods were replaced with the special end character. Word2vec takes input in a format like this to learn word embeddings.

Let's look at the n-gram features for this report, which will be used for bag of words modeling:

In [7]:
print(CXRAnnotator.ngram_reports[500])

['find_the_heart', 'the_heart_is', 'the_aorta_is', 'lung_are_grossli', 'are_grossli_clear', 'no_larg_effus', 'comparison_none', 'find_the', 'the_heart', 'heart_is', 'in_size', 'the_aorta', 'aorta_is', 'is_mildli', 'xxxx_right', 'is_in', 'in_xxxx', 'xxxx_with', 'with_tip', 'tip_in', 'right_atrium', 'there_is', 'no_pneumothorax', 'lung_are', 'are_grossli', 'grossli_clear', 'there_is', 'no_larg', 'larg_effus', 'impress_right', 'cathet_tip', 'tip_in', 'right_atrium', 'no_pneumothorax', 'comparison', 'none', 'indic', 'central', 'line', 'placement', 'find', 'the', 'heart', 'is', 'borderlin', 'in', 'size', 'the', 'aorta', 'is', 'mildli', 'tortuou', 'xxxx', 'right', 'cathet', 'is', 'in', 'xxxx', 'with', 'tip', 'in', 'right', 'atrium', 'junction', 'there', 'is', 'pneumothorax', 'lung', 'are', 'grossli', 'clear', 'there', 'is', 'larg', 'effus', 'impress', 'right', 'cathet', 'tip', 'in', 'right', 'atrium', 'pneumothorax']


Since we have `N_GRAM_SIZES=[1,2,3]` in this demo, we see individual words (1-grams), each 2 consecutive words (2-grams; e.g., 'comparison_none'), and each 3 consecutive words ('no_larg_effus') available as features. Sometimes these 2- and 3-grams are uninformative ('comparison_none'), at other times they may be useful ('no_pneumothorax'). Note that only n-grams appearing `N_THRESH_CORPUS` times in training data (10 in this demo) are included. 

Train Lasso logistic regression models using features from 60% of labeled reports and infer labels for 40% of labeled reports (for performance evaluation) and unlabeled reports (for ultimate application):

In [8]:
binary_labels, proba_labels = CXRAnnotator.infer_labels()

generating features


100%|████████████████████████████████████████████████████████████████████████████| 1500/1500 [00:00<00:00, 4099.24it/s]


total labels:6
labels eligible for inference:4
dimensionality of predictor matrix:(1500, 2425)
n_train in modeling=900
n_test in modeling=600
i=0


100%|███████████████████████████████████████████████████████████████████████████| 2000/2000 [00:00<00:00, 26965.13it/s]
100%|███████████████████████████████████████████████████████████████████████████| 1666/1666 [00:00<00:00, 19683.19it/s]


Examine quality of predictions on held out 40% of labeled data.

In [9]:
CXRAnnotator.accuracy

Unnamed: 0_level_0,AUC,True +,False +,True -,False -
Label (with calcs on held out 40 pct),Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Normal,0.956679,208,53,324,15
Opacity,0.981869,62,17,517,4
Cardiomegaly,0.993979,41,18,541,0
Nodule,0.991759,16,36,548,0


Notice `Fibrosis` was filtered out despite appearing in input data as we had very few positive observations. It is important to ensure that sufficient positive and negative cases for each label exist in your labeled data.

Rare labels with high AUC may still have a significant number of false positives (`Nodule`). Be aware of noise introduced by your labeling process before using inferred labels to train convolutional neural networks or other algorithms, and consider the positive predictive value (PPV) of a positive label. Additional labeled examples, particularly of rare pathology, may help improve accuracy. 

Recent results ([Ghafoorian et al.](https://arxiv.org/abs/1801.05040) [Rajpurkar et al.](https://arxiv.org/abs/1711.05225)) demonstrate that deep learning can achieve impressive results when trained to a large noisily labeled radiological imaging dataset.

Examine a few probabilistic predictions:

In [10]:
proba_labels.tail()

Unnamed: 0_level_0,Normal,Opacity,Cardiomegaly,Nodule
Accession Number,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
103661,0.113953,0.007305,0.022156,0.009483
103662,0.283203,0.007305,0.022156,0.009483
103663,0.283203,0.007305,0.022156,0.009483
103664,0.000129,0.060547,0.058807,0.037109
103665,0.020233,0.999512,0.011406,0.019058


Examine a few binary predictions - these override to manual labels when available:

In [11]:
binary_labels.tail()

Unnamed: 0_level_0,Normal,Opacity,Cardiomegaly,Nodule
Accession Number,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
103661,0,0,0,0
103662,0,0,0,0
103663,0,0,0,0
103664,0,0,0,0
103665,0,1,0,0


You can examine individual report predictions; here are report and predictions for a report that manual reviewers coded as `Normal`:

In [12]:
#normal report
print(CXRAnnotator.df_data['Report Text'].iloc[1700])
print("\n")
print(proba_labels.iloc[1700])
print("\n")
print(binary_labels.iloc[1700])

  Comparison:  None.   Indication:  XXXX, chest pain and XXXX x2 weeks.   Findings:  The cardiomediastinal silhouette and pulmonary vasculature are within normal limits in size. The lungs are clear of focal airspace disease, pneumothorax, or pleural effusion. There are no acute bony findings.   Impression:  No acute cardiopulmonary findings. 


Normal          0.969727
Opacity         0.001776
Cardiomegaly    0.000642
Nodule          0.000948
Name: 101700, dtype: float64


Normal          1
Opacity         0
Cardiomegaly    0
Nodule          0
Name: 101700, dtype: int32


Here are report and predictions for a report that manual reviewers coded as positive for `Cardiomegaly`:

In [13]:
print(CXRAnnotator.df_data['Report Text'].iloc[2100])
print("\n")
print(proba_labels.iloc[2100])
print("\n")
print(binary_labels.iloc[2100])

  Comparison:  PA and lateral chest x-XXXX dated XXXX.   Indication:  XXXX-year-old female with chest pain.   Findings:  The heart size is enlarged. Tortuous aorta. Otherwise the mediastinal contour is within normal limits. The lungs are free of any focal infiltrates. There are no nodules or masses. No visible pneumothorax. No visible pleural fluid. The XXXX are grossly normal. There is no visible free intraperitoneal air under the diaphragm.   Impression:  1. Cardiomegaly without lung infiltrates. 


Normal          0.008018
Opacity         0.001008
Cardiomegaly    0.981445
Nodule          0.056152
Name: 102100, dtype: float64


Normal          0
Opacity         0
Cardiomegaly    1
Nodule          0
Name: 102100, dtype: int32


Here are report and predictions for a report that manual reviewers coded as positive for `Opacity`:

In [14]:
#opacity
print(CXRAnnotator.df_data['Report Text'].iloc[2770])
print("\n")
print(proba_labels.iloc[2770])
print("\n")
print(binary_labels.iloc[2770])

  Comparison:  XXXX, XXXX   Indication:  XXXX-year-old XXXX with chest pain.   Findings:  The heart size is stable. The aorta is ectatic and atherosclerotic but stable. XXXX sternotomy XXXX are again noted. The scarring in the left lower lobe is again noted and unchanged from prior exam. There are mild bilateral prominent lung interstitial opacities consistent with emphysematous disease. The calcified granulomas are stable.   Impression:  1. Changes of emphysema and left lower lobe scarring, both stable. 2. Unchanged degenerative and atherosclerotic changes of the thoracic aorta. 


Normal          0.000000
Opacity         0.981445
Cardiomegaly    0.125977
Nodule          0.234497
Name: 102770, dtype: float64


Normal          0
Opacity         1
Cardiomegaly    0
Nodule          0
Name: 102770, dtype: int32
