In [1]:
%config InlineBackend.figure_format='retina'

## Abstract

- contains 224,316 chest radiographs of 65,240 patients
- designed a laberl to automatically detect the presence of 14 observations in radiology reports
    - investigate different approaches to using uncertainty labels that output probability of these observations given available frontal and lateral radiographs
- Validation set = 200 chest radiographic studies
    - manually annotated by 3 board-certified radiologists
        - found different uncertainty approachs were useful for different pathologies
- Results, model ROC and PR curves lie above all 3 radiologist operating points

## Introduction

- Automated chest radiograph interpretation at level of practicing radiologists
    - benefit in many medical settings
        - improved workflow prioritization and clinical decision support
        - large-scale screening and global population health initiatives
- Designed labeler that extracted observations from free-text radiology reports
    - captured uncertainties present in reports by using an uncertainty label
- Pay particular attention to uncertainty labels

### Table 1 From CheXpert Paper
![Table 1 From Chexpert Paper](images/chexpert_table_1.png)

## Dataset

### _Data Collection and Label Selection_

- collected chest radiographic studies from Stanford Hospital, performed between October 2002 and July 2017
    - from both inpatient and outpatient centers, along with associated radiology reports
    - from these sampled 1000 reports for manual review by board-certified radiologist
- determined 14 observations (i.e. pathologies) based on prevalence in reports and clinical relevance
    - `Pneumonia` was included as a label to represent images that suggest primary infection as diagnosis
    - `No Finding` observation captured absence of all pathologies
    
### _Label Extraction from Radiology Reports_

- team developed automated rule-based labeler to extract observations from free text radiology reports
    - set up in three distinct stages:
        - __mention extraction__
        - __mention classification__
        - __mention aggregation__
        
#### Mention Extraction

- extracts mentions from list of observation from _Impression_ section of report
    - summarizes key findings in study
    - team also put together manually curated list of phrases to match alternative names for pathologies in reports
    
#### Mention Classification

- after extraction, aim is to classify them as negative, uncertain or positive
- `uncertain` label can catch both uncertainty of radiologist in diagnosis as well as ambiguity inherent in report (__HOW?__)
- Is 3-phase pipeline consisting of:
    - pre-negation uncertainty
    - negation
    - post-negation uncertainty
        - if match is found, mention is classified accordingly 
        - if mention is not matched in any of the phases, it is classified as positive
- Rules for mention classification designed on universal dependency parse of report
    - first, split and tokenize sentences using `NLTK`
    - then, sentences parsed using Bllip parser trained using __David McClosky's__ biomedical model [see here](https://nlp.stanford.edu/~mcclosky/papers/dmcc-thesis-2010.pdf)
    - finally, universal dependency graph of each sentence is computed using Stanford CoreNLP [see here](https://nlp.stanford.edu/pubs/USD_LREC14_paper_camera_ready.pdf)
    
#### Mention Aggregation

- use classification for each mention of observations to determine label from 12 pathologies as well as `Support Devices` and `No Finding`
    - observations with at least one mention is assigned a positive (1) label
    - observation assigned uncertain (u) label if no positively classified mentions and at least one uncertain mention
    - observation assigned negative label if there is at least one negatively classified mention
    - assign _blank_ if there is no mention of an observation
    - `No Finding` observation assigned a positive label (1) if no pathology classified as positive or uncertain
    
# Table 2 From Chexpert Paper
![Table 2](images/chexpert_table_2.png)

## Labeler Results

### Report Evaluation Set
- report evaluation set = 1000 radiology reports from 1000 distinct randomly sampled patients
    - do not overlap with patients whose studies were used to develop the labeler
- two board-certified radiologists (w/o access to additional info) label each observation
    - confidently present (1)
    - confidently absent (0)
    - uncertainly present (u)
    - not mentioned (blank)
- resulting annotation serve as ground truth on the report evaluation set

### Comparison to NIH labeler
- compared labeler against method used in NIH medical image dataset
- Table 2 (see above) shows the performace of the CheXpert labeler vs. NIH labeler
    - across all observations CheXpert labeler achieved higher F1 score
    - The F1 score: weighted average of the precision and recall, with the best value at 1 and worst score at 0. 
        - The relative contribution of precision and recall to the F1 score are equal. 
        - The formula for the F1 score is:
`F1 = 2 * (precision * recall) / (precision + recall)`
-  Three key differences between CheXpert method and NIH method
    - did __not__ use automatic mention extractors like MetaMap or DNorm
    - incorporated several additional rules to capture large variation in ways negation and uncertainty are conveyed
    - split uncertainty classification of mentions into pre-negation and post-negation
        - allowed them to resolve cases of uncertainty rules double matching with negation rules in the reports
        - Example, the following phrase `cannot exclude pneumothorax` conveys uncertainty in the presence of pneumothorax
        - without pre-negation stage, 'pneumothorax match is classified as negative due to 'exclude XXX' rule
        - by applying 'cannot exclude' rule in pre-negation, this observation can be correctly classified as uncertain

## Model