# Report Generation Metrics from CheXagent Paper

![CheXagent Report Generation Metrics](../assets/extra/Text_metrics_CheXagent.png)


- Dataset: CheXpert_findings
  - Metric: CheXbert_F1
    - Target Minimum: 0.35
  - Metric: BERTScore
    - Target Minimum: 0.48
  - Metric: RadGraph_F1
    - Target Minimum: 0.27

- Dataset: MIMIC-CXR_findings
  - Metric: CheXbert_F1
    - Target Minimum: 0.33
  - Metric: BERTScore
    - Target Minimum: 0.47
  - Metric: RadGraph_F1
    - Target Minimum: 0.28

- Dataset: MIMIC-CXR_proprietary_comparison
  - Metric: CheXbert_F1_Macro_14
    - Target Minimum: 0.5
  - Metric: CheXbert_F1_Macro_5
    - Target Minimum: 0.53

- Dataset: MIMIC-CXR_summarization
  - Metric: ROUGE_L
    - Target Minimum: 0.35

In [None]:
import sys
import os

# Get the absolute path of the 'tutorials' directory
notebook_dir = os.path.abspath('') 

# Get the path to the parent directory (the project root)
project_root = os.path.dirname(notebook_dir)

# Add the project root to the sys.path so we can import utils from the repo
if project_root not in sys.path:
    sys.path.append(project_root)
from utils.text_metrics import *

Example use from 2 generated texts

In [3]:
generated_reports = ["1. supine frontal view of the chest demonstrates interval removal of the enteric tube. the remaining right ij catheter and surgical materials are stable. 2. the heart is moderately enlarged and mitral annular calcification as well as",
                    "1. increased left lower lung zone opacity. 2. blunting of the right costophrenic angle with a small right pleural effusion. 3. stable mitral valve replacement and annular ring in the tricuspid" ]
original_reports = [
    "1. stable right internal jugular central venous catheter with tip in the superior vena cava. mitral and tricuspid valves are again seen. median sternotomy wires. nasogastric tube extends into the stomach. 2. loculated right pleural effusion is unchanged. 3. prominent interstitial lung markings bilaterally likely related to pulmonary edema versus infection unchanged. 4. cardiomediastinal silhouette stable in size and appearance.",
    "1. findings consistent with removal of a significant amount of right pleural fluid. no pneumothorax. 2. stable small left pleural effusion with retrocardiac opacity which may represent atelectasis or consolidation."]
all_metrics = evaluate_all_metrics(generated_reports, original_reports, evaluation_mode="CheXagent")
for metric, scores in all_metrics.items():
    print(f"{metric}: {scores}")

Using device: cuda:0
chexbert_f1_weighted: 0.3333333333333333
chexbert_f1_micro: 0.4
chexbert_f1_macro: 0.14285714285714285
chexbert_f1_micro_5: 0.2857142857142857
chexbert_f1_macro_5: 0.13333333333333333
bertscore_f1: [0.6276065111160278, 0.6828216314315796]
radgraph_f1_RG_E: 0.24080267558528426
radgraph_f1_RG_ER: 0.2176759410801964
rouge_l: [0.16494845360824745, 0.19672131147540983]


In [4]:
from f1chexbert import F1CheXbert
f1chexbert = F1CheXbert()
accuracy, accuracy_not_averaged, class_report, class_report_5 = f1chexbert(
    hyps=generated_reports,
    refs=original_reports)
print("Accuracy:", accuracy)
print("Accuracy (not averaged):", accuracy_not_averaged)
print("\nClassification Report:")
for key, value in class_report.items():
    print(f"{key}: {value}")
print("\nClassification Report (top 5):")
for key, value in class_report_5.items():
    print(f"{key}: {value}")

Accuracy: 0.0
Accuracy (not averaged): [0. 0.]

Classification Report:
Enlarged Cardiomediastinum: {'precision': 0.0, 'recall': 0.0, 'f1-score': 0.0, 'support': 1.0}
Cardiomegaly: {'precision': 0.0, 'recall': 0.0, 'f1-score': 0.0, 'support': 0.0}
Lung Opacity: {'precision': 1.0, 'recall': 0.5, 'f1-score': 0.6666666666666666, 'support': 2.0}
Lung Lesion: {'precision': 0.0, 'recall': 0.0, 'f1-score': 0.0, 'support': 0.0}
Edema: {'precision': 0.0, 'recall': 0.0, 'f1-score': 0.0, 'support': 1.0}
Consolidation: {'precision': 0.0, 'recall': 0.0, 'f1-score': 0.0, 'support': 1.0}
Pneumonia: {'precision': 0.0, 'recall': 0.0, 'f1-score': 0.0, 'support': 1.0}
Atelectasis: {'precision': 0.0, 'recall': 0.0, 'f1-score': 0.0, 'support': 1.0}
Pneumothorax: {'precision': 0.0, 'recall': 0.0, 'f1-score': 0.0, 'support': 0.0}
Pleural Effusion: {'precision': 1.0, 'recall': 0.5, 'f1-score': 0.6666666666666666, 'support': 2.0}
Pleural Other: {'precision': 0.0, 'recall': 0.0, 'f1-score': 0.0, 'support': 0.0}


In [5]:
from radgraph import F1RadGraph
f1radgraph = F1RadGraph(reward_level="all", model_type="radgraph-xl")
hyps = generated_reports
refs = original_reports
mean_reward, reward_list, hypothesis_annotation_lists, reference_annotation_lists = f1radgraph(hyps=hyps, refs=refs)

rg_e, rg_er, _ = mean_reward

print([float(val) for val in mean_reward])

Using device: cuda:0
[0.24080267558528426, 0.2176759410801964, 0.05357142857142857]
