### CXR-BERT-specialized
CXR-BERT is a chest X-ray (CXR) domain-specific language model that makes use of an improved vocabulary, novel pretraining procedure, weight regularization, and text augmentations. The resulting model demonstrates improved performance on radiology natural language inference, radiology masked language model token prediction, and downstream vision-language processing tasks such as zero-shot phrase grounding and image classification.

First, we pretrain CXR-BERT-general from a randomly initialized BERT model via Masked Language Modeling (MLM) on abstracts PubMed and clinical notes from the publicly-available MIMIC-III and MIMIC-CXR. In that regard, the general model is expected be applicable for research in clinical domains other than the chest radiology through domain specific fine-tuning.

CXR-BERT-specialized is continually pretrained from CXR-BERT-general to further specialize in the chest X-ray domain. At the final stage, CXR-BERT is trained in a multi-modal contrastive learning framework, similar to the CLIP framework. The latent representation of [CLS] token is utilized to align text/image embeddings.

In [1]:
import torch
from transformers import AutoModel, AutoTokenizer

# Load the model and tokenizer
url = "microsoft/BiomedVLP-CXR-BERT-specialized"
tokenizer = AutoTokenizer.from_pretrained(url, trust_remote_code=True)
model = AutoModel.from_pretrained(url, trust_remote_code=True)

# Input text prompts (e.g., reference, synonym, contradiction)
text_prompts = ["There is no pneumothorax or pleural effusion",
                "No pleural effusion or pneumothorax is seen",
                "The extent of the pleural effusion is constant."]

# Tokenize and compute the sentence embeddings
tokenizer_output = tokenizer.batch_encode_plus(batch_text_or_text_pairs=text_prompts,
                                               add_special_tokens=True,
                                               padding='longest',
                                               return_tensors='pt')
embeddings = model.get_projected_text_embeddings(input_ids=tokenizer_output.input_ids,
                                                 attention_mask=tokenizer_output.attention_mask)

# Compute the cosine similarity of sentence embeddings obtained from input text prompts.
sim = torch.mm(embeddings, embeddings.t())

  from .autonotebook import tqdm as notebook_tqdm


In [None]:
from transformers import AutoConfig

config = AutoConfig.from_pretrained("microsoft/BiomedVLP-CXR-BERT-specialized")
print(config)