# Interpret and deploy MedCLIP pipeline

## Method
1. Input text(s) and generate tokens for each one (before applying Attention Mechanism) -- tokenizer
2. Apply Attention Mechanism via Neural Network -- model
3. Pool outputs to generate 1 single embedding


## How to proceed
- We can use SentenceTransformer to simply run SentenceTransformer("...").encode(text) without dealing with pooling.
- Using Transformers requires more manual work
- SentenceTransformer is not available for this model
- Transformers requires an AutoTokenizer to preprocess the text into IDs, with padding, truncation parameters
- Then we use AutoModel which is the actual Neural Network to create an output
- This outputs an embedding per token, and we take a representative pool for an entire sentence embedding

In [None]:
%pip install transformers torch ipywidgets

In [None]:
from transformers import AutoTokenizer, AutoModel
import torch
tokeniser = AutoTokenizer.from_pretrained("emilyalsentzer/Bio_ClinicalBERT") # Generate input tokens
texts = [
    "Patient has a cold",
    "Patient has a sore throat and no other symptoms",
    "Patient is vomiting blood and has a collapsed lung"
    ]

In [None]:
inputs = tokeniser(texts, padding=True, truncation=True, return_tensors="pt") # Tokens
print(inputs)

In [None]:
model = AutoModel.from_pretrained("emilyalsentzer/Bio_ClinicalBERT") # Load Neural Network
with torch.no_grad(): # Apply forward pass without calculating gradients to speed up computation
    outputs = model(**inputs) # Apply Attention Mechanism to each token to generate embeddings

hidden_states = outputs.last_hidden_state # "Hidden_states" is the attention-mechanism output
print(hidden_states.shape)

In [None]:
# Apply Pooling with a mask (similar to filtering Pandas DataFrame)
mask = inputs["attention_mask"].unsqueeze(-1)
pooled = (hidden_states * mask).sum(dim=1) / mask.sum(dim=1)
print(pooled)

# Full Text Encoder class loading

In [None]:
from transformers import AutoTokenizer, AutoModel
import torch

class Model:
    """Import model with AutoTokenizer and Automodel. Defaults to BioClinicalBERT"""
    def __init__(self, link="emilyalsentzer/Bio_ClinicalBERT"):
        self.tokenizer = AutoTokenizer.from_pretrained(link)
        self.model = AutoModel.from_pretrained(link) # Load Neural Network
    def embeddings(self, texts):
        inputs = self.tokenizer(texts, padding=True, truncation=True, return_tensors="pt") # Tokens

        with torch.no_grad(): # Apply forward pass without calculating gradients to speed up computation
            outputs = self.model(**inputs) # Apply Attention Mechanism to each token to generate embeddings

        hidden_states = outputs.last_hidden_state # "Hidden_states" is the attention-mechanism output
        # Apply Pooling with a mask (similar to filtering Pandas DataFrame)
        mask = inputs["attention_mask"].unsqueeze(-1)
        pooled = (hidden_states * mask).sum(dim=1) / mask.sum(dim=1)
        return pooled

In [None]:
tmp = Model().embeddings(["Patient has a cold", "Patient is vomiting blood"])
print(tmp)

# Image Encoder
We use the swin transformer by microsoft: https://huggingface.co/microsoft/swinv2-tiny-patch4-window16-256

In [None]:
# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-classification", model="microsoft/swinv2-tiny-patch4-window16-256", use_fast=True)
pipe("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/hub/parrots.png")

- Test component