<a href="https://colab.research.google.com/github/Alfred9/Natural-Language-Processing/blob/main/Named%20Entity%20Recognition/Deberta_med_ner_2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [4]:
# Use a pipeline as a high-level helper
from transformers import pipeline

# Initialize the pipeline
pipe = pipeline("token-classification", model="Clinical-AI-Apollo/Medical-NER", aggregation_strategy='simple')


Device set to use cpu


In [5]:
# List of sample texts
sample_texts = [
    "45 year old woman diagnosed with CAD",
    "Patient reports severe chest pain for 3 days",
    "MRI confirmed the presence of a brain tumor",
    "35-year-old man with a history of diabetes and hypertension",
    "She was admitted due to acute respiratory distress syndrome (ARDS)",
    "Family history includes coronary artery disease and stroke"
]

In [6]:
# Process each text through the pipeline
for text in sample_texts:
    result = pipe(text)
    print(f"Input: {text}")
    print(f"Result: {result}")
    print("-" * 50)

Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.


Input: 45 year old woman diagnosed with CAD
Result: [{'entity_group': 'AGE', 'score': 0.5433549, 'word': '45 year old', 'start': 0, 'end': 11}, {'entity_group': 'SEX', 'score': 0.40775427, 'word': 'woman', 'start': 11, 'end': 17}, {'entity_group': 'DISEASE_DISORDER', 'score': 0.34644428, 'word': 'CAD', 'start': 32, 'end': 36}]
--------------------------------------------------
Input: Patient reports severe chest pain for 3 days
Result: [{'entity_group': 'SEVERITY', 'score': 0.49363744, 'word': 'severe', 'start': 15, 'end': 22}, {'entity_group': 'BIOLOGICAL_STRUCTURE', 'score': 0.21146007, 'word': 'chest', 'start': 22, 'end': 28}, {'entity_group': 'SIGN_SYMPTOM', 'score': 0.16951033, 'word': 'pain', 'start': 28, 'end': 33}, {'entity_group': 'DURATION', 'score': 0.5693923, 'word': '3 days', 'start': 37, 'end': 44}]
--------------------------------------------------
Input: MRI confirmed the presence of a brain tumor
Result: [{'entity_group': 'DIAGNOSTIC_PROCEDURE', 'score': 0.23901568, 'w

In [9]:
# Process each text and display tokens with NER labels
print("NER Results Using Pipeline:\n")
for text in sample_texts:
    result = pipe(text)
    print(f"Input: {text}")
    for entity in result:
        print(f"Text: {entity['word']} | Label: {entity['entity_group']} | Score: {entity['score']:.2f}")
    print("-" * 50)

NER Results Using Pipeline:

Input: 45 year old woman diagnosed with CAD
Text: 45 year old | Label: AGE | Score: 0.54
Text: woman | Label: SEX | Score: 0.41
Text: CAD | Label: DISEASE_DISORDER | Score: 0.35
--------------------------------------------------
Input: Patient reports severe chest pain for 3 days
Text: severe | Label: SEVERITY | Score: 0.49
Text: chest | Label: BIOLOGICAL_STRUCTURE | Score: 0.21
Text: pain | Label: SIGN_SYMPTOM | Score: 0.17
Text: 3 days | Label: DURATION | Score: 0.57
--------------------------------------------------
Input: MRI confirmed the presence of a brain tumor
Text: MRI | Label: DIAGNOSTIC_PROCEDURE | Score: 0.24
Text: brain tumor | Label: DISEASE_DISORDER | Score: 0.27
--------------------------------------------------
Input: 35-year-old man with a history of diabetes and hypertension
Text: 35-year-old | Label: AGE | Score: 0.65
Text: man | Label: SEX | Score: 0.53
Text: diabetes | Label: DISEASE_DISORDER | Score: 0.24
Text: hypertension | Label: 

In [12]:
# Load model directly
from transformers import AutoTokenizer, AutoModelForTokenClassification
import torch


 # Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained("Clinical-AI-Apollo/Medical-NER")
model = AutoModelForTokenClassification.from_pretrained("Clinical-AI-Apollo/Medical-NER")


In [13]:
# Process each text using the model directly
print("Processing using the tokenizer and model directly:\n")
for text in sample_texts:
    inputs = tokenizer(text, return_tensors="pt")
    outputs = model(**inputs)
    print(f"Input: {text}")
    print(f"Raw Outputs: {outputs}")
    print("-" * 50)


Processing using the tokenizer and model directly:

Input: 45 year old woman diagnosed with CAD
Raw Outputs: TokenClassifierOutput(loss=None, logits=tensor([[[-8.5375e-01, -1.0362e-02,  1.8961e-01,  7.0059e-01,  8.2117e-01,
          -3.0354e-01,  2.9691e-01,  4.5949e-02, -2.9505e-01, -8.3273e-01,
           6.8068e-01,  2.4214e-01, -2.5524e-02, -2.1197e-01,  3.8403e-01,
          -1.4249e-01,  3.9579e-01, -9.9153e-02, -5.2300e-02, -2.0429e-01,
           5.9592e-01, -2.3436e-01,  2.2224e-01, -1.9267e-01,  4.3034e-02,
          -3.0417e-01, -1.9081e-01, -7.2881e-01,  5.6102e-01, -1.8302e-01,
           3.2192e-01, -5.3647e-01, -4.9908e-01,  6.9888e-01, -1.1278e-01,
          -1.9376e-01, -1.0863e-01,  6.4272e-01,  3.6245e-01, -5.7336e-01,
          -4.0215e-01,  5.2347e-01, -3.1513e-02,  6.5801e-01,  3.5849e-01,
          -1.9354e-01,  4.6761e-01, -2.4857e-01, -1.6946e-02,  2.3624e-01,
          -1.3665e-01,  1.8609e-01, -4.8788e-01, -3.2033e-01,  3.8771e-01,
           2.2669e-02,  1.

In [14]:
# Function to extract text and NER labels using the raw model output
def extract_ner_labels(text):
    inputs = tokenizer(text, return_tensors="pt")
    outputs = model(**inputs)
    logits = outputs.logits
    predictions = torch.argmax(logits, dim=-1)  # Get the predicted class indices
    tokens = tokenizer.convert_ids_to_tokens(inputs["input_ids"][0])  # Convert IDs to tokens
    labels = [model.config.id2label[pred.item()] for pred in predictions[0]]  # Convert indices to labels

    # Print tokens with labels
    for token, label in zip(tokens, labels):
        if token not in tokenizer.all_special_tokens:  # Exclude special tokens like [CLS], [SEP]
            print(f"Text: {token} | Label: {label}")

In [15]:
# Process each text using the model directly
print("\nNER Results Using Model Directly:\n")
for text in sample_texts:
    print(f"Input: {text}")
    extract_ner_labels(text)
    print("-" * 50)



NER Results Using Model Directly:

Input: 45 year old woman diagnosed with CAD
Text: ▁45 | Label: B-AGE
Text: ▁year | Label: I-AGE
Text: ▁old | Label: I-AGE
Text: ▁woman | Label: B-SEX
Text: ▁diagnosed | Label: O
Text: ▁with | Label: O
Text: ▁CAD | Label: B-DISEASE_DISORDER
--------------------------------------------------
Input: Patient reports severe chest pain for 3 days
Text: ▁Patient | Label: O
Text: ▁reports | Label: O
Text: ▁severe | Label: B-SEVERITY
Text: ▁chest | Label: B-BIOLOGICAL_STRUCTURE
Text: ▁pain | Label: I-SIGN_SYMPTOM
Text: ▁for | Label: O
Text: ▁3 | Label: I-DURATION
Text: ▁days | Label: I-DURATION
--------------------------------------------------
Input: MRI confirmed the presence of a brain tumor
Text: ▁MRI | Label: B-DIAGNOSTIC_PROCEDURE
Text: ▁confirmed | Label: O
Text: ▁the | Label: O
Text: ▁presence | Label: O
Text: ▁of | Label: O
Text: ▁a | Label: O
Text: ▁brain | Label: B-DISEASE_DISORDER
Text: ▁tumor | Label: I-DISEASE_DISORDER
--------------------------

### GatorTron-Medium overview

In [None]:
from transformers import AutoModel, AutoTokenizer, AutoConfig

tokinizer= AutoTokenizer.from_pretrained('UFNLP/gatortron-medium')
config=AutoConfig.from_pretrained('UFNLP/gatortron-medium')
mymodel=AutoModel.from_pretrained('UFNLP/gatortron-medium')

encoded_input=tokinizer("Bone scan:  Negative for distant metastasis.", return_tensors="pt")
encoded_output = mymodel(**encoded_input)


model.safetensors:  20%|#9        | 1.56G/7.84G [00:00<?, ?B/s]