## 2. NER Exploration

We will explore models for Medical NER using Hugging Face Transformers.

### Hugging Face NLP Pipelines
Hugging Face Transformers provides several NLP pipelines for common tasks.
Here are some of the main ones:

- **text-classification**: Sentiment analysis, topic classification, etc.
- **token-classification**: Named Entity Recognition (NER), part-of-speech tagging.
- **question-answering**: Extract answers from context given a question.
- **text-generation**: Generate text (e.g., with GPT, Llama).
- **text2text-generation**: Sequence-to-sequence tasks (summarization, translation).
- **summarization**: Generate summaries of text.
- **translation**: Translate text between languages.
- **zero-shot-classification**: Classify text with labels not seen during training.
- **conversational**: Build chatbots and dialogue agents.
- **fill-mask**: Predict masked words in a sentence (BERT-style).
- **feature-extraction**: Get vector embeddings for text.
- **sentence-similarity**: Compute similarity between sentences.
- **table-question-answering**: QA over tabular data.

In [1]:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("token-classification", model="Clinical-AI-Apollo/Medical-NER", aggregation_strategy='simple')
result = pipe('45 year old woman diagnosed with CAD')

  from .autonotebook import tqdm as notebook_tqdm
Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`
Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`
Device set to use cuda:0
Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.


In [3]:
# Load json data
import json
with open('../data/medical-ner/data.json', 'r') as f:
    data = json.load(f)

In [None]:
# Get all possible entity groups from the huggingface model
entity_groups = set()

In [6]:
# Testing the Medical-NER model on the dataset
for example in data['examples'][:5]:
    print(f"Text: {example['content']}")
    entities = pipe(example['content'])
    print(f"Entities: {entities}\n")

Text: While bismuth compounds (Pepto-Bismol) decreased the number of bowel movements in those with travelers' diarrhea, they do not decrease the length of illness.[91] Anti-motility agents like loperamide are also effective at reducing the number of stools but not the duration of disease.[8] These agents should be used only if bloody diarrhea is not present.[92]

Diosmectite, a natural aluminomagnesium silicate clay, is effective in alleviating symptoms of acute diarrhea in children,[93] and also has some effects in chronic functional diarrhea, radiation-induced diarrhea, and chemotherapy-induced diarrhea.[45] Another absorbent agent used for the treatment of mild diarrhea is kaopectate.

Racecadotril an antisecretory medication may be used to treat diarrhea in children and adults.[86] It has better tolerability than loperamide, as it causes less constipation and flatulence.[94]
Entities: [{'entity_group': 'MEDICATION', 'score': np.float32(0.62879586), 'word': 'bismuth compounds', 'sta