### An Application of LLM on Ontology Learning


Ontology Learning (OL), which automates the extraction of this structured knowledge from unstructured data, is crucial for building dynamic ontologies foundational to the Semantic Web.

<img src="images/ol.png"/>


**One of the OL tasks is focused on the `Term Typing` task, which aims to identify the types of terms**

**We assume that terms are being extracted from texts and in this demo we aim to find types for terms**

In [1]:
import json
import codecs
from transformers import AutoModelForCausalLM, AutoTokenizer
from tqdm import tqdm
from sklearn.metrics import accuracy_score, f1_score

## Loading Demo Data for Term Typing

Let's load the snapshot of NCI ontology dataset curated data from LLMs4OL Challenge @ ISWC 2024 (https://github.com/HamedBabaei/LLMs4OL-Challenge-ISWC2024)

In [2]:
def load_json(path: str) -> json:
    """
        Loading a json file
    :param path: path to the json file
    :return: a json dictionary
    """
    with codecs.open(path, "r", encoding="utf-8") as json_file:
        json_data = json.load(json_file)
    return json_data

In [3]:
demo_data = load_json(path="demo-data.json")

demo_data

[{'ID': 'C0000869-6', 'term': 'Thorntrees', 'type': ['plant']},
 {'ID': 'C0034246-807',
  'term': 'Tanacetum cinerariifolium Flower',
  'type': ['plant']},
 {'ID': 'C0242767-3091', 'term': 'Caulis', 'type': ['plant']},
 {'ID': 'C0330382-5170', 'term': 'Sweet William', 'type': ['plant']},
 {'ID': 'C0034246-807',
  'term': 'Tanacetum cinerariifolium Flower',
  'type': ['plant']},
 {'ID': 'C0001577-28',
  'term': 'Uterine Adnexitis',
  'type': ['disease or syndrome']},
 {'ID': 'C0002173-46',
  'term': 'Alopecia Mucinosis',
  'type': ['disease or syndrome']},
 {'ID': 'C0002992-68',
  'term': 'Diffuse Hemangioma',
  'type': ['disease or syndrome']},
 {'ID': 'C0003850-91',
  'term': 'Arterial Sclerosis',
  'type': ['disease or syndrome']},
 {'ID': 'C0035920-854',
  'term': 'Rubella Infection',
  'type': ['disease or syndrome']},
 {'ID': 'C0001699-31', 'term': 'K. pneumoniae', 'type': ['bacterium']},
 {'ID': 'C0085494-2237',
  'term': 'Presumptive enterococcus',
  'type': ['bacterium']},
 {'I

## Building Answer Sets

Creating possible answers that LLMs can generate for types.

In [5]:
term_types = []

for sample in demo_data:
    term_types.append(sample['type'][0])
    
term_types = list(set(term_types))

term_types

['fungus', 'food', 'disease or syndrome', 'bacterium', 'plant']

Building answer sets for types. It can be generated manually or using AI (LLMs: GPT-3, GPT-4, ... )

In [6]:
answer_set = {
    "plant": ["plant"],
    "bacterium": ["bacterium", "bacteria"],
    "fungus": ["fungus", "fung"],
    "food": ["food"],
    "disease or syndrome": ["disease or syndrome", "disease", "syndrome"]
}

## Loading LLM

Loading Mistral-7B (https://huggingface.co/mistralai/Mistral-7B-v0.1) model on GPU

In [7]:
model_path = "mistralai/Mistral-7B-v0.1"
device = 'cuda'

llm = AutoModelForCausalLM.from_pretrained(model_path, device_map='balanced')

tokenizer = AutoTokenizer.from_pretrained(model_path)

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

## 1. Standard Prompting

**Standard Prompting** refers to the conventional or widely accepted methods of providing input to an LLM during fine-tuning or inference. This typically involves presenting the model with a prompt or a **starting point,** usually in the form of a text snippet, from which it generates a continuation or response.

**Prompt Tempate** for `Arterio Sclerosis` term -- the type of this term is `disease or syndrome`

In [8]:
standard_pt = """Perform a sentence completion on the following sentence:
Sentence: {term} in biomedicine is a"""

# Arterio Sclerosis is a type of vascular disease!

sample = standard_pt.replace("{term}", "Arterial Sclerosis".lower()) 

print(sample)

Perform a sentence completion on the following sentence:
Sentence: arterial sclerosis in biomedicine is a


Prompting LLMs with standard prompt:

In [9]:
inputs = tokenizer(sample, return_tensors="pt").to(device)

generated_ids = llm.generate(**inputs, max_new_tokens=10, pad_token_id=tokenizer.eos_token_id)

decoded = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)

print(decoded[0])

Perform a sentence completion on the following sentence:
Sentence: arterial sclerosis in biomedicine is a disease of the arteries.

1


In [23]:
# Inferencing method that takes prompt template and data and generates types.
def inference(prompt_template, data, tokenizer, llm, print_generations=True, max_new_tokens=10):
    generated_types = []
    for sample in tqdm(data):
        # 
        sample = prompt_template.replace("{term}", sample['term'].lower()) 
        inputs = tokenizer(sample, return_tensors="pt").to(device)

        generated_ids = llm.generate(**inputs, max_new_tokens=max_new_tokens, pad_token_id=tokenizer.eos_token_id) 
        # num_return_sequences=5

        decoded = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)
        if print_generations:
            print(decoded)
        generated_texts = decoded[0].replace(sample, "")
        generated_types.append(generated_texts)
    return generated_types

def evaluate_generated_types(generated_types, answer_set):
    gen = []
    gt = []
    for index, generated_text in enumerate(generated_types):
        true_answers = answer_set[demo_data[index]['type'][0]]
        true_gen = False
        for true_answer in true_answers:
            if true_answer in generated_text:
                true_gen = True
                break
        if true_gen:
            gen.append(demo_data[index]['type'][0])
        else:
            gen.append(generated_text)
        gt.append(demo_data[index]['type'][0])

    print("Accuracy:", accuracy_score(gt, gen))
    print("F1-Score (macro):", f1_score(gt, gen, average='macro'))

In [14]:
standard_pt = """Perform a sentence completion on the following sentence:
Sentence: {term} in biomedicine is a"""

standard_pt_inference = inference(prompt_template=standard_pt, tokenizer=tokenizer, llm=llm, answer_set=answer_set)

  5%|▌         | 1/20 [00:05<01:45,  5.54s/it]

['Perform a sentence completion on the following sentence:\nSentence: thorntrees in biomedicine is a ________________.\n\n1. new']


 10%|█         | 2/20 [00:10<01:37,  5.40s/it]

['Perform a sentence completion on the following sentence:\nSentence: tanacetum cinerariifolium flower in biomedicine is a ______________.\n\n1. A.']


 15%|█▌        | 3/20 [00:16<01:31,  5.36s/it]

['Perform a sentence completion on the following sentence:\nSentence: caulis in biomedicine is a ______________.\n\n1. A.']


 20%|██        | 4/20 [00:21<01:25,  5.36s/it]

['Perform a sentence completion on the following sentence:\nSentence: sweet william in biomedicine is a ________.\n\n1. A.']


 25%|██▌       | 5/20 [00:26<01:19,  5.29s/it]

['Perform a sentence completion on the following sentence:\nSentence: tanacetum cinerariifolium flower in biomedicine is a ______________.\n\n1. A.']


 30%|███       | 6/20 [00:32<01:14,  5.36s/it]

['Perform a sentence completion on the following sentence:\nSentence: uterine adnexitis in biomedicine is a disease of the female reproductive system.\n']


 35%|███▌      | 7/20 [00:37<01:08,  5.25s/it]

['Perform a sentence completion on the following sentence:\nSentence: alopecia mucinosis in biomedicine is a rare disease that is characterized by the presence of hair']


 40%|████      | 8/20 [00:42<01:03,  5.29s/it]

['Perform a sentence completion on the following sentence:\nSentence: diffuse hemangioma in biomedicine is a _______\n\n1. benign tum']


 45%|████▌     | 9/20 [00:47<00:57,  5.25s/it]

['Perform a sentence completion on the following sentence:\nSentence: arterial sclerosis in biomedicine is a disease of the arteries.\n\n1']


 50%|█████     | 10/20 [00:52<00:52,  5.22s/it]

['Perform a sentence completion on the following sentence:\nSentence: rubella infection in biomedicine is a disease that is caused by a virus.\n\n']


 55%|█████▌    | 11/20 [00:58<00:47,  5.25s/it]

['Perform a sentence completion on the following sentence:\nSentence: k. pneumoniae in biomedicine is a gram-negative, rod-shaped, facult']


 60%|██████    | 12/20 [01:03<00:41,  5.17s/it]

['Perform a sentence completion on the following sentence:\nSentence: presumptive enterococcus in biomedicine is a ________.\n\n1. Gram-']


 65%|██████▌   | 13/20 [01:08<00:36,  5.17s/it]

['Perform a sentence completion on the following sentence:\nSentence: gaffkya species in biomedicine is a ______________.\n\n1. A.']


 70%|███████   | 14/20 [01:13<00:31,  5.20s/it]

['Perform a sentence completion on the following sentence:\nSentence: sterigmatocystis ochracea in biomedicine is a ________________.\n\n1. fun']


 75%|███████▌  | 15/20 [01:18<00:25,  5.15s/it]

['Perform a sentence completion on the following sentence:\nSentence: hyalopus acremonium in biomedicine is a ______________.\n\n1. fung']


 80%|████████  | 16/20 [01:23<00:20,  5.10s/it]

['Perform a sentence completion on the following sentence:\nSentence: sterigmatocystis flavipes in biomedicine is a ________________.\n\n1. fun']


 85%|████████▌ | 17/20 [01:28<00:15,  5.14s/it]

['Perform a sentence completion on the following sentence:\nSentence: bacca in biomedicine is a ______\n\n1. a. a']


 90%|█████████ | 18/20 [01:34<00:10,  5.16s/it]

['Perform a sentence completion on the following sentence:\nSentence: green olive in biomedicine is a _______________.\n\n1. a.']


 95%|█████████▌| 19/20 [01:39<00:05,  5.09s/it]

['Perform a sentence completion on the following sentence:\nSentence: cruciform vegetables in biomedicine is a new and promising field of research.\n\n1']


100%|██████████| 20/20 [01:43<00:00,  5.20s/it]

['Perform a sentence completion on the following sentence:\nSentence: fruit or vegetable in biomedicine is a plant or plant part that is used for its taste']





In [15]:
standard_pt_inference

[' ________________.\n\n1. new',
 ' ______________.\n\n1. A.',
 ' ______________.\n\n1. A.',
 ' ________.\n\n1. A.',
 ' ______________.\n\n1. A.',
 ' disease of the female reproductive system.\n',
 ' rare disease that is characterized by the presence of hair',
 ' _______\n\n1. benign tum',
 ' disease of the arteries.\n\n1',
 ' disease that is caused by a virus.\n\n',
 ' gram-negative, rod-shaped, facult',
 ' ________.\n\n1. Gram-',
 ' ______________.\n\n1. A.',
 ' ________________.\n\n1. fun',
 ' ______________.\n\n1. fung',
 ' ________________.\n\n1. fun',
 ' ______\n\n1. a. a',
 ' _______________.\n\n1. a.',
 ' new and promising field of research.\n\n1',
 ' plant or plant part that is used for its taste']

In [16]:
evaluate_generated_types(generated_types=standard_pt_inference, answer_set=answer_set)

Accuracy: 0.25
F1-Score (macro): 0.08680555555555555


## 2. Different Prompt

Adding instruction about what is the task: `Given a term in biomedicine domain identify the term type.`


In [28]:
standard_pt2 = """Given a term in biomedicine domain identify the term type.
### Term: {term}
### Type: """

standard_pt2_inference = inference(prompt_template=standard_pt2, tokenizer=tokenizer, llm=llm, data=demo_data)

  5%|▌         | 1/20 [00:05<01:36,  5.07s/it]

['Given a term in biomedicine domain identify the term type.\n### Term: thorntrees\n### Type: 1\n### Definition:\n\n### Synony']


 10%|█         | 2/20 [00:10<01:31,  5.09s/it]

['Given a term in biomedicine domain identify the term type.\n### Term: tanacetum cinerariifolium flower\n### Type: 1\n\n### Term: tanacetum']


 15%|█▌        | 3/20 [00:15<01:26,  5.07s/it]

['Given a term in biomedicine domain identify the term type.\n### Term: caulis\n### Type: 1\n### Definition:\n\nThe stem of']


 20%|██        | 4/20 [00:20<01:20,  5.02s/it]

['Given a term in biomedicine domain identify the term type.\n### Term: sweet william\n### Type: 1\n### Definition:\n\nA plant,']


 25%|██▌       | 5/20 [00:25<01:14,  5.00s/it]

['Given a term in biomedicine domain identify the term type.\n### Term: tanacetum cinerariifolium flower\n### Type: 1\n\n### Term: tanacetum']


 30%|███       | 6/20 [00:30<01:09,  4.97s/it]

['Given a term in biomedicine domain identify the term type.\n### Term: uterine adnexitis\n### Type: 1\n\nGiven a term in biomed']


 35%|███▌      | 7/20 [00:34<01:04,  4.96s/it]

['Given a term in biomedicine domain identify the term type.\n### Term: alopecia mucinosis\n### Type: 1\n\nGiven a term in biomed']


 40%|████      | 8/20 [00:40<00:59,  4.99s/it]

['Given a term in biomedicine domain identify the term type.\n### Term: diffuse hemangioma\n### Type: 1\n\nGiven a term in biomed']


 45%|████▌     | 9/20 [00:45<00:55,  5.00s/it]

['Given a term in biomedicine domain identify the term type.\n### Term: arterial sclerosis\n### Type: 1\n\nGiven a term in biomed']


 50%|█████     | 10/20 [00:50<00:50,  5.03s/it]

['Given a term in biomedicine domain identify the term type.\n### Term: rubella infection\n### Type: 1\n\nGiven a term in biomed']


 55%|█████▌    | 11/20 [00:55<00:45,  5.03s/it]

['Given a term in biomedicine domain identify the term type.\n### Term: k. pneumoniae\n### Type: 1\n\nGiven a term in biomed']


 60%|██████    | 12/20 [01:00<00:40,  5.04s/it]

['Given a term in biomedicine domain identify the term type.\n### Term: presumptive enterococcus\n### Type: 1\n\nGiven a term in biomed']


 65%|██████▌   | 13/20 [01:05<00:35,  5.01s/it]

['Given a term in biomedicine domain identify the term type.\n### Term: gaffkya species\n### Type: 1\n### Type: 2\n### Type']


 70%|███████   | 14/20 [01:10<00:29,  4.99s/it]

['Given a term in biomedicine domain identify the term type.\n### Term: sterigmatocystis ochracea\n### Type: 1\n\n### Term: sterigmatoc']


 75%|███████▌  | 15/20 [01:15<00:24,  4.97s/it]

['Given a term in biomedicine domain identify the term type.\n### Term: hyalopus acremonium\n### Type: 1\n\nGiven a term in biomed']


 80%|████████  | 16/20 [01:19<00:19,  4.96s/it]

['Given a term in biomedicine domain identify the term type.\n### Term: sterigmatocystis flavipes\n### Type: 1\n\n### Term: sterigmatoc']


 85%|████████▌ | 17/20 [01:24<00:14,  4.94s/it]

['Given a term in biomedicine domain identify the term type.\n### Term: bacca\n### Type: 1\n### Definition:\n\nA fruit that']


 90%|█████████ | 18/20 [01:29<00:09,  4.96s/it]

['Given a term in biomedicine domain identify the term type.\n### Term: green olive\n### Type: 1\n### Definition:\n\n### Term:']


 95%|█████████▌| 19/20 [01:34<00:04,  4.98s/it]

['Given a term in biomedicine domain identify the term type.\n### Term: cruciform vegetables\n### Type: 1\n### Type: 2\n### Type']


100%|██████████| 20/20 [01:39<00:00,  5.00s/it]

['Given a term in biomedicine domain identify the term type.\n### Term: fruit or vegetable\n### Type: 1\n### Term: fruit or vegetable\n']





In [31]:
standard_pt2_inference

['1\n### Definition:\n\n### Synony',
 '1\n\n### Term: tanacetum',
 '1\n### Definition:\n\nThe stem of',
 '1\n### Definition:\n\nA plant,',
 '1\n\n### Term: tanacetum',
 '1\n\nGiven a term in biomed',
 '1\n\nGiven a term in biomed',
 '1\n\nGiven a term in biomed',
 '1\n\nGiven a term in biomed',
 '1\n\nGiven a term in biomed',
 '1\n\nGiven a term in biomed',
 '1\n\nGiven a term in biomed',
 '1\n### Type: 2\n### Type',
 '1\n\n### Term: sterigmatoc',
 '1\n\nGiven a term in biomed',
 '1\n\n### Term: sterigmatoc',
 '1\n### Definition:\n\nA fruit that',
 '1\n### Definition:\n\n### Term:',
 '1\n### Type: 2\n### Type',
 '1\n### Term: fruit or vegetable\n']

In [32]:
evaluate_generated_types(generated_types=standard_pt2_inference, answer_set=answer_set)

Accuracy: 0.05
F1-Score (macro): 0.02380952380952381


## 3. Providing types within the prompt

Now let's add possible types beside instruction by adding: `List of possible types for term: 'food', 'bacterium', 'disease or syndrome', 'fungus', 'plant'
`

In [33]:
standard_pt3 = """Given a term in biomedicine domain identify the term type.
List of possible types for term: 'food', 'bacterium', 'disease or syndrome', 'fungus', 'plant'

### Term: {term}
### Type: """

standard_pt3_inference = inference(prompt_template=standard_pt3, tokenizer=tokenizer, llm=llm, data=demo_data)

  5%|▌         | 1/20 [00:05<01:36,  5.08s/it]

["Given a term in biomedicine domain identify the term type.\nList of possible types for term: 'food', 'bacterium', 'disease or syndrome', 'fungus', 'plant'\n\n### Term: thorntrees\n### Type:  plant\n\n\n\n\n\n\n\n\n"]


 10%|█         | 2/20 [00:10<01:31,  5.10s/it]

["Given a term in biomedicine domain identify the term type.\nList of possible types for term: 'food', 'bacterium', 'disease or syndrome', 'fungus', 'plant'\n\n### Term: tanacetum cinerariifolium flower\n### Type: 1\n\n### Term: tanacetum"]


 15%|█▌        | 3/20 [00:15<01:26,  5.08s/it]

["Given a term in biomedicine domain identify the term type.\nList of possible types for term: 'food', 'bacterium', 'disease or syndrome', 'fungus', 'plant'\n\n### Term: caulis\n### Type:  food\n\n\n### Term: caulif"]


 20%|██        | 4/20 [00:20<01:20,  5.05s/it]

["Given a term in biomedicine domain identify the term type.\nList of possible types for term: 'food', 'bacterium', 'disease or syndrome', 'fungus', 'plant'\n\n### Term: sweet william\n### Type:  plant\n\n\n\n\n\n\n\n\n"]


 25%|██▌       | 5/20 [00:25<01:15,  5.04s/it]

["Given a term in biomedicine domain identify the term type.\nList of possible types for term: 'food', 'bacterium', 'disease or syndrome', 'fungus', 'plant'\n\n### Term: tanacetum cinerariifolium flower\n### Type: 1\n\n### Term: tanacetum"]


 30%|███       | 6/20 [00:30<01:10,  5.01s/it]

["Given a term in biomedicine domain identify the term type.\nList of possible types for term: 'food', 'bacterium', 'disease or syndrome', 'fungus', 'plant'\n\n### Term: uterine adnexitis\n### Type:  disease or syndrome\n\n\n\n\n\n\n"]


 35%|███▌      | 7/20 [00:35<01:04,  4.98s/it]

["Given a term in biomedicine domain identify the term type.\nList of possible types for term: 'food', 'bacterium', 'disease or syndrome', 'fungus', 'plant'\n\n### Term: alopecia mucinosis\n### Type:  disease or syndrome\n\n\n\n\n\n\n"]


 40%|████      | 8/20 [00:40<00:59,  4.97s/it]

["Given a term in biomedicine domain identify the term type.\nList of possible types for term: 'food', 'bacterium', 'disease or syndrome', 'fungus', 'plant'\n\n### Term: diffuse hemangioma\n### Type:  disease or syndrome\n\n\n\n\n\n\n"]


 45%|████▌     | 9/20 [00:45<00:54,  4.97s/it]

["Given a term in biomedicine domain identify the term type.\nList of possible types for term: 'food', 'bacterium', 'disease or syndrome', 'fungus', 'plant'\n\n### Term: arterial sclerosis\n### Type:  disease or syndrome\n\n\n\n\n\n\n"]


 50%|█████     | 10/20 [00:50<00:49,  4.99s/it]

["Given a term in biomedicine domain identify the term type.\nList of possible types for term: 'food', 'bacterium', 'disease or syndrome', 'fungus', 'plant'\n\n### Term: rubella infection\n### Type:  disease or syndrome\n\n\n\n\n\n\n"]


 55%|█████▌    | 11/20 [00:55<00:45,  5.01s/it]

["Given a term in biomedicine domain identify the term type.\nList of possible types for term: 'food', 'bacterium', 'disease or syndrome', 'fungus', 'plant'\n\n### Term: k. pneumoniae\n### Type: 1\n\n\n\n\n\n\n\n\n"]


 60%|██████    | 12/20 [01:00<00:40,  5.02s/it]

["Given a term in biomedicine domain identify the term type.\nList of possible types for term: 'food', 'bacterium', 'disease or syndrome', 'fungus', 'plant'\n\n### Term: presumptive enterococcus\n### Type:  bacterium\n\n\n\n\n\n\n\n"]


 65%|██████▌   | 13/20 [01:05<00:35,  5.01s/it]

["Given a term in biomedicine domain identify the term type.\nList of possible types for term: 'food', 'bacterium', 'disease or syndrome', 'fungus', 'plant'\n\n### Term: gaffkya species\n### Type: 1\n\n\n\n\n\n\n\n\n"]


 70%|███████   | 14/20 [01:10<00:30,  5.04s/it]

["Given a term in biomedicine domain identify the term type.\nList of possible types for term: 'food', 'bacterium', 'disease or syndrome', 'fungus', 'plant'\n\n### Term: sterigmatocystis ochracea\n### Type:  fungus\n\n\n\n\n\n\n"]


 75%|███████▌  | 15/20 [01:15<00:25,  5.04s/it]

["Given a term in biomedicine domain identify the term type.\nList of possible types for term: 'food', 'bacterium', 'disease or syndrome', 'fungus', 'plant'\n\n### Term: hyalopus acremonium\n### Type:  fungus\n\n\n\n\n\n\n"]


 80%|████████  | 16/20 [01:20<00:20,  5.02s/it]

["Given a term in biomedicine domain identify the term type.\nList of possible types for term: 'food', 'bacterium', 'disease or syndrome', 'fungus', 'plant'\n\n### Term: sterigmatocystis flavipes\n### Type:  fungus\n\n\n\n\n\n\n"]


 85%|████████▌ | 17/20 [01:25<00:15,  5.00s/it]

["Given a term in biomedicine domain identify the term type.\nList of possible types for term: 'food', 'bacterium', 'disease or syndrome', 'fungus', 'plant'\n\n### Term: bacca\n### Type:  food\n\n\n\n\n\n\n\n\n"]


 90%|█████████ | 18/20 [01:30<00:09,  4.99s/it]

["Given a term in biomedicine domain identify the term type.\nList of possible types for term: 'food', 'bacterium', 'disease or syndrome', 'fungus', 'plant'\n\n### Term: green olive\n### Type:  food\n\n\n\n\n\n\n\n\n"]


 95%|█████████▌| 19/20 [01:35<00:04,  4.98s/it]

["Given a term in biomedicine domain identify the term type.\nList of possible types for term: 'food', 'bacterium', 'disease or syndrome', 'fungus', 'plant'\n\n### Term: cruciform vegetables\n### Type:  food\n\n\n### Term: eryth"]


100%|██████████| 20/20 [01:40<00:00,  5.01s/it]

["Given a term in biomedicine domain identify the term type.\nList of possible types for term: 'food', 'bacterium', 'disease or syndrome', 'fungus', 'plant'\n\n### Term: fruit or vegetable\n### Type:  food\n\n### Term: Escherich"]





In [37]:
standard_pt3_inference

[' plant\n\n\n\n\n\n\n\n\n',
 '1\n\n### Term: tanacetum',
 ' food\n\n\n### Term: caulif',
 ' plant\n\n\n\n\n\n\n\n\n',
 '1\n\n### Term: tanacetum',
 ' disease or syndrome\n\n\n\n\n\n\n',
 ' disease or syndrome\n\n\n\n\n\n\n',
 ' disease or syndrome\n\n\n\n\n\n\n',
 ' disease or syndrome\n\n\n\n\n\n\n',
 ' disease or syndrome\n\n\n\n\n\n\n',
 '1\n\n\n\n\n\n\n\n\n',
 ' bacterium\n\n\n\n\n\n\n\n',
 '1\n\n\n\n\n\n\n\n\n',
 ' fungus\n\n\n\n\n\n\n',
 ' fungus\n\n\n\n\n\n\n',
 ' fungus\n\n\n\n\n\n\n',
 ' food\n\n\n\n\n\n\n\n\n',
 ' food\n\n\n\n\n\n\n\n\n',
 ' food\n\n\n### Term: eryth',
 ' food\n\n### Term: Escherich']

In [38]:
evaluate_generated_types(generated_types=standard_pt3_inference, answer_set=answer_set)

Accuracy: 0.75
F1-Score (macro): 0.5089285714285714


## 4. Few-Shot Prompting

In [36]:
fewshot_pt = """Given a term in biomedicine domain identify the term type.
List of possible types for term: 'food', 'bacterium', 'disease or syndrome', 'fungus', 'plant'

### Term: life-mel
### Type: food

### Term: golden bifido
### Type: bacterium

### Term: uterine adenomyosis
### Type: disease or syndrome

### Term: mycobiome profile
### Type: fungus

### Term: asana leaf
### Type: plant

### Term: {term}
### Type: """

fewshot_pt_inference = inference(prompt_template=fewshot_pt, tokenizer=tokenizer, llm=llm, data=demo_data)

  5%|▌         | 1/20 [00:05<01:36,  5.09s/it]

["Given a term in biomedicine domain identify the term type.\nList of possible types for term: 'food', 'bacterium', 'disease or syndrome', 'fungus', 'plant'\n\n### Term: life-mel\n### Type: food\n\n### Term: golden bifido\n### Type: bacterium\n\n### Term: uterine adenomyosis\n### Type: disease or syndrome\n\n### Term: mycobiome profile\n### Type: fungus\n\n### Term: asana leaf\n### Type: plant\n\n### Term: thorntrees\n### Type:  plant\n\n### Term: 100"]


 10%|█         | 2/20 [00:10<01:32,  5.14s/it]

["Given a term in biomedicine domain identify the term type.\nList of possible types for term: 'food', 'bacterium', 'disease or syndrome', 'fungus', 'plant'\n\n### Term: life-mel\n### Type: food\n\n### Term: golden bifido\n### Type: bacterium\n\n### Term: uterine adenomyosis\n### Type: disease or syndrome\n\n### Term: mycobiome profile\n### Type: fungus\n\n### Term: asana leaf\n### Type: plant\n\n### Term: tanacetum cinerariifolium flower\n### Type:  plant\n\n### Term: erythro"]


 15%|█▌        | 3/20 [00:15<01:27,  5.16s/it]

["Given a term in biomedicine domain identify the term type.\nList of possible types for term: 'food', 'bacterium', 'disease or syndrome', 'fungus', 'plant'\n\n### Term: life-mel\n### Type: food\n\n### Term: golden bifido\n### Type: bacterium\n\n### Term: uterine adenomyosis\n### Type: disease or syndrome\n\n### Term: mycobiome profile\n### Type: fungus\n\n### Term: asana leaf\n### Type: plant\n\n### Term: caulis\n### Type:  plant\n\n### Term: caulis\n"]


 20%|██        | 4/20 [00:20<01:22,  5.16s/it]

["Given a term in biomedicine domain identify the term type.\nList of possible types for term: 'food', 'bacterium', 'disease or syndrome', 'fungus', 'plant'\n\n### Term: life-mel\n### Type: food\n\n### Term: golden bifido\n### Type: bacterium\n\n### Term: uterine adenomyosis\n### Type: disease or syndrome\n\n### Term: mycobiome profile\n### Type: fungus\n\n### Term: asana leaf\n### Type: plant\n\n### Term: sweet william\n### Type:  plant\n\n### Term: erythro"]


 25%|██▌       | 5/20 [00:25<01:17,  5.15s/it]

["Given a term in biomedicine domain identify the term type.\nList of possible types for term: 'food', 'bacterium', 'disease or syndrome', 'fungus', 'plant'\n\n### Term: life-mel\n### Type: food\n\n### Term: golden bifido\n### Type: bacterium\n\n### Term: uterine adenomyosis\n### Type: disease or syndrome\n\n### Term: mycobiome profile\n### Type: fungus\n\n### Term: asana leaf\n### Type: plant\n\n### Term: tanacetum cinerariifolium flower\n### Type:  plant\n\n### Term: erythro"]


 30%|███       | 6/20 [00:30<01:12,  5.15s/it]

["Given a term in biomedicine domain identify the term type.\nList of possible types for term: 'food', 'bacterium', 'disease or syndrome', 'fungus', 'plant'\n\n### Term: life-mel\n### Type: food\n\n### Term: golden bifido\n### Type: bacterium\n\n### Term: uterine adenomyosis\n### Type: disease or syndrome\n\n### Term: mycobiome profile\n### Type: fungus\n\n### Term: asana leaf\n### Type: plant\n\n### Term: uterine adnexitis\n### Type:  disease or syndrome\n\n### Term: ery"]


 35%|███▌      | 7/20 [00:36<01:07,  5.16s/it]

["Given a term in biomedicine domain identify the term type.\nList of possible types for term: 'food', 'bacterium', 'disease or syndrome', 'fungus', 'plant'\n\n### Term: life-mel\n### Type: food\n\n### Term: golden bifido\n### Type: bacterium\n\n### Term: uterine adenomyosis\n### Type: disease or syndrome\n\n### Term: mycobiome profile\n### Type: fungus\n\n### Term: asana leaf\n### Type: plant\n\n### Term: alopecia mucinosis\n### Type:  disease or syndrome\n\n### Term: ery"]


 40%|████      | 8/20 [00:41<01:01,  5.14s/it]

["Given a term in biomedicine domain identify the term type.\nList of possible types for term: 'food', 'bacterium', 'disease or syndrome', 'fungus', 'plant'\n\n### Term: life-mel\n### Type: food\n\n### Term: golden bifido\n### Type: bacterium\n\n### Term: uterine adenomyosis\n### Type: disease or syndrome\n\n### Term: mycobiome profile\n### Type: fungus\n\n### Term: asana leaf\n### Type: plant\n\n### Term: diffuse hemangioma\n### Type:  disease or syndrome\n\n### Term: ery"]


 45%|████▌     | 9/20 [00:46<00:56,  5.12s/it]

["Given a term in biomedicine domain identify the term type.\nList of possible types for term: 'food', 'bacterium', 'disease or syndrome', 'fungus', 'plant'\n\n### Term: life-mel\n### Type: food\n\n### Term: golden bifido\n### Type: bacterium\n\n### Term: uterine adenomyosis\n### Type: disease or syndrome\n\n### Term: mycobiome profile\n### Type: fungus\n\n### Term: asana leaf\n### Type: plant\n\n### Term: arterial sclerosis\n### Type:  disease or syndrome\n\n### Term: ery"]


 50%|█████     | 10/20 [00:51<00:50,  5.10s/it]

["Given a term in biomedicine domain identify the term type.\nList of possible types for term: 'food', 'bacterium', 'disease or syndrome', 'fungus', 'plant'\n\n### Term: life-mel\n### Type: food\n\n### Term: golden bifido\n### Type: bacterium\n\n### Term: uterine adenomyosis\n### Type: disease or syndrome\n\n### Term: mycobiome profile\n### Type: fungus\n\n### Term: asana leaf\n### Type: plant\n\n### Term: rubella infection\n### Type:  disease or syndrome\n\n### Term: ery"]


 55%|█████▌    | 11/20 [00:56<00:45,  5.09s/it]

["Given a term in biomedicine domain identify the term type.\nList of possible types for term: 'food', 'bacterium', 'disease or syndrome', 'fungus', 'plant'\n\n### Term: life-mel\n### Type: food\n\n### Term: golden bifido\n### Type: bacterium\n\n### Term: uterine adenomyosis\n### Type: disease or syndrome\n\n### Term: mycobiome profile\n### Type: fungus\n\n### Term: asana leaf\n### Type: plant\n\n### Term: k. pneumoniae\n### Type:  bacterium\n\n### Term: erythe"]


 60%|██████    | 12/20 [01:01<00:40,  5.08s/it]

["Given a term in biomedicine domain identify the term type.\nList of possible types for term: 'food', 'bacterium', 'disease or syndrome', 'fungus', 'plant'\n\n### Term: life-mel\n### Type: food\n\n### Term: golden bifido\n### Type: bacterium\n\n### Term: uterine adenomyosis\n### Type: disease or syndrome\n\n### Term: mycobiome profile\n### Type: fungus\n\n### Term: asana leaf\n### Type: plant\n\n### Term: presumptive enterococcus\n### Type:  bacterium\n\n### Term: erythe"]


 65%|██████▌   | 13/20 [01:06<00:35,  5.08s/it]

["Given a term in biomedicine domain identify the term type.\nList of possible types for term: 'food', 'bacterium', 'disease or syndrome', 'fungus', 'plant'\n\n### Term: life-mel\n### Type: food\n\n### Term: golden bifido\n### Type: bacterium\n\n### Term: uterine adenomyosis\n### Type: disease or syndrome\n\n### Term: mycobiome profile\n### Type: fungus\n\n### Term: asana leaf\n### Type: plant\n\n### Term: gaffkya species\n### Type:  bacterium\n\n### Term: eryth"]


 70%|███████   | 14/20 [01:11<00:30,  5.11s/it]

["Given a term in biomedicine domain identify the term type.\nList of possible types for term: 'food', 'bacterium', 'disease or syndrome', 'fungus', 'plant'\n\n### Term: life-mel\n### Type: food\n\n### Term: golden bifido\n### Type: bacterium\n\n### Term: uterine adenomyosis\n### Type: disease or syndrome\n\n### Term: mycobiome profile\n### Type: fungus\n\n### Term: asana leaf\n### Type: plant\n\n### Term: sterigmatocystis ochracea\n### Type:  fungus\n\n### Term: ery"]


 75%|███████▌  | 15/20 [01:16<00:25,  5.12s/it]

["Given a term in biomedicine domain identify the term type.\nList of possible types for term: 'food', 'bacterium', 'disease or syndrome', 'fungus', 'plant'\n\n### Term: life-mel\n### Type: food\n\n### Term: golden bifido\n### Type: bacterium\n\n### Term: uterine adenomyosis\n### Type: disease or syndrome\n\n### Term: mycobiome profile\n### Type: fungus\n\n### Term: asana leaf\n### Type: plant\n\n### Term: hyalopus acremonium\n### Type:  bacterium\n\n### Term: erythe"]


 80%|████████  | 16/20 [01:21<00:20,  5.13s/it]

["Given a term in biomedicine domain identify the term type.\nList of possible types for term: 'food', 'bacterium', 'disease or syndrome', 'fungus', 'plant'\n\n### Term: life-mel\n### Type: food\n\n### Term: golden bifido\n### Type: bacterium\n\n### Term: uterine adenomyosis\n### Type: disease or syndrome\n\n### Term: mycobiome profile\n### Type: fungus\n\n### Term: asana leaf\n### Type: plant\n\n### Term: sterigmatocystis flavipes\n### Type:  fungus\n\n### Term: ery"]


 85%|████████▌ | 17/20 [01:27<00:15,  5.14s/it]

["Given a term in biomedicine domain identify the term type.\nList of possible types for term: 'food', 'bacterium', 'disease or syndrome', 'fungus', 'plant'\n\n### Term: life-mel\n### Type: food\n\n### Term: golden bifido\n### Type: bacterium\n\n### Term: uterine adenomyosis\n### Type: disease or syndrome\n\n### Term: mycobiome profile\n### Type: fungus\n\n### Term: asana leaf\n### Type: plant\n\n### Term: bacca\n### Type:  food\n\n### Term: bacca\n"]


 90%|█████████ | 18/20 [01:32<00:10,  5.14s/it]

["Given a term in biomedicine domain identify the term type.\nList of possible types for term: 'food', 'bacterium', 'disease or syndrome', 'fungus', 'plant'\n\n### Term: life-mel\n### Type: food\n\n### Term: golden bifido\n### Type: bacterium\n\n### Term: uterine adenomyosis\n### Type: disease or syndrome\n\n### Term: mycobiome profile\n### Type: fungus\n\n### Term: asana leaf\n### Type: plant\n\n### Term: green olive\n### Type:  food\n\n### Term: erythro"]


 95%|█████████▌| 19/20 [01:37<00:05,  5.14s/it]

["Given a term in biomedicine domain identify the term type.\nList of possible types for term: 'food', 'bacterium', 'disease or syndrome', 'fungus', 'plant'\n\n### Term: life-mel\n### Type: food\n\n### Term: golden bifido\n### Type: bacterium\n\n### Term: uterine adenomyosis\n### Type: disease or syndrome\n\n### Term: mycobiome profile\n### Type: fungus\n\n### Term: asana leaf\n### Type: plant\n\n### Term: cruciform vegetables\n### Type:  food\n\n### Term: erythro"]


100%|██████████| 20/20 [01:42<00:00,  5.13s/it]

["Given a term in biomedicine domain identify the term type.\nList of possible types for term: 'food', 'bacterium', 'disease or syndrome', 'fungus', 'plant'\n\n### Term: life-mel\n### Type: food\n\n### Term: golden bifido\n### Type: bacterium\n\n### Term: uterine adenomyosis\n### Type: disease or syndrome\n\n### Term: mycobiome profile\n### Type: fungus\n\n### Term: asana leaf\n### Type: plant\n\n### Term: fruit or vegetable\n### Type:  food\n\n### Term: bifidob"]





In [39]:
fewshot_pt_inference

[' plant\n\n### Term: 100',
 ' plant\n\n### Term: erythro',
 ' plant\n\n### Term: caulis\n',
 ' plant\n\n### Term: erythro',
 ' plant\n\n### Term: erythro',
 ' disease or syndrome\n\n### Term: ery',
 ' disease or syndrome\n\n### Term: ery',
 ' disease or syndrome\n\n### Term: ery',
 ' disease or syndrome\n\n### Term: ery',
 ' disease or syndrome\n\n### Term: ery',
 ' bacterium\n\n### Term: erythe',
 ' bacterium\n\n### Term: erythe',
 ' bacterium\n\n### Term: eryth',
 ' fungus\n\n### Term: ery',
 ' bacterium\n\n### Term: erythe',
 ' fungus\n\n### Term: ery',
 ' food\n\n### Term: bacca\n',
 ' food\n\n### Term: erythro',
 ' food\n\n### Term: erythro',
 ' food\n\n### Term: bifidob']

In [40]:
evaluate_generated_types(generated_types=fewshot_pt_inference, answer_set=answer_set)

Accuracy: 0.95
F1-Score (macro): 0.7999999999999999


## Conclusion

* A small dataset of 20 samples from 5 types.
* Model: Mistral-7B
* Summary of results

|Prompt Template| Accuracy | F1-Score|
|:---:|:---:|:---:|
|`Perform a sentence completion on the following sentence:\nSentence: {term} in biomedicine is a`|0.25|0.086|
|`Given a term in biomedicine domain identify the term type.\n### Term: {term}\n### Type:`| 0.05 | 0.023|
|`Given a term in biomedicine domain identify the term type.\nList of possible types for term: 'food', 'bacterium', 'disease or syndrome', 'fungus', 'plant'\n\n### Term: {term}\n### Type:`| 0.75 | 0.508 |
|`Given a term in biomedicine domain identify the term type.\nList of possible types for term: 'food', 'bacterium', 'disease or syndrome', 'fungus', 'plant'\n\n### Term: life-mel\n### Type: food\n\n### Term: golden bifido\n### Type: bacterium\n\n### Term: uterine adenomyosis\n### Type: disease or syndrome\n\n### Term: mycobiome profile\n### Type: fungus\n\n### Term: asana leaf\n### Type: plant\n\n### Term: {term}\n### Type: `|0.95|0.799|

* Remarks
   - What happens if we have more than 5 types, for example 100 or 700 types?
   - Few-shot prompting may not become ideal when we have more than 50 types, it will exceed the LLMs input limits.


# LLMs4OL Challenge @ ISWC-2024

<img src="images/llms4ol.png" width=1000px lenght=1500px/>