---

### Jupyter notebook driver for LogMap-LLM

This notebook driver for LogMap-LLM provides an interactive LogMap-LLM 
user experience. This may be of interest to technical users of LogMap-LLM
who are comfortable with, and prefer, a Python-programmer user experience
to a command-line user experience.

A Python script driver for LogMap-LLM is also available. It provides a
command-line LogMap-LLM user experience. There is no interactivity. You
launch LogMap-LLM and it does everything for you and writes everything to 
the console.

We use this notebook driver to drive the development of LogMap-LLM. We
then use automated means to convert this Jupyter notebook to the
counterpart Python script driver for LogMap-LLM.  That script then needs 
a few follow-up manual edits to comment-out a few things and to revise
some textual comments (such as these right here).

Apart from those minor manual tweaks, the two drivers (notebook and script) 
are meant to be exact counterparts of one another: functionally and 
behaviourally identical. Design choices made here in the notebook driver are
made with the command-line experience of LogMap-LLM in mind. In terms of
what is printed to stdout (cell output or console output), what the user
sees should be identical across the notebook and script drivers.

---

Basic imports

In [1]:
import os.path
import tomllib
import pandas as pd
import json
import oracle_prompt_building as opb
import oracle_consultation as oc 

In [2]:
# arrange for modules to be reloaded automatically, so changes are
# recognised seamlessly
%load_ext autoreload
%autoreload 2

JPype imports

In [3]:
# Import the module
import jpype

# Allow Java modules to be imported
import jpype.imports

# Import all standard Java types into the global scope
from jpype.types import *

Load the LogMap-LLM configuration file

In [4]:
with open("logmap-llm-config-basic.toml", mode="rb") as fp:
    config = tomllib.load(fp)

Display the configuration parameter settings

**NOTE: this is for dev and test only, not the user; we remove this cell from the Python script driver**

In [5]:
print(f'task name: {config['alignmentTask']['task_name']}')
print(f'onto source: {config['alignmentTask']['onto_source_filepath']}')
print(f'onto target: {config['alignmentTask']['onto_target_filepath']}')
print(f'extended mappings_to_ask: {config['alignmentTask']['generate_extended_mappings_to_ask_oracle']}')
print(f'logmap_parameters_dirpath: {config['alignmentTask']['logmap_parameters_dirpath']}')
print()
#print(f'openrouter apikey: {config['oracle']['openrouter_apikey']}')
print(f'openrouter LLM model name: {config['oracle']['openrouter_model_name']}')
print(f'oracle dev prompt template: {config['oracle']['oracle_dev_prompt_template_name']}')
print(f'oracle user prompt template: {config['oracle']['oracle_user_prompt_template_name']}')
print()
print(f'logmapllm output dirpath: {config['outputs']['logmapllm_output_dirpath']}')
print(f'logmap initial alignment output dirpath: {config['outputs']['logmap_initial_alignment_output_dirpath']}')
print(f'logmap refined alignment output dirpath: {config['outputs']['logmap_refined_alignment_output_dirpath']}')
print()
print(f'align ontologies: {config['pipeline']['align_ontologies']}')
print(f'build oracle prompts: {config['pipeline']['build_oracle_prompts']}')
print(f'consult oracle: {config['pipeline']['consult_oracle']}')
print(f'refine alignment: {config['pipeline']['refine_alignment']}')


task name: anatomy
onto source: /Users/dave/research/logmap-usage/ontologies/mouse.owl
onto target: /Users/dave/research/logmap-usage/ontologies/human.owl
extended mappings_to_ask: False
logmap_parameters_dirpath: /Users/dave/research/logmap-20251230/

openrouter LLM model name: arcee-ai/trinity-mini:free
oracle dev prompt template: oracle developer prompt template name
oracle user prompt template: one_level_of_parents_and_synonyms

logmapllm output dirpath: /Users/dave/research/logmap-usage/logmapllm-outputs/
logmap initial alignment output dirpath: /Users/dave/research/logmap-usage/logmap-initial-alignment/
logmap refined alignment output dirpath: /Users/dave/research/logmap-usage/logmap-refined-alignment/

align ontologies: reuse
build oracle prompts: build
consult oracle: consult
refine alignment: refine


Build JVM classpath and JVM options

TODO: when LogMap-LLM is a package, we'll want to discover and set LogMap the dirpath automatically somehow

In [9]:
# TODO: set this dirpath automatically, relatve to something fixed in the eventual LogMap-LLM
# distribution for the LogMap-LLM python package 
logmap_dirpath = '/Users/dave/research/logmap-20251230/'

# path to main LogMap jar file
logmap_jar = os.path.join(logmap_dirpath, 'logmap-matcher-4.0.jar')
jpype.addClassPath(logmap_jar)

# path to LogMap dependency jar files
logmap_dep = os.path.join(logmap_dirpath, 'java-dependencies/*')
jpype.addClassPath(logmap_dep)

# LogMap jvm options
jvmOptions = [
    "-Xms500M", 
    "-Xmx25G",
    "-DentityExpansionLimit=10000000",
    "--add-opens=java.base/java.lang=ALL-UNNAMED"
]

Check if a JVM (Java Virtual Machine) is running

In [10]:
if jpype.isJVMStarted():
    print("JVM running, version:", jpype.getJVMVersion())
else:
    print("JVM is not running!")

JVM running, version: (21, 0, 6)


Start a JVM

In [8]:
if not jpype.isJVMStarted():
    jpype.startJVM(*jvmOptions)

Confirm a JVM is running

In [9]:
if not jpype.isJVMStarted():
    print("JVM is not running!")
else:
    print("JVM version:", jpype.getJVMVersion())

JVM version: (21, 0, 6)


---

Now that we have imported JPype and started a JVM, we can import and call Java classes.

---

Java imports for basic LogMap usage

In [11]:
from uk.ac.ox.krr.logmap2 import LogMapLLM_Interface

Python imports that contain Java imports

In [12]:
import bridging as br

Prepare the filepaths of the source and target ontologies the way LogMap expects 

In [13]:
task_name = config['alignmentTask']['task_name']
onto_src_filepath = config['alignmentTask']['onto_source_filepath']
onto_tgt_filepath = config['alignmentTask']['onto_target_filepath']
onto_src_filepath_logmap = "file:" + config['alignmentTask']['onto_source_filepath']
onto_tgt_filepath_logmap = "file:" + config['alignmentTask']['onto_target_filepath']

Instantiate a LogMapLLM interface to LogMap for the specified alignment task

In [14]:
logmap2_LogMapLLM_Interface = LogMapLLM_Interface(onto_src_filepath_logmap, 
                                                  onto_tgt_filepath_logmap, 
                                                  task_name)

Configure the LogMapLLM interface to LogMap for the initial alignment task

In [15]:
# boolean: True = generate extended m_ask, False = generate standard m_ask
#generate_extended_m_ask = False
generate_extended_m_ask = config['alignmentTask']['generate_extended_mappings_to_ask_oracle']
logmap2_LogMapLLM_Interface.setExtendedQuestions4LLM(generate_extended_m_ask)

In [16]:
# blank for default; or just ignore this setter for default case
#path_to_logmap_parameters = '/Users/dave/research/logmap-usage/logmap-params/'
#path_to_logmap_parameters = ""  
path_to_logmap_parameters = config['alignmentTask']['logmap_parameters_dirpath']
logmap2_LogMapLLM_Interface.setPathToLogMapParameters(path_to_logmap_parameters)

In [17]:
# set dirpath into which LogMap will save its outputs
# For LogMap, an empty string here means 'do not save any output files'!

#logmap_outputs_dir_path = '/Users/dave/research/logmap-usage/mappings1'
#logmap_outputs_dir_path = ""
logmap_outputs_dir_path = config['outputs']['logmap_initial_alignment_output_dirpath']
logmap2_LogMapLLM_Interface.setPathForOutputMappings(logmap_outputs_dir_path)

---

## Begin LogMap-LLM session dialog with the user

---

In [18]:
print()
print('LogMap-LLM session beginning')
print()
print(f'Alignment task name: {task_name}')
print()
print('Source ontology:')
print(onto_src_filepath)
print()
print('Target ontology:')
print(onto_tgt_filepath)


LogMap-LLM session beginning

Alignment task name: anatomy

Source ontology:
/Users/dave/research/logmap-usage/ontologies/mouse.owl

Target ontology:
/Users/dave/research/logmap-usage/ontologies/human.owl


---

## pipeline step 1: Align Ontologies

---

In [19]:
print()
print('- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -')
print('Step 1: Align ontologies and obtain mappings to ask an Oracle')
print('- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -')
print()

if config['pipeline']['align_ontologies'] == 'align':
    # perform an initial alignment so we can get a fresh m_ask
    print("Performing fresh initial LogMap alignment ...")
    print()
    logmap2_LogMapLLM_Interface.performAlignment()
    print("Initial alignment complete")
    mappings = logmap2_LogMapLLM_Interface.getLogMapMappings()
    print()
    print(f'Number of mappings in initial alignment: {len(mappings)}')
    m_ask_java = logmap2_LogMapLLM_Interface.getLogMapMappingsForLLM()
    m_ask_df = br.java_mappings_2_python(m_ask_java)
    # Note: we don't need to save m_ask_df to a file because LogMap
    # does that automatically, as part of performing an alignment
elif config['pipeline']['align_ontologies'] == 'reuse':
    # bypass an initial alignment and reuse an existing m_ask
    # saved to a file in an alignment conducted previously
    print("Reusing existing initial LogMap alignment ...")
    print()
    filename = task_name + '-logmap_mappings.txt'
    filepath = os.path.join(logmap_outputs_dir_path, filename)
    mappings = pd.read_csv(filepath, sep='|', header=None)
    print(f'Number of mappings in initial alignment: {len(mappings)}')
    print()
    filename = task_name + '-logmap_mappings_to_ask_oracle_user_llm.txt'
    print('Loading mappings to ask an Oracle from file:')
    print(filename)
    filepath = os.path.join(logmap_outputs_dir_path, filename)
    m_ask_df = pd.read_csv(filepath, sep='|', header=None)
    m_ask_df.columns = br.get_m_ask_column_names()
else:
    raise ValueError(f'Value for align_ontologies not recognised: {config['pipeline']['align_ontologies']}')

print()
print(f"Number of mappings to ask an Oracle: {len(m_ask_df)}")


- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Step 1: Align ontologies and obtain mappings to ask an Oracle
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Reusing existing initial LogMap alignment ...

Number of mappings in initial alignment: 1405

Loading mappings to ask an Oracle from file:
anatomy-logmap_mappings_to_ask_oracle_user_llm.txt

Number of mappings to ask an Oracle: 5


---

## pipeline step 2: Build Oracle Prompts

---

In [21]:
print()
print('- - - - - - - - - - - - - - - - - - - - - - - - - - - - -')
print('Step 2: Build user prompts for mappings to ask an Oracle')
print('- - - - - - - - - - - - - - - - - - - - - - - - - - - - -')
print()

oupt_name = config['oracle']['oracle_user_prompt_template_name']

if config['pipeline']['build_oracle_prompts'] == 'build':
    print('Building fresh Oracle user prompts ...')
    print()
    m_ask_oracle_user_prompts = opb.build_oracle_user_prompts(oupt_name,
                                                              onto_src_filepath,
                                                              onto_tgt_filepath, 
                                                              m_ask_df)
elif config['pipeline']['build_oracle_prompts'] == 'reuse':
    print('Reusing existing Oracle user prompts ...')
    print()
    # reuse oracle user prompts created previously and saved in a file on disk
    dirpath = config['outputs']['logmapllm_output_dirpath']
    filename = task_name + '-' + oupt_name + '-mappings_to_ask_oracle_user_prompts.json'
    print('Loading oracle user prompts from file:')
    print(filename)
    filepath = os.path.join(dirpath, filename)
    with open(filepath, 'r') as fp:
        m_ask_oracle_user_prompts = json.load(fp)
elif config['pipeline']['build_oracle_prompts'] == 'bypass':
    print('Bypassing use of Oracle user prompts')
    m_ask_oracle_user_prompts = None
else:
    raise ValueError(f'Value for build_oracle_prompts not recognised: {config['pipeline']['build_oracle_prompts']}')

if m_ask_oracle_user_prompts is not None:
    print()
    print(f"Number of Oracle user prompts obtained: {len(m_ask_oracle_user_prompts)}")
    print()

if config['pipeline']['build_oracle_prompts'] == 'build':
    # save the newly built oracle user prompts to a .json file so they can be reused
    dirpath = config['outputs']['logmapllm_output_dirpath']
    filename = task_name + '-' + oupt_name + '-mappings_to_ask_oracle_user_prompts.json'
    print('Oracle user prompts saved to file:')
    print(filename)
    filepath = os.path.join(dirpath, filename)
    with open(filepath, 'w') as fp:
        json.dump(m_ask_oracle_user_prompts, fp)



- - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Step 2: Build user prompts for mappings to ask an Oracle
- - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Building fresh Oracle user prompts ...



Preparing source ontology:   0%|          | 0/1 [00:00<?, ?it/s]

There are 51313 triples in the ontology


Preparing source ontology: 100%|██████████| 1/1 [00:14<00:00, 14.48s/it]





Preparing target ontology:   0%|          | 0/1 [00:00<?, ?it/s]

There are 51313 triples in the ontology


Preparing target ontology: 100%|██████████| 1/1 [00:14<00:00, 14.21s/it]



Prompt template function obtained: one_level_of_parents_and_synonyms



Building the prompts: 100%|██████████| 5/5 [00:00<00:00, 160.44it/s]


Number of Oracle user prompts obtained: 5

Oracle user prompts saved to file:
anatomy-one_level_of_parents_and_synonyms-mappings_to_ask_oracle_user_prompts.json





OPTIONAL: Inspect some of the generated Oracle user prompts

**NOTE: This cell is intended for deveopment and testing. It may be of interest to LogMap-LLM users who
prefer the Jupyter notebook, interactive Python-programmer LogMap-LLM user experience. In fact, for those LogMap-LLM users who choose to design their own Oracle 'user' prompt templates, running the cell below provides a useful way of reviewing what the prompts look like that have been generated from the user's prompt template design. In other words, LogMap-LLM users can use this next cell as a test tool to help them design their Oracle 'user' prompt templates.**

**NOTE: In the Python script driver for LogMap-LLM (the counterpart of this Jupyter notebook), we comment-out the code cell below so that it does not disturb the command-line user experience.**

In [22]:
cnt = 0
for key, val in m_ask_oracle_user_prompts.items():
    cnt += 1
    if cnt > 2:
        break
    print(key)
    print(val)
    print()

http://mouse.owl#MA_0000810|http://human.owl#NCI_C49333
We have two entities from different biomedical ontologies.
The first one is "brain grey matter", also known as "MA_0000810", which falls under the category "grey matter".
The second one is "Cerebral_Gray_Matter", also known as "Brain Gray Matter", which falls under the category "Brain_Part".

Do they mean the same thing? Respond with "True" or "False".

http://mouse.owl#MA_0002062|http://human.owl#NCI_C33661
We have two entities from different biomedical ontologies.
The first one is "superficial temporal artery", also known as "MA_0002062", which falls under the category "temporal artery".
The second one is "Superficial_Femoral_Artery", also known as "Superficial Femoral Artery", which falls under the category "Femoral_Artery_Branch".

Do they mean the same thing? Respond with "True" or "False".



---

## pipeline step 3: Consult Oracle

---

In [23]:
print()
print('- - - - - - - - - - - - - - - - - - - - - - - - - - -')
print("Step 3: Consult LLM Oracle for each 'mapping to ask'")
print('- - - - - - - - - - - - - - - - - - - - - - - - - - -')
print()

api_key = config['oracle']['openrouter_apikey']
model_name = config['oracle']['openrouter_model_name']

# TODO: externalise max_workers in the config.toml file, so the user
# has control without having to modify Python code
max_workers = 2

if config['pipeline']['consult_oracle'] == 'consult':
    model_name = config['oracle']['openrouter_model_name']
    print(f'Consulting LLM Oracle {model_name}')
    print('with user prompts for mappings to ask ...')
    print()
    m_ask_df_ext = oc.consult_oracle_for_mappings_to_ask(m_ask_oracle_user_prompts,
                                                         api_key,
                                                         model_name,
                                                         max_workers,
                                                         m_ask_df)
elif config['pipeline']['consult_oracle'] == 'reuse':
    print('Reusing existing Oracle consultation outcomes (predictions)')
    # reuse Oracle predictions created previously and saved in a file on disk
    dirpath = config['outputs']['logmapllm_output_dirpath']
    filename = task_name + '-' + oupt_name + '-mappings_to_ask_with_oracle_predictions.csv'
    print('Loading Oracle predictions for the mappings_to_ask from file:')
    print(filename)
    filepath = os.path.join(dirpath, filename)
    m_ask_df_ext = pd.read_csv(filepath)
elif config['pipeline']['consult_oracle'] == 'bypass':
    print('Bypassing Oracle consultations')
    m_ask_df_ext = None
else:
    raise ValueError(f'Value for consult_oracle not recognised: {config['pipeline']['consult_oracle']}')


if m_ask_df_ext is not None:
    preds = m_ask_df_ext['Oracle_prediction']
    nr_mappings = len(preds)
    nr_errors = sum(preds == 'error')
    nr_completions = nr_mappings - nr_errors
    nr_true = sum(preds == True)
    nr_false = sum(preds == False)
    width = len(str(nr_mappings))
    nr_true = str(nr_true).rjust(width)
    nr_false = str(nr_false).rjust(width)
    nr_errors = str(nr_errors).rjust(width)
    print()
    print(f"Number of mappings to ask an Oracle: {nr_mappings}")
    print(f"Number of LLM Oracle consultations : {nr_completions}")
    print(f"Number of mappings predicted True  : {nr_true}")
    print(f"Number of mappings predicted False : {nr_false}")
    print(f"Number of consultation failures    : {nr_errors}")
    print()

if config['pipeline']['consult_oracle'] == 'consult' and m_ask_df_ext is not None:
    # save the extended m_ask dataframe (that contains the LLM Oracle predictions)
    dirpath = config['outputs']['logmapllm_output_dirpath']
    filename = task_name + '-' + oupt_name + '-mappings_to_ask_with_oracle_predictions.csv'
    print("Oracle predictions for 'mappings to ask' saved to file:")
    print(filename)
    filepath = os.path.join(dirpath, filename)
    m_ask_df_ext.to_csv(filepath)



- - - - - - - - - - - - - - - - - - - - - - - - - - -
Step 3: Consult LLM Oracle for each 'mapping to ask'
- - - - - - - - - - - - - - - - - - - - - - - - - - -

Consulting LLM Oracle arcee-ai/trinity-mini:free
with user prompts for mappings to ask ...



Oracle consultations: 100%|██████████| 5/5 [00:05<00:00,  1.15s/it]


Number of mappings to ask an Oracle: 5
Number of LLM Oracle consultations : 5
Number of mappings predicted True  : 2
Number of mappings predicted False : 3
Number of consultation failures    : 0

Oracle predictions for 'mappings to ask' saved to file:
anatomy-one_level_of_parents_and_synonyms-mappings_to_ask_with_oracle_predictions.csv





OPTIONAL: Display some of the Oracle predictions regarding the 'mappings to ask' an Oracle

**NOTE: In the Python script driver for LogMap-LLM (the counterpart of this Jupyter notebook), we comment-out the code cell below so that it does not disturb the command-line user experience.**

In [24]:
m_ask_df_ext.head()

Unnamed: 0,source_entity_uri,target_entity_uri,relation,confidence,entityType,Oracle_prediction,Oracle_confidence,Oracle_input_tokens,Oracle_output_tokens
0,http://mouse.owl#MA_0000810,http://human.owl#NCI_C49333,=,0.56,CLS,True,1.0,192,228
1,http://mouse.owl#MA_0002062,http://human.owl#NCI_C33661,=,0.6,CLS,False,1.0,202,233
2,http://mouse.owl#MA_0001130,http://human.owl#NCI_C12350,=,0.65,CLS,False,1.0,225,229
3,http://mouse.owl#MA_0000874,http://human.owl#NCI_C33260,=,0.56,CLS,False,1.0,206,475
4,http://mouse.owl#MA_0002497,http://human.owl#NCI_C33309,=,0.55,CLS,True,1.0,262,335


---

## Pipeline step 4: Refine Alignment

---

In [25]:
print()
print('- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -')
print('Step 4: Refine alignment using LLM Oracle mapping predictions')
print('- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -')
print()

logmap_outputs_dir_path = config['outputs']['logmap_refined_alignment_output_dirpath']
logmap2_LogMapLLM_Interface.setPathForOutputMappings(logmap_outputs_dir_path)

if config['pipeline']['refine_alignment'] == 'refine':
    if m_ask_df_ext is not None:
        # refine the initial alignment using the m_ask Oracle predictions
        print("Refining initial LogMap alignment ...")
        print()
        m_ask_oracle_preds_java = br.python_oracle_mapping_predictions_2_java(m_ask_df_ext)
        print(f'Number of mappings predicted True by Oracle given to LogMap: {len(m_ask_oracle_preds_java)} ')
        print()
        logmap2_LogMapLLM_Interface.performAlignmentWithLocalOracle(m_ask_oracle_preds_java)
        print("Alignment refinement complete")
        mappings_java = logmap2_LogMapLLM_Interface.getLogMapMappings()
        print()
        print(f'Number of mappings in LogMap refined alignment: {len(mappings_java)}')
    else:
        print('Step 4 bypassed due to Oracle consultation failures in Step 3')
elif config['pipeline']['refine_alignment'] == 'bypass':
    print('Bypassing alignment refinement')
else:
    raise ValueError(f'Value for refine_alignment not recognised: {config['pipeline']['refine_alignment']}')


- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Step 4: Refine alignment using LLM Oracle mapping predictions
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Refining initial LogMap alignment ...

Number of mappings predicted True by Oracle given to LogMap: 2 

Alignment refinement complete

Number of mappings in LogMap refined alignment: 1148


Confirm processing is complete

In [26]:
print()
print('LogMap-LLM session ending')
print()


LogMap-LLM session ending

