# Inferencing with InstructLab

<ul>
<li>Contributors: InstructLab team and IBM Research Technology Education team
<li>Contact for questions and technical support: IBM.Research.JupyterLab@ibm.com
<li>Provenance: IBM Research
<li>Version: 1.0.8
<li>Release date: 2024-08-30
<li>Compute requirements: GPU: estimated 6 minutes
<li>Memory requirements: 16 GB
<li>Notebook set: InstructLab
</ul>

# Select Viewing Option 

This notebook was optimized for viewing the output in a separate panel:
- If you would like to see the separate panel output, set `dual_screen` to *True* in the first line of the next cell and follow the steps below. 
- If you want the output inline with the notebook cells, set `dual_screen` to *False*. If you are running with the output inline with the notebook, please run the notebook cell by cell so that options can be selected.

If you set `dual_screen` to *True*, perform the following:
1. Right click on the same cell and select **Create New View for Output**.
1. Drag the new **Output View** panel to the right side of the JupyterLab.
1. Hide the File Browser by toggling the File Browser icon on the top left of the JupyterLab.
1. To run the notebook, click on a notebook code cell, then from the top menu select *Kernel->Restart Kernel and Run All Cells*.
1. Select options and the *Continue* button to progress with the notebook.

**Note:** This notebook must be run with a GPU. If you are not running with a GPU, please select File->Hub Control Panel->Stop My Server, then Start My Server and the select GPU Session

In [None]:
dual_screen = False

from IPython.display import Image, display
import ipywidgets as widgets
from ipynb_pause import flow

H1 = "<p style='font-family:IBM Plex Sans;font-size:28px'>"
H2 = "<p style='font-family:IBM Plex Sans;font-size:24px'>"
Norm = "<p style='font-family:IBM Plex Sans;font-size:20px'>"
Small = "<p style='font-family:IBM Plex Sans;font-size:17px'>"
Ex = "<p style='font-family:IBM Plex Sans;font-size:20px;font-style:italic'>"

out = widgets.Output(layout={'border': '1px solid black'})
run=flow.display_mode(mode=dual_screen, output=out, color='darkblue')
if dual_screen:
    display(out)

# Summary

This notebook is part of a sequential notebook set. Before using this notebook, please ensure that you have reviewed the first and second notebooks in the set:
- [Configuring InstructLab](./00_configuring_InstructLab.ipynb)
- [Training with InstructLab](./01_training_with_InstructLab.ipynb)

The second notebook within this set showcases the generation of synthetic data utilizing InstructLab. It subsequently demonstrates how a large language model (LLM) can be effectively trained on this synthetic dataset. In current notebook, Both the pre-trained LLM and the LLM trained on the generated synthetic data are evaluated against a predefined set of questions to assess their respective performance.
 

# Table of Contents

* <a href="#I2_1">Step 1. Import Libraries </a>
* <a href="#I2_2">Step 2. Load the Base Model and the InstructLab Trained Model</a>
* <a href="#I2_3">Step 3. Define a Function to Perform Inference on Base and Trained Models </a>
* <a href="#I2_4">Step 4. Run Interactive Q&A Session with Base and Trained Models to Evaluate Performance</a>
* <a href="#I2_conclusion">Conclusion</a>
* <a href="#I2_learn">Learn more</a>

<a id="I2_1"></a>
# Step 1. Import Libraries

In [None]:
if dual_screen:
    with out:
        out.clear_output()
        display(widgets.HTML(H1+"Step 1. Import Libraries"))
        display(widgets.HTML(Norm+"We import the required libraries to generate Q&A session with LLMs"))

import shutil
import os
import json
from datasets import load_dataset
from langchain.llms import LlamaCpp
from langchain.prompts import PromptTemplate
from langchain_core.output_parsers import StrOutputParser

with open('instructlab.json', 'r') as f:
    jsonState = json.load(f)

if dual_screen:
    with out:
        display(widgets.HTML(Norm+"Imports completed"))
else: 
    print(f"Imports completed")
run.pause()

<a id="I2_2"></a>
# Step 2. Load the Base Model and the InstructLab Trained Model
The data set that was used in that last run of the [Training with InstructLab](./01_training_with_InstructLab.ipynb) notebook is preselected for inferencing. You may select an alternative dataset if you wish.
You can select the trained model to use for inferencing from:
* **Fine Tuned Model** - A previously trained fine tuned model to demonstrate inferencing.
* **Newly Trained Model** - Your trained model from prior runs of the [Training with InstructLab](./01_training_with_InstructLab.ipynb) notebook. You can optionally place a model you wish to make comparisons with in the */data/data_set/new_model* directory.


In [None]:
data_set = widgets.ToggleButtons(
    options=['2024 Oscars', 'Quantum', 'Agentic AI', 'Your Content 1', 'Your Content 2'],
    tooltips=['2024 Oscar Awards Ceremony', 'Quantum Roadmap and Patterns', 'Artificial Intelligence Agents', 'Your own uploaded content dataset 1', 'Your own uploaded content dataset 2'],
    description='Dataset:', disabled=False, button_style='', style={"button_width": "auto"}
)
model = widgets.ToggleButtons(
    options=['Fine Tuned Model', 'Newly Tuned Model'],
    tooltips=['Tested fine tuned model', 'Newly tuned model'],
    description='Model:', disabled=False, button_style='', style={"button_width": "auto"}
)

print("\nPlease select correct document which was used in notebook 01_training_with_InstructLab")
print("\nSelect the content (Last used in training is preselected):")
data_set.value=jsonState["last_use_case"]
display(data_set)
display(model)
if dual_screen:
    with out:
        out.clear_output()
        display(widgets.HTML(H1+"Step 2. Load the Base Model and the InstructLab Trained Model"))
        display(widgets.HTML(Norm+"<br>Select the content (Last used in training is preselected):"))
        display(data_set)
        display(model)
        display(widgets.HTML(Small+"Select 'Demo Content' or 'Your Own Content' if you provided your own created QNA file"))
else:
    print("After choosing your dataset for inferencing, select the following cell and continue running the notebook")         
run.pause()

In [None]:
if dual_screen:
    with out:
        display(widgets.HTML(H1+"Step 2. Load the Base Model and the InstructLab Trained Model"))
        display(widgets.HTML(Norm+"Using Data Set: " + data_set.value))
else:
    print("Using Data Set: " + data_set.value)
if data_set.value=='2024 Oscars':
    use_case="oscars"
elif data_set.value=='Quantum':
    use_case="quantum"
elif data_set.value=='Agentic AI':
    use_case="agentic_ai"
elif data_set.value=='Your Content 1':
    use_case="your_content_1"
elif data_set.value=='Your Content 2':
    use_case="your_content_2"       
else:
    print("ERROR: Please select correct document which was used in notebook 01_training_with_InstructLab")
    if dual_screen:
        with out:
            display(widgets.HTML(Norm+"ERROR: Please select correct document which was used in notebook 01_training_with_InstructLab"))
            
if model.value == 'Fine Tuned Model':
    directory="/fine_tuned_models/"+use_case+"/"
else:
    directory="/data/"+ use_case+"/new_model/"
    
notebook_dir=os.getcwd()
os.chdir('/home/jovyan/')
pwd= os.getcwd()

base_model_path = notebook_dir +"/models/granite-7b-lab-Q4_K_M.gguf"
trained_model_path = notebook_dir + directory+ "ggml-model-f16.gguf"

if dual_screen:
    with out:
        display(widgets.HTML(Norm+"Base model directory: "+base_model_path))
        display(widgets.HTML(Norm+"Trained model directory: "+trained_model_path))
        display(widgets.HTML(Norm+"Both LLMs are loaded"))
print("Base model directory: "+base_model_path)
print("Trained model directory: "+trained_model_path)
run.pause()

<a id="I2_3"></a>
# Step 3. Define a Function to Perform Inference on Base and Trained Models

In [None]:
def model_inference(base_model_path, trained_model_path):
    _DEFAULT_TEMPLATE = """The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

    Current conversation:
    Human: {input}
    AI:"""
    
    base_llm = LlamaCpp(model_path=base_model_path,
                   verbose=False,
                   n_gpu_layers=25,
                   max_tokens=90,
                   temperature=0,
                   top_k=1
                  )
    trained_llm = LlamaCpp(model_path=trained_model_path,
                   verbose=False,
                   n_gpu_layers=25,
                   max_tokens=90,
                   temperature=0,
                   top_k=1
                  )   

    PROMPT = PromptTemplate( input_variables=["input"], 
                            template=_DEFAULT_TEMPLATE
                            )
    
    chain1 = PROMPT | base_llm | StrOutputParser()
    chain2 = PROMPT | trained_llm | StrOutputParser()
    if dual_screen:
        with out:
            display(widgets.HTML(Norm+"Ready to ask questions in the code window"))
   
    while True:
        question = input("Ask me a question (type 'exit' to end): ")        
        if question.lower() == 'exit':
            if dual_screen:
                with out:
                    display(widgets.HTML(Norm+"Exiting this Q&A session."))
            else:
                print("Exiting this Q&A session.")
            break
        else:            
            if dual_screen:
                with out:
                    display(widgets.HTML(Norm+"<br>You asked: " + question))
                    answer1 = chain1.invoke(question)
                    answer1= answer1.split('Human',1)[0]
                    display(widgets.HTML(Norm+"Base Model Answer: " + answer1))
                    answer2 = chain2.invoke(question)
                    answer2= answer2.split('Human',1)[0] 
                    display(widgets.HTML(Norm+"Trained Model Answer: " + answer2))
            else:
                print("You asked: ", question)
                answer1 = chain1.invoke(question)
                answer1= answer1.split('Human',1)[0]
                print ("Base Model Answer: ",answer1)
                answer2 = chain2.invoke(question)
                answer2= answer2.split('Human',1)[0] 
                print ("Trained Model Answer: ",answer2)
if dual_screen:
    with out:
        out.clear_output()
        display(widgets.HTML(H1+"Step 3. Define a Function to Perform Inference on Base and Trained Models"))
        display(widgets.HTML(Norm+"Function defined"))
else:
    print(f"Function to perform inference on LLMs defined")
    
run.pause()

<a id="I2_4"></a>
# Step 4. Run Interactive Q&A Session with Base and Trained Models to Evaluate Performance

## 4.1 Sample questions that can be asked to LLM

The following are sample questions derived from the data used to generate synthetic data, which was then employed to train the language model.

In [None]:
#Display Sample Questions
with open(notebook_dir+'/data/' + use_case + '/questions.txt') as f:
    for line in f.readlines():
        display(widgets.HTML(Norm+line))

if dual_screen:
    with out:
        display(widgets.HTML(H1+"Step 4. Run Interactive Q&A Session"))
        display(widgets.HTML(Norm+"Processing may take a couple minutes on the first run..."))
        model_inference(base_model_path, trained_model_path)
else:
    print("Processing may take several minutes on the first run...")
    model_inference(base_model_path, trained_model_path)
run.resume()    

<a id="I2_conclusion"></a>
# Conclusion

This notebook demonstrated inferencing with models produced using InstructLab.

<a id="I2_learn"></a>
# Learn More

Proceed to run the [Training with Red Hat AI InstructLab Service](./03_training_with_RH_AI_InstructLab_Service.ipynb) notebook utilize the Red Hat AI InstructLab IBM Cloud-based service. 

This notebook is based on the InstructLab CLI repository available [here](https://github.com/instructlab/instructlab).

InstructLab uses a novel synthetic data-based alignment tuning method for Large Language Models introduced in this [paper](https://arxiv.org/abs/2403.01081).

Contact us by email to ask questions, discuss potential use cases, or schedule a technical deep dive. The contact email is IBM.Research.JupyterLab@ibm.com.

© 2025 IBM Corporation