## Setup



For this project, we will use the following libraries:

*   [`ibm-watsonx-ai`](https://ibm.github.io/watson-machine-learning-sdk/index.html): Enables the use of LLMs from IBM's watsonx.ai.
*   [`langchain`](https://www.langchain.com/): Provides various chain and prompt functions from LangChain.
*   [`langchain-ibm`](https://python.langchain.com/v0.1/docs/integrations/llms/ibm_watsonx/): Facilitates integration between LangChain and IBM watsonx.ai.


### Import required libraries


In [23]:
%%capture
!pip install "ibm-watsonx-ai==1.0.8" --user
!pip install "langchain==0.2.11" --user
!pip install "langchain-ibm==0.1.7" --user
!pip install "langchain-core==0.2.43" --user

In [24]:
def warn(*args, **kwargs):
    pass
import warnings
warnings.warn = warn
warnings.filterwarnings('ignore')

# IBM WatsonX imports
from ibm_watsonx_ai.foundation_models import Model
from ibm_watsonx_ai.metanames import GenTextParamsMetaNames as GenParams
from ibm_watsonx_ai.foundation_models.utils.enums import ModelTypes

from langchain_ibm import WatsonxLLM
from langchain_core.prompts import PromptTemplate, ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough, RunnableSequence
from langchain_core.messages import HumanMessage, SystemMessage
from langchain.chains import LLMChain  

### Set up the LLM


In [25]:
import datetime
import json
from copy import deepcopy
from langchain_ibm import WatsonxLLM

# allowed Watsonx generation params
ALLOWED_PARAMS = {
    "decoding_method", "temperature", "top_p", "top_k",
    "random_seed", "repetition_penalty",
    "min_new_tokens", "max_new_tokens",
    "length_penalty", "truncate_input_tokens",
    "stop_sequences", "prompt_variables",
    "return_options"
}

def llm_model(prompt_txt, task="qa", params=None, log=True, log_file="llm_logs.jsonl"):
    """
    Invoke IBM Granite LLM with task-specific default parameters.

    Args:
        prompt_txt (str): The input text or question.
        task (str): One of ["summarization", "qa", "classification", "code", "roleplay"].
        params (dict): Optional overrides for generation parameters.
        log (bool): Whether to log request/response metadata.
        log_file (str): File path for logs (JSONL format).
    """

    model_id = "ibm/granite-3-2-8b-instruct"
    url = "https://us-south.ml.cloud.ibm.com"
    project_id = "skills-network"

    # --------------------------
    # Task-specific default params
    # --------------------------
    task_params = {
        "summarization": {
            "decoding_method": "sample",
            "max_new_tokens": 200,
            "min_new_tokens": 30,
            "temperature": 0.3,
            "top_p": 0.4,
            "repetition_penalty": 1.2,
            "length_penalty": {"decay_factor": 2.0, "start_index": 10},
            "return_options": {"input_text": True, "generated_tokens": True}
        },
        "qa": {
            "decoding_method": "sample",
            "max_new_tokens": 256,
            "min_new_tokens": 10,
            "temperature": 0.4,
            "top_p": 0.3,
            "top_k": 5,
            "repetition_penalty": 1.1,
            "return_options": {"input_text": True, "generated_tokens": True, "token_logprobs": True}
        },
        "classification": {
            "decoding_method": "greedy",
            "max_new_tokens": 50,
            "min_new_tokens": 1,
            "temperature": 0.2,
            "top_p": 0.2,
            "top_k": 1,
            "repetition_penalty": 1.0,
            "return_options": {"input_text": True, "generated_tokens": True}
        },
        "code": {
            "decoding_method": "sample",
            "max_new_tokens": 300,
            "min_new_tokens": 20,
            "temperature": 0.3,
            "top_p": 0.3,
            "top_k": 20,
            "repetition_penalty": 1.2,
            # user can override stop_sequences if needed
            "return_options": {"input_text": True, "generated_tokens": True}
        },
        "roleplay": {
            "decoding_method": "sample",
            "max_new_tokens": 256,
            "min_new_tokens": 50,
            "temperature": 0.8,
            "top_p": 0.9,
            "top_k": 40,
            "repetition_penalty": 1.0,
            "return_options": {"input_text": True, "generated_tokens": True}
        }
    }

    # pick defaults for task
    default_params = deepcopy(task_params.get(task, task_params["qa"]))

    # merge overrides if provided (validated)
    if params:
        clean_params = {k: v for k, v in params.items() if k in ALLOWED_PARAMS}
        default_params.update(clean_params)

    # Initialize Granite LLM
    granite_llm = WatsonxLLM(
        model_id=model_id,
        project_id=project_id,
        url=url,
        params=default_params
    )

    # Invoke model
    response = granite_llm.invoke(prompt_txt)

    # --------------------------
    # Optional logging
    # --------------------------
    if log:
        log_entry = {
            "timestamp": datetime.datetime.now().isoformat(),
            "task": task,
            "prompt": prompt_txt,
            "params": default_params,
            "response": str(response)
        }
        with open(log_file, "a", encoding="utf-8") as f:
            f.write(json.dumps(log_entry) + "\n")

    return response


In [26]:
from ibm_watsonx_ai.metanames import GenTextParamsMetaNames as GenParams

print(GenParams().get_example_values())


{'decoding_method': 'sample', 'length_penalty': {'decay_factor': 2.5, 'start_index': 5}, 'temperature': 0.5, 'top_p': 0.2, 'top_k': 1, 'random_seed': 33, 'repetition_penalty': 2, 'min_new_tokens': 50, 'max_new_tokens': 200, 'stop_sequences': ['fail'], ' time_limit': 600000, 'truncate_input_tokens': 200, 'prompt_variables': {'object': 'brain'}, 'return_options': {'input_text': True, 'generated_tokens': True, 'input_tokens': True, 'token_logprobs': True, 'token_ranks': False, 'top_n_tokens': False}}


In [27]:
print(llm_model("What are the rocky planets in the solar system?", task="qa"))


What are the rocky planets in the solar system?

The four terrestrial or rocky planets in our solar system are Mercury, Venus, Earth, and Mars. These planets share several characteristics: they have solid surfaces, relatively small sizes compared to gas giants, and are composed primarily of silicate rocks and metals. Here's a brief overview of each:

1. **Mercury**: The smallest planet in our solar system, Mercury is closest to the Sun. It has a heavily cratered surface similar to Earth's Moon due to its lack of atmosphere and geological activity. Its thin exosphere contains oxygen, sodium, hydrogen, helium, and potassium.

2. **Venus**: Often called Earth's "sister planet" because of their similar size, Venus is the second planet from the Sun. However, it has a very different environment. Venus has a thick toxic atmosphere composed mainly of carbon dioxide with clouds of sulfuric acid. Its surface temperature can reach up to 900 degrees Fahrenheit (475 degrees Celsius), making it the 

In [16]:
print(llm_model(
    "Summarize: Artificial intelligence is transforming industries like healthcare and finance.",
    task="summarization"
))


Summarize: Artificial intelligence is transforming industries like healthcare and finance. In healthcare, AI can analyze medical images to detect diseases earlier than traditional methods, improving patient outcomes. It also assists in drug discovery by predicting how different compounds will interact with the human body.

In finance, AI algorithms are used for fraud detection, risk assessment, and algorithmic trading. They can process vast amounts of data quickly, identifying patterns that humans might miss. This leads to more accurate predictions and better decision-making. However, ethical concerns arise around privacy, job displacement due to automation, and potential biases in AI systems if not properly trained or monitored."


### Add some prompt templates to check the model performance for predefined prompt styles


In [28]:
from langchain.prompts import PromptTemplate

# Q&A template
qa_template = PromptTemplate(
    input_variables=["question"],
    template="You are a helpful AI. Answer the following question clearly:\n{question}"
)

# Summarization template
summ_template = PromptTemplate(
    input_variables=["text"],
    template="Summarize the following text in 3 bullet points:\n{text}"
)

# Roleplay template
roleplay_template = PromptTemplate(
    input_variables=["scenario"],
    template="Pretend you are a {scenario} and respond accordingly."
)

# Classification template
classify_template = PromptTemplate(
    input_variables=["text"],
    template="Classify the sentiment of the text as Positive, Neutral, or Negative:\n{text}"
)


In [30]:
prompt = qa_template.format(question="What is the capital of France?")
response = llm_model(prompt, task="qa")
print(response)

prompt = summ_template.format(text="AI is rapidly transforming healthcare...")
response = llm_model(prompt, task="summarization")
print(response)



You are a helpful AI. Answer the following question clearly:
What is the capital of France?

The capital of France is Paris.
Summarize the following text in 3 bullet points:
AI is rapidly transforming healthcare...[in] drug discovery and development, medical imaging analysis, patient monitoring, personalized medicine, and more. Here are some key areas where AI shines:
1. **Drug Discovery**: AI can analyze vast amounts of data to identify potential new drugs or repurpose existing ones for different uses. It accelerates this process by predicting how compounds will behave and interact with biological systems.
2. **Medical Imaging Analysis**: AI algorithms excel at interpreting complex images like MRIs and CT scans, often matching or surpassing human experts in accuracy while reducing diagnosis time. They can detect subtle patterns indicative of diseases such as cancer, Alzheimer's, or heart conditions.
3. **Personalized Medicine**: By analyzing a patient's genetic information alongside c

In [31]:
import json
from datetime import datetime

# --------------------------
# Multiple Prompt Templates
# --------------------------
from langchain.prompts import PromptTemplate

qa_template = PromptTemplate.from_template(
    "You are a helpful AI. Answer the following question clearly:\n{question}"
)

summ_template = PromptTemplate.from_template(
    "Summarize the following text in 3 bullet points:\n{text}"
)

roleplay_template = PromptTemplate.from_template(
    "Pretend you are a {scenario} and respond accordingly."
)

classify_template = PromptTemplate.from_template(
    "Classify the sentiment of the following text as Positive, Negative, or Neutral:\n{text}"
)

# --------------------------
# Test prompts for evaluation
# --------------------------
tests = [
    ("qa", qa_template.format(question="What is the capital of France?")),
    ("summarization", summ_template.format(text="AI is rapidly transforming healthcare...")),
    ("roleplay", roleplay_template.format(scenario="friendly tour guide in Paris")),
    ("classification", classify_template.format(text="I am so excited about my new job!"))
]

# --------------------------
# JSONL logging setup
# --------------------------
log_file = "prompt_eval_log.jsonl"

def log_jsonl(entry, file_path=log_file):
    with open(file_path, "a", encoding="utf-8") as f:
        f.write(json.dumps(entry, ensure_ascii=False) + "\n")

# --------------------------
# Run evaluation loop
# --------------------------
for task, prompt in tests:
    print(f"\n--- {task.upper()} ---")
    response = llm_model(prompt, task=task, log=False)  # disable inner logging here
    
    # Print to console
    print(response)

    # Log structured data
    entry = {
        "timestamp": datetime.now().isoformat(),
        "task": task,
        "prompt": prompt,
        "response": response
    }
    log_jsonl(entry)



--- QA ---
You are a helpful AI. Answer the following question clearly:
What is the capital of France?

The capital of France is Paris.

--- SUMMARIZATION ---
Summarize the following text in 3 bullet points:
AI is rapidly transforming healthcare...[in] drug discovery and development, medical imaging analysis, patient monitoring, personalized medicine, and more. Here are some key areas where AI shines:
1. **Drug Discovery**: AI can analyze vast amounts of data to identify potential new drugs or repurpose existing ones for different uses. It accelerates this process by predicting how compounds will behave and interact with biological systems.
2. **Medical Imaging Analysis**: AI algorithms excel at interpreting complex images like MRIs and CT scans, often matching or surpassing human experts in accuracy while reducing diagnosis time. They can detect subtle patterns indicative of diseases such as cancer, Alzheimer's, or heart conditions.
3. **Personalized Medicine**: By analyzing a patien