# Prompt engineering with LangChain - an NER example

*Written by Jaya Chaturvedi and Angus Roberts, May 2024*

---

###Generative models###
In pevious practicals, we have used the BERT transfomer model. To recap, this model is:

* the encoder layer of an encoder-decoder
* trained on a masked word prediction task, and
* trained on a next sentence prediction task
* trained on 3.3 billion words
* 110 million parameters

We are now going to look at generative large language models, such as the GPT models. Such models are:

* the decoder layer of an encoder-decoder
* trained on a text generation task (predict the next word), and
* conditioned on other tasks, such as following instructions
* trained on terrabytes or petabytes of text
* Billions of parameters

This practical has been written for use with an 8 billion parameter version of Llama 3, so more thsana 70 times larger then BERT. It has been instruction tuned and optimized for dialogue use cases. There is also an 80 billion parameter version available.

###Prompting###
These models generate text. You can play with them, asking them to write songs, poems etc etc, typing text at a **prompt**, with the model completing the text you have started by generating the next N words. What you get depends partly on the model, partly on model parameters you set, but also partly on the way in which you ask the question - i.e. your **prompt engineering**.

We will look at how we might design prompts to generate output in a consistent format a task, specifically for NER.

###Hosting the model - Hugging Face###

These models are large! You could put them on your laptop, but (a) they will take time to download and (b) they might dwarf it's memory. So, we will use a remotely hosted model, on Hugging Face. Hugging Face has several publically availabel models. For Llama 3, we are using a paid-for model hosting service. This gives us more control over availability, scaling etc.

###Interacting with the model - LangChain###

All models have different ways of interacting with them, whether via a prompt or an API. We will be using a widely-used Python library that hides these differences behind a common API, LangChain. LangChain will work with many different models, both remotely hosted and locally hosted. So you should be able to re-purpose this code for other models and situations.

LangChain is big - we will only look at a few of LangChain's features, and some basic prompting strategies.

There are often other ways, but we. hope that this practical will show you the basics.

###Further information and resources###

* We would like to recommend Chapter 12 in the 3rd edition of Jurafsky and Martin's "Speech and Language Processing" - but it hasn't yet been written!



---

## Imports

In [None]:
# LangChain needs to have the Hugging Face transformers package installed,
# and we need several LangChain packages
%pip install --upgrade --quiet transformers
%pip install --upgrade --quiet langchain langchain_community
%pip install --upgrade --quiet langchain-huggingface

In [None]:
# The most important imports are her, though we will import
# a small number of other packages later, for specific pieces of code
import langchain_community
from langchain_huggingface.llms import HuggingFaceEndpoint
from langchain.chains import LLMChain
from langchain.prompts import PromptTemplate

## Using Huggingface endpoints

Adapted from https://python.langchain.com/docs/integrations/llms/huggingface_pipelines


From https://python.langchain.com/docs/integrations/llms/huggingface_endpoint

Huggingface serverless inference is described here:

https://huggingface.co/docs/api-inference/index

This explains how to get an API token from your account, and how to construct the model URL. It also gives code for using a model, but we will instead used Langchain to interface to the Huggingface API.

You can set up your own dedicated endpoint to which you deploy a model, giving better availability than the public endpoints. There is a cost:

https://huggingface.co/inference-endpoints/dedicated

UI for starting and stopping and configuring endpoints is here:

https://ui.endpoints.huggingface.co/angusroberts/endpoints


protected endpoint - seems to need own token
public endpoint - still needs a token, but can be any




In [None]:

# You need to get your API token from huggingface, run this cell and paste
# it in to the resulting prompt, for use in later sections
# How to get a token is described here:
# https://huggingface.co/docs/api-inference/quicktour#get-your-api-token

# getpass provides an obscured password prompt.
# os gives access to operating ssytem functionality, which
# we need to set an environment-wide variable to hold our token
from getpass import getpass
import os

HUGGINGFACEHUB_API_TOKEN = getpass()

# We put the token in an environment variable, from where LangChain will access it when needed
os.environ["HUGGINGFACEHUB_API_TOKEN"] = HUGGINGFACEHUB_API_TOKEN

## Querying a Hugging Face LLM server directly

In [None]:
# This example uses the Hugging Face service API direct
# to access a freely available GPT2 model hosted
# on Hugging Face

# requests is a package to send requets to web servers
import requests

# the Hugging Face API server for GPT2
API_URL = "https://api-inference.huggingface.co/models/gpt2"

# Headers for our request - the token
headers = {"Authorization": f"Bearer {HUGGINGFACEHUB_API_TOKEN}"}

# function to post the request and return the response
# takes a json query as the payload
def query(payload):
    response = requests.post(API_URL, headers=headers, json=payload)
    return response.json()

# Send a query and print the results
data = query({"inputs": "Can you please let us know more details about your "})
print(data)

## A simple LangChain LLM, using a Hugging Face free endpoint

Next we will see how to create a LangChain LLM that wraps a Hugging Face endpoint

What is the advantage of this approach, over querying directly as in the above example?

In [None]:
# You can also use these free model endpoints via Lang Chain, an API that wraps up
# many different ways of accessing models in a common interface.
# Here we use it to access a public Hugging Face endpoint which has limited models,
# and no guarantee on performance or availability

repo_id = "openai-community/gpt2"
llm = HuggingFaceEndpoint(
   endpoint_url="https://api-inference.huggingface.co/models/" + repo_id,
   task="text-generation",
   temperature = 0.1,
   model_kwargs={"max_length": 128}
)
llm("When I went to Paris, I ")


## Using a paid-for endpoint

We can use exactly the same code for a paid-for endpoint, or for a local model. Here, we connect to a paid-for endpoint, which we will use for the rest of the practical.

What are the pros and cons of using these different methods of model delivery?

In [None]:
# Using the paid for model endpoint, which can host a wider range of models
# This is the url of a paid for endpoint - replace with whichever you are using
# You need to enter the URL provided for the practical!
endpoint_url = "PUT THE PROVIDED ENDPOINT URL HERE"



In [None]:
llm = HuggingFaceEndpoint(
    endpoint_url=endpoint_url,
    max_new_tokens=256,
    top_k=10,
    top_p=0.95,
    typical_p=0.95,
    temperature=0.01,
    repetition_penalty=1.03,
)
llm("My favourite joke is ")

## Writing prompts

Reuse

In [None]:
# Creating a prompt / LLM chain and running it

template = """Question: {question}

Answer: """

prompt = PromptTemplate.from_template(template)
chain = prompt | llm





In [None]:
question = "Why is the sun so hot?"
print(chain.invoke({"question": question}))

Try a few other questions:

* What is the French for cheese?
* What is 2+2 ?
* How many dollars to the euro?
* Why is the sun so hot?

What do you notice about the answers? Could we use this as a translation app, calculator, or currency converter?

How does varying the parameters vary the response? What do each of the parameters do?

What about varying the prompt? Can you get it to format your answers in different ways?

## A simple few shot example

building on the above

In [None]:
# Simple few shot learning

template = """Translate English to French:

        sea otter => loutre de mer
        peppermint => menthe poivrée
        plush girafe => girafe peluche
        {word} =>"""

prompt = PromptTemplate.from_template(template)
chain = prompt | llm

english_word = "cheese"
print(chain.invoke({"word": english_word}))


Can you get it to give us just one answer?

## Medication Extraction using zero-shot and few-shot learning

In [None]:
# Example document
clinical_document = """
The patient was initially prescribed Metoprolol 50 mg twice daily for hypertension.
During the first follow-up visit, the dosage of Metoprolol was increased to 100 mg twice daily.
Later, the patient developed side effects and Metoprolol was switched to Atenolol 50 mg once daily.
In the subsequent visit, Amlodipine 5 mg was added to the treatment plan.
At the final follow-up, the Atenolol dosage was increased to 100 mg once daily, and Amlodipine was continued.
"""

In [None]:
# simple prompt
def get_meds_response(text):

    template = """
    Extract all medications and their doses from the following clinical document:

    {text}
    Answer:"""

    prompt = PromptTemplate.from_template(template)
    chain = prompt | llm
    response = chain.invoke({"text": text})
    return response

In [None]:
extraction_response = get_meds_response(clinical_document)
print("Response:\n", extraction_response)

## Llama 3 prompt format

We can improve our interaction by using a pre-specified prompt format with which Llama has been trained. This forces the model to constrain outputm and gives much cleaner results

In [None]:
# using Llama 3 prompt format
def get_llama3_meds_response(text):

  # Create a template using the Llama 3 prompting format
  template = """<|begin_of_text|><|start_header_id|>system<|end_header_id|>

{system_prompt}<|eot_id|>
<|start_header_id|>user<|end_header_id|>

{user_prompt}<|eot_id|><|start_header_id|>assistant<|end_header_id|>"""

  system_prompt = """You are an AI assistant that extracts medications and their doses from health record text.
  When you are given a piece of text, you will list all of the medications that are in the text."""

  prompt = PromptTemplate(
      input_variables=["system_prompt", "user_prompt"],
      template=template
  )

  # Invoke the model
  response = llm(prompt.format(system_prompt=system_prompt, user_prompt=text))
  chain = prompt | llm
  response = chain.invoke({"system_prompt":system_prompt, "user_prompt":text})
  return response

In [None]:
extraction_response = get_llama3_meds_response(clinical_document)
print("Response:\n", extraction_response)

Try changing your prompt in different ways, to get:

* A list with one item for each time a medication is mentioned, instead of one item for each uique medication
* A list with both medications and doses?
* A response without any leading or trailing commentary?
* Try to get Llama to structure the output, so that medications and doses are marked in different ways (e.g. put brackets around the doses)
* Can you get the output as JSON?

## Few shot prompting with output examples

In [None]:
# Few-shot examples for medication extraction
med_examples = [
    {
        "text": "The patient was prescribed Metformin 500 mg twice daily for diabetes. After two months, the dose was increased to 1000 mg twice daily.",
        "extracted_medications": "[Metformin 500 mg twice daily], [Metformin 1000 mg twice daily]"
    },
    {
        "text": "Initially, Atorvastatin 20 mg was prescribed. During the first follow-up, the dosage was increased to 40 mg. Later, Ezetimibe 10 mg was added.",
        "extracted_medications": "[Atorvastatin 20 mg], [Atorvastatin 40 mg], [Ezetimibe 10 mg]"
    }
]

In [None]:
# few-shot
def few_shot_medication_extraction(text, examples):
    example_prompts = "\n\n".join([f"Text: {ex['text']}\nExtracted Medications: {ex['extracted_medications']}" for ex in med_examples])
    prompt = f"Use the following examples to guide you on how to extract medications and their doses from the given clinical document.\n\nExamples:\n{example_prompts}\n\nClinical document: {text}\nExtracted Medications:"
    response = llm(prompt)
    return response

In [None]:
# Few-shot medication extraction
few_shot_extraction_result = few_shot_medication_extraction(clinical_document, med_examples)
print("Response:\n", few_shot_extraction_result)

## Few shot prompting with Llama's prompt format

In [None]:
# using Llama 3 prompt format
def get_llama3_few_shot_meds_response(text):

  # Create a template using the Llama 3 prompting format
  template = """<|begin_of_text|><|start_header_id|>system<|end_header_id|>

{system_prompt}<|eot_id|>
<|start_header_id|>user<|end_header_id|>

{user_prompt}<|eot_id|><|start_header_id|>assistant<|end_header_id|>"""

  system_prompt = """You are an AI assistant that extracts medications and their doses from health record text.
  When you are given a piece of text, you will list all of the medications that are in the text.
  Here are some examples, showing how you should format the list:\n\n"""

  example_prompts = "\n\n".join([f"Text: {ex['text']}\nExtracted Medications: {ex['extracted_medications']}" for ex in med_examples])

  system_prompt = system_prompt + example_prompts


  prompt = PromptTemplate(
      input_variables=["system_prompt", "user_prompt"],
      template=template
  )

  # Invoke the model
  response = llm(prompt.format(system_prompt=system_prompt, user_prompt=text))
  chain = prompt | llm
  response = chain.invoke({"system_prompt":system_prompt, "user_prompt":text})
  return response

In [None]:
# Few-shot medication extraction
few_shot_extraction_result = get_llama3_few_shot_meds_response(clinical_document)
print("Response:\n", few_shot_extraction_result)

Can you elaborate the prompt further, to separate out the drug, does and frequency into separate features in the output that might be easily parsed in to e.g, a dictionary?

## Identifying and annotating symptoms

Write and test prompts to mark mentions of symptoms in an input text

In [None]:
# An input document to annotate document
clinical_document = """
The patient was admitted with severe headache, nausea, and dizziness.
They also reported experiencing fatigue and joint pain.
Additionally, there was mention of occasional shortness of breath and a persistent cough.
"""

In [None]:
# Some examples of how we wopuld like the output to be formatted
examples = [
    {
        "text": "The patient was given acetaminophen and complained of headaches and fatigue.",
        "annotated": "The patient was given acetaminophen and complained of [headaches] and [fatigue]."
    },
    {
        "text": "She was treated with amoxicillin and experienced a rash and itching.",
        "annotated": "She was treated with amoxicillin and experienced a [rash] and [itching]."
    }
]

# Extending
Can you get your symptom annotation example to tell you the start and finish character offsets of the symptoms, like in these examples?



In [None]:
examples = [
    {
        "text": "The patient was given acetaminophen and complained of headaches and fatigue.",
        "annotations": "[symptom: headaches, start_offset: 54, end_offset: 63], [symptom: fatigue, start_offset: 68, end_offset: 75]"

    },
    {
        "text": "She was treated with amoxicillin and experienced a rash and itching.",
        "annotations": "[symptom: rash, start_offset: 51, end_offset: 55], [symptom: itching, start_offset: 60, end_offset: 67]"
    }
]

## Next steps
This notebook has shown you some of the basics of prompt engineering. Recent models and APIs are much more sophisticated than the exmaples we have given, and often provide facilities for more detailed interaction:

* output parsing
* tool and function calls
* direct support for e.g. JSON

## The assignment

Can you develop prompts to provide an answer for the assignment?

You will probably have to use a free online LLM, such as ChatGPT, a free Hugging Face or perplexity online model. This means you are unlikley to be able to evaluate agaist a large dataset. But, you might be able to show some results with minimal training, and you might be able to do a very small evaluation - a limitation. What other limitations might there be? Will the test data be blind?