# Prompt engineering with LangChain - an NER example

*Written by Jaya Chaturvedi and Angus Roberts, May 2024*

---

###Generative models
In pevious practicals, we have used the BERT transfomer model. To recap, this model is:

* the encoder layer of an encoder-decoder
* trained on a masked word prediction task, and
* trained on a next sentence prediction task
* trained on 3.3 billion words
* 110 million parameters

We are now going to look at generative large language models, such as the GPT models. Such models are:

* the decoder layer of an encoder-decoder
* trained on a text generation task (predict the next word), and
* conditioned on other tasks, such as following instructions
* trained on terrabytes or petabytes of text
* Billions of parameters

This practical has been written for use with an 8 billion parameter version of Llama 3, so more than 70 times larger then BERT. It has been instruction tuned and optimized for dialogue use cases. There is also an 80 billion parameter version available.

###Prompting
These models generate text. You can play with them, asking them to write songs, poems etc etc, typing text at a **prompt**, with the model completing the text you have started by generating the next N words. What you get depends partly on the model, partly on model parameters you set, but also partly on the way in which you ask the question - i.e. your **prompt engineering**.

We will look at how we might design prompts to generate output in a consistent format a task, specifically for NER.

###Hosting the model - Hugging Face

These models are large! You could put them on your laptop, but (a) they will take time to download and (b) they might dwarf it's memory. So, we will use a remotely hosted model, on Hugging Face. Hugging Face has several publically availabel models. For Llama 3, we are using a paid-for model hosting service. This gives us more control over availability, scaling etc.

###Interacting with the model - LangChain

All models have different ways of interacting with them, whether via a prompt or an API. We will be using a widely-used Python library that hides these differences behind a common API, LangChain. LangChain will work with many different models, both remotely hosted and locally hosted. So you should be able to re-purpose this code for other models and situations.

LangChain is big - we will only look at a few of LangChain's features, and some basic prompting strategies.

There are often other ways, but we. hope that this practical will show you the basics.

###Further information and resources

We would like to recommend Chapter 12 in the 3rd edition of Jurafsky and Martin's "Speech and Language Processing" - but it hasn't yet been written! Try these instead:

* A
* B



---

## Imports

In [None]:
# LangChain needs to have the Hugging Face transformers package installed,
# and we need several LangChain packages
%pip install --upgrade --quiet transformers
%pip install --upgrade --quiet langchain langchain_community
%pip install --upgrade --quiet langchain-huggingface

In [None]:
# The most important imports are her, though we will import
# a small number of other packages later, for specific pieces of code
import langchain_community
from langchain_huggingface.llms import HuggingFaceEndpoint
from langchain.chains import LLMChain
from langchain.prompts import PromptTemplate

## Using Hugging Face endpoints - getting a access token

We will use a model hosted at a Hugging Face endpoint. To use this, you will need to have. Hugging Face account, and generate a Hugging Face access token. This can all be done from the [Hugging Face website](https://huggingface.co/) and is fairly self explanatory, but for details see the previous practical, and at this link:

[Hugging Face security tokens](https://huggingface.co/docs/hub/en/security-tokens)

Once you have a token, you need to run the code cell below, and when asked paste your token in to the prompt. It will remain hidden.

<details>
Details on setting up endpoints

(You don't need to know this - some links different types of Hugging Face endpoints and setting them up, useful when creating new endpoints for this practical)


* [LangChain integration with Hugging Face endpoints](https://python.langchain.com/docs/integrations/llms/huggingface_endpoint)
* [Hugging Face serverless inference](https://huggingface.co/docs/api-inference/index)
* [Setting up Hugging Face dedicated endpoints](https://huggingface.co/inference-endpoints/dedicated)
* The UI for starting and stopping and configuring endpoints is here: `https://ui.endpoints.huggingface.co/[YOUR USER NAME]/endpoints`
* Types of endpoint
 * Protected - need the owner's token
 * Public - needs any user's token
</details>

In [None]:

# You need to get your access token from huggingface, run this cell and paste
# it in to the resulting prompt, for use in later sections
# How to get a token is described here:
# https://huggingface.co/docs/api-inference/quicktour#get-your-api-token

# getpass provides an obscured password prompt.
# os gives access to operating ssytem functionality, which
# we need to set an environment-wide variable to hold our token
from getpass import getpass
import os

HUGGINGFACEHUB_API_TOKEN = getpass()

# We put the token in an environment variable, from where LangChain will access it when needed
os.environ["HUGGINGFACEHUB_API_TOKEN"] = HUGGINGFACEHUB_API_TOKEN

## Querying a Hugging Face LLM server directly

Now you have an access token, you can use model endpoint hosted by Hugging Face in your code. There are many APIs that allow you to do this. The first method we will use is to call it directly as a web server, using a standard post method sent to the web server with the normal HTTP(S) protocol, just like your web browser uses.

In [None]:
# This example uses the Hugging Face service API direct
# to access a freely available GPT2 model hosted
# on Hugging Face

# requests is a package to send requets to web servers
import requests

# the Hugging Face API server for GPT2
API_URL = "https://api-inference.huggingface.co/models/gpt2"

# Headers for our request - the token
headers = {"Authorization": f"Bearer {HUGGINGFACEHUB_API_TOKEN}"}

# function to post the request and return the response
# takes a json query as the payload
def query(payload):
    response = requests.post(API_URL, headers=headers, json=payload)
    return response.json()

# Send a query and print the results
data = query({"inputs": "Can you please let us know more details about your "})
print(data)

You can of course try different pieces of text, getting the model to generate the sequence of words it predicts will come next.

## A simple LangChain LLM, using a Hugging Face free endpoint

Instead of using the http web protocol to communicate with our serer, we will use the LangChain API. This wraps many different types of remote and local model. The abstraction we will use is the LangChain LLM class, which you can learn more about in the [LangChain documentation](https://python.langchain.com/v0.1/docs/modules/model_io/llms/quick_start/).

We will create a LangChain LLM that wraps a Hugging Face endpoint, using the LangChain [HuggingFaceEndpoint class](https://api.python.langchain.com/en/v0.1/llms/langchain_community.llms.huggingface_endpoint.HuggingFaceEndpoint.html#langchain-community-llms-huggingface-endpoint-huggingfaceendpoint), which implements the LangChain LLM interface.

What is the advantage of this approach, over querying directly as in the above example?

We will start by querying a free public model with a simple prompt, as before.

In [None]:
# You can also use these free model endpoints via Lang Chain, an API that wraps up
# many different ways of accessing models in a common interface.
# Here we use it to access a public Hugging Face endpoint which has limited models,
# and no guarantee on performance or availability

# You could find out about other models hosted on Hugging Face, and replace
# the next line with the repository ID of another
repo_id = "openai-community/gpt2"

# Make the LLM
llm = HuggingFaceEndpoint(
   endpoint_url="https://api-inference.huggingface.co/models/" + repo_id,
   task="text-generation",
   temperature = 0.1,
   model_kwargs={"max_length": 128}
)

# Invoke the model directly
llm("When I went to Paris, I ")


## Using a paid-for endpoint - Llama 3

We can use exactly the same code for other free models, or for a paid-for endpoint hosted on HuggingFace. We could also use the same approach to create LLMs from local models, by using a different LangChain subclass of LLM.

Here, we connect to a paid-for endpoint, which we will use for the rest of the practical.

What are the pros and cons of using these different methods of model delivery?

In [None]:
# Using the paid for model endpoint, which can host a wider range of models
# This is the url of a paid for endpoint - replace with whichever you are using
# You need to enter the URL provided for the practical!
endpoint_url = "PUT THE PROVIDED ENDPOINT URL HERE"

In [None]:
# Make the model - we are starting to use more parameters
llm = HuggingFaceEndpoint(
    endpoint_url=endpoint_url,
    max_new_tokens=256,
    top_k=10,
    top_p=0.95,
    typical_p=0.95,
    temperature=0.01,
    repetition_penalty=1.03,
)

# Invoke the model directly
llm("My favourite joke is ")

How does the output of Llama 3 compare to the GPT 2 model used in the previous exercise?

## Writing prompts

So far we have written very simple throwaway prompts for our models. But, many prompting tasks follow repeated patterns. LangChain has abstractions for writing prompt templates, that we can reuse. You can read more about this in the documentation for the base class, [PromptTemplate](https://python.langchain.com/v0.1/docs/modules/model_io/prompts/quick_start/).

Once we have a PromptTemplate, we can use it with our model. There are various ways of doing this, perhaps the most useful being [LangChain Expression Language (LCEL)](https://python.langchain.com/v0.1/docs/expression_language/). This is a simple language that allows you to chain together different components that can be run by LangChain, such as LLMs and PromptTenmplates (because they implement a specific interface, Runnable). Here's a simple example, creating a chain from a prompt variable called `question` and a model called `llm`:

`chain = question | llm`

We can then invoke our chain, which will run the prompt and pass it to the model:

`chain.invoke({dictionary of parameters})`

There are lots of other LangChain interfaces and classes that can be used in these chains, such as output parsers which take the output and parse it in to something useful, and retrieval augmented generators, which retrieve sets of documents to pass to the model for use when answering. See the [LCEL documentation](https://python.langchain.com/v0.1/docs/expression_language/) for lots more examples.

Try these two prompt examples, varying your input, to see how prompts might be reused:

In [None]:
#-------------------------------------------
# EXAMPLE 1
#-------------------------------------------

# Creating a string template for our prompt,
# with a placeholder for a varianle that we
# will change each time we use it
template = """Where is {city}"""

# Now we can instantiate this prompt
prompt = PromptTemplate.from_template(template)

# And chain it with the LLM we created before
chain = prompt | llm

In [None]:
# Let's try it - have a go with a few city names
city_name = "Paris"
print(chain.invoke({"city": city_name}))

In [None]:
#-------------------------------------------
# EXAMPLE 2
#-------------------------------------------

# The prompt string
template = """Question: {question}

Answer: """

# Make the prompt
prompt = PromptTemplate.from_template(template)

# Chain it to the LLM
chain = prompt | llm

In [None]:
# Try it out
question = "Why is the sun so hot?"
print(chain.invoke({"question": question}))

Try this last prompt with a few other questions:

* What is the French for cheese?
* What is 2+2 ?
* How many dollars to the euro?
* Why is the sun so hot?

**Things to think about and try**

* What do you notice about the answers? Could we use this as a translation app, calculator, or currency converter?

* How does varying the parameters vary the response? What do each of the parameters do?

* What about varying the prompt? Can you get it to format your answers in different ways?

## A simple few shot example

The above examples are all cases of **zero shot learning** - we did not train the model in any way for our specific task (though of course it might hacve had lots of relevant training and conditioning when it was built).

For the above examples, although we did get reasonable answers, we also got lots of text that we didn't neccesarily want. We can train the model to give us more specific answers, by providing examples. This is one case of **few shot learning**

In [None]:
# Simple few shot template, with three examples
template = """Translate English to French:

        sea otter => loutre de mer
        peppermint => menthe poivrée
        plush girafe => girafe peluche
        {word} =>"""

# Make the prompt and chain
prompt = PromptTemplate.from_template(template)
chain = prompt | llm

# Invoke it
english_word = "cheese"
print(chain.invoke({"word": english_word}))


## Can you improve the prompt?
This works to some extent, but our prompt is not ideal. Can you change the prompt, so it gives us just one answer, the translation for the word we provided?

## Medication Extraction using zero-shot learning

We will now look at how we might use a generative model for medical Named Entity Recognition (NER). We will start with a simple zero-shot medication extraction experiment. Is it any use? How might we improve it?

In [None]:
# Example document
clinical_document = """
The patient was initially prescribed Metoprolol 50 mg twice daily for hypertension.
During the first follow-up visit, the dosage of Metoprolol was increased to 100 mg twice daily.
Later, the patient developed side effects and Metoprolol was switched to Atenolol 50 mg once daily.
In the subsequent visit, Amlodipine 5 mg was added to the treatment plan.
At the final follow-up, the Atenolol dosage was increased to 100 mg once daily, and Amlodipine was continued.
"""

In [None]:
# simple zero shot prompt
def get_meds_response(text):

    template = """
    Extract all medications and their doses from the following clinical document:

    {text}
    Answer:"""

    prompt = PromptTemplate.from_template(template)
    chain = prompt | llm
    response = chain.invoke({"text": text})
    return response

In [None]:
extraction_response = get_meds_response(clinical_document)
print("Response:\n", extraction_response)

## Using Llama 3's prompt format

You can greatly improve the results of prompting if you know a little about the way in which a model was trained. We are using an **instruction trained** version of Llama 3. In addition to the next word prediciton taask, it has been conditioned on a large set of instructions and answers, in which the model has been rewarded for generating good output for instructions that have been provided in a pre-specified format. If we follow the same format in our prompting, we will be able to take advantage of this conditioning. Such instruction tuning is a common feature of LLMs, each having its own specific instruction format. Llama 3's looks like this:

`<|begin_of_text|><|start_header_id|>system<|end_header_id|>`

`{system message}<|eot_id|>`
`<|start_header_id|>user<|end_header_id|>`

`{user message}<|eot_id|>`
`<|start_header_id|>assistant<|end_header_id|>`

The prompt is split in to three parts, each part relating to a particular role in the "conversation"

* **system** - here we can provide a message to tell the system how to behave, how to format it's output etc
* **user** - the prompt that we want answering
* **assistant** - the "reply" generated by Llama. This is the last role provided, and after seeing this, Llama will generate output.


You can read more about this in the [Llama 3 prompt format documentation](https://llama.meta.com/docs/model-cards-and-prompt-formats/meta-llama-3/)

In [None]:
# A function that will use the Llama 3 prompt format and zero-shot
# learning to extract medications from the provided text.
def get_llama3_meds_response(text):

  # Create a template using the Llama 3 prompting format
  template = """<|begin_of_text|><|start_header_id|>system<|end_header_id|>

{system_prompt}<|eot_id|>
<|start_header_id|>user<|end_header_id|>

{user_prompt}<|eot_id|><|start_header_id|>assistant<|end_header_id|>"""

  system_prompt = """You are an AI assistant that extracts medications and their doses from health record text.
  When you are given a piece of text, you will list all of the medications that are in the text."""

  prompt = PromptTemplate(
      input_variables=["system_prompt", "user_prompt"],
      template=template
  )

  # Invoke the model
  response = llm(prompt.format(system_prompt=system_prompt, user_prompt=text))
  chain = prompt | llm
  response = chain.invoke({"system_prompt":system_prompt, "user_prompt":text})
  return response

In [None]:
# Now let's try the prompt with our example document
extraction_response = get_llama3_meds_response(clinical_document)
print("Response:\n", extraction_response)

Try changing your prompt in different ways, to get:

* A list with one item for each time a medication is mentioned, instead of one item for each uique medication
* A list with both medications and doses?
* A response without any leading or trailing commentary?
* Try to get Llama to structure the output, so that medications and doses are marked in different ways (e.g. put brackets around the doses)
* Can you get the output as JSON?

## Medications extraction with few-shot learning

Let's try to improve our medications extraction by providing some examples. We will first use a basic, unformatted prompt.

In [None]:
# Few-shot examples for medication extraction. Each example is a dictionary with a piece of text, and
# then what we would expect the model to generate for this example.
med_examples = [
    {
        "text": "The patient was prescribed Metformin 500 mg twice daily for diabetes. After two months, the dose was increased to 1000 mg twice daily.",
        "extracted_medications": "[Metformin 500 mg twice daily], [Metformin 1000 mg twice daily]"
    },
    {
        "text": "Initially, Atorvastatin 20 mg was prescribed. During the first follow-up, the dosage was increased to 40 mg. Later, Ezetimibe 10 mg was added.",
        "extracted_medications": "[Atorvastatin 20 mg], [Atorvastatin 40 mg], [Ezetimibe 10 mg]"
    }
]

In [None]:
# A function that defines our few-shot prompt from some examples
# and then runs the model with that prompt over the provided text.
def few_shot_medication_extraction(text, examples):

    # Create the prompt using list comprehension in a format string (!)
    example_prompts = "\n\n".join([f"Text: {ex['text']}\nExtracted Medications: {ex['extracted_medications']}" for ex in med_examples])

    # Add the examples in to our full prompt
    prompt = f"Use the following examples to guide you on how to extract medications and their doses from the given clinical document.\n\nExamples:\n{example_prompts}\n\nClinical document: {text}\nExtracted Medications:"

    # Invoke the model with the prompt - we are not using chaining here, just plain invocation
    response = llm(prompt)
    return response

In [None]:
# Now let's run the function on our clinical document
few_shot_extraction_result = few_shot_medication_extraction(clinical_document, med_examples)
print("Response:\n", few_shot_extraction_result)

## Few shot prompting with Llama's prompt format

The results in the last exercise were quite impressive. Can we improve on them with Llama's prompt format. Let's have a go.

In [None]:
# Function using Llama 3 prompt format to extract medications
#m from the provided text, using the provoded examples
def get_llama3_few_shot_meds_response(text, ecamples):

  # Create a template using the Llama 3 prompting format
  template = """<|begin_of_text|><|start_header_id|>system<|end_header_id|>

{system_prompt}<|eot_id|>
<|start_header_id|>user<|end_header_id|>

{user_prompt}<|eot_id|><|start_header_id|>assistant<|end_header_id|>"""

  # The system prompt - how the LLM should respond
  system_prompt = """You are an AI assistant that extracts medications and their doses from health record text.
  When you are given a piece of text, you will list all of the medications that are in the text.
  Here are some examples, showing how you should format the list:\n\n"""

  # Few shot examples
  example_prompts = "\n\n".join([f"Text: {ex['text']}\nExtracted Medications: {ex['extracted_medications']}" for ex in med_examples])

  # Make the full prompt
  system_prompt = system_prompt + example_prompts
  prompt = PromptTemplate(
      input_variables=["system_prompt", "user_prompt"],
      template=template
  )

  # Invoke the model
  response = llm(prompt.format(system_prompt=system_prompt, user_prompt=text))
  chain = prompt | llm
  response = chain.invoke({"system_prompt":system_prompt, "user_prompt":text})
  return response

In [None]:
# Few-shot medication extraction with our Llama prompt format
few_shot_extraction_result = get_llama3_few_shot_meds_response(clinical_document, med_examples)
print("Response:\n", few_shot_extraction_result)

## Elaborating the prompt
Can you elaborate the prompt further, to separate out the drug, does and frequency into separate features in the output that might be easily parsed in to e.g, a Python dictionary?

## Exercise: identifying and annotating symptoms

Write and test prompts to mark mentions of symptoms in an input text

In [None]:
# An input document to annotate document
clinical_document = """
The patient was admitted with severe headache, nausea, and dizziness.
They also reported experiencing fatigue and joint pain.
Additionally, there was mention of occasional shortness of breath and a persistent cough.
"""

In [None]:
# Some examples of how we would like the output to be formatted
examples = [
    {
        "text": "The patient was given acetaminophen and complained of headaches and fatigue.",
        "annotated": "The patient was given acetaminophen and complained of [headaches] and [fatigue]."
    },
    {
        "text": "She was treated with amoxicillin and experienced a rash and itching.",
        "annotated": "She was treated with amoxicillin and experienced a [rash] and [itching]."
    }
]

# Exercise: extending the symptom extractor to include character offsets
Can you get your symptom annotation example to tell you the start and finish character offsets of the symptoms, like in these examples?



In [None]:
examples = [
    {
        "text": "The patient was given acetaminophen and complained of headaches and fatigue.",
        "annotations": "[symptom: headaches, start_offset: 54, end_offset: 63], [symptom: fatigue, start_offset: 68, end_offset: 75]"

    },
    {
        "text": "She was treated with amoxicillin and experienced a rash and itching.",
        "annotations": "[symptom: rash, start_offset: 51, end_offset: 55], [symptom: itching, start_offset: 60, end_offset: 67]"
    }
]

## The assignment

Can you develop prompts to provide an answer for the assignment?

You will probably have to use a free online LLM, such as ChatGPT, a free Hugging Face or [Perplexity online model](https://www.perplexity.ai/hub/blog/introducing-pplx-online-llms?utm_source=labs&utm_medium=labs&utm_campaign=online-llms). This means you are unlikley to be able to evaluate agaist a large dataset. But, you might be able to show some results with minimal training, and you might be able to do a very small evaluation - a limitation. What other limitations might there be? Will the test data be blind?

## Next steps
This notebook has shown you some of the basics of prompt engineering. Recent models and APIs are much more sophisticated than the exmaples we have given, and often provide facilities for more detailed interaction. There is much more to learn about if you are interested:

* output parsing
* tool and function calls
* direct support for e.g. JSON
* different capabilities of different models