In [None]:
%%capture
!pip install -Uqqq pip --progress-bar off
!pip install -qqq torch==2.0.1 --progress-bar off
!pip install -qqq transformers==4.33.2 --progress-bar off
!pip install -qqq langchain==0.0.299 --progress-bar off
!pip install -qqq chromadb==0.4.10 --progress-bar off
!pip install -qqq xformers==0.0.21 --progress-bar off
!pip install -qqq sentence_transformers==2.2.2 --progress-bar off
!pip install -qqq tokenizers==0.14.0 --progress-bar off
!pip install -qqq optimum==1.13.1 --progress-bar off
!pip install -qqq auto-gptq==0.4.2 --extra-index-url https://huggingface.github.io/autogptq-index/whl/cu118/ --progress-bar off
!pip install -qqq unstructured==0.10.16 --progress-bar off

In [None]:
!pip install langchain pypdf --q

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/277.6 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m81.9/277.6 kB[0m [31m2.7 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m277.6/277.6 kB[0m [31m4.5 MB/s[0m eta [36m0:00:00[0m
[0m

Using Llama 2 is as easy as using any other HuggingFace model. We'll be using the HuggingFacePipeline wrapper (from LangChain) to make it even easier to use. To load the 13B version of the model, we'll use a GPTQ (Generative Pre-trained Transformer Quantization) version of the model:

In [None]:
# We import the relevant liberies
import torch
from langchain import HuggingFacePipeline
from transformers import AutoModelForCausalLM, AutoTokenizer, GenerationConfig, pipeline
##########################################################################################

# We set the model name, create a tokenizer, and load the pre-trained language model for text generation.
MODEL_NAME = "TheBloke/Llama-2-7B-Chat-GPTQ"

tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME, use_fast=True)

model = AutoModelForCausalLM.from_pretrained(
    MODEL_NAME, torch_dtype=torch.float16, trust_remote_code=True, device_map="auto"
)
##########################################################################################

# Configure the text generation settings using GenerationConfig.
#These settings include the maximum number of new tokens in the generated text, temperature (controls randomness), top-p sampling, text sampling, and repetition penalty.

# Create a configuration for text generation based on the specified model name
generation_config = GenerationConfig.from_pretrained(MODEL_NAME)
# Set the maximum number of new tokens in the generated text to 1024.
# This limits the length of the generated output to 1024 tokens.
generation_config.max_new_tokens = 1024

# Set the temperature for text generation. Lower values (e.g., 0.0001) make output more deterministic, following likely predictions.
# Higher values make the output more random.
generation_config.temperature = 0.0001

# Set the top-p sampling value. A value of 0.95 means focusing on the most likely words that make up 95% of the probability distribution.
generation_config.top_p = 0.95

# Enable text sampling. When set to True, the model randomly selects words based on their probabilities, introducing randomness.
generation_config.do_sample = True

# Set the repetition penalty. A value of 1.15 discourages the model from repeating the same words or phrases too frequently in the output.
generation_config.repetition_penalty = 1.15

##########################################################################################

# We create a text generation pipeline using the configured model, tokenizer, and generation configuration.

text_pipeline = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    generation_config=generation_config,
)

##########################################################################################
# We create a LangChain pipeline that wraps the Hugging Face text generation pipeline.
# This LangChain pipeline is configured with a specific temperature setting (temperature: 0), which affects the randomness of the generated text.

llm = HuggingFacePipeline(pipeline=text_pipeline, model_kwargs={"temperature": 0})


To work with external files, LangChain provides data loaders that can be used to load documents from various sources. Combining LLMs with external data is generally referred to as Retrieval Augmented Generation (RAG).

Let's see how we can use the UnstructuredMarkdownLoader to load a document from a Markdown file:

In [None]:
# We import relevant packages
from langchain.document_loaders import UnstructuredMarkdownLoader
from langchain.document_loaders import PyPDFLoader

# We create an instance of the PyPDFLoader by providing the URL of the PDF document we use.
# This loader used is specialized for extracting information from PDF files.
loader = PyPDFLoader("https://www.tlv.se/download/18.12c69789187230f29b822802/1680069871440/report_international_price_comparison_2022_130-2023.pdf")

# We load the document PDF
docs = loader.load()
# And extrack the lenght / number of pages in the document.
len(docs)

72

The Markdown file5 we're loading is a paper on International price comparison from 2022. The paper was published by Tandvårds- och Läkemedelsförmånsverket, which is the swedish medical and dental institute.

Let's see how we can use the RecursiveCharacterTextSplitter to split the document into smaller chunks:

In [None]:
# We import relevant packages
from langchain.text_splitter import RecursiveCharacterTextSplitter

# Create an instance of RecursiveCharacterTextSplitter with specific configuration parameters.
# In this case, it is configured to split text into chunks of 1024 characters with an overlap of 64 characters between consecutive chunks.
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1024, chunk_overlap=64)

# We apply the text splitter to split the input document into smaller chunks, as specified in the above line of code.
# The split_documents method takes the list of documents and returns a list of text chunks.
texts = text_splitter.split_documents(docs)

# We get the lenght of the provided list of text chunks.
len(texts)

185

Splitting the document into chunks is required due to the limited number of tokens a LLM can look at once (4096 for Llama 2). Next, we'll use the HuggingFaceEmbeddings class to create embeddings for the chunks:

In [None]:
# We import the relevant packages
from langchain.embeddings import HuggingFaceEmbeddings
# Create an instance of HuggingFaceEmbeddings with specific configuration parameters.

  # * model_name: Specifies the name of the pre-trained Hugging Face model to be used for embeddings (in this case, "thenlper/gte-large").

  # * model_kwargs: Specifies additional keyword arguments for configuring the underlying model. Here, it sets the device to "cuda" for GPU acceleration.

  # * encode_kwargs: Specifies additional keyword arguments for the embedding process. Here, it requests normalized embeddings.

embeddings = HuggingFaceEmbeddings(
    model_name="thenlper/gte-large",
    model_kwargs={"device": "cuda"},
    encode_kwargs={"normalize_embeddings": True},
)

# Use the embed_query method of the embeddings instance to embed the content of the first text chunk (texts[0].page_content).
# query_result, will be a representation of the text content in the form of embeddings.
query_result = embeddings.embed_query(texts[0].page_content)

# And we print the lenght of it.
print(len(query_result))

In the spirit of using free tools, we're also using free embeddings hosted by HuggingFace. We'll use Chroma database to store/cache the embeddings and make it easy to search them:

To combine the LLM with the database, we'll use the RetrievalQA chain:

In [None]:
# We import relevant packages
from langchain.vectorstores import Chroma

# Create an instance of the Chroma vector store using the from_documents method. This method takes three parameters:

  # * texts: The list of text chunks that you want to store in the vector store.

  # * embeddings: The embeddings associated with the text chunks.

  # * persist_directory: The directory where the vector store will be stored for future use or persistence.

db = Chroma.from_documents(texts, embeddings, persist_directory="db")

# Use the similarity_search method of the db instance to perform a similarity search. This method takes two parameters:

  # * "falling prices": The query string for which similarity search is performed.

  # * k=2: The number of nearest neighbors (results) to retrieve.

  # * The results of the similarity search are stored in the results variable.

results = db.similarity_search("falling prices", k=2)

# We print Print the content of the first result from the similarity search.
# results[0] represents the most similar item found based on the query, and .page_content retrieves the content associated with that result.
print(results[0].page_content)

the downward trend came to a halt, with relat ive prices remaining 11.2 percent 
below the average. Using a fixed exchange rate, Sweden's relative prices fell by 
around ten percentage points up to Q1 2021, and then stabilised at a relative price of 
2.1 percent above the average. The change over time is  largely driven by the 
exchange rate change.21 However, from 2014 to 2016, the relative price decrease was 
mainly driven by reassessments and the introduction of rule -based price reductions 
for pharmaceuticals 15 years and older. This change can be seen in  Figure 13. As 
previously mentioned, Sweden's relative prices would likely rise if the Swedish 
krona regained its value. The following section provides a more detailed description 
of the reasons behind this price development.  
 
21 A version of the figure with fixed exchange rate is found in Section 1.1 of Appendix 1.


In [None]:
# We (again) import the relevant packages
from langchain.chains import RetrievalQA
from langchain import PromptTemplate

# Create a prompt template using the PromptTemplate class.
# The template includes placeholders {context} and {question} for dynamic input.
# The template also provides some instructions and context for answering the question.
template = """
<s>[INST] <<SYS>>
Act as a medical expert. Use the following information to answer the question at the end.
<</SYS>>

{context}

{question} [/INST]
"""
# Create an instance of PromptTemplate using the defined template.
# We specify the input variables as "context" and "question."
prompt = PromptTemplate(template=template, input_variables=["context", "question"])

#Create an instance of RetrievalQA using the from_chain_type method. This method initializes the QA chain with the following parameters:

  # * llm: The language model (llm) to be used in the QA process. As spcified in the block of code in the beginning og the notebook.

  # * chain_type: The type of QA chain, specified as "stuff."

  # * retriever: The retriever to be used for document retrieval (using the previously created Chroma vector store).

  # * return_source_documents: Boolean indicating whether to return the source documents in the result.

  # * chain_type_kwargs: Additional keyword arguments specific to the QA chain, including the defined prompt.
qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=db.as_retriever(search_kwargs={"k": 2}),
    return_source_documents=True,
    chain_type_kwargs={"prompt": prompt},
)

# Use the qa_chain to perform question-answering.
# The input is a question about medicine prices in different countries.
# The result contains the answer generated by the QA system.
result = qa_chain(
    "Explain how some of the different countries compare when it comes to medicine prices."
)

# We print the results
print(result["result"].strip())

As a medical expert, I can provide you with an analysis of how different countries compare when it comes to medicine prices based on the provided information.
Firstly, it is important to note that the global pharmaceutical market is highly consolidated, with a few major players holding significant market share. This means that pricing strategies and policies can vary greatly between countries, depending on their economic and regulatory frameworks.
In terms of medicine prices, the data suggests that North America dominates the pharmaceutical market, accounting for 49.1% of total sales in the global market. This is likely due to several factors, including the presence of large pharmaceutical companies in the region, a well-developed healthcare system, and a relatively high average income. In contrast, Europe accounts for 23.4% of global pharmaceutical sales, while Africa, Asia (excluding Japan and China), and Australia combine for 8.4%. China accounts for 9.4%, Japan for 6.1%, and Latin 

This will pass our prompt to the LLM along with the top 2 results from the database. The LLM will then use the prompt to generate an answer. The answer will be returned along with the source documents. Let's try another prompt:

In [None]:
# We (again again) import relevant packages
from textwrap import fill

# Use the qa_chain to perform question-answering with a new question.
# The input is a question related to summarizing how countries determine the costs of medicine.
# The result contains the generated answer by the QA system.
result = qa_chain(
    "Summarize the different ways countries decide on costs of medicin"
)
# Print the generated answer after applying text wrapping.
# The fill function from the textwrap module is used to format the text, ensuring that lines do not exceed a specified width of 80 characters.
print(fill(result["result"].strip(), width=80))

As a medical expert, I can provide you with an analysis of the different ways
countries decide on the costs of medicines based on the provided text. Here are
some key points: 1. Price comparison: Countries compare the prices of
pharmaceuticals across different markets to determine the relative cost of
drugs. However, there are challenges in grouping pharmaceuticals according to
therapy groups, and treatment traditions may vary between countries. 2.
Exclusion of low-volume pharmaceuticals: Countries exclude pharmaceuticals with
very low volumes in their domestic market to prevent disproportionate weight in
the price comparison. 3. Volume data: Sales volumes for the previous 12 months
up to March 2022 are used in calculating bilateral indices. 4. Transparency in
list prices: Different countries have varying levels of transparency in list
prices, including whether or not discount systems are institutionalized and
reflected in list prices. 5. Global market share: Globally, North America
do

In this assignment we've implemented a NLP pipeline using the LangChain library and Hugging Face Transformers. Initially, we set up a Question-Answering (QA) system employing a RetrievalQA chain with a predefined template for medical expertise using a document, that explains medical pricing mechanisms in a range of countries. This system utilized a language model (llm) and a document vector store (Chroma) for efficient retrieval and processing. We then demonstrated the system's capability by answering questions related to medicine prices.

Additionally, we integrated text splitting to handle large documents efficiently and embedded the text content using a pre-trained language model.

Furthermore, we performed similarity search to find relevant information based on user queries. Finally, we summarized the results and presented them with proper text wrapping for readability. The overall system allows for querying, retrieval, and summarization of information.