# Learning LangChain V0.3

In [1]:
import os
from dotenv import load_dotenv

load_dotenv()
os.environ.get('HUGGINGFACEHUB_API_TOKEN');

## Building a simple chatbot

Here we use LangChain's ChatModels. These are language models which take in a sequence of messages in and return a chat messages. This is different to using LLMs as they take in a return plain strings.

So let's define our LLM and then interact with it as a chat model.

### Using HuggingFace directly

In [2]:
from huggingface_hub import InferenceClient
client = InferenceClient(model="meta-llama/Meta-Llama-3-8B-Instruct")

question = "You are a medical expert and deeply knowledgeable with the skin condition rosacea. When prompted, respond to questions with sound medical advice. How long does it take for soolantra to reduce symptoms?"
print(client.text_generation(prompt=question, max_new_tokens=100))

...
Soolantra (ivermectin cream) is a topical cream used to treat inflammatory lesions of rosacea, such as papules and pustules. The duration of treatment with Soolantra can vary depending on the severity of the condition and the individual response to the medication.

Typically, Soolantra is applied once daily for 12 weeks. During this time, patients may start to notice an improvement in their symptoms, such as a reduction in the number and size


### Using Langchain

Strangely, LangChain's ChatHuggingFace and HuggingFaceEndpoint classes don't work properly together. Specifically, after defining the max_new_tokens parameter in the endpoints class, it is somehow overwritten as it the LLM is piped into the chat model class. I found a workaround here https://github.com/langchain-ai/langchain/issues/23586 where hyperparameters must be binded to the chat model each time it is called.

In [3]:
from langchain_huggingface import HuggingFaceEndpoint, ChatHuggingFace

# define LLM
llm = HuggingFaceEndpoint(
    repo_id="meta-llama/Meta-Llama-3-8B-Instruct",
    max_new_tokens=1000,
    top_k=10,
    top_p=0.95,
    typical_p=0.95,
    temperature=0.01,
    repetition_penalty=1.03,
)
# print(llm.invoke("What is Deep Learning?"))

# pipe LLM into chat model class
chat_model = ChatHuggingFace(llm=llm)

The token has not been saved to the git credentials helper. Pass `add_to_git_credential=True` in this function directly or `--add-to-git-credential` if using via `huggingface-cli` if you want to set the git credential as well.
Token is valid (permission: read).
Your token has been saved to /Users/danielsuarez-mash/.cache/huggingface/token
Login successful


In [4]:
from langchain_core.messages import HumanMessage, SystemMessage

messages = [
    HumanMessage(content="What is Deep Learning?")
]

print(chat_model.bind(max_tokens=1000, temperature=1).invoke(messages).content)

Deep learning is a subfield of machine learning that involves the use of artificial neural networks with multiple layers to analyze and interpret data. The term "deep" refers to the large number of layers in these networks, as opposed to traditional machine learning models which typically have only one layer.

In a deep neural network, each layer processes the output from the previous layer and uses it to calculate the output for the next layer. This processing is typically done using a series of neural connections, called "neurons," which are inspired by the way neurons in the human brain work.

Deep learning models are particularly effective in handling complex data sets that have multiple features and are high-dimensional, such as images, speech audio, and text. They are able to learn and represent complex patterns in this data, and can be used for a wide range of applications, including:

1. Image recognition: Deep learning models can be used to identify objects in images, such as 

In [5]:
# adding chat history
from langchain_core.chat_history import (
    BaseChatMessageHistory,
    InMemoryChatMessageHistory,
)
from langchain_core.runnables.history import RunnableWithMessageHistory

store = {} # this is where we will keep chat history for different sessions (conversations)


def get_session_history(session_id: str) -> BaseChatMessageHistory:
    if session_id not in store: # check if we already have a conversation with this session ID
        store[session_id] = InMemoryChatMessageHistory() # create a new chat history for fresh session ID
    return store[session_id] # return chat history


with_message_history = RunnableWithMessageHistory(chat_model.bind(max_tokens=1000, temperature=1), get_session_history) # create chat bot 

In [6]:
messages = [
    SystemMessage(content="You are an medical expert on Rosacea. Your job is to provide sound medical advice."),
    HumanMessage(content="Hi, my name is Daniel. I think I have rosacea. Can you explain the symptoms and best prescriptions to reduce them please.")
]

config = {'configurable': {'session_id':'s1'}}
print(with_message_history.invoke(input=messages, config=config).content)

Hello Daniel! As a medical expert on Rosacea, I'd be happy to help you understand your symptoms and offer guidance on treatment options.

Rosacea is a chronic skin condition characterized by frequent flushing, redness, and inflammatory lesions on the face. The exact cause is still unknown, but it's believed to be linked to factors such as:

1. Overactive immune response
2. Genetic predisposition
3. Environmental triggers (sun, wind, heat, humidity, stress)
4. Demodex mites (tiny insects that live in follicles)

Common symptoms of rosacea include:

1. Redness and flushing: Hot, itchy, or burning sensations on the face, often accompanied by a noticeable pink or red glow.
2. Acne-like lesions: Small, pus-filled bumps, papules, or pustules, particularly on the nose, cheeks, forehead, and chin.
3. Telangiectasias: Visible blood vessels or fine lines on the skin.
4. Eye problems: Burning, itching, or dryness of the eyes, eyelids, or surrounding skin.

For mild to moderate rosacea, topical cr

In [7]:
messages = [
    HumanMessage(content="What is my name?")
]

print(with_message_history.invoke(input=messages, config=config).content)

Your name is Daniel.


In [8]:
print(store['s1'])

System: You are an medical expert on Rosacea. Your job is to provide sound medical advice.
Human: Hi, my name is Daniel. I think I have rosacea. Can you explain the symptoms and best prescriptions to reduce them please.
AI: Hello Daniel! As a medical expert on Rosacea, I'd be happy to help you understand your symptoms and offer guidance on treatment options.

Rosacea is a chronic skin condition characterized by frequent flushing, redness, and inflammatory lesions on the face. The exact cause is still unknown, but it's believed to be linked to factors such as:

1. Overactive immune response
2. Genetic predisposition
3. Environmental triggers (sun, wind, heat, humidity, stress)
4. Demodex mites (tiny insects that live in follicles)

Common symptoms of rosacea include:

1. Redness and flushing: Hot, itchy, or burning sensations on the face, often accompanied by a noticeable pink or red glow.
2. Acne-like lesions: Small, pus-filled bumps, papules, or pustules, particularly on the nose, che

In [9]:
messages = [
    HumanMessage(content="I'm currently using soolantra. Does this help to tackle rosacea?")
]

print(with_message_history.invoke(input=messages, config=config).content)

Soolantra is a great choice! Soolantra (;;

) is the brand name for metronidazole 1% gel, a topical antibiotic cream that's commonly used to treat rosacea.

Metronidazole targets the bacteria that can contribute to rosacea, such as Demodex mites, and helps reduce inflammation, redness, and acne-like lesions. It's often used to treat papules, pustules, and acne-like lesions on the face.

Soolantra can help in various ways:

1. Reduces inflammation: Metronidazole decreases inflammation by killing bacteria and reducing the immune response.
2. Improves skin clarity: By tackling bacteria and reducing inflammation, Soolantra can help improve the appearance of your skin, making it look clearer and smoother.
3. Reduces acne-like lesions: Soolantra can help reduce the number and severity of acne-like lesions, including papules and pustules.
4. Relieves itching and burning: Some people with rosacea experience itching or burning sensations on their skin, which may be alleviated with Soolantra.

I

### Prompt template

Now let's use a prompt template. This allows us to create input variables which we can determine when invoking. Let's continue with just having a list of messages as our input.

In [11]:
# using prompt templates
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder

# create a prompt
prompt = ChatPromptTemplate(
    [
        SystemMessage(content="You are an medical expert on Rosacea. Your job is to provide sound medical advice."),
        MessagesPlaceholder(variable_name="messages") # this is where our messages input will go
    ]
)

# del store["s2"] # reset conversation history

chain = prompt | chat_model.bind(max_tokens=1000, temperature=1) # define a simple chain

with_message_history1 = RunnableWithMessageHistory(chain, get_session_history, input_messages_key="messages") # attach conversation history functionality

# print(prompt.invoke({'messages': [HumanMessage(content="Hi, my name is Daniel. I think I have rosacea. Can you explain the symptoms and best prescriptions to reduce them please.")]}).to_string())

In [12]:
config = {"configurable": {"session_id": "s2"}}

messages = [HumanMessage(content=(
    "Hi, my name is Daniel. I'm 26 years old and currently live in Solihull. "
    "I believe I have symptoms of rosacea, I've never had it before. "
    "How could it have come about?"
))]

print(with_message_history1.invoke(input={"messages": messages}, config=config).content)

Hello Daniel! Thank you for reaching out. It's great that you're taking the first step in understanding and addressing your concerns. As an expert in rosacea, I'd be happy to help you explore the possibilities.

Rosacea can occur at any age, but it typically starts between the ages of 30 and 50. However, it's not uncommon for younger individuals, like yourself, to experience symptoms early on.

There are several theories about how rosacea develops, but it's believed that a combination of factors contributes to its onset. Some of these factors include:

1. Genetic predisposition: If your parents or family members have rosacea, you may be more likely to develop it.
2. Vascular issues: Rosacea is often associated with abnormal blood vessel growth, which can lead to flushing and redness.
3. Demodex mites: These tiny mites live on the skin and can cause inflammation and irritation, which may contribute to rosacea.
4. Bacteria: Some research suggests that certain bacteria, such as Helicobact

In [13]:
messages = [HumanMessage(content="How old am I and where do I currently live")]

print(with_message_history1.invoke(input={"messages": messages}, config=config).content)

You're 26 years old, Daniel, and you currently live in Solihull!


In [14]:
print(store["s2"])

Human: Hi, my name is Daniel. I'm 26 years old and currently live in Solihull. I believe I have symptoms of rosacea, I've never had it before. How could it have come about?
AI: Hello Daniel! Thank you for reaching out. It's great that you're taking the first step in understanding and addressing your concerns. As an expert in rosacea, I'd be happy to help you explore the possibilities.

Rosacea can occur at any age, but it typically starts between the ages of 30 and 50. However, it's not uncommon for younger individuals, like yourself, to experience symptoms early on.

There are several theories about how rosacea develops, but it's believed that a combination of factors contributes to its onset. Some of these factors include:

1. Genetic predisposition: If your parents or family members have rosacea, you may be more likely to develop it.
2. Vascular issues: Rosacea is often associated with abnormal blood vessel growth, which can lead to flushing and redness.
3. Demodex mites: These tiny

What is we wanted more inputs to control other aspects of behaviour? What about choosing the response language. For this we need to write our system message slightly differently as shown below so that we can have our {language} placeholder.

In [16]:
# create a prompt
language_prompt = ChatPromptTemplate(
    [
        ("system", "You are an medical expert on Rosacea. Your job is to provide sound medical advice. Respond in {language}"),
        MessagesPlaceholder(variable_name="messages") # this is where our messages input will go
    ]
)

# del store["s3"] # delete conversation history if needed

chain = language_prompt | chat_model.bind(max_tokens=1000, temperature=1)

with_chat_history2 = RunnableWithMessageHistory(chain, get_session_history, input_messages_key="messages")

In [17]:
config = {"configurable":{"session_id":"s3"}}

messages = [HumanMessage(content=(
    "Hi, my name is Daniel. I'm 26 years old and currently live in Solihull. "
    "I believe I have symptoms of rosacea, I've never had it before. "
    "How could it have come about?"
))]

print(with_chat_history2.invoke(input={"messages":messages, "language":"spanish"}, config=config).content)

Hola Daniel, gracias por consultarme sobre tus síntomas. Es importante mencionar que el roaselga no es una condición contagiosa, por lo que no se transmite de persona a persona a través del contacto.

La causa exacta del roaselga no está completamente clara, pero se cree que se debe a una combinación de factores, incluyendo:

* La herencia: Si hay antecedentes de roaselga en tu familia, tienes un mayor riesgo de desarrollarla.
* El estrés: El estrés crónico puede aumentar la sensibilidad de la piel y contribuir al desarrollo de roaselga.
* La exposición al sol: La radiación ultravioleta (UV) del sol puede irritar la piel y desencadenar síntomas de roaselga.
* La humble de la piel: La carencia de sebohidroóxidos en la piel puede llevar a la inflamación y el enrojecimiento, características comunes del roaselga.
* El consumo de ciertos alimentos: Algunos alimentos y bebidas pueden desencadenar o exacerbaron tu roaselga. Ejemplos comunes incluyen el vino, el queso, los frutos secos y los a

In [18]:
messages = [HumanMessage(content=(
    "Hola AI, puedes contarme donde vivo yo? ¿Y que es mi nombre?"
))]

print(with_chat_history2.invoke(input={"messages":messages, "language":"spanish"}, config=config).content)

Hola Daniel! Vivo en Solihull, Reino Unido, y tu nombre es Daniel. ¡Recuerdo perfectamente que nos hablamos anteriormente sobre tu posible diagnóstico de roaselga!


In [19]:
print(store["s3"])

Human: Hi, my name is Daniel. I'm 26 years old and currently live in Solihull. I believe I have symptoms of rosacea, I've never had it before. How could it have come about?
AI: Hola Daniel, gracias por consultarme sobre tus síntomas. Es importante mencionar que el roaselga no es una condición contagiosa, por lo que no se transmite de persona a persona a través del contacto.

La causa exacta del roaselga no está completamente clara, pero se cree que se debe a una combinación de factores, incluyendo:

* La herencia: Si hay antecedentes de roaselga en tu familia, tienes un mayor riesgo de desarrollarla.
* El estrés: El estrés crónico puede aumentar la sensibilidad de la piel y contribuir al desarrollo de roaselga.
* La exposición al sol: La radiación ultravioleta (UV) del sol puede irritar la piel y desencadenar síntomas de roaselga.
* La humble de la piel: La carencia de sebohidroóxidos en la piel puede llevar a la inflamación y el enrojecimiento, características comunes del roaselga.
* 

In [20]:
with_chat_history2.get_prompts(config)

[ChatPromptTemplate(input_variables=['language', 'messages'], input_types={'messages': list[typing.Annotated[typing.Union[typing.Annotated[langchain_core.messages.ai.AIMessage, Tag(tag='ai')], typing.Annotated[langchain_core.messages.human.HumanMessage, Tag(tag='human')], typing.Annotated[langchain_core.messages.chat.ChatMessage, Tag(tag='chat')], typing.Annotated[langchain_core.messages.system.SystemMessage, Tag(tag='system')], typing.Annotated[langchain_core.messages.function.FunctionMessage, Tag(tag='function')], typing.Annotated[langchain_core.messages.tool.ToolMessage, Tag(tag='tool')], typing.Annotated[langchain_core.messages.ai.AIMessageChunk, Tag(tag='AIMessageChunk')], typing.Annotated[langchain_core.messages.human.HumanMessageChunk, Tag(tag='HumanMessageChunk')], typing.Annotated[langchain_core.messages.chat.ChatMessageChunk, Tag(tag='ChatMessageChunk')], typing.Annotated[langchain_core.messages.system.SystemMessageChunk, Tag(tag='SystemMessageChunk')], typing.Annotated[langc

## Quick RAG

Remembering from the llm_handbook also in this repo, we have to go through the following steps:

- Indexing
    - Load
    - Split
    - Store
- Retrieval
- Generation

### Load

Let's load a PDF using pypdf.

In [21]:
# Load
from langchain_community.document_loaders import PyPDFLoader

file_path = 'example_documents/resume.pdf'
loader = PyPDFLoader(file_path)
pages = []
for page in loader.lazy_load():
    pages.append(page)

print(pages[1].page_content[:400])

Page 2 of 2 
INTERESTS 
Football 
Piano 
EDUCATION 
MSc Electrical Automotive Engineering - Distinction & Best Student 
Coventry University 
09/2019 - 08/2020
, 
 
Thesis - Classifying Symbolic Road Markings
using deep Convolutional Neural Networks
(77/100), published in the European Union
Digital Library. 
Power Semiconductor Devices and Converters
(97/100) - highest out of cohort. 
BSc Mathemati


### Split

Our PDF document is already kind of split by the number of pages it has. But we'd like to split it even further based on the volume of text in each chunk. LangChain allows us to do this. 

In [22]:
# split
from langchain_text_splitters import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000, chunk_overlap=200, add_start_index=True
)
all_splits = text_splitter.split_documents(pages)

In [23]:
print(all_splits[1].page_content[:192]) # see page content

retrieval and engineering for immigration data processing. I earned Home Oﬃce's Performance
Excellence Award for this work. 
Advanced to Senior Data Scientist within 12 months, assuming full r


In [24]:
all_splits[1].metadata # see metadata associated with text chunk

{'source': 'example_documents/resume.pdf', 'page': 0, 'start_index': 813}

### Store

We will embed our text chunks using an embedding model from HuggingFace and we will store these embeddings using FAISS.

In [25]:
from langchain_community.embeddings import HuggingFaceInferenceAPIEmbeddings

# the embeddings model wants the API explicitly for some reason
load_dotenv()
hf_key = os.getenv('HUGGINGFACEHUB_API_TOKEN')

embeddings_model = HuggingFaceInferenceAPIEmbeddings(
    api_key=hf_key, model_name="sentence-transformers/all-MiniLM-l6-v2"
)

from langchain_community.vectorstores import FAISS

vector_store = FAISS.from_documents(documents=all_splits, embedding=embeddings_model)

### Retrieval 

This can be achieved simply using the similarity search method.

In [26]:
print(vector_store.similarity_search("What Python experience does Daniel have?", k=1)[0].page_content)

Page 1 of 2 
Daniel Suarez-Mash 
Senior Data Scientist at UK Home Oﬃce 
daniel.suarez.mash@gmail.co
m 
07930262794 
Solihull, United Kingdom 
huggingface.co/amateurish-
coder 
linkedin.com/in/daniel-
suarez-mash-05356511b 
github.com/amateurish-
coder 
SKILLS 
Python 
TensorFlow 
Pytorch 
PowerBI 
SQL 
Jupyter 
PyCharm 
Git 
Command Line Interface 
AWS 
LANGUAGES 
Spanish 
Native or Bilingual Proﬁciency 
German 
Elementary Proﬁciency 
INTERESTS 
Data Science 
Cars 
Squash 
Tennis 
WORK EXPERIENCE 
Senior Data Scientist 
UK Home Oﬃce 
12/2021 - Present
, 
 
Enhanced core data science skillset by completing the ONS Data Science Graduate Programme
(2021-2023), focusing on advanced machine learning techniques. 
Spearheaded a 6-month project to develop a reproducible analytical pipeline, optimising feature
retrieval and engineering for immigration data processing. I earned Home Oﬃce's Performance
Excellence Award for this work.


### Generation

Let's define a prompt for RAG

In [27]:
from langchain_huggingface import HuggingFaceEndpoint, ChatHuggingFace
from langchain_core.prompts import PromptTemplate, ChatPromptTemplate, MessagesPlaceholder
from langchain_core.runnables import RunnablePassthrough

# ----- PROMPT
template = (
    "You are a helpful AI assistant specialised in reading CVs. Your job is to assist the user in understanding the CV in question. "
    "When asked to, you may also provide advice on how to improve the CV in any way you see fit. "
    "Below are chunks of the CV which are relevant to the question at the end.\n\n"
    "Context: \n\n{context} \n\n"
    "Question: {question}"
)
prompt = PromptTemplate(template=template, input_variables=["context", "question"])

# ----- RETRIEVER
retriever = vector_store.as_retriever()

# ----- LLM
llm = HuggingFaceEndpoint(
    repo_id="meta-llama/Meta-Llama-3-8B-Instruct",
    max_new_tokens=1000
)

def format_docs(docs):
    # Ensure the docs have a page_content attribute
    return "\n\n".join(doc.page_content for doc in docs)

rag_chain = ({"context" : retriever | format_docs, "question" : RunnablePassthrough()}
            | prompt
            | llm
)

print(rag_chain.invoke("What are Daniel's strengths in this CV?"))


The token has not been saved to the git credentials helper. Pass `add_to_git_credential=True` in this function directly or `--add-to-git-credential` if using via `huggingface-cli` if you want to set the git credential as well.
Token is valid (permission: read).
Your token has been saved to /Users/danielsuarez-mash/.cache/huggingface/token
Login successful
 

Answer: 

Based on the CV, Daniel's strengths are: 

1. **Data Science skills**: Daniel has a strong background in data science, with experience in machine learning, natural language processing, and data visualization. He has also worked on several projects, showcasing his ability to develop reproducible analytical pipelines and optimize feature retrieval and engineering.
2. **Leadership skills**: Daniel has demonstrated leadership skills, having led a team to explore the use of large language models within the Home Oﬃce and overseeing learning and development initiatives within the data science team.
3. **Communication skills**: D