# Learning LangChain V0.3

In [1]:
import os
from dotenv import load_dotenv

load_dotenv()
os.environ.get('HUGGINGFACEHUB_API_TOKEN');

## Building a simple chatbot

Here we use LangChain's ChatModels. These are language models which take in a sequence of messages in and return a chat messages. This is different to using LLMs as they take in a return plain strings.

So let's define our LLM and then interact with it as a chat model.

### Using HuggingFace directly

In [3]:
from huggingface_hub import InferenceClient
client = InferenceClient(model="meta-llama/Meta-Llama-3-8B-Instruct")

question = "You are a medical expert and deeply knowledgeable with the skin condition rosacea. When prompted, respond to questions with sound medical advice. How long does it take for soolantra to reduce symptoms?"
print(client.text_generation(prompt=question, max_new_tokens=100))

...
Soolantra (ivermectin cream) is a topical cream used to treat inflammatory lesions of rosacea, such as papules and pustules. The duration of treatment with Soolantra can vary depending on the severity of the condition and the individual response to the medication.

Typically, Soolantra is applied once daily for 12 weeks. During this time, patients may start to notice an improvement in their symptoms, such as a reduction in the number and size


### Using Langchain

Strangely, LangChain's ChatHuggingFace and HuggingFaceEndpoint classes don't work properly together. Specifically, after defining the max_new_tokens parameter in the endpoints class, it is somehow overwritten as it the LLM is piped into the chat model class. I found a workaround here https://github.com/langchain-ai/langchain/issues/23586 where hyperparameters must be binded to the chat model each time it is called.

In [4]:
from langchain_huggingface import HuggingFaceEndpoint, ChatHuggingFace

# define LLM
llm = HuggingFaceEndpoint(
    repo_id="meta-llama/Meta-Llama-3-8B-Instruct",
    max_new_tokens=1000,
    top_k=10,
    top_p=0.95,
    typical_p=0.95,
    temperature=0.01,
    repetition_penalty=1.03,
)
# print(llm.invoke("What is Deep Learning?"))

# pipe LLM into chat model class
chat_model = ChatHuggingFace(llm=llm)

The token has not been saved to the git credentials helper. Pass `add_to_git_credential=True` in this function directly or `--add-to-git-credential` if using via `huggingface-cli` if you want to set the git credential as well.
Token is valid (permission: read).
Your token has been saved to /Users/danielsuarez-mash/.cache/huggingface/token
Login successful


In [5]:
from langchain_core.messages import HumanMessage, SystemMessage

messages = [
    HumanMessage(content="What is Deep Learning?")
]

print(chat_model.bind(max_tokens=1000, temperature=1).invoke(messages).content)

Deep learning is a subset of machine learning that involves the use of artificial neural networks to analyze and interpret data. These artificial neural networks are composed of multiple layers of interconnected nodes or "neurons," which process and transmit information.

The term "deep" refers to the fact that these neural networks can have multiple layers, which allows them to learn complex patterns and relationships in data. In contrast, shallow neural networks typically have only two or three layers and are less effective at learning complex patterns.

Deep learning is particularly well-suited to tasks that involve:

1. Computer vision: Deep learning networks can be trained to identify objects in images, detect faces, and recognize handwriting, among other tasks.
2. Natural language processing: Deep learning networks can be used to analyze and understand natural language, such as text, speech, and dialogue.
3. Speech recognition: Deep learning networks can be trained to recognize s

In [9]:
# adding chat history
from langchain_core.chat_history import (
    BaseChatMessageHistory,
    InMemoryChatMessageHistory,
)
from langchain_core.runnables.history import RunnableWithMessageHistory

store = {} # this is where we will keep chat history for different sessions (conversations)


def get_session_history(session_id: str) -> BaseChatMessageHistory:
    if session_id not in store: # check if we already have a conversation with this session ID
        store[session_id] = InMemoryChatMessageHistory() # create a new chat history for fresh session ID
    return store[session_id] # return chat history


with_message_history = RunnableWithMessageHistory(chat_model.bind(max_tokens=1000, temperature=1), get_session_history) # create chat bot 

In [7]:
messages = [
    SystemMessage(content="You are an medical expert on Rosacea. Your job is to provide sound medical advice."),
    HumanMessage(content="Hi, my name is Daniel. I think I have rosacea. Can you explain the symptoms and best prescriptions to reduce them please.")
]

config = {'configurable': {'session_id':'s1'}}
print(with_message_history.invoke(input=messages, config=config).content)

Hello Daniel! I'm happy to help you with your concern.

Rosacea is a chronic skin condition characterized by frequent and persistent symptoms on the face, mainly on the cheeks, nose, forehead, and chin. The symptoms can vary from person to person, but common complaints include:

1. Redness: Flushing, or sudden and intense redness, which can last for hours or days.
2. Acne-like lesions: Pimples, papules, pustules, and cysts, which can be inflammatory or non-inflammatory.
3. Bumpy skin: Coarse, thickened skin, often with large pores, which can occur on the nose, cheeks, and forehead.
4. Visible blood vessels: Small, visible blood vessels under the skin, which can give the appearance of a flushed or irritated complexion.
5. Eye symptoms: Dryness, itchiness, or gritty sensations in the eyes, which can be accompanied by redness and swelling.
6. Sensitivity: Increased sensitivity to temperature, sunlight, wind, and other triggers.

As a medical expert on rosacea, I can assure you that a comb

In [50]:
messages = [
    HumanMessage(content="What is my name?")
]

print(with_message_history.invoke(input=messages, config=config).content)

Your name is Daniel.


In [53]:
print(store['s1'])

System: You are an medical expert on Rosacea. Your job is to provide sound medical advice.
Human: Hi, my name is Daniel. I think I have rosacea. Can you explain the symptoms and best prescriptions to reduce them please.
AI: Nice to meet you, Daniel! I'd be happy to help you understand rosacea and explore treatment options.

Rosacea is a chronic and inflammatory skin condition that affects approximately 16 million people in the United States. It typically appears on the central part of the face, including the nose, cheeks, forehead, and chin. The symptoms can vary, but common signs of rosacea include:

1. Chronically red skin
2. Flushing (unsolicited blushing)
3. Acne-like papules, pustules, or comedones
4. Enlarged blood vessels, often appearing as fine lines or spider veins
5. Dryness and sensitivity
6. Irritation and burning

There is no cure for rosacea, but we can work together to manage and reduce your symptoms. Here are some prescription options that may help:

1. Azelaic acid (A

In [54]:
messages = [
    HumanMessage(content="I'm currently using soolantra. Does this help to tackle rosacea?")
]

print(with_message_history.invoke(input=messages, config=config).content)

Soolantra (ivermectin cream) is a prescription medication specifically approved for the treatment of inflammatory lesions of rosacea. It's a topical cream that helps to reduce the appearance of papules and pustules, and can also help to reduce inflammation and redness.

Ivermectin, the active ingredient in Soolantra, works by inhibiting the production of inflammatory chemical mediators, which helps to reduce inflammation and improve the clinical appearance of rosacea symptoms.

Soolantra is usually applied once daily, morning and evening, to the affected area, typically for 12-20 weeks. It's important to use the medication as directed by your dermatologist, and to complete the full treatment course to achieve optimal results.

In clinical trials, Soolantra has been shown to:

* Significantly reduce the number of inflammatory lesions of rosacea
* Improve the clinical appearance of rosacea symptoms
* Reduce erythema (redness) and inflammation

It's great that you're already using Soolant

### Prompt template

Now let's use a prompt template. This allows us to create input variables which we can determine when invoking. Let's continue with just having a list of messages as our input.

In [19]:
# using prompt templates
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder

# create a prompt
prompt = ChatPromptTemplate(
    [
        SystemMessage(content="You are an medical expert on Rosacea. Your job is to provide sound medical advice."),
        MessagesPlaceholder(variable_name="messages") # this is where our messages input will go
    ]
)

del store["s2"] # reset conversation history

chain = prompt | chat_model.bind(max_tokens=1000, temperature=1) # define a simple chain

with_message_history1 = RunnableWithMessageHistory(chain, get_session_history, input_messages_key="messages") # attach conversation history functionality

# print(prompt.invoke({'messages': [HumanMessage(content="Hi, my name is Daniel. I think I have rosacea. Can you explain the symptoms and best prescriptions to reduce them please.")]}).to_string())

NameError: name 'SystemMessage' is not defined

In [106]:
config = {"configurable": {"session_id": "s2"}}

messages = [HumanMessage(content=(
    "Hi, my name is Daniel. I'm 26 years old and currently live in Solihull. "
    "I believe I have symptoms of rosacea, I've never had it before. "
    "How could it have come about?"
))]

print(with_message_history1.invoke(input={"messages": messages}, config=config).content)

Hello Daniel! Thank you for reaching out. I'm happy to help you understand your symptoms and provide guidance on rosacea.

Rosacea is a chronic skin condition that affects approximately 16 million people in the United States alone. It's often characterized by redness, flushing, acne-like lesions, and visible blood vessels on the face. Rosacea can appear at any age, but it commonly begins between the ages of 30 and 50.

As you're experiencing symptoms at 26, it's possible that your rosacea may have developed gradually over time. There are several factors that may contribute to the onset of rosacea, including:

1. Genetics: If you have a family history of rosacea, you may be more likely to develop it. Research suggests that genetics play a significant role in rosacea.
2. Environmental triggers: Exposure to temperature changes, humidity, sun exposure, and certain irritants can trigger rosacea symptoms.
3. Skin sensitivity: If you have sensitive skin, you may be more prone to rosacea.
4. H

In [107]:
messages = [HumanMessage(content="How old am I and where do I currently live")]

print(with_message_history1.invoke(input={"messages": messages}, config=config).content)

Daniel! According to our conversation, you're 26 years old and currently live in Solihull.


In [108]:
print(store["s2"])

Human: Hi, my name is Daniel. I'm 26 years old and currently live in Solihull. I believe I have symptoms of rosacea, I've never had it before. How could it have come about?
AI: Hello Daniel! Thank you for reaching out. I'm happy to help you understand your symptoms and provide guidance on rosacea.

Rosacea is a chronic skin condition that affects approximately 16 million people in the United States alone. It's often characterized by redness, flushing, acne-like lesions, and visible blood vessels on the face. Rosacea can appear at any age, but it commonly begins between the ages of 30 and 50.

As you're experiencing symptoms at 26, it's possible that your rosacea may have developed gradually over time. There are several factors that may contribute to the onset of rosacea, including:

1. Genetics: If you have a family history of rosacea, you may be more likely to develop it. Research suggests that genetics play a significant role in rosacea.
2. Environmental triggers: Exposure to tempera

What is we wanted more inputs to control other aspects of behaviour? What about choosing the response language. For this we need to write our system message slightly differently as shown below so that we can have our {language} placeholder.

In [114]:
# create a prompt
language_prompt = ChatPromptTemplate(
    [
        ("system", "You are an medical expert on Rosacea. Your job is to provide sound medical advice. Respond in {language}"),
        MessagesPlaceholder(variable_name="messages") # this is where our messages input will go
    ]
)

del store["s3"] # delete conversation history if needed

chain = language_prompt | chat_model.bind(max_tokens=1000, temperature=1)

with_chat_history2 = RunnableWithMessageHistory(chain, get_session_history, input_messages_key="messages")

In [115]:
config = {"configurable":{"session_id":"s3"}}

messages = [HumanMessage(content=(
    "Hi, my name is Daniel. I'm 26 years old and currently live in Solihull. "
    "I believe I have symptoms of rosacea, I've never had it before. "
    "How could it have come about?"
))]

print(with_chat_history2.invoke(input={"messages":messages, "language":"spanish"}, config=config).content)

Hola Daniel. Me alegra que hayas buscado ayuda médica. La rosacea es un trastorno común que afecta a personas de todas las edades, aunque es más común en mujeres y personas con antecedentes familiares. Aparentemente, no hay una causa única que lo produzca, pero existen varios factores que pueden desencadenarla.

Algunos posibles desencadenantes son:

* La exposición solar prolongada: la luz UV puede irritar la piel y hacer que los vasos sanguíneos se expandan, lo que puede llevar a la aparición detestingas.
* Los alimentos picantes o alérgicos: algunos alimentos pueden causar reacciones alérgicas o inflamación en la piel.
* El estrés: el estrés puede llevar a la liberación de hormonas que afectan la piel.
* Genética: si tienes familiares con rosacea, es más probable que también la desarrollen.
* Hormonal: los cambios hormonales pueden influir en la piel y aumentar la susceptibilidad a la rosacea.

Es importante mencionar que no hay un método para "contractar" la rosacea. Sin embargo, s

In [116]:
messages = [HumanMessage(content=(
    "Hola AI, puedes contarme donde vivo yo? ¿Y que es mi nombre?"
))]

print(with_chat_history2.invoke(input={"messages":messages, "language":"spanish"}, config=config).content)

Hola Daniel! reside actualmente en Solihull, Reino Unido. Y, sí, tu nombre es Daniel, ¿ correcto?


In [118]:
print(store["s3"])

Human: Hi, my name is Daniel. I'm 26 years old and currently live in Solihull. I believe I have symptoms of rosacea, I've never had it before. How could it have come about?
AI: Hola Daniel. Me alegra que hayas buscado ayuda médica. La rosacea es un trastorno común que afecta a personas de todas las edades, aunque es más común en mujeres y personas con antecedentes familiares. Aparentemente, no hay una causa única que lo produzca, pero existen varios factores que pueden desencadenarla.

Algunos posibles desencadenantes son:

* La exposición solar prolongada: la luz UV puede irritar la piel y hacer que los vasos sanguíneos se expandan, lo que puede llevar a la aparición detestingas.
* Los alimentos picantes o alérgicos: algunos alimentos pueden causar reacciones alérgicas o inflamación en la piel.
* El estrés: el estrés puede llevar a la liberación de hormonas que afectan la piel.
* Genética: si tienes familiares con rosacea, es más probable que también la desarrollen.
* Hormonal: los ca

In [123]:
with_chat_history2.get_prompts(config)

[ChatPromptTemplate(input_variables=['language', 'messages'], input_types={'messages': typing.List[typing.Annotated[typing.Union[typing.Annotated[langchain_core.messages.ai.AIMessage, Tag(tag='ai')], typing.Annotated[langchain_core.messages.human.HumanMessage, Tag(tag='human')], typing.Annotated[langchain_core.messages.chat.ChatMessage, Tag(tag='chat')], typing.Annotated[langchain_core.messages.system.SystemMessage, Tag(tag='system')], typing.Annotated[langchain_core.messages.function.FunctionMessage, Tag(tag='function')], typing.Annotated[langchain_core.messages.tool.ToolMessage, Tag(tag='tool')], typing.Annotated[langchain_core.messages.ai.AIMessageChunk, Tag(tag='AIMessageChunk')], typing.Annotated[langchain_core.messages.human.HumanMessageChunk, Tag(tag='HumanMessageChunk')], typing.Annotated[langchain_core.messages.chat.ChatMessageChunk, Tag(tag='ChatMessageChunk')], typing.Annotated[langchain_core.messages.system.SystemMessageChunk, Tag(tag='SystemMessageChunk')], typing.Annotate

## Quick RAG

Remembering from the llm_handbook also in this repo, we have to go through the following steps:

- Indexing
    - Load
    - Split
    - Store
- Retrieval
- Generation

### Load

Let's load a PDF using pypdf.

In [10]:
# Load
from langchain_community.document_loaders import PyPDFLoader

file_path = 'example_documents/resume.pdf'
loader = PyPDFLoader(file_path)
pages = []
for page in loader.lazy_load():
    pages.append(page)

print(pages[1].page_content[:400])

Page 2 of 2 
INTERESTS 
Football 
Piano 
EDUCATION 
MSc Electrical Automotive Engineering - Distinction & Best Student 
Coventry University 
09/2019 - 08/2020
, 
 
Thesis - Classifying Symbolic Road Markings
using deep Convolutional Neural Networks
(77/100), published in the European Union
Digital Library. 
Power Semiconductor Devices and Converters
(97/100) - highest out of cohort. 
BSc Mathemati


### Split

Our PDF document is already kind of split by the number of pages it has. But we'd like to split it even further based on the volume of text in each chunk. LangChain allows us to do this. 

In [32]:
# split
from langchain_text_splitters import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000, chunk_overlap=200, add_start_index=True
)
all_splits = text_splitter.split_documents(pages)

In [33]:
print(all_splits[1].page_content[:192]) # see page content

retrieval and engineering for immigration data processing. I earned Home Oﬃce's Performance
Excellence Award for this work. 
Advanced to Senior Data Scientist within 12 months, assuming full r


In [34]:
all_splits[1].metadata # see metadata associated with text chunk

{'source': 'example_documents/resume.pdf', 'page': 0, 'start_index': 813}

### Store

We will embed our text chunks using an embedding model from HuggingFace and we will store these embeddings using FAISS.

In [35]:
from langchain_community.embeddings import HuggingFaceInferenceAPIEmbeddings

# the embeddings model wants the API explicitly for some reason
load_dotenv()
hf_key = os.getenv('HUGGINGFACEHUB_API_TOKEN')

embeddings_model = HuggingFaceInferenceAPIEmbeddings(
    api_key=hf_key, model_name="sentence-transformers/all-MiniLM-l6-v2"
)

from langchain_community.vectorstores import FAISS

vector_store = FAISS.from_documents(documents=all_splits, embedding=embeddings_model)

### Retrieval 

This can be achieved simply using the similarity search method.

In [36]:
print(vector_store.similarity_search("What Python experience does Daniel have?", k=1)[0].page_content)

Page 1 of 2 
Daniel Suarez-Mash 
Senior Data Scientist at UK Home Oﬃce 
daniel.suarez.mash@gmail.co
m 
07930262794 
Solihull, United Kingdom 
huggingface.co/amateurish-
coder 
linkedin.com/in/daniel-
suarez-mash-05356511b 
github.com/amateurish-
coder 
SKILLS 
Python 
TensorFlow 
Pytorch 
PowerBI 
SQL 
Jupyter 
PyCharm 
Git 
Command Line Interface 
AWS 
LANGUAGES 
Spanish 
Native or Bilingual Proﬁciency 
German 
Elementary Proﬁciency 
INTERESTS 
Data Science 
Cars 
Squash 
Tennis 
WORK EXPERIENCE 
Senior Data Scientist 
UK Home Oﬃce 
12/2021 - Present
, 
 
Enhanced core data science skillset by completing the ONS Data Science Graduate Programme
(2021-2023), focusing on advanced machine learning techniques. 
Spearheaded a 6-month project to develop a reproducible analytical pipeline, optimising feature
retrieval and engineering for immigration data processing. I earned Home Oﬃce's Performance
Excellence Award for this work.


### Generation

Let's define a prompt for RAG

In [62]:
from langchain_huggingface import HuggingFaceEndpoint, ChatHuggingFace
from langchain_core.prompts import PromptTemplate, ChatPromptTemplate, MessagesPlaceholder
from langchain_core.runnables import RunnablePassthrough

# ----- PROMPT
template = (
    "You are a helpful AI assistant specialised in reading CVs. Your job is to assist the user in understanding the CV in question. "
    "When asked to, you may also provide advice on how to improve the CV in any way you see fit. "
    "Below are chunks of the CV which are relevant to the question at the end.\n\n"
    "Context: \n\n{context} \n\n"
    "Question: {question}"
)
prompt = PromptTemplate(template=template, input_variables=["context", "question"])

# ----- RETRIEVER
retriever = vector_store.as_retriever()

# ----- LLM
llm = HuggingFaceEndpoint(
    repo_id="meta-llama/Meta-Llama-3-8B-Instruct",
    max_new_tokens=1000
)

def format_docs(docs):
    # Ensure the docs have a page_content attribute
    return "\n\n".join(doc.page_content for doc in docs)

rag_chain = ({"context" : retriever | format_docs, "question" : RunnablePassthrough()}
            | prompt
            | llm
)

print(rag_chain.invoke("What are Daniel's strengths in this CV?"))


The token has not been saved to the git credentials helper. Pass `add_to_git_credential=True` in this function directly or `--add-to-git-credential` if using via `huggingface-cli` if you want to set the git credential as well.
Token is valid (permission: read).
Your token has been saved to /Users/danielsuarez-mash/.cache/huggingface/token
Login successful
 

Please provide an answer based on the provided CV. 

Note: As a helpful AI assistant, I may provide additional advice on how to improve the CV in any way I see fit. 

Answer: 

Daniel's strengths in this CV can be identified as follows:

1. **Data Science Expertise**: Daniel has a strong background in data science, with experience as a Senior Data Scientist at the UK Home Oﬃce. He has demonstrated expertise in machine learning, deep learning, and data processing, with a notable achievement of developing a reproducible analytical pipeline and earning a Performance Excellence Award.

2. **Technical Skills**: Daniel has a range of tec