# Building LLM Powered Applications by Valentina Alto


## Introduction to Large Language Models 



Topics covered;
1. Understanding LLMs, their differentiators from classical ML systems. 
2. Overview of the most popular LLM architectures.
3. How LLMs are trained and consumed.
4. Base LLMs versus fine-tuned LLMs. 

Definitions; 
- LLMs are deep-learning based models that use many parameters to learn for vast amounts of unlabeled texts. 
They perform tasks like recognizing, sumarizing, translating, predicting and generating text.
- Deep Learning is a branch of machine learning characterized by neural networks with multiple layers, used for extracting abstract features from input data. 
- Artificial neural networks; are computational models insired by the structure and functioning of the human brain. 
- Backpropagation is an algorithm used to train neural networks. In the forward pass, the data is passed through the network to compute the output. In the backward pass, the errors are propagated backward to update the network's parameters and improve performance. 
- Foundation model is a type of pre-trained generative AI model that offers immense versatility by being adaptable for various specific tasks. 
They undergo extensive traning on vast and diverse datasets, enabling them to grasp general patterns and relationships within data. 

Foundation models:
> Pre-training
> Fine-tuning 
> Transfer learning 
> Large model architecture 
> Generalization 
LLMs; GPT-4, BERT, Megatron, Llama

Popular AI ANN Architectures;
1. RNN: Used to handle sequential data. Have recurrent connections that allow information to persist across time steps, making it suitable for language modeling, machine translation and text generation. 
Have a vanishing gradient problem; struggle to capture long-term patterns(small gradient) and unstable training & prevents the RNN from converging to a good solution(exploding gradient-large)
2. LSTM: Variants of RNNs that address the vanishing gradient problem. Introduce gating mechanisms that enable better preservation of important information across longer sequences. 
Popular for sequential tasks eg text generation, speech recognition and sentiment analysis.

The archutectures above have limitations in handling long-range dependencies, scalability, and overall efficiency especially when dealing with large-scale NLP tasks that would need massive parallel processing. 

### Introducing The Transformer Architecture

- Transformer dispenses with recurrence and convolutions entirely and relies solely on attention mechanisms.
'Attention' is a mechanism that enables the model to focus on relevant parts of the input sequence while generating the output. 
- It calculates attention scores between input and output positions, applies Softmax to get weights, and takes a weighted sum of the input sequence to obtain context vectors. 
- Attention is crucial for capturing long-range dependencies and relationships between words in the data. 
In transformers, self-attention layers are responsible for determining the importance of each input token in generating the output.
To obtain the self-attention vectors for  a sentence, we need;
1. Query(Q): Used to represent the current focus of the attention mechanism. 
2. Key(K): Used to determine which parts of the input should be given attention. 
3. Value(V): Used to compute the context vectors. 
These matrices are used to calcuate attention scores between the elements in the input sequence and are the three weight matrices that are learned during the training process. 
The transformer has two main components: 
> Encoder; takes the input sequence and produces a sequence of hidden states, each of which is a weighted sum of all the input embeddings. 
> Decoder; Takes the output sequence(shifted right by one position) and produces a sequence of predicitions, ie a weighted sum of all the encoder's hidden states and the previous decoder's hidden states. 

Some models use the encoder only eg BERT(Bidirectional Encoder Representations from Transformers). Designed for NLU tasks like text classification, question answering and sentiment analysis. 
Other models use decoder part eg GPT-3(Genetative Pre-trained Transformer 3). Designed for NLG tasks like text completion, summarization and dialog. 
Some models use both encoder and decoder parts eg T5(Text-to-Text Transfer Transformer), designed for NLP tasks framed as text-to-text transformations eg translation, paraphrasing, and text simplification. 

### Training and Evaluating LLMs 


Training an LLM; 
> Number of parameters; Measures the complexity of the LLM architecture and represents the number of connections among neurons. 
> Training set; Refers to the unlabeled text corpus on which the LLM learns and trains its parameters.
1 token ~= 4 English characters 
1 token ~= 3/4 words 
Training is done on distributed systems with multiple graphics processing units(GPUs) or tensor processing units (TPUs).
A tensor is a multi-dimensional array used to hold numerical data. 
- Main training steps:
1. Data collection; Gathering a large amount of data from varous sources. Should be diverse, high-quality and representative. 
2. Data preprocessing; Process of cleaning, filtering, and formatting the data for training. May include removing duplicates, noise, sensitive information, splitting the data into paragraphs, tokenizing the text into subwords/characters. 
3. Model architecure; Designing the structure and parameters of the LLM. Choose the type of neural network(eg transformer), its structure(decoder only, encoder only, encoder-decoder), number  and size of layers, the attention mechanisms and activation functions. 
4. Model initialization; Assigning initial values to the weights and biases of the LLM. Can be random/ using pre-trained weights from another model. 
5. Model pre-training; Process of updating weights and biases of the LLM by feeding it batches of data and computing the loss function. 
The loss function measures how the well the LLM predicts the next token given the previous tokens. 
The LLM tries to minize loss by using an optimization algorithm(gradient descent-SGD) that adjusts weights and biases in the direction that reduces loss with backpropagation. 
6. Fine-tuning:; Base model is trained in a supervised way with a dataset made of tuples of (prompt, ideal response). Makes the base model inline with AI assistants. 
The output is a supervised fine-tined(SFT) model. 
7. Reinforcement learning from human feedbach(RLHF); Iteratively optimizing the SFT model by updating some of its parameters wrt the reward model(typically another LLM trained incorporating human preferences).

- Model Evaluation 
Evaluating an LLM involves measuring its language fluency, coherence, and ability to emulate different styles depending on the user's request. 
1. General Language Understanding Evaluation(GLUE): Measures the performance of LLMs on various NLU tasks. The higher the score of GLUE benchmark, the better the LLM in generalizing across different domains and tasks.
Focuses on grammar, paraphrasing and text similarity.
SuperGLUE is more challenging and realistic than GLUE and convers complex tasks and phenomena. 
2. Massive Multitask Language Understanding(MMLU): Measures the knowledge of an LLM using zero-shot and few-shot setting. 
Zero-shot evaluation measures how well the language model can perform on a new tasks by using natural language instructions/examples as prompts and computing the likelihood of the correct output given the input. 
Focuses on generalized language understanding among various domains and tasks. 
3. HellaSwag: Evaluates LLMs on their ability to generate plausible and common sense continuations for given contexts. 
4. TruthfulQA: Evaluates a language model's accuracy in generating responses to questions. The questions mimic those that humans might answer incorrectly due to false beliefs/misunderstanding. 
5. AI2 Reasoning Challenge(ARC): Measures LLMs' reasoning capabilities and to simulate the dev't of models that can perform complex NLU tasks.

### How to customize your model 

> Extending non-parametric knowledge: The allows the model to access external sources of information to integrate its parametric knowledge while responding to the user's query. 
-Parametric knowledge is the one embedded in the LLM's parameters, deriving from unlabeled text corpora during training. 
- Non-parametric knowledge is the one we can 'attach' to the model via embedded documentation. Doesn't change the structure of the model but allows it to navigate through external documentation to be used as relevant context to answer the user's query. 
> Few-shot learning: The LLM is given a metaprompt with a small number of examples of each new task it is asked to perform. 
A metaprompt is a message/instruction used to improve the performance of LLMs on new tasks with a few examples. 
> Fine tuning: Involves using smaller, task-specific datasets to customize the foundation models for particular apps. 


#### Considerations for integrationg LLMs within Applications
1. Technical aspect: covers the how. Involves embedding them through REST API and manage them with AI orchestrators(helps to efficiently manage and coordinate the LLMs' functionality within the app).
2. Conceptual aspect: Covers the what. Bringing a LLM capabilities that can be harnessed within the applications. This highlights the significant assistance and collaboration provided by LLMs in enhancing app functionalities. 

- Grounding involves using an LLM with information that us use case specific, relevant and not available as part of the LLM's trained knowledge. This ensures quality, accuracy and relevance. 
This can be achieved through retrieval-augmented generation(RAG).

> LLM Limitations
- Limited parametric knowledge: Have a knowledge base cutoff date. 
- Lack of executive power: LLMs are not empowered to carry out actions. 

Prompt engineering: Process of designing and optimizing prompts to LLMs for a wide variety of applications and research topics.
Involves sepecting the right words, phrases, symbols and formats that elicit the desired response from the LLM. 
Prompts: Short pieces of text used to guide an LLMs' output. 




#### AI Orchestrators 
VectorDB: A database that stores and retrieves information based on vectorized embeddings, the numerical representations that capture the meaning and context of text. 
eg Chroma, Elasticsearch, Milvus, Pinecone, Qdrant, Weaviate, FAISS(Facebook AI Similarity Search).
1. Langchain: Framework for developing apps powered by language models, making them data-aware and agentic. 
Modules;
- Models: Are the LLMs and LFMs that are engine of the app. Supports proprietary  and open-source models. 
- Data connectors: Building blocks needed to retrieve additional external knowledge eg document loaders and text embedding models. 
- Memory: Allows the app to keep references to the user's interactions, both long and short term. Based on vectorized embeddings stored in a VectorDB. 
- Chains: Predetermined sequences of actions and calls to LLMs that make it easier to build complex applications that require chaining LLMs with each other/other components.
- Agents: Entities that drive decision making within LLM-powered apps. Have access to a suite of tools and can decide which tools to call based on the user input and context.

2. Haystack: A framework developed by Deepset that provides devs with tools to build NLP-base  apps. 
- Nodes: Components that perform a specific task/function eg as a retriever, a reader, a generator, a summarizer etc.
- Pipelines: Sequences of calls to nodes that perform natural language tasks/interact with other resources. Can be querying pipelines or indexing pipelines depending on whether they perform searche on a set of documents/prepare documents for search.
Are predetermined and handcoded hence don't change/adapt basedon the user input/context.
- Agent: Uses LLMs to generate accurate responses to complex queries. Can access a set of tools eg pipelines, nodes and decide which tool to call base on user input/context. 
- Tools: Are functions that an agent can call to perform natural language tasks/interact with other resources. Can either be pipelines/nodes. 
- DocumentStores: Are backends that store and retrieve documents for searches. Can be a VectorDB(FAISS, Milvus, ElasticSearch)

3. Semantic Kernel: An open-source SDK developed by Microsoft.
A kernel is meant to act as the engine that addresses a user's input by chaining and concatenating a series of components into pipelines, encouraging function composition.
- Models: LLMs/FLMs that will be the engine of the app. Supports both proprierary and open-source models. 
- Memory: Allows the app to keep references to the user's interactions, both in the short  and long term. Memories can be accessed as: 
  > Key-value pairs- Saving env variables that store simple information. 
  > Local storage - Consists of saving information to a file that can be retrieved bt its filename. 
  > Semantic memory search - Uses embeddings to represent and search for text information basd on its meaning.
- Functions: Skills that minx LLM promts and code, with the goal of making the user's ask interpretable and actionable. 
  > Semantic fuctions- A type of templated prompt, a natural language query that specifies the input and output format of the LLM. 
  > Native functions- Native computer code that can route the intent captured by the semantic function and perform the related task.
- Plug-ins: Connectors toward external sources/systems that are meant to provide additional information/ the ability to perform autonomous actions. eg Microsoft Graph connector kit. 
- Planner: A function that takes as input a user's task and producs a set of actions, plug-ins, and functions needed to achieve the goal. Auto-create chains/pipelines to address new user's needs. 

#### How to choose a framework 
1. The programming language you are comfortable with/prefer to use. One that matches your existing skills/preferences. 
2. The type and complexity of the natural language tasks you want to perform/support. eg summarization, translation and reasoning. 
3. The level of customization and control you want over LLMs and their parameters/options. Different ways of accessing, configuring and fine-tuning ahd their parameters/options like model selection, prompt design, inference speed and output format. 
4. The availability and quality of the documentation, tutorials, examples and community support for framework. This helps you to get started and solve problems with the framework.


### LLMs

Different LLMs ahave different architectures, sizes, training data, capabilities and limitations. Choosing the right one impacts performance, quality, and cost of the solution.
1. Propritary Models
Offer better  support and maintainane as well as safety an alignment. Outperform open-source models on generalization but act as a 'black box'.
> GPT-4: Develped by OpenAI, belongs to generative pretrained transformer(GPT) models, a decoder-only transformer-based architecture. 
- The model is aligned based on RLHF training. Other training methods; unsupervised pretraining, supervised fine-tuning, instruction tuning. 
- The model has limited hallucination; a phenomenon where the LLM generate text that is incorrect, nonsensical, not real but appears to be plausible/coherent. This because the LLMs are based on statistical models that learn from massive amounts of data and produce outputs based on patterns and probabilites learnt. The data many not represent reality due to it being incompelete, noisy or biased.
- Alignment describes the degree to which LLMs behave in ways that are useful and harmless for their human users. An LLM is aligned if it generates text that is accurate, relevant, coherent and respectful. 
- The LLM is misaligned if it generates text  that is false, misleading, harmful and offensive. 

> Gemini 1.5: A state-of-the-art GenAI model developed by Google. Its multimodal hane can process and generate content  in text, images, audo, video and code. 
Based on a mixture-of-expert(MoE) transfomer. 
- MoE refers to  a model that incorporates multiple specialized sub-models 'experts' within it layers. Uses a gating mechanism/router to determine which expert should process a given input, allowing the model to dynamically allocate resources and specialize in processing certain types of information. 
Gemini comes in sizes like Ultra, Pro and Nano to catter for different computational needs. 

> Claude 2: Constitutional Language-scale Alignment via User Data and Expertise(CLAUDE). Developed by Anthropic with focus on AI safety and alignment. 
- Claude 2 is a transformer-based LLM that's trained via unsupervised learning, RLHF and constitutional AI(CAI). 
- CAI aims to make the model safer and more aligned with human values and intentions by preventing toxic/discriminatory output and broadly creating an AI system that is helpful, honest and harmless. Scored over 71% on the HumanEval benchmark
-HumanEval is a benchmark for evaluating the code generation ability of LLMs. This measures functional correctness, syntactic validity, and semantic coherence in the LLM's outputs. 

2. Open-source Models 
This implies; 
 - You have major control over the architecture, you can modify it in your local version. 
 - Polisibility of training the moel from scratch, on top of classical fine-tuning. 
 - Free to use. 
> LLaMA-2: Large Language Model Meta AI 2, developed by Meta. It is an autoregressive model with an optimized decoder-only transformer architecture. 
- Autoregressive for the fact that the model predicts the next token in the sequence, conditioned on all previous tokens. Done by masking the input. 
- Base models: Trained on vast amounts of data often from the internet. Primary function is to predict the next word in a given context and may not always be precise/focused on specific instructions. 
- Assistant models: start as base LLMs but are further fine-tuned with input-output pairs that include instructions and the model tries to follow those instructions. Often emply RLHF to refine the model, making it better at being helpful, honest and harmless. 
> Falcon LLM: A lighter model(few parameters) and focused on quality of the training dataset. Launched by the Technology Innovation Institute(TII). 
- It's an autoregressive, decoder-only transformer. Also comes with an fine-tuned variant called 'Instruct' tailored towards following user instructions. 
- Instruct models are specialized for short-form instruction following. Trained on large datasets of instructions and their correspoding outputs.
> Mistral: Developed by MistralAI and emphasizes transparency and accessiblity in AI development. 
- Mistral model is a decoder-only transformer model designed for generative text tasks. Known for innovative architectures like: 
 -> grouped-query attention(GQA): Allows for faster inference times to standard full attention mechanisms. Partitions attention mechanism's query heads into groups with each group sharing a single key and value head.
 -> sliding-window attention(SWA): Used to handle long text sequences efficiently. Extends the model's attention beyond fixed window size, allowing each layer to reference a range of positions from the preceding layer. 

####  Choosing the right LLM 
1. Size and performance: Complex models tend to have better performance in terms of parametric knowledge and generalization capabilities. 
For large models, more computation an memory is required to process user input. 
2. Cost and hosting strategy: 
   > Cost of model consumption: Fee for consuming the model. Proprietary models require a fee in proportional to the number of tokens processed. 
   > Cost of model hosting: Proprietary models are hosted in private/public hyperscaler and are consumed via a REST API. Open-source models need own infrastructure or using HuggingFace Inference API. 
3. Customization: 
   > Fine-tuning: Slightly adjusting LLMs's parameters to better fit the domain. Open-source models can be fine-tuned while for proprietary models, not all can be fine-tuned. 
   > Training from scratch: For super specific models, you might want to train from scratch by having them downloaded locally. Not possible for proprietary models. 
4. Domain-specific capabilities: Use a model that is a top performer in a specific benchmark eg MMLU for LLMs' generalization culture and commonsense reasoning, TruthfulQA for LLMs' alignment, HumanEval for LLMs' coding capabilities. 
This saves in terms of model complexity for relatively small models. 


## Prompt Engineering

Prompt engineering: Process of designing effective prompts that elicit high-quality and relevant output from LLMs. 

Principles: 
 - Clear instructions. Goal/objective of task, format/structure of output, constraints, and context/background of the task. 
 - Split complex tasks into subtasks. 
 - Ask for justification
 - Generate many outputs, then use the model to pick on the best one. 
 - Repeat instructions to the end. 
 Recency bias: Tendency of the LLM to give more weight to the information that appears near the end of a prompt, and ingore/forget the information that appears earlier. This leads to inaccurate/inconsitent responses that don't take into a/c the whole context of the task.
 - Use delimiters eg a any sequence of characters/symbols that is clearly mapping a schema rather than a concept. 

### Advanced Techniques 

1. Few-shot approach
- Providing the model with examples of how we would like it to respond. THis enables model customization without interfering with the overall architecture. 


2. CoT
- Chain of Thought(CoT) is a technique that enables complex reasoning capabilities through intermediate reasoning steps. 
- It encourages the model to expain its reasoning 'forcing' it not to be too fast and risk giving wrong responses.

3. ReAct

ReAct(Reason and Act) is a paradigm that combines reasoning and acting with LLMs. 
- It prompts the language model to generate verbal reasoning traces and actions fora task, and also receives observations from external sources eg web searches/DBs. 
- This allows the language model to perform dynamic reasoning and adapt its action plan based on external information. 
- CoT prompts the model to generate intermediate reasoning steps for a task while ReAct prompts the model to generate intermediate reasoning steps, actions and observations for a task. 


In [1]:
import os 
from dotenv import load_dotenv
from langchain import SerpAPIWrapper
from langchain.agents import AgentType, initialize_agent 
from langchain_mistralai import ChatMistralAI
from langchain.tools import Tool 
# from langchain.schema import HumanMessage 

load_dotenv()

# SerpAPI Key 
key = os.getenv("SERPAPI_API_KEY")
# MistralAI Key
mistral_api_key = os.getenv("MISTRAL_API_KEY")

# Initialize Mistral chat model 
model = ChatMistralAI(
    model_name="mistral-large-latest",
    api_key=mistral_api_key
)

In [2]:
search = SerpAPIWrapper()
tools = [
    Tool.from_function(
        func = search.run, 
        name='Search', 
        description = "useful for when you need to answer questions about\
            current events."
    )
]
agent_executor = initialize_agent(
    tools,
    model, 
    agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, 
    verbose=True
)

print(agent_executor.agent.llm_chain.prompt.template, end='\n')

Answer the following questions as best you can. You have access to the following tools:

Search(query: str, **kwargs: Any) -> str - useful for when you need to answer questions about            current events.

Use the following format:

Question: the input question you must answer
Thought: you should always think about what to do
Action: the action to take, should be one of [Search]
Action Input: the input to the action
Observation: the result of the action
... (this Thought/Action/Action Input/Observation can repeat N times)
Thought: I now know the final answer
Final Answer: the final answer to the original input question

Begin!

Question: {input}
Thought:{agent_scratchpad}


  agent_executor = initialize_agent(


In [4]:
agent_executor("What's deepseek")

Parameter `stop` not yet supported (https://docs.mistral.ai/api)




[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: I need to find out what "deepseek" is. It could be a current event, a technology, an organization, or something else. I should search for information about it.

Action: Search

Action Input: "deepseek"[0m

Parameter `stop` not yet supported (https://docs.mistral.ai/api)



Observation: [36;1m[1;3m['DeepSeek type: Artificial intelligence company.', 'DeepSeek entity_type: related_questions.', 'DeepSeek kgmid: /g/11wvrb0s91.', 'DeepSeek founder: Liang Wenfeng.', 'DeepSeek parent_organization: High-Flyer.', 'DeepSeek founded: May 2023, Hangzhou, China.', 'DeepSeek headquarters: Hangzhou, Zhejiang, China.', 'DeepSeek number_of_employees: 160 (2025).', 'DeepSeek, unravel the mystery of AGI with curiosity. Answer the essential question with long-termism.', 'a Chinese artificial intelligence company that develops large language models (LLMs). Based in Hangzhou, Zhejiang, it is owned and funded by the Chinese hedge ...', "Experience seamless interaction with DeepSeek's official AI assistant for free! Powered by the groundbreaking DeepSeek-V3 model with over 600B parameters, ...", "DeepSeek is the name of a free AI-powered chatbot, which looks, feels and works very much like ChatGPT. That means it's used for many of ...", 'Wiz Research has identified a publicly

{'input': "What's deepseek",
 'output': 'DeepSeek is a Chinese artificial intelligence company that develops large language models and offers an AI-powered chatbot. It was founded in May 2023 and is based in Hangzhou, Zhejiang, China.'}

## Embedding LLMs within Your Applications

### Langchain(LangChain Expression Language-LCEL): 
1. Streaming asynchronous support: Allows effienct handling of data streams. 
2. Batch support: Enables processing data in batches. 
3. Parallel execution: Enhances performance by executing tasks concurrently. 
4. Retries and fallbacks: Ensure robust error handling of failures gracefully.
5. Dynamically routing logic: Allows logic flow based on input and output. 
6. Message history: Keeps track of interactions for context-aware processing. 


![{36710D30-CDEC-4E93-B53F-D1311AFCCEB7}.png](attachment:{36710D30-CDEC-4E93-B53F-D1311AFCCEB7}.png)

- Prompt template: A component that defines how to generate a prompt for a language model. 
It can include variables, placeholders, prefixes, suffixes and customizations based on task and data. 


In [None]:
from langchain import PromptTemplate 

template  = """Sentence: {sentence}
Translation in {language}:
"""
prompt = PromptTemplate(template=template,
    input_variables=['sentence', 'language'])

print(prompt.format(sentence='the cat is on the table', language='spanish'))

Sentence: the cat is on the table
Translation in spanish:



> A completion model is a type of LLM that takes a text input and generates a text output. 
It ties to continue the prompt in a coherent and relevant way, according to the task and data trained on. 
> A chat model is a special completion model that is designed to generate conversational responses. Takes a list of messages as input, where each message has a role(system/assistant) and content. 
Tries to generate new messages for the assistant role, based on previous messages and system instructions. 
- A completion model expects a sigle input as prompt, while a chat model expects a list of messages as input. 

In LangChain, an example selector allows one to choose which examples to include in a prompt for a language model. 
eg: {"prompt": "<prompt text>", "completion": "<ideal generated text>} 

1. Data connections; Building blocks needed to retrieve additional non-parametric knowledge we want to provide the model with.

(a). Document Loaders: Load documents from different sources eg csv,file directory, HTML, JSON, Markdown and PDF.


In [None]:
from langchain.document_loaders.csv_loader import CSVLoader


"""Detect filename encording
import chardet

with open('filename', 'rb') as f:
    print(chardet.detect(f.read()))
"""
loader = CSVLoader(file_path="data/sample.csv", encoding="UTF-8-SIG")
data = loader.load() 

print(data)

[Document(metadata={'source': 'data/sample.csv', 'row': 0}, page_content='Name: John\nAge: 25\nCity: New York'), Document(metadata={'source': 'data/sample.csv', 'row': 1}, page_content='Name: Emily\nAge: 28\nCity: Los Angeles'), Document(metadata={'source': 'data/sample.csv', 'row': 2}, page_content='Name: Michael\nAge: 22\nCity: Chicago')]


(b). Document transformers: eg text splitters for splitting documents into chunks that are semantically related to reduce context loss and relevant information. 

In [None]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

with open("data/mountain.txt") as f: 
    mountain = f.read() 

text_splitter = RecursiveCharacterTextSplitter( 
    chunk_size=100, 
    chunk_overlap=20, 
)

texts = text_splitter.create_documents([mountain])

print(texts[0])
print(texts[1])

page_content='Amidst the serene landscape, towering mountains stand as majestic guardians of nature's beauty.'
page_content='The crisp mountain air carries whispers of tranquility, while the rustling leaves compose a'


In [None]:
print(texts[0].page_content)

Amidst the serene landscape, towering mountains stand as majestic guardians of nature's beauty.


(c). Text embedding models; Used for incorporating non-parametric knowledge into LLMs and then stored in a VectorDB. 


In [29]:
import warnings 
from langchain_mistralai import MistralAIEmbeddings

warnings.filterwarnings(action='ignore')

embedding_model = MistralAIEmbeddings(
    model="mistral-embed"
)

embeddings = embedding_model.embed_documents(
    [
        texts[0].page_content
    ]
)
print("Embed documents: ")
print(f"Number of vector: {len(embeddings)}; Dimension of each vector: \
      {len(embeddings[0])}")

embed_query = embedding_model.embed_query(
    "What is the text saying?"
)

print("Embed query: ")
print(f"Dimension of the vector: {len(embed_query)}")
print(f"Sample of the first 5 elements of the vector: {embed_query[:5]}")


Embed documents: 
Number of vector: 1; Dimension of each vector:       1024
Embed query: 
Dimension of the vector: 1024
Sample of the first 5 elements of the vector: [0.007228851318359375, 0.01021575927734375, 0.046600341796875, -0.013275146484375, 0.045379638671875]


(d). Vector Store: A database that can store and search over unstructured data by using embeddings. With embeddings, vector stores can perform fast and acurate similarity search. 
eg Facebook AI Similarity Search(FAISS), ElasticSearch, MongoDB Atlas and Azure Search.

- Similarity is the measure of how close/related two vectors are in a vector space. In LLMS, vectors are numerical representations of sentences, words/documents that capture semantic meaning. 

In [30]:
from langchain.document_loaders import TextLoader 
from langchain.vectorstores import FAISS 


raw_documents = TextLoader('data/dialogue.txt').load() 
text_splitter = RecursiveCharacterTextSplitter(chunk_size=50, chunk_overlap=0, \
                    separators='\n',)
documents = text_splitter.split_documents(raw_documents)
db = FAISS.from_documents(documents, embedding_model)

query = "What is the reason for calling?"
docs = db.similarity_search(query)
print(docs[0].page_content)

Sorry to hear that. May I ask your name?


(e). Retrievers: A retriever is a component that can return documents relevant to an unstructured query. eg natural language question/ a keyword.
Methods used include keyword matching, semantic search and ranking algorithms. 
A retriever can use any method to find relevant documents and can use different sources of documents eg webpages, DB or files  while a vector store relies on embeddings and needs to store the data itself. 


In [32]:
from langchain.chains import RetrievalQA

retriever = db.as_retriever() 

qa = RetrievalQA.from_chain_type(llm = model, chain_type="stuff", 
        retriever=retriever)

query = "What is the reason of the call?"
qa.run(query)


'The reason for the call is to report an accident.'

2. Memory

Memory allows the application to keep references to user interactions, both in the short  and long term. 
- Conversation buffer memory: Allows storage of chat messages and extract them in a variable. 
- Conversation buffer window memory: Allows a sliding window over only K interactions so that you can manage longer chat time. 
- Entity memory: Allows the language model to remember given facts about specific entities in a conversation. 
- Conversation knowledge graph memory: Uses a knowledge graph to recreate memory. 
- Conversation summary memory: Creates a summary of the conversation over time. 
- Conversation summary buffer memory: Combines buffer and conversation summary memory. 
- Conversation token buffer memory: Uses token lengths rather than number of interactions to determine when to start summarizing the interactions. 
- Vector store-backed memory: Leverages embeddings and vector stores. Stores interactions as vectors, and retrieves the top K most similar texts using a retriever.



In [8]:
from langchain.memory import ConversationSummaryMemory

memory = ConversationSummaryMemory(llm=model)
memory.save_context(
    {"input": "hi, am looking for some ideas to write an essay in AI."},
    {'output': "Hello, what about writing about LLMs."}
)
print(memory.load_memory_variables({}), end="")

{'history': 'The human greets and expresses that they are looking for ideas to write an essay about AI. The AI suggests writing about Large Language Models (LLMs).'}

3. Chains 

Are predefined sequences of actions and calls to LLMs that make it easier to build complex apps that require combining LLMs with each other/components. 
- LLMChain: Consists of a prompt template, an LLM and an optional output parser. 
The output parser structures language model responses. Uses get_format_instructions and parse methods. 


In [11]:
from langchain import PromptTemplate, LLMChain

template = """Sentence: {sentence}
Translation in {language}:
"""
prompt = PromptTemplate(template=template, 
            input_variables=["sentence", 'language'])

llm_chain = LLMChain(prompt=prompt, llm=model)

print(llm_chain.predict(sentence="the cat is on the table", language="spanish"), end="")


The translation of "The cat is on the table" in Spanish is:

"El gato está sobre la mesa."

Here's a breakdown:
- El gato = The cat
- está = is
- sobre = on
- la mesa = the table

- RouterChain: 

Allows you to route the input variables to different chains based on some conditions. 


In [15]:
from langchain.chains import ConversationChain 
from langchain.chains.router import MultiPromptChain
from langchain.chains.llm import LLMChain 
from langchain.chains.router.llm_router import LLMRouterChain, RouterOutputParser
from langchain.chains.router.multi_prompt_prompt import MULTI_PROMPT_ROUTER_TEMPLATE

itinerary_template = """You are a vacation iteneraty assistant. \
You help customers find the best destinations and itinerary. \
You help customer create an optimized itinerary based on their 
references. 

Here is the question: 
{input}"""

restaurant_template= """You are a restaurant booking assistant. \
You check with customers number of guests and food preferences. \
You pay attention whether there are special conditions to take into 
account. 
    
Here is the question: 
{input}"""

llm = model 

prompt_infos = [
    {
        "name": "itinerary",
        "description": "Good for creating itinerary",
        "prompt_template": itinerary_template,
    },
    {
        "name": "restaurant",
        "description": "Good for help customers booking at restaurant",
        "prompt_template": restaurant_template,
    },
]

destination_chains = {}
for p_info in prompt_infos:
    name = p_info["name"]
    prompt_template = p_info["prompt_template"]
    prompt = PromptTemplate(template=prompt_template, input_variables=["input"])
    chain = LLMChain(llm=llm, prompt=prompt)
    destination_chains[name] = chain
default_chain = ConversationChain(llm=llm, output_key="text")

destinations = [f"{p['name']}: {p['description']}" for p in prompt_infos]
destinations_str = "\n".join(destinations)
router_template = MULTI_PROMPT_ROUTER_TEMPLATE.format(destinations=destinations_str)
router_prompt = PromptTemplate(
    template=router_template,
    input_variables=["input"],
    output_parser=RouterOutputParser(),
)
router_chain = LLMRouterChain.from_llm(llm, router_prompt)

chain = MultiPromptChain(
    router_chain=router_chain,
    destination_chains=destination_chains,
    default_chain=default_chain,
    verbose=True,
)
print(chain.invoke("I'm planning a trip from Milan to Venice by car. What can I visit in between?"),end="")



[1m> Entering new MultiPromptChain chain...[0m
itinerary: {'input': "I'm planning a trip from Milan to Venice by car. What can I visit in between?"}
[1m> Finished chain.[0m
{'input': "I'm planning a trip from Milan to Venice by car. What can I visit in between?", 'text': 'That sounds like a wonderful trip! Driving from Milan to Venice offers a variety of interesting stops along the way. Here are some suggestions for places to visit in between:\n\n### 1. **Bergamo**\n- **Distance from Milan**: Approximately 50 km\n- **Highlights**: The upper town (Città Alta) with its medieval walls, Piazza Vecchia, and the Basilica di Santa Maria Maggiore. The lower town (Città Bassa) offers modern amenities and beautiful parks.\n\n### 2. **Brescia**\n- **Distance from Milan**: Approximately 90 km\n- **Highlights**: Castello di Brescia, the ancient Roman ruins of Tempio Capitolino, and the beautiful Piazza della Loggia. Don\'t miss the Museo di Santa Giulia, a UNESCO World Heritage site.\n\n### 3

- SequentialChain 

Allows you to execute multiple chains in a sequence. 


In [17]:
from langchain.chains import SimpleSequentialChain

template = """You are a comedian. Generate a joke following
{topic}
Joke:"""
prompt_template= PromptTemplate(input_variables=["topic"],
template=template)
joke_chain = LLMChain(llm=model, prompt=prompt_template)

template = """You are a translator. Given a text input, translate it to 
{language}
Translation:"""
promt_template = PromptTemplate(
    input_variables=["language"], 
    template=template
)
translator_chain = LLMChain(llm=model, prompt=prompt_template)

overall_chain = SimpleSequentialChain(chains=[joke_chain, translator_chain], verbose=True)
translated_joke = overall_chain.invoke("Cats and Dogs")

print(translated_joke, end="")



[1m> Entering new SimpleSequentialChain chain...[0m
[36;1m[1;3mWhat do you call a cat that was caught by the police?

The purrpatrator.

And what do you call a dog that can do magic?

A labracadabrador.[0m
[33;1m[1;3mWhat do you call a cow that jumps over barbed wire?

An udder-vaulter.[0m

[1m> Finished chain.[0m
{'input': 'Cats and Dogs', 'output': 'What do you call a cow that jumps over barbed wire?\n\nAn udder-vaulter.'}

- TransformationChain 

Allows you to transform the input variables/output of another chain using some fuctions/expressions.


In [27]:
from langchain.chains import TransformChain

# Import the string module
import string

# Define the function
def rename_cat(inputs: dict) -> dict:
  # Open the file in read mode
  text = inputs["text"]
  # Create a table that maps punctuation characters to None
  new_text = text.replace('cat', 'Silvester the Cat')
  # Apply the table to the text and return the result
  return {"output_text": new_text}

with open("data/Cats&Dogs.txt") as f: 
    cats_and_dogs= f.read() 
    
transform_chain = TransformChain(
    input_variables=["text"], output_variables=["output_text"], 
    transform=rename_cat
)
template = """Summarize this text: 

{output_text}"""

prompt = PromptTemplate(input_variables=["output_text"], template=template)
llm_chain = LLMChain(llm=model, prompt=prompt)

sequential_chain = SimpleSequentialChain(chains=[transform_chain, 
llm_chain])

sequential_chain.invoke(cats_and_dogs)

{'input': "\nThe Cat and the Dog\n\nThere was once a cat and a dog who lived in the same house. They did not get along very well, as they often fought over food, toys, and attention. The cat was clever and cunning, while the dog was loyal and friendly. The cat liked to tease the dog, and the dog liked to chase the cat.\n\nOne day, the cat decided to play a prank on the dog. He found a ball of yarn and tied it around the dog's tail. Then he hid behind a sofa and waited for the dog to notice. When the dog saw the yarn, he thought it was a toy and started to play with it. He ran around the house, trying to catch the yarn with his mouth. But every time he got close, the yarn moved away from him. The cat laughed silently as he watched the dog's futile attempts.\n\nThe dog soon realized that something was wrong. He looked behind him and saw that the yarn was attached to his tail. He tried to pull it off, but it was too tight. He felt angry and embarrassed. He wondered who did this to him. He

4. Agents 

Agents are entities that drive decision-making within LLM-powered apps. 
Agent types: 
- Structured input ReAct. 
- OpenAI Functions. 
- Conversational 
- Self ask with search. 
- ReAct document store. 
- Plan-and-execute agents. 



# Building Conversational Applications 

### Creating a plain vanilla bot

In [54]:
from langchain.schema import (
    AIMessage, 
    HumanMessage, 
    SystemMessage
)
from langchain.chains import LLMChain, ConversationChain
from langchain_mistralai import ChatMistralAI
from langchain.memory import ConversationBufferMemory 

In [55]:
chat = ChatMistralAI(
    api_key=mistral_api_key,
    model_name="mistral-large-latest"
)
messages = [
    SystemMessage(
        content="You are a helpful assistant that help the user to plan an optimized itinerary."),
    HumanMessage(content="I'm going to Rome for 2 days, what can I visit?")
]

output = chat(messages)
print(output.content)

That's great! Rome is a city rich in history, art, and culture. With only 2 days, you'll want to focus on the must-see attractions. Here's a suggested itinerary:

### Day 1: Ancient Rome
**Morning:**
1. **Colosseum**: Start your day early to avoid crowds. The Colosseum is one of Rome's most iconic landmarks.
2. **Roman Forum**: Just next to the Colosseum, the Roman Forum was the political and economic hub of the Roman Republic.

**Afternoon:**
3. **Palatine Hill**: Explore the ruins of ancient palaces and temples. It's a short walk from the Roman Forum.
4. **Pantheon**: Head to the historic center to visit this ancient temple, now a church. It's one of the best-preserved ancient Roman buildings.

**Evening:**
5. **Piazza Navona**: Enjoy the beautiful fountains and architecture. It's a great place to relax and people-watch.
6. **Dinner**: Try some authentic Italian cuisine in one of the nearby restaurants.

### Day 2: Vatican City and Historic Sites
**Morning:**
1. **Vatican City**: Sta

In [56]:
# Adding memory 
memory = ConversationBufferMemory() 
conversation = ConversationChain(
    llm=chat, verbose=True, memory=memory
)

print(conversation.run("Hi there!").replace("\\n", "\n"))



[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:

Human: Hi there!
AI:[0m

[1m> Finished chain.[0m
Hello! How are you today? I'm here and ready to chat about all sorts of things. Let's make this conversation interesting! How about I share a fun fact to start? Did you know that a day on Venus is longer than a year on Venus? It takes Venus about 243 Earth days to rotate once on its axis, but it only takes around 225 Earth days for Venus to orbit the Sun. Isn't that amazing? Now, it's your turn to share something or ask me a question.


In [57]:
print(conversation.run("What's the best place to live in Kenya?"))



[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:
Human: Hi there!
AI: Hello! How are you today? I'm here and ready to chat about all sorts of things. Let's make this conversation interesting! How about I share a fun fact to start? Did you know that a day on Venus is longer than a year on Venus? It takes Venus about 243 Earth days to rotate once on its axis, but it only takes around 225 Earth days for Venus to orbit the Sun. Isn't that amazing? Now, it's your turn to share something or ask me a question.
Human: What's the best place to live in Kenya?
AI:[0m

[1m> Finished chain.[0m
That's a great question! Determining the "best" place to live can depend on various factors such as personal prefe

In [58]:
# Interactive chat 
while True: 
    query = input("you: ")
    if query == 'q': 
        break 
    output = conversation({'input': query})
    print('User: ', query)
    print("AI System: ", output['response'])



[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:
Human: Hi there!
AI: Hello! How are you today? I'm here and ready to chat about all sorts of things. Let's make this conversation interesting! How about I share a fun fact to start? Did you know that a day on Venus is longer than a year on Venus? It takes Venus about 243 Earth days to rotate once on its axis, but it only takes around 225 Earth days for Venus to orbit the Sun. Isn't that amazing? Now, it's your turn to share something or ask me a question.
Human: What's the best place to live in Kenya?
AI: That's a great question! Determining the "best" place to live can depend on various factors such as personal preferences, lifestyle, and specific

In [59]:
# Adding non-parametric knowledge 
from langchain_mistralai import MistralAIEmbeddings 
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import FAISS 
from langchain.document_loaders import PyPDFLoader
from langchain.chains import ConversationalRetrievalChain 
from langchain.memory import ConversationBufferMemory 

mistral_embeddings = MistralAIEmbeddings(
    api_key=mistral_api_key,
    model="mistral-embed"
)
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1500,
    chunk_overlap=200 
)
raw_documents = PyPDFLoader("data/Kenya Cloud Policy.pdf").load() 
documents = text_splitter.split_documents(raw_documents)
db = FAISS.from_documents(documents, mistral_embeddings)
memory = ConversationBufferMemory(
    memory_key='chat_history',
    return_messages=True
)

llm = model 
qa_chain = ConversationalRetrievalChain.from_llm(llm,
    retriever=db.as_retriever(),
    memory=memory, 
    verbose=True
)
print(qa_chain.run({'question': 'Give an overview of the kenya cloud policy.'}).replace("\\n", '\n'))




[1m> Entering new StuffDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mSystem: Use the following pieces of context to answer the user's question. 
If you don't know the answer, just say that you don't know, don't try to make up an answer.
----------------
5 
2.3 The need for Cloud Policy  
The cloud computing landscape is robust and needs adequate policy and legal framework to ensure. 
The Kenya's existing policy and legal framework is inadequate and doesn’t address all the issues 
and challenges related to cloud computing hence the need to come up with a comprehensive cloud 
policy.  While general ICT principles are outlined, there may be gaps in addressing the traditional 
on-premise challenges and opportunities associated with cloud computing. The establishment of a 
Cloud Policy in Kenya prese nts an opportunity to address gaps in the existing policy and legal 
framework related to cloud computing. By defining clear ob

Making the bot agentic(from the langchain.agents.agent_toolkits). 
- create_retriever_tool: Creates a custom tool that acts as a retriever for an agent. 
- create_conversational_retrieval_agent: Initializes a conversational agent that is configured to work with retrievers 

In [None]:
from langchain.agents.agent_toolkits import create_retriever_tool, create_conversational_retrieval_agent

tool = create_retriever_tool(
    db.as_retriever(), 
    "Kenya Cloud Policy", 
    "Search and return documents regarding the Kenya Cloud Policy."
)
tools = [tools]

memory = ConversationBufferMemory(
    memory_key='chat_history',
    return_messages=True
)
agent_executor = create_conversational_retrieval_agent(llm, 
    tools, 
    memory_key='chat_history', 
    verbose=True
)
