<h1 style="text-align: center; color: Blue; font-family: verdana; font-size: 40px;">First LLM-based Solution: LangChain & LLaMa</h1>

**Table of contents**<a id='toc0_'></a>    
- [Pre-requisite (Libraries)](#toc1_)    
- [Core Concepts of LangChain:](#toc2_)    
  - [Components and Modules](#toc2_1_)    
  - [Integration with LLMs](#toc2_2_)    
  - [Workflow Management](#toc2_3_)    
- [Key LLM-based applications using LangChain](#toc3_)    
  - [Chatbots and Virtual Assistants](#toc3_1_)    
  - [Retrieval-Augmented Generation (RAG)](#toc3_2_)    
  - [Language Translation Tools](#toc3_3_)    
  - [Sentiment Analysis Tools](#toc3_4_)    
  - [Content Generation](#toc3_5_)    
  - [Code Generation and Assistance](#toc3_6_)    
  - [Personalized Learning Assistants](#toc3_7_)    
  - [Data Analysis and Visualization](#toc3_8_)    
- [Building a Chatbot](#toc4_)    
  - [Initialize the OpenAI Chat Model](#toc4_1_)    
  - [Initialize the LMaMa (Large Language Model Meta AI) Chat Model](#toc4_2_)    
  - [Create a Function for Chatting](#toc4_3_)    
  - [Run the Chatbot](#toc4_4_)    
- [Retrieval-Augmented Generation (RAG)](#toc5_)    
  - [Document Store](#toc5_1_)    
  - [Embedding Model](#toc5_2_)    
  - [Retriever](#toc5_3_)    
  - [Language Model (LLM)](#toc5_4_)    
  - [RAG Chain](#toc5_5_)    
  - [Query Interface](#toc5_6_)    
  - [Example Workflow](#toc5_7_)    

<!-- vscode-jupyter-toc-config
	numbering=false
	anchor=true
	flat=false
	minLevel=1
	maxLevel=6
	/vscode-jupyter-toc-config -->
<!-- THIS CELL WILL BE REPLACED ON TOC UPDATE. DO NOT WRITE YOUR TEXT IN THIS CELL -->

# <a id='toc1_'></a>[Pre-requisite (Libraries)](#toc0_)

1. Setup your virtual environment:

`python3 -m venv .venv`

2. Activate your virtual environment:

`source .venv/bin/activate`

3. install LangChain

`pip install langchain`

4. Install needed Libraries for your LLMs: Integration of LLM models in LangChain. API key should be provided. You can use your OpenAI account to generate a key and pass it as a variable when initiating the OpenAI LLM class. Or you can use Ollama 

If you are using OpenAI, make your API key available

In [1]:
import os
from dotenv import load_dotenv

# load environment variables
load_dotenv()

# access variable

api_key = os.environ.get("my_openai_key")

# print(f'API Key: {api_key}')

In [None]:
from langchain.llms import OpenAI

llm = OpenAI(openai_api_key=api_key)



# <a id='toc2_'></a>[Core Concepts of LangChain:](#toc0_)

LangChain is built on the concept of components and chains. 
1. **<u>Components</u>**: are the reusable modules that perform specific tasks such as processing input data, generating text formats, accessing external information or managing workflows.

2. **<u>Chains</u>**: are sequences of components that work together to form and achieve your goal such as generating text formats, summarizing a document or providing specific tailored recommendations.


## <a id='toc2_1_'></a>[Components and Modules](#toc0_)

In LangChain the terms `components` and `modules` are sometimes used interchangeably. But there is an essential difference between both terms.

1. **<u>Components</u>** are the core building blocks of LangChain. They represent specific tasks or functionalities. Typically small and specific, they are reused across different applications and/or workflows.

2. **<u>Module</u>** combine a set of components to achieve more complex functionalities. LangChain provides standard interfaces for for some of its main modules such as memory and agents modules.

Both components and modules are reusable and can be combines together in workflows. This is called a chain where sequences of components or modules are put together to perform a task.

## <a id='toc2_2_'></a>[Integration with LLMs](#toc0_)

LangChain seamlessly integrates with different LLMs by making available standardized interface. besides the standard connection mechanism to LLMs, LangChain offers multiple features to optimize the use of those models (LLMs) to build language-based applications:

1. **<u>Prompt Management:</u>** LangChain allows to craft effective prompts that help the LLMs understand the task and generate useful response(s).

2. **<u>Dynamic LLM Selection:</u>** It provides the possibility to select the most suitable model for a given task (given the complexity, the accuracy criteria, computation capability etc..).

3. **<u>Memory Management:</u>** LangChain integrates with memory modules allowing LLMs to access and process external information.

4. **<u>Agent-based Management:</u>** LangChain enables the orchestration of complex LLM-based workflows that could adapt te condition changes (such as user needs)

## <a id='toc2_3_'></a>[Workflow Management](#toc0_)

It is the process of orchestrating and controlling the execution of chains and agents to solve specific problem. This deals essentially with managing the data flow, coordinating the execution of components and/or modules, and making sure that the crafted solution is responding effectively to user interactions and changing circumstances.

1. **<u>Chain orchestration:</u>** LangChain coordinates the execution of chains to ensure tasks are performed in the correct order and data is correctly passed between components.

2. **<u>Agent-based management:</u>** The use of agents is simplified with predefined templates and a user-friendly interface.

3. **<u>State management:</u>** LangChain automatically tracks the state of the application, providing developers with a unified interface for accessing and modifying state information.

4. **<u>Concurrency management:</u>** LangChain handles the complexities of concurrent execution, enabling developers to focus on the tasks and interactions without worrying about threading or synchronization issues.

# <a id='toc3_'></a>[Key LLM-based applications using LangChain](#toc0_)

The most popular use cases for LLM-based applications using LangChain, particularly in open-source contexts:

## <a id='toc3_1_'></a>[Chatbots and Virtual Assistants](#toc0_)
Description: These applications leverage LLMs to engage users in natural language conversations, providing support, answering questions, or assisting with tasks.
Example: Customer service bots that can handle inquiries and provide information based on user input.

## <a id='toc3_2_'></a>[Retrieval-Augmented Generation (RAG)](#toc0_)
Description: This approach combines document retrieval with LLM capabilities, allowing users to “chat” with their documents. It retrieves relevant information from a database or document store and generates responses based on that data.
Example: Applications that allow users to ask questions about specific documents, such as PDFs or web pages, and receive contextually relevant answers.

## <a id='toc3_3_'></a>[Language Translation Tools](#toc0_)
Description: LLMs can be used to translate text between languages, providing real-time translation services.
Example: Applications that facilitate communication between speakers of different languages in chat or messaging platforms.

## <a id='toc3_4_'></a>[Sentiment Analysis Tools](#toc0_)
Description: These applications analyze text to determine the sentiment (positive, negative, neutral) expressed within it, useful for businesses to gauge customer feedback.
Example: Tools that analyze social media posts or customer reviews to provide insights into public sentiment about a brand or product.

## <a id='toc3_5_'></a>[Content Generation](#toc0_)
Description: LLMs can generate various types of content, including articles, marketing copy, and creative writing.
Example: Automated blog post generators that create content based on specified topics or keywords.

## <a id='toc3_6_'></a>[Code Generation and Assistance](#toc0_)
Description: LLMs can assist developers by generating code snippets or providing explanations for programming concepts.
Example: Tools that help with debugging or suggest code improvements based on user queries.

## <a id='toc3_7_'></a>[Personalized Learning Assistants](#toc0_)
Description: These applications provide tailored educational content and support based on individual learning needs and preferences.
Example: Tutoring systems that adapt to a student’s progress and provide resources accordingly.

## <a id='toc3_8_'></a>[Data Analysis and Visualization](#toc0_)
Description: LLMs can help analyze data and generate visualizations based on user queries.
Example: Applications that allow users to ask questions about datasets and receive visual representations of the data.

# <a id='toc4_'></a>[Building a Chatbot](#toc0_)

Using LangChain and OpenAI/Llama (Ollama).

Two key components from the LangChain are used to facilitate interactions between users and the language model:

1. HumanMessage
Purpose: This class represents a message sent by a human user in the conversation. It encapsulates the content of the message and any relevant metadata.
Usage: When you create an instance of HumanMessage, you typically pass the user’s input (the text they typed) to it. This helps the model understand that this message is coming from the user and allows it to process the input accordingly.
2. AIMessage
Purpose: This class represents a message generated by the AI (the language model). Similar to HumanMessage, it contains the content of the message along with any associated metadata.
Usage: When the model generates a response, you create an instance of AIMessage with the output text. This distinguishes the AI’s responses from the user’s inputs, maintaining clarity in the conversation flow.

Use OpenAI if you have credit or use Llama open source model by installing Ollama which is a free and open-source tool designed for running large language models (LLMs) locally on your computer [Ollama](https://ollama.com/download)

* Make sure Ollama server is running: `ollama serve`
* Pull the model that you will be using: `ollama pull <model-name>`
* to kill or remove one of the models: `ollama rm <model-name>`

In [None]:
# import required libraries

from langchain.chat_models import ChatOpenAI
from langchain.schema import HumanMessage, AIMessage

## <a id='toc4_1_'></a>[Initialize the OpenAI Chat Model](#toc0_)

In [None]:
llm = ChatOpenAI(model="gpt-3.5-turbo", openai_api_key =api_key )

  llm = ChatOpenAI(model="gpt-3.5-turbo", openai_api_key =api_key )


## <a id='toc4_2_'></a>[Initialize the LMaMa (Large Language Model Meta AI) Chat Model](#toc0_)

In [103]:
from langchain.llms import OpenAI

#initialize the OpenAI model
llm = OpenAI(openai_api_key=api_key)

  llm = OpenAI(openai_api_key=api_key)


In [1]:
from langchain_ollama import ChatOllama  # Updated import statement
from langchain.schema import HumanMessage, AIMessage

# Initialize the LLaMa model
model = ChatOllama(model="llama2")  # Specify the model you want to use

ModuleNotFoundError: No module named 'langchain_ollama'

## <a id='toc4_3_'></a>[Create a Function for Chatting](#toc0_)

In [3]:
conversation_history = []

def chat_with_bot(user_input):
    global conversation_history
    human_message = HumanMessage(user_input)
    # Append the user message to the conversation history
    conversation_history.append(human_message)
    
    # Invoke the model with the conversation history
    response = model.invoke(conversation_history)
    
    # Append the model response to the conversation history
    conversation_history.append(AIMessage(content=response.content))
    
    return response.content

## <a id='toc4_4_'></a>[Run the Chatbot](#toc0_)

In [4]:
# Step 6: Run the Chatbot
import time

# Inside your chat loop
while True:
    user_input = input("You: ")
    if user_input.lower() == 'exit':
        break
    response = chat_with_bot(user_input)
    print(f"Bot: {response}")
    time.sleep(1)  # Wait for 1 second before the next request


Bot: The capital of Tunisia is Tunis.


In [5]:
def chat_with_bot_user(user_input):
    global conversation_history
    # Append the user's message to the conversation history
    conversation_history.append(HumanMessage(content=user_input))
    
    # Prepare the messages for the model
    messages = [msg.content for msg in conversation_history]  # Only get the content
    
    # Generate a response from the model
    response = model.invoke(messages)
    
    # Append the model's response to the conversation history
    conversation_history.append(AIMessage(content=response.content))
    
    # Return formatted output
    return f"You: {user_input}\nBot: {response.content}"


In [6]:
# Step 6: Run the Chatbot
import time

# Inside your chat loop
while True:
    user_input = input("You: ")
    if user_input.lower() == 'exit':
        break
    response = chat_with_bot_user(user_input)
    print(f"{response}")
    time.sleep(1)  # Wait for 1 second before the next request


You: what's the capital of tunisia?
Bot: 
The answer to your question is "Tunis".


# <a id='toc5_'></a>[Retrieval-Augmented Generation (RAG)](#toc0_)

It combines retrieval and generation capabilities to enhance the performance of the language models. The key Building Blocks are:

## <a id='toc5_1_'></a>[Document Store](#toc0_)

1. Purpose: A repository of documents that the model can retrieve information from.
2. Implementation: You can use various document loaders (like TextLoader in Langchain) to load and manage your documents.

## <a id='toc5_2_'></a>[Embedding Model](#toc0_)

1. Purpose: Converts documents and queries into vector representations, allowing for efficient similarity searches.
2. Implementation: Use models like OpenAIEmbeddings to create embeddings for your documents.

## <a id='toc5_3_'></a>[Retriever](#toc0_)

1. Purpose: Retrieves relevant documents based on the user’s query.
2. Implementation: Langchain provides retrievers like SimpleRetriever, which uses the embeddings to find the most relevant documents.

## <a id='toc5_4_'></a>[Language Model (LLM)](#toc0_)

1. Purpose: Generates responses based on the retrieved documents.
2. Implementation: The Llama model (e.g., Llama(model_name='meta-llama/llama-3-7b')) serves as the generative component, taking context from the retrieved documents to produce coherent answers.

## <a id='toc5_5_'></a>[RAG Chain](#toc0_)

1. Purpose: Integrates the retriever and the language model into a single workflow.
2. Implementation: Use LLMChain in Langchain to connect the retriever and the LLM, allowing for seamless querying and response generation.

## <a id='toc5_6_'></a>[Query Interface](#toc0_)

1. Purpose: Allows users to input their queries and receive responses.
2. Implementation: This can be a simple command-line interface or a more complex web interface, depending on your application.

## <a id='toc5_7_'></a>[Example Workflow](#toc0_)

1. User Input: The user submits a query.
2. Retrieval: The retriever fetches relevant documents based on the query.
3. Generation: The LLM generates a response using the retrieved documents as context.
4. Output: The response is presented to the user.

In [84]:
# import necessary libraries
from langchain import LLMChain
from langchain.llms import Ollama
from langchain.document_loaders import TextLoader
from langchain_ollama import OllamaEmbeddings
from langchain.vectorstores import Chroma  # Example vector store
from langchain.chains import StuffDocumentsChain, RetrievalQAWithSourcesChain, LLMChain
from langchain.prompts import PromptTemplate

In [85]:
# Load documents
loader = TextLoader('rag_docs.txt')
documents = loader.load()

In [86]:
# Create an embedding model using Ollama
embedding_model = OllamaEmbeddings(model="mxbai-embed-large")

Do not forget to install ChromaDB, an open-source vector store
`pip install chromadb`

In [87]:
# Create a vector store (ChromaDB is an open-source vector store)
# do not forget to pull the embedding in ollama using the command `ollama pull mxbai-embed-large` in terminal
vector_store = Chroma.from_documents(documents, embedding_model)

In [117]:
# Create a retriever from the vector store
retriever = vector_store.as_retriever()

In [118]:
# Initialize the Ollama model for response generation
ollama_model = Ollama(model="llama2")

In [119]:
# Define a prompt template
prompt_template = PromptTemplate(input_variables=["context", "question"], template="Context: {context}\nQuestion: {question}\nAnswer:")

# Create an LLMChain with the Ollama model and the prompt template
llm_chain = LLMChain(llm=ollama_model, prompt=prompt_template)

# Create a chain to combine documents
combine_documents_chain = StuffDocumentsChain(llm_chain=llm_chain, document_variable_name="context")

# Create the RAG chain using RetrievalQAWithSourcesChain
rag_chain = RetrievalQAWithSourcesChain(combine_documents_chain=combine_documents_chain, retriever=retriever)

In [120]:
# Example query
context = "RAG models combine retrieval and generation for better responses."
question = "What is the significance of RAG models?"

# Generate the prompt
formatted_prompt = prompt_template.format(context=context, question=question)
response = rag_chain(formatted_prompt)

# print prompt
print(f"Prompt:\n{formatted_prompt}")

# Print the answer
print(f"answer: {response['answer']}")  # Access the answer directly
print(f"source: {response['sources']}")  # Optionally print the sources

Number of requested results 4 is greater than number of elements in index 2, updating n_results = 2


Prompt:
Context: RAG models combine retrieval and generation for better responses.
Question: What is the significance of RAG models?
Answer:
answer: The significance of RAG models lies in their ability to combine the strengths of both retrieval systems and generative models, providing more accurate and contextually relevant responses compared to using either approach alone. By retrieving relevant documents from a knowledge base and augmenting them with generative capabilities, RAG models can provide faster and more personalized responses in applications such as chatbots, where understanding user intent and providing precise information is crucial. Additionally, the integration of RAG models into everyday applications will likely enhance user experiences across multiple domains.
source: 


In [101]:
# Retrieve documents manually for debugging
retrieved_docs = retriever.get_relevant_documents(formatted_prompt)
print(f"Retrieved Documents: {retrieved_docs}")

Number of requested results 4 is greater than number of elements in index 2, updating n_results = 2


Retrieved Documents: [Document(metadata={'source': 'rag_docs.txt'}, page_content='# The Significance of RAG Models\n\nRetrieval-Augmented Generation (RAG) models combine the strengths of retrieval systems and generative models. By retrieving relevant documents from a knowledge base, RAG models can provide more accurate and contextually relevant responses. This approach is particularly useful in applications like chatbots, where understanding user intent and providing precise information is crucial.\n\n# Applications of RAG Models\n\nRAG models are widely used in various fields, including customer support, education, and content creation. In customer support, they can quickly retrieve information from FAQs and manuals, allowing for faster response times. In education, RAG models can assist students by providing relevant resources and explanations based on their queries.\n\n# Future of RAG Technology\n\nAs AI technology continues to evolve, RAG models are expected to become more sophisti

<p style="text-align: center;  font-family: verdana; font-size: 16px;">Happy Python Coding & AI Development!</p>

<p style="text-align: right; color: Blue; font-family: verdana; font-size: 20px;">© 2024 Drs. Rafik Borji & Egidio Marotta.</p>