# AutoAssistAI

In [1]:
# Imports
import os
import json
import textwrap
import chromadb
import langchain
import sqlalchemy
import langchain_openai
from langchain_openai import OpenAI
from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain
from langchain.chains import SimpleSequentialChain
from langchain.chains import SequentialChain
from langchain.memory import ConversationBufferMemory
from langchain.chains import ConversationChain
from langchain.memory import ConversationBufferWindowMemory
from langchain.document_loaders import WebBaseLoader
from langchain.indexes import VectorstoreIndexCreator
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import Chroma
from langchain.chains import RetrievalQA
from langchain.chat_models import ChatOpenAI
from langchain.document_loaders import TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import FAISS
import warnings
warnings.filterwarnings('ignore')

USER_AGENT environment variable not set, consider setting it to identify your requests.


## **Getting to Know LangChain**

[Oficial documentation](https://python.langchain.com/docs/get_started/introduction)

LangChain is a powerful framework designed to simplify the development of applications that use language models (LLMs). Its modular structure and versatility allow developers to build a wide range of solutions, from simple automation tasks to complex systems like chatbots, question-and-answer platforms, and more.

---

**What is LangChain?**  
LangChain is an open-source library that bridges the gap between LLMs and real-world applications by enabling seamless integration with various tools, data sources, and workflows. Its goal is to simplify the development process while offering robust capabilities for building intelligent applications.

---

**Core Features of LangChain**

1. **Modularity and Customization**  
   LangChain's modular design allows developers to integrate components like LLMs, prompt templates, memory, and agents. Each component can be customized to meet specific requirements, making the framework flexible and versatile.

2. **Integration with External Data**  
   One of LangChain's key features is Retrieval-Augmented Generation (RAG), enabling applications to retrieve and use external data sources like web content, documents, or APIs to provide more accurate and context-aware responses.

3. **Memory Management**  
   LangChain provides various types of memory, such as buffer memory and conversation memory, allowing applications to maintain context and improve user interactions over time.

4. **Agent Framework**  
   LangChain supports agents capable of dynamically deciding which tools or APIs to use based on user inputs, adding a layer of intelligence to your applications.

5. **Wide Compatibility**  
   It works seamlessly with a variety of LLMs, such as OpenAI's GPT models, Hugging Face transformers, and custom fine-tuned models, ensuring flexibility in choosing the best model for your use case.

---

**Applications of LangChain**

- **Chatbots**: Create intelligent and context-aware conversational agents.  
- **Question-Answer Systems**: Build systems capable of answering domain-specific questions using RAG and external data.  
- **Automated Processes**: Develop tools for summarizing, translating, or analyzing text data.  
- **Custom LLM Solutions**: Fine-tune language models with LangChain to address unique business problems.

---

**Why Use LangChain?**

LangChain simplifies the integration of language models with external tools and data sources, accelerating the development of sophisticated AI-driven applications. Whether you're building a chatbot, a data-powered assistant, or a customized LLM, LangChain offers the tools and flexibility to bring your ideas to life.


---

### **Diving Deeper into LangChain Components**

LangChain’s architecture is built around several core components, each designed to perform a specific function that simplifies the integration and application of large language models (LLMs). Below, we’ll explore these components in detail:

---

**1. Models**  
The **model** is the heart of LangChain. It interacts with the language model (LLM) to generate predictions, completions, or responses.

- **Supported Models:**  
  LangChain supports a wide range of LLMs, including:
  - OpenAI's GPT (e.g., GPT-3.5, GPT-4).
  - Hugging Face Transformers.
  - Open-source models (e.g., Llama, BLOOM, Falcon).
  
- **Customization:**  
  Developers can fine-tune models, adjust hyperparameters, and incorporate specialized pre-trained models for domain-specific tasks.

---

**2. Prompts**  
Prompts define how input is structured and presented to the LLM. Crafting effective prompts is crucial for achieving accurate and relevant responses.

- **Prompt Templates:**  
  LangChain provides tools for creating reusable templates with placeholders for dynamic inputs.  
  Example:  
  ```python
  from langchain.prompts import PromptTemplate

  prompt = PromptTemplate(
      input_variables=["context", "question"],
      template="Use the following context to answer the question:\n\n{context}\n\nQuestion: {question}"
  )
  ```
  
- **Prompt Optimization:**  
  LangChain facilitates testing and iteration of prompts to maximize model performance.

---

**3. Memory**  
Memory allows the system to retain information between interactions, making applications context-aware.

- **Types of Memory:**  
  - **ConversationBufferMemory:** Stores the entire conversation history.  
  - **ConversationSummaryMemory:** Summarizes past interactions to maintain context efficiently.  
  - **VectorStoreRetrieverMemory:** Uses embeddings to retrieve relevant context dynamically.

- **Use Case:**  
  For chatbots, memory ensures that the bot understands and maintains context throughout a conversation.

---

**4. Chains**  
Chains are sequences of operations that transform inputs into outputs. LangChain allows developers to build complex workflows by chaining multiple components together.

- **LLMChain:**  
  The simplest type of chain, consisting of a prompt and an LLM.  
  Example:  
  ```python
  from langchain.chains import LLMChain
  from langchain.llms import OpenAI

  llm = OpenAI(model="gpt-4")
  chain = LLMChain(llm=llm, prompt=prompt)
  response = chain.run({"context": "AI is transforming industries.", "question": "How is it used in healthcare?"})
  ```
  
- **Sequential Chains:**  
  Combine multiple chains to perform more complex tasks, such as summarization followed by question-answering.

---

**5. Tools and Agents**  
Agents are decision-makers that dynamically decide which tools to use based on user input. Tools provide external capabilities, such as searching the web or accessing APIs.

- **Tools:**  
  Common tools include:
  - **Web Search:** Retrieve real-time information.
  - **Calculators:** Perform mathematical computations.
  - **Databases:** Query structured or unstructured data.

- **Agents:**  
  Agents use prompts to decide which tool to invoke and how to handle responses.  
  Example: An agent might search the web for information if a question cannot be answered using the LLM alone.

---

**6. Data Connectors**  
LangChain supports **Retrieval-Augmented Generation (RAG)** by integrating with external data sources. This makes LLMs more powerful and capable of providing accurate, context-specific answers.

- **Data Sources:**  
  - **Vector Databases:** Pinecone, Weaviate, FAISS.  
  - **Document Loaders:** PDFs, Excel files, web scraping.  
  - **APIs:** Integrate third-party APIs for live data retrieval.

- **Embedding Models:**  
  LangChain allows embeddings to be generated for indexing and searching data. This ensures relevant information is retrieved efficiently.

---

**7. Evaluation**  
LangChain includes tools for evaluating and debugging applications to ensure they meet performance requirements.

- **Human-in-the-Loop (HITL):**  
  Involve human evaluators to assess the quality of responses.  
- **Automated Evaluation:**  
  Use metrics like BLEU, ROUGE, or accuracy to measure performance.

---

**8. Deployment**  
LangChain applications can be deployed on various platforms, making them scalable and production-ready.

- **Cloud Platforms:** AWS, GCP, Azure.  
- **Dockerization:** Containerize LangChain apps for easy deployment.  
- **Integration with APIs:** Expose the functionality as RESTful APIs for external use.

---

**9. Advanced Features**  
- **Streaming:** LangChain supports streaming responses for real-time applications like live chat interfaces.  
- **Callbacks:** Monitor and log the internal workflow of chains and agents for debugging or tracking.

---

**Why These Components Matter**  
Each component is modular and can be independently configured, allowing developers to:
- Customize solutions for specific use cases.
- Scale applications without overhauling existing structures.
- Ensure high performance and efficiency by leveraging the best tools and integrations.

---

Would you like me to focus on a specific component, or provide an example project that ties these components together?

---

## Defining the LLM

In [2]:
# Adding the API key
with open('../ignore/secret_key.json') as f:
    os.environ['OPENAI_API_KEY'] = json.load(f)['secret_key']
    

# Defines the LLM
# Creates an instance of a Large Language Model (LLM), specifically one provided by OpenAI
llm = OpenAI(temperature=0.9)

Temperature is a hyperparameter that influences the randomness of the responses generated by the model. A higher temperature value (usually ranging from 0 to 1) promotes more creative and varied responses. On the other hand, a lower temperature tends to cause the model to produce more deterministic and possibly more predictable responses.

In [3]:
# Send the prompt to LLM and capture the response
nome = llm.invoke("I want to open a Japanese food restaurant. Suggest a fancy name for it.")
print(nome)



"Sakura Delights"


In this context, the string “I want to open a Japanese food restaurant. Suggest a fancy name for it.” serves as the prompt or input to the language model. It describes the task the user wants the model to perform: creatively generating a name for a new Japanese food restaurant. The model will use its natural language training and prior knowledge to generate a response that meets this request.

---

## Using Prompt Templates

Prompt Templates in the context of LangChain refer to structured ways of formatting input to large language models (LLMs) to improve their performance and adherence to desired behaviors.

A prompt template defines a template sequence with placeholder variables that can be populated dynamically. This allows you to construct prompts in a consistent and programmatic manner, rather than hard-coding full prompts.

Prompt templates in LangChain provide a structured and extensible way to interface with LLMs, making it easy to explore and optimize prompting strategies to improve language model performance on specific tasks or domains.

In [4]:
# Set the prompt template
prompt_template_name = PromptTemplate(
    input_variables = ['cuisine'],
    template = "I want to open a {cuisine} restaurant. Suggest a fancy name for it."
)

The above line of code defines a PromptTemplate, a framework that allows you to create dynamic prompts for use with Large Language Models (LLMs). This approach is particularly useful when you want to generate custom prompts based on specific variables or when you want to reuse a prompt format with different data sets.

**input_variables = ['cuisine']**: Defines a list of variables that can be used to populate the template. In this case, there is a single variable called 'cuisine'. This variable acts as a placeholder that will be replaced with a specific value when the template is used.

In [5]:
# Use the previously defined template to generate a specific prompt,
# inserting the value "Italiana" in place of the variable culinary
p = prompt_template_name.format(cuisine = "Mexican")
print(p)

I want to open a Mexican restaurant. Suggest a fancy name for it.


## Operation Sequences with LLMChain

Chains in LangChain are sequences of operations that can process inputs and generate outputs by combining multiple components, including large language models (LLMs), other chains, and specialized tools or utilities.

An LLMChain is a type of chain that allows you to interact with a large language model (LLM) in a structured way. It provides a simple interface for passing inputs to the LLM and retrieving its outputs.

The LLMChain serves as a building block for many other constructs in LangChain, such as agents, tools, and more advanced chain types. By encapsulating the LLM interaction logic in a reusable and extensible component, LLMChain simplifies the process of building applications that leverage large language models.

In [6]:
# Create the chain and activate verbose
chain = LLMChain(llm = llm, prompt = prompt_template_name, verbose = True)

# Invoke the chain by passing a parameter to the prompt
chain.invoke("Brazilian")



[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mI want to open a Brazilian restaurant. Suggest a fancy name for it.[0m

[1m> Finished chain.[0m


{'cuisine': 'Brazilian', 'text': '\n"Sabor do Brasil" (Taste of Brazil)'}

The above line of code creates an instance of LLMChain, a class designed to chain or sequence operations using an LLM. This instance is configured to use a specific language model and a predefined prompt template.

In [7]:
# Create the chain and activate verbose
chain = LLMChain(llm = llm, prompt = prompt_template_name, verbose = True)

# Invoke the chain by passing a parameter to the prompt
chain.invoke("Thai")



[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mI want to open a Thai restaurant. Suggest a fancy name for it.[0m

[1m> Finished chain.[0m


{'cuisine': 'Thai', 'text': '\n\n"Lotus Blossom Thai Bistro"'}

## Simple Sequential Chain

A SimpleSequentialChain in LangChain is a chain type that executes a sequence of components (e.g. LLMs, tools, other chains) in a predefined order. It is one of the most basic and commonly used chain types in LangChain.

A sample use case for SimpleSequentialChain could be a question answering system where:

- The first component is an LLM that analyzes the input question.
- The second component is a tool that retrieves relevant documents from a database.
- The third component is another LLM that generates an answer based on the question and the retrieved documents.

By chaining these components together into a SimpleSequentialChain, you can create a more complex and capable system while maintaining a modular and extensible architecture.

While SimpleSequentialChain is useful for linear workflows, LangChain also provides other chain types such as ConditionalChain and SequentialChain for more complex control flows and branching logic.

In [8]:
# Sets LLM with lower temperature
llm = OpenAI(temperature = 0.6)

# Create the prompt template
prompt_template_name = PromptTemplate(
    input_variables =['cuisine'],
    template = "I want to open a {cuisine} restaurant. Suggest a fancy name for it.")


# Create the chain
chain_1 = LLMChain(llm = llm, prompt = prompt_template_name)


# Create another prompt template
prompt_template_items = PromptTemplate( input_variables = ['restaurant name'], template = """Suggest some menu items for {restaurant_name}""")


# Create the chain
chain_2 = LLMChain(llm = llm, prompt = prompt_template_items, verbose=True)


# Concatenates the two chains
chain_final = SimpleSequentialChain(chains = [chain_1, chain_2])


# Invoca a chain
print(chain_final.invoke("Indiana"))



[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mSuggest some menu items for 

"Hoosier's Hearth"[0m

[1m> Finished chain.[0m
{'input': 'Indiana', 'output': '\n\n1. Fried Pork Tenderloin Sandwich with Homemade Potato Chips\n2. Indiana Corn Chowder served in a Bread Bowl\n3. Hoosier Fried Chicken with Mashed Potatoes and Gravy\n4. Biscuits and Gravy with Sausage Patties\n5. Indiana Pork BBQ Ribs with Corn on the Cob\n6. Sweet Corn Fritters with Maple Syrup\n7. Indiana Farmhouse Salad with Local Greens and Veggies\n8. Hoosier Meatloaf with Roasted Root Vegetables\n9. Indiana Apple Pie with Vanilla Ice Cream\n10. Hoosier Pork and Beans Casserole\n11. Indiana Fried Catfish with Hushpuppies\n12. Hoosier Beef Stew with Cornbread\n13. Indiana Sweet Potato Casserole with Pecan Streusel Topping\n14. Hoosier Hoagie Sandwich with Ham, Turkey, and Provolone Cheese\n15. Indiana Maple Syrup Glazed Ham with Scalloped Potatoes.'}


## Sequential Chain

SequentialChain is a more advanced version of SimpleSequentialChain. While SimpleSequentialChain executes a fixed sequence of components, SequentialChain allows dynamic and conditional execution of components based on the outputs of previous components.

A sample use case for SequentialChain could be a conversational agent that:

- Uses an LLM to understand user input and determine the appropriate action.

- Conditionally executes different components (e.g., database lookup, API call, calculation) based on the output of the LLM.

- Optionally prompts the user for additional information if needed.

- Generates a final response using another LLM, based on the outputs of previous components.

By leveraging SequentialChain, you can build more intelligent and adaptive applications that can dynamically adjust their behavior based on intermediate results and states.

In [9]:
# Define the LLM
llm = OpenAI(temperature = 0.7)


# Creating the first chain

# Define the prompt template
prompt_template_name = PromptTemplate(
    input_variables = ['cuisine'],
    template = "I want to open a {cuisine} restaurant. Suggest a fancy name for it.")

# Define the chain with an output parameter
chain_1 = LLMChain(llm = llm, prompt = prompt_template_name, output_key = "restaurant_name")

In [10]:
# Creating the second chain

# Define the prompt template
prompt_template_items = PromptTemplate(
    input_variables = ['restaurant_name'],
    template = "Suggest some menu items for {restaurant_name}."
)

# Define the chain with an output parameter
chain_2 = LLMChain(llm = llm, prompt = prompt_template_items, output_key = "menu_items")

In [11]:
# Create the sequence of chains
chain = SequentialChain(chains = [chain_1, chain_2],
                        input_variables = ['cuisine'],
                        output_variables = ['restaurant_name', "menu_items"])

In [12]:
chain.invoke({"cuisine": "Italian"})

{'cuisine': 'Italian',
 'restaurant_name': '\n\n"La Dolce Vita Trattoria" ',
 'menu_items': '\n\n1. Antipasto platter with a variety of cured meats, cheeses, olives, and roasted vegetables\n2. Caprese salad with fresh mozzarella, tomatoes, and basil\n3. Homemade gnocchi with a choice of marinara, pesto, or creamy alfredo sauce\n4. Chicken or veal piccata served with lemon-caper sauce\n5. Seafood risotto with shrimp, scallops, and clams\n6. Eggplant Parmesan with marinara sauce and melted mozzarella cheese\n7. Wood-fired pizza with a choice of toppings such as prosciutto, arugula, and truffle oil\n8. Linguine alle vongole (linguine with clams) in a white wine and garlic sauce\n9. Osso buco (braised veal shank) with gremolata and creamy polenta\n10. Tiramisu or cannoli for dessert.'}

In [13]:
# Invoking the method and capturing the response
response = chain.invoke({"cuisine": "Italian"})

# Preparing the formatted string
formatted_output = f"Cuisine: {response['cuisine']}\nRestaurant Name: {response['restaurant_name'].strip()}\nMenu Items:"

# Adding each menu item to the formatted string
menu_items = response['menu_items'].strip().split('\n')
for item in menu_items:
    formatted_output += f"\n{item}"

# Displaying the formatted output
print(formatted_output)

Cuisine: Italian
Restaurant Name: "Bella Vita Trattoria"
Menu Items:
1. Antipasto platter: a selection of cured meats, cheeses, olives, and grilled vegetables
2. Caprese salad: fresh mozzarella, tomatoes, and basil drizzled with balsamic glaze
3. Risotto al Funghi: creamy risotto with wild mushrooms
4. Pollo Marsala: chicken breast sautéed with mushrooms and Marsala wine sauce
5. Gnocchi al Pesto: potato dumplings in a homemade pesto sauce
6. Lasagna Bolognese: layers of pasta, Bolognese sauce, and béchamel cheese
7. Linguine Frutti di Mare: linguine pasta with shrimp, scallops, mussels, and clams in a white wine sauce
8. Eggplant Parmigiana: breaded eggplant topped with marinara sauce and mozzarella cheese
9. Saltimbocca alla Romana: thin sliced veal, prosciutto, and sage in a white wine sauce
10. Tiramisu: traditional Italian dessert made with ladyfingers, espresso, and mascarpone cheese.


## Building Memory for LLM

In LangChain, “Memory” refers to components that allow chains, agents, and other constructs to store and retrieve information about previous inputs, outputs, and intermediate states. This allows them to maintain context and make use of relevant information from the history of previous conversations or computations.

In [14]:
# Define the chain
chain = LLMChain(llm = llm, prompt = prompt_template_name)

In [15]:
# Invoke the chain
name = chain.invoke("Mexican")
print(name)

{'cuisine': 'Mexican', 'text': '\n\n"El Sabor del Sol" (The Flavor of the Sun)'}


In [16]:
# Invoke the chain
name = chain.invoke("Argentina")
print(name)

{'cuisine': 'Argentina', 'text': '\n\n"Tango Gastronomía"'}


In [17]:
# Creating the memory object
memory = ConversationBufferMemory()

In [18]:
chain = LLMChain(llm = llm, prompt = prompt_template_name, memory = memory)

name = chain.run("Mexican")
print(name)



"Casa de Sabor" 


In [19]:
name = chain.run("Argentina")
print(name)



"Tango Gastronomía"


In [20]:
print(chain.memory.buffer)

Human: Mexican
AI: 

"Casa de Sabor" 
Human: Argentina
AI: 

"Tango Gastronomía"


## Conversation Chain

A ConversationChain is a specialized type of chain designed to handle multi-turn conversations or dialogs with an LLM.

A ConversationChain is particularly useful for building conversational agents, chatbots, or any application that requires maintaining context across multiple turns of interaction with a user. By abstracting away the complexities of managing conversation history and formatting prompts, a ConversationChain simplifies the process of building multi-turn dialog systems with LLMs.

In [21]:
# Creates the conversation object
conv = ConversationChain(llm = OpenAI(temperature = 0.7))

print(conv.prompt.template)

The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:
{history}
Human: {input}
AI:


In [22]:
conv.invoke("Which country has won the Football World Cup the most times?")

{'input': 'Which country has won the Football World Cup the most times?',
 'history': '',
 'response': ' Brazil has won the Football World Cup the most times, with a total of five wins. They won in 1958, 1962, 1970, 1994, and 2002. They also hold the record for the most consecutive wins, with three victories in a row in 1958, 1962, and 1970. Germany and Italy are tied for second place with four wins each. Germany won in 1954, 1974, 1990, and 2014, while Italy won in 1934, 1938, 1982, and 2006.'}

In [23]:
conv.invoke("What is 30 + 12?")

{'input': 'What is 30 + 12?',
 'history': 'Human: Which country has won the Football World Cup the most times?\nAI:  Brazil has won the Football World Cup the most times, with a total of five wins. They won in 1958, 1962, 1970, 1994, and 2002. They also hold the record for the most consecutive wins, with three victories in a row in 1958, 1962, and 1970. Germany and Italy are tied for second place with four wins each. Germany won in 1954, 1974, 1990, and 2014, while Italy won in 1934, 1938, 1982, and 2006.',
 'response': '  30 + 12 is equal to 42.'}

In [24]:
conv.invoke("Who is the greatest scorer in the history of the Football World Cup?")

{'input': 'Who is the greatest scorer in the history of the Football World Cup?',
 'history': 'Human: Which country has won the Football World Cup the most times?\nAI:  Brazil has won the Football World Cup the most times, with a total of five wins. They won in 1958, 1962, 1970, 1994, and 2002. They also hold the record for the most consecutive wins, with three victories in a row in 1958, 1962, and 1970. Germany and Italy are tied for second place with four wins each. Germany won in 1954, 1974, 1990, and 2014, while Italy won in 1934, 1938, 1982, and 2006.\nHuman: What is 30 + 12?\nAI:   30 + 12 is equal to 42.',
 'response': ' The greatest scorer in the history of the Football World Cup is Miroslav Klose from Germany. He has scored a total of 16 goals in four World Cup tournaments (2002, 2006, 2010, and 2014). He surpassed the previous record holder, Brazilian player Ronaldo, who had 15 goals. Klose also holds the record for the most goals scored in a single World Cup tournament, with

In [25]:
conv.invoke("What was the first question I asked?")

{'input': 'What was the first question I asked?',
 'history': 'Human: Which country has won the Football World Cup the most times?\nAI:  Brazil has won the Football World Cup the most times, with a total of five wins. They won in 1958, 1962, 1970, 1994, and 2002. They also hold the record for the most consecutive wins, with three victories in a row in 1958, 1962, and 1970. Germany and Italy are tied for second place with four wins each. Germany won in 1954, 1974, 1990, and 2014, while Italy won in 1934, 1938, 1982, and 2006.\nHuman: What is 30 + 12?\nAI:   30 + 12 is equal to 42.\nHuman: Who is the greatest scorer in the history of the Football World Cup?\nAI:  The greatest scorer in the history of the Football World Cup is Miroslav Klose from Germany. He has scored a total of 16 goals in four World Cup tournaments (2002, 2006, 2010, and 2014). He surpassed the previous record holder, Brazilian player Ronaldo, who had 15 goals. Klose also holds the record for the most goals scored in a

In [26]:
print(conv.memory.buffer)

Human: Which country has won the Football World Cup the most times?
AI:  Brazil has won the Football World Cup the most times, with a total of five wins. They won in 1958, 1962, 1970, 1994, and 2002. They also hold the record for the most consecutive wins, with three victories in a row in 1958, 1962, and 1970. Germany and Italy are tied for second place with four wins each. Germany won in 1954, 1974, 1990, and 2014, while Italy won in 1934, 1938, 1982, and 2006.
Human: What is 30 + 12?
AI:   30 + 12 is equal to 42.
Human: Who is the greatest scorer in the history of the Football World Cup?
AI:  The greatest scorer in the history of the Football World Cup is Miroslav Klose from Germany. He has scored a total of 16 goals in four World Cup tournaments (2002, 2006, 2010, and 2014). He surpassed the previous record holder, Brazilian player Ronaldo, who had 15 goals. Klose also holds the record for the most goals scored in a single World Cup tournament, with 5 goals in 2006. He retired from 

---

## Conversation Buffer Window Memory

ConversationBufferWindowMemory is a type of LangChain memory component designed specifically for use with ConversationChain. It provides a way to store and retrieve conversation history while limiting the amount of context retained based on a specified window size.

The main advantage of ConversationBufferWindowMemory is its ability to limit the amount of context provided to the LLM, which can be important for performance and to prevent the model from becoming overwhelmed with too much irrelevant information. By adjusting the window size, you can control the tradeoff between providing enough context and avoiding excessive computational overhead.

This type of memory is particularly useful for building conversational agents, chatbots, or any application that requires maintaining a continuous window of relevant recent conversation history for context.

In [27]:
# Set the memory window
memory = ConversationBufferWindowMemory(k = 1)

In [28]:
# Create the conversation chain
conv = ConversationChain(llm = OpenAI(temperature = 0.7), memory = memory)

In [29]:
# Invoke LLM
conv.run("Who won the first Football World Cup?")

" The first Football World Cup was won by Uruguay in 1930. They defeated Argentina in the final match with a score of 4-2. The tournament was held in Uruguay and was organized by FIFA. It was a 13-team competition, with France, Belgium, Yugoslavia, Romania, Brazil, Bolivia, Peru, Chile, Paraguay, and Mexico also participating. Uruguay's victory was hailed as a triumph for the host nation, and the tournament was seen as a success overall. Are there any other questions you have about the history of the Football World Cup?"

In [30]:
# Invoke LLM
conv.invoke("What is 10 + 19?")

{'input': 'What is 10 + 19?',
 'history': "Human: Who won the first Football World Cup?\nAI:  The first Football World Cup was won by Uruguay in 1930. They defeated Argentina in the final match with a score of 4-2. The tournament was held in Uruguay and was organized by FIFA. It was a 13-team competition, with France, Belgium, Yugoslavia, Romania, Brazil, Bolivia, Peru, Chile, Paraguay, and Mexico also participating. Uruguay's victory was hailed as a triumph for the host nation, and the tournament was seen as a success overall. Are there any other questions you have about the history of the Football World Cup?",
 'response': '  10 + 19 is equal to 29. Is there anything else you would like to know?'}

In [31]:
# Invoke LLM
conv.invoke("Who was the captain of the winning team of the first Football World Cup?")

{'input': 'Who was the captain of the winning team of the first Football World Cup?',
 'history': 'Human: What is 10 + 19?\nAI:   10 + 19 is equal to 29. Is there anything else you would like to know?',
 'response': ' The first Football World Cup was held in 1930 in Uruguay. The winning team was Uruguay, and their captain was José Nasazzi. He was a defender and played for the Uruguayan club Nacional. Is there anything else you would like to know?'}

In [32]:
print(conv.memory.buffer)

Human: Who was the captain of the winning team of the first Football World Cup?
AI:  The first Football World Cup was held in 1930 in Uruguay. The winning team was Uruguay, and their captain was José Nasazzi. He was a defender and played for the Uruguayan club Nacional. Is there anything else you would like to know?


## LangChain and VectorDB for Web Scraping

ChromaDB is a vector storage database library that integrates with LangChain. It provides functionality for efficiently storing, retrieving, and searching large amounts of text data using vector embeddings and semantic similarity.

https://www.trychroma.com/

https://pypi.org/project/chromadb/

In [80]:
# Web data extraction
data = WebBaseLoader(
    # "https://www.nzherald.co.nz/world/donald-trumps-tariff-threat-forces-colombia-to-accept-deportees-on-military-planes/7WA36M4OQBFYVHM4XLAUAN2BE4/"
    "https://www.bbc.com/future/article/20250124-how-to-carry-more-than-your-own-bodyweight"
)

Note: Always check a website's robots.txt before scraping data. Don't scrape if it's not allowed!

In [81]:
# Load the documents
documents = data.load()

In [82]:
len(documents)

1

In [83]:
# Extract the first document (in this case there is only one document)
document = documents[0]

In [84]:
# Dictionary keys
document.__dict__.keys()

dict_keys(['id', 'metadata', 'page_content', 'type'])

In [85]:
# Display the first 100 characters
document.page_content[:100]

'Springy poles and forehead straps: How to carry more than your own bodyweightSkip to contentBritish '

In [86]:
# Metadata
document.metadata

{'source': 'https://www.bbc.com/future/article/20250124-how-to-carry-more-than-your-own-bodyweight',
 'title': 'Springy poles and forehead straps: How to carry more than your own bodyweight',
 'description': "Some communities have developed techniques to help them carry heavier loads. Here's what we can learn from them.",
 'language': 'en-GB'}

In [87]:
# Function to print formatted result
def print_response(response: str):
    print("\n".join(textwrap.wrap(response, width = 100)))

In [88]:
# Create the index using VectorstoreIndexCreator
index_creator = VectorstoreIndexCreator(
    embedding=OpenAIEmbeddings(),  # Define the embeddings
)
vectorstore = index_creator.from_loaders([data])  # Create the index from the loader

# Define the LLM for querying
llm = OpenAI(temperature=0)

# Create a RetrievalQA chain
retrieval_chain = RetrievalQA.from_chain_type(
    llm=llm,
    retriever=vectorstore.vectorstore.as_retriever(),  # Use the retriever from the vectorstore
    return_source_documents=True,
)

# Query the index
query = "How to carry more than your own bodyweight?"
response = retrieval_chain({"query": query})

# Print the result and sources
print("Answer:", response["result"])
print("\nSource Documents:")

print_response(response=response['result'])


Answer:  Some communities, such as the rural farm workers of Vietnam, have developed techniques to help them carry heavier loads. These techniques include using springy poles and forehead straps to distribute the weight and make it easier to carry. Additionally, some engineers have developed spring-loaded or "floating" backpacks to ease the force of loads on the back and shoulders. Military personnel also often carry loads that exceed their own bodyweight, and have developed techniques and equipment to make it easier, such as using straps and distributing the weight evenly. The sherpa method of carrying heavy loads has also been studied and found to be a combination of weight training and cardio, which helps build endurance and strength for carrying heavy loads.

Source Documents:
 Some communities, such as the rural farm workers of Vietnam, have developed techniques to help them
carry heavier loads. These techniques include using springy poles and forehead straps to distribute
the wei

---

### Let's work with VectorDB

In [98]:
# Create a template
template = """
{context}

Please answer considering the most modern methods you know.

Question: {question}
Answer:"""

In [99]:
# Create the prompt
prompt = PromptTemplate(template = template, input_variables = ["context", "question"])

In [100]:
# Print the prompt to view the format
print(prompt.format(
    context = "You're a specialist",
    question = "How can I become stronger?",)
)


You're a specialist

Please answer considering the most modern methods you know.

Question: How can I become stronger?
Answer:


In [101]:
# Create the embeddings object
embeddings = OpenAIEmbeddings()

The above line of code refers to initializing an instance of the OpenAIEmbeddings class, which is an interface for generating embeddings (vector representations) using models provided by OpenAI, such as the GPT language models. Embeddings are transformations of raw data, such as text, into vectors of fixed numbers, capturing semantic and contextual aspects of the original content in a way that can be processed by machine learning algorithms.

In [102]:
# Creates VectorDB by converting text documents into numeric representations (embeddings)
db = Chroma.from_documents(documents, embeddings)

The above line of code is for creating a vector database using Chroma. This operation involves preparing a data structure optimized for searches and analysis based on documents and their respective embeddings.

In [103]:
type(db)

# Arguments
chain_type_kwargs = {"prompt": prompt}

# Chain de RetrievalQA
chain = RetrievalQA.from_chain_type(llm = ChatOpenAI(temperature = 0),
                                    chain_type = "stuff",
                                    retriever = db.as_retriever(search_kwargs = {"k": 1}),
                                    chain_type_kwargs = chain_type_kwargs)

The above line of code is creating an instance called chain, using the RetrievalQA class to configure a process chain focused on performing Question Answering (QA) tasks based on information retrieval. The from_chain_type method is used to specify the type of process chain and configure its main components, such as the language model and the retrieval engine.

In [105]:
# Query
query = "What makes a person stronger, according to the text?"

# Response
response = chain.invoke(query)
print(response)

{'query': 'What makes a person stronger, according to the text?', 'result': 'According to the text, a person can become stronger by engaging in strength training, building muscle mass, and incorporating progressive weight training into their exercise routine. Additionally, combining cardio and strength training, as well as focusing on technique and gradually increasing loads, can help improve strength and carry heavier loads.'}


---

## Building Sales Expert Chatbot with LangChain and LLM

In [49]:
# Defining LLM
gpt = ChatOpenAI(temperature = 0)

In [50]:
# Template
template = """This is a conversation between a customer and a sports car sales specialist.

You are the car specialist, you know sports cars well and you should always answer
as accurately as possible.

Current conversation:
{history}
Human: {input}
CarSpecialist:"""

In [51]:
# Create the prompt template
prompt = PromptTemplate(input_variables = ["history", "input"], template = template)

In [52]:
# Create the conversation chain
conversation = ConversationChain(prompt = prompt,
                                 llm = gpt,
                                 verbose = False,
                                 memory = ConversationBufferMemory(ai_prefix = "CarSpecialist"))

In [53]:
# Conversation loop limited to 5 interactions (increase the number of interactions or remove the if block)

# Initialize the counter
counter = 0

# Loop
while True:

    prompt = input(prompt = "Customer: ")
    print()
    result = conversation(prompt)
    print_response("Expert: " + result["response"])
    print()

    counter += 1

    if counter >= 5:
        print('\nThanks for Using the AI-Based Customer Service System!')
        break


Expert: Hello! How can I assist you today?


Expert: I'm looking to purchase a sports car. Can you recommend any models that are known for their
performance and handling?


Expert: Absolutely! Some popular sports car models known for their performance and handling are the
Porsche 911, Chevrolet Corvette, BMW M3, and Nissan GT-R. Each of these cars offers a unique driving
experience and top-notch performance capabilities. Do you have a specific budget or preference in
mind?


Expert: I'm looking for something with a budget of around $50,000. Do you have any recommendations
within that price range?


Expert: For a budget of around $50,000, I would recommend looking into the Ford Mustang GT, Subaru
WRX STI, or the Chevrolet Camaro SS. These models offer great performance and handling at a more
affordable price point. Would you like to schedule a test drive or learn more about any of these
options?


Thanks for Using the AI-Based Customer Service System!
