# Introduction

This notebook contains all the code you need to create your own chatbot with a custom knowledge base using LlamaIndex and OpenAI's GPT models.

Follow the instructions for each step and run the code samples by pressing the "play" button near each code cell.

# Download the data for your custom knowledge base

For this notebook, you should place your custom data in a local folder (e.g., `custom_data/`). Alternatively, if you want to use example data, you can clone a sample repository or prepare your own dataset. Replace `custom_data/` with the path to your data directory in the code below.

# Install the dependencies

Run the code below to install the required dependencies for our chatbot.

In [1]:
!pip install llama-index
!pip install langchain
!pip install openai
!pip install pandas



# Define the functions

The following code defines the functions to construct the index and query it using LlamaIndex and OpenAI.

In [2]:
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.core.settings import Settings
from llama_index.llms.openai import OpenAI
from IPython.display import Markdown, display
import os

def construct_index(directory_path):
    # Configure LLM and embeddings
    Settings.llm = OpenAI(model="gpt-3.5-turbo", temperature=0.5)
    Settings.chunk_size = 600
    Settings.chunk_overlap = 20

    # Load documents from the directory
    documents = SimpleDirectoryReader(directory_path).load_data()

    # Create and save the index
    index = VectorStoreIndex.from_documents(documents)
    index.storage_context.persist(persist_dir="storage")

    return index

def ask_ai():
    # Load the index from disk
    from llama_index.core import StorageContext, load_index_from_storage
    storage_context = StorageContext.from_defaults(persist_dir="storage")
    index = load_index_from_storage(storage_context)

    # Create a query engine
    query_engine = index.as_query_engine()

    while True:
        query = input("What do you want to ask? (Type 'exit' to quit) ")
        if query.lower() == 'exit':
            break
        response = query_engine.query(query)
        display(Markdown(f"Response: <b>{response}</b>"))

# Set OpenAI API Key

You need an OpenAI API key to run this code. If you don't have one, sign up at [OpenAI](https://platform.openai.com/overview), go to your account settings, and create an API key under "View API Keys".

Run the code below and paste your API key into the text input.

In [None]:
os.environ["OPENAI_API_KEY"] = input("Paste your OpenAI key here and hit enter: ")

# Construct an index

Now we are ready to construct the index. This will take every file in your custom data folder, split it into chunks, and embed it using OpenAI's embeddings API.

**Notice**: Running this code will incur costs on your OpenAI account (approximately $0.02 per 1,000 tokens for embeddings). Ensure you have sufficient credits in your OpenAI account.

In [None]:
construct_index("custom_data")  # Replace 'custom_data' with your data directory path

# Ask questions

Run the function below to query your chatbot. Type your question into the input, and type 'exit' to stop.

If your custom data is similar to the original example (about cooking), you can try these questions:
1. Why do people cook at home? Make a classification.
2. What frustrates people about cooking? Make a classification.
3. Brainstorm marketing campaign ideas for an air fryer targeting home cooks.
4. Which kitchen appliances do people use most often?
5. What do people like about cooking at home?

Otherwise, ask questions relevant to your custom knowledge base.

In [None]:
ask_ai()

# Comparison of Original and Updated Code

The following table compares the original notebook code with this updated version, highlighting deprecated components and changes made to modernize the chatbot.

In [None]:
import pandas as pd
from IPython.display import display, HTML

# Define the data for the comparison chart
data = [
    {
        "Component": "LlamaIndex Version",
        "Original Code": "llama-index==0.5.6",
        "Updated Code": "Latest llama-index (e.g., 0.10.x)",
        "Reason for Change": "Older version deprecated; newer versions have updated APIs and better performance."
    },
    {
        "Component": "LangChain Version",
        "Original Code": "langchain==0.0.148",
        "Updated Code": "Latest langchain (e.g., 0.2.x)",
        "Reason for Change": "Outdated version; newer versions support modern LLM integrations."
    },
    {
        "Component": "OpenAI Library",
        "Original Code": "openai>=0.26.4 (via llama-index dependencies)",
        "Updated Code": "Latest openai (e.g., 1.x.x)",
        "Reason for Change": "Updated for compatibility with current OpenAI APIs."
    },
    {
        "Component": "Index Class",
        "Original Code": "GPTSimpleVectorIndex",
        "Updated Code": "VectorStoreIndex",
        "Reason for Change": "GPTSimpleVectorIndex deprecated; VectorStoreIndex is the modern equivalent."
    },
    {
        "Component": "LLM Configuration",
        "Original Code": "LLMPredictor(llm=OpenAI(model=\"text-davinci-003\"))",
        "Updated Code": "Settings.llm = OpenAI(model=\"gpt-3.5-turbo\")",
        "Reason for Change": "LLMPredictor deprecated; Settings simplifies LLM setup; text-davinci-003 retired."
    },
    {
        "Component": "Prompt Helper",
        "Original Code": "PromptHelper(max_input_size, num_outputs, max_chunk_overlap, chunk_size_limit)",
        "Updated Code": "Settings.chunk_size, Settings.chunk_overlap",
        "Reason for Change": "PromptHelper removed; chunking configured via Settings."
    },
    {
        "Component": "Service Context",
        "Original Code": "ServiceContext.from_defaults(llm_predictor, prompt_helper)",
        "Updated Code": "Removed; uses Settings",
        "Reason for Change": "ServiceContext deprecated; Settings is the new configuration mechanism."
    },
    {
        "Component": "Index Persistence",
        "Original Code": "index.save_to_disk('index.json')",
        "Updated Code": "index.storage_context.persist(persist_dir=\"storage\")",
        "Reason for Change": "New persistence API for better storage management."
    },
    {
        "Component": "Index Loading",
        "Original Code": "GPTSimpleVectorIndex.load_from_disk('index.json')",
        "Updated Code": "load_index_from_storage(StorageContext.from_defaults(persist_dir=\"storage\"))",
        "Reason for Change": "Updated API for loading indexes from storage."
    },
    {
        "Component": "Query Method",
        "Original Code": "index.query(query)",
        "Updated Code": "index.as_query_engine().query(query)",
        "Reason for Change": "New query engine API for more flexible and robust querying."
    },
    {
        "Component": "OpenAI Model",
        "Original Code": "text-davinci-003",
        "Updated Code": "gpt-3.5-turbo",
        "Reason for Change": "text-davinci-003 deprecated by OpenAI; gpt-3.5-turbo is current and efficient."
    },
    {
        "Component": "Data Directory",
        "Original Code": "context_data/data (cloned from GitHub)",
        "Updated Code": "custom_data (user-specified)",
        "Reason for Change": "Allows flexibility for custom user data; original repo may not be relevant."
    },
    {
        "Component": "Exit Condition",
        "Original Code": "None (infinite loop in ask_ai)",
        "Updated Code": "Added 'if query.lower() == 'exit': break'",
        "Reason for Change": "Improves usability by allowing users to exit the query loop."
    }
]

# Create a DataFrame
df = pd.DataFrame(data)

# Style the DataFrame for better display
styled_df = df.style.set_properties(**{
    'text-align': 'left',
    'white-space': 'pre-wrap',
}).set_table_styles([
    {'selector': 'th', 'props': [('text-align', 'center'), ('font-weight', 'bold')]},
    {'selector': 'td', 'props': [('padding', '10px'), ('border', '1px solid #ddd')]}
]).set_caption("Comparison of Original and Updated Chatbot Code")

# Display the styled table
display(styled_df)