#Introduction

This notebook has all the code you need to create your own chatbot with custom knowledge base using GPT-3. 

Follow the instructions for each steps and then run the code sample. In order to run the code, you need to press "play" button near each code sample.

#Download the data for your custom knowledge base
For the demonstration purposes we are going to use ----- as our knowledge base. You can download them to your local folder from the github repository by running the code below.
Alternatively, you can put your own custom data into the local folder. 

In [1]:
import pandas as pd
df = pd.read_excel('D:\OneDrive - NITT\Custom_Download\Shegardi_dataset.xlsx',sheet_name = 'dataset')
df.head()

ValueError: Worksheet named 'dataset' not found

# Install the dependicies
Run the code below to install the depencies we need for our functions

In [None]:
!pip install llama-index
!pip install langchain

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [None]:
!git clone https://github.com/irina1nik/context_data.git

Cloning into 'context_data'...
remote: Enumerating objects: 30, done.[K
remote: Total 30 (delta 0), reused 0 (delta 0), pack-reused 30[K
Unpacking objects: 100% (30/30), 12.56 KiB | 1.40 MiB/s, done.


In [None]:
import pandas as pd

# Read the Excel file
data_path = "/content/Shegardi_dataset.xlsx"
data = pd.read_excel(data_path)

# Process the data (in this example, combining the 'Questions' and 'Answers' columns)
def preprocess_data(data):
    text = ""
    for index, row in data.iterrows():
        text += f"Question: {row['Questions']}\nAnswer: {row['Answers']}\n"
    return text

text_data = preprocess_data(data)

# Save the processed data as a text file
output_file = "output_dataset.txt"
with open(output_file, "w") as f:
    f.write(text_data)

print(f"Data saved to {output_file}")


Data saved to output_dataset.txt


# Define the functions
The following code defines the functions we need to construct the index and query it

In [59]:
from llama_index import SimpleDirectoryReader, GPTListIndex, readers, GPTSimpleVectorIndex, LLMPredictor, PromptHelper
from langchain import OpenAI
import sys
import os
from IPython.display import Markdown, display

def construct_index(directory_path):
    # set maximum input size
    max_input_size = 4096
    # set number of output tokens
    num_outputs = 2000
    # set maximum chunk overlap
    max_chunk_overlap = 20
    # set chunk size limit
    chunk_size_limit = 600 

    # define LLM
    llm_predictor = LLMPredictor(llm=OpenAI(temperature=0.5, model_name="text-davinci-003", max_tokens=num_outputs))
    prompt_helper = PromptHelper(max_input_size, num_outputs, max_chunk_overlap, chunk_size_limit=chunk_size_limit)
 
    documents = SimpleDirectoryReader(directory_path).load_data()
    
    index = GPTSimpleVectorIndex(
        documents, llm_predictor=llm_predictor, prompt_helper=prompt_helper
    )

    index.save_to_disk('index.json')

    return index

def ask_ai():
    index = GPTSimpleVectorIndex.load_from_disk('index.json')
    while True: 
        query = input("What do you want to ask? ")
        response = index.query(query, response_mode="compact")
        display(Markdown(f"Response: <b>{response.response}</b>"))
  

# Set OpenAI API Key
You need an OPENAI API key to be able to run this code.

If you don't have one yet, get it by [signing up](https://platform.openai.com/overview). Then click your account icon on the top right of the screen and select "View API Keys". Create an API key.

Then run the code below and paste your API key into the text input.

In [60]:
os.environ["OPENAI_API_KEY"] = input("Paste your OpenAI key here and hit enter:")

Paste your OpenAI key here and hit enter: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


#Construct an index
Now we are ready to construct the index. This will take every file in the folder 'data', split it into chunks, and embed it with OpenAI's embeddings API.

**Notice:** running this code will cost you credits on your OpenAPI account ($0.02 for every 1,000 tokens). If you've just set up your account, the free credits that you have should be more than enough for this experiment.

In [34]:
!ls

context_data  output_dataset.txt  Shegardi_dataset.xlsx
index.json    sample_data


In [61]:
construct_index("context_data/data")

INFO:root:> [build_index_from_documents] Total LLM token usage: 0 tokens
2023-03-15 19:13:25.344 > [build_index_from_documents] Total LLM token usage: 0 tokens
INFO:root:> [build_index_from_documents] Total embedding token usage: 4337 tokens
2023-03-15 19:13:25.347 > [build_index_from_documents] Total embedding token usage: 4337 tokens


<llama_index.indices.vector_store.vector_indices.GPTSimpleVectorIndex at 0x7ff06bd58df0>

#Ask questions
It's time to have fun and test our AI. Run the function that queries GPT and type your question into the input. 

If you've used the provided example data for your custom knowledge base, here are a few questions that you can ask:
1. Why people cook at home? Make classification
2. Make classification about what frustrates people about cooking?
3. Brainstorm marketing campaign ideas for an air fryer that would appeal people that cook at home
4. Which kitchen appliences people use most often?
5. What people like about cooking at home?

In [49]:
ask_ai()

What do you want to ask? who are you


INFO:root:> [query] Total LLM token usage: 591 tokens
2023-03-15 19:06:53.420 > [query] Total LLM token usage: 591 tokens
INFO:root:> [query] Total embedding token usage: 3 tokens
2023-03-15 19:06:53.427 > [query] Total embedding token usage: 3 tokens


Response: <b>
I am Shegardi, and I am an employee of Warba Bank, located in Kuwait.</b>

What do you want to ask? who is the ceo


INFO:root:> [query] Total LLM token usage: 585 tokens
2023-03-15 19:07:13.796 > [query] Total LLM token usage: 585 tokens
INFO:root:> [query] Total embedding token usage: 5 tokens
2023-03-15 19:07:13.803 > [query] Total embedding token usage: 5 tokens


Response: <b>
Answer: Mr. Shaheen H. Al-Ghanem</b>

What do you want to ask? mention credit cards typs


INFO:root:> [query] Total LLM token usage: 619 tokens
2023-03-15 19:07:58.646 > [query] Total LLM token usage: 619 tokens
INFO:root:> [query] Total embedding token usage: 6 tokens
2023-03-15 19:07:58.649 > [query] Total embedding token usage: 6 tokens


Response: <b>Answer: Elite Dual Chip Mastercard, World Elite Mastercard, World Mastercard, Platinum Mastercard, VISA Signature, VISA Platinum, VISA Prepaid.</b>

KeyboardInterrupt: ignored

In [None]:
!pip install streamlit

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting streamlit
  Downloading streamlit-1.20.0-py2.py3-none-any.whl (9.6 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m9.6/9.6 MB[0m [31m38.2 MB/s[0m eta [36m0:00:00[0m
Collecting rich>=10.11.0
  Downloading rich-13.3.2-py3-none-any.whl (238 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m238.7/238.7 KB[0m [31m22.5 MB/s[0m eta [36m0:00:00[0m
Collecting gitpython!=3.1.19
  Downloading GitPython-3.1.31-py3-none-any.whl (184 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m184.3/184.3 KB[0m [31m16.5 MB/s[0m eta [36m0:00:00[0m
Collecting validators>=0.2
  Downloading validators-0.20.0.tar.gz (30 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting pydeck>=0.1.dev5
  Downloading pydeck-0.8.0-py2.py3-none-any.whl (4.7 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m4.7/4.7 MB[0m [31

In [None]:
import streamlit as st
import pandas as pd
from llama_index import SimpleDirectoryReader, GPTListIndex, readers, GPTSimpleVectorIndex, LLMPredictor, PromptHelper
from langchain import OpenAI
import sys
import os
from IPython.display import Markdown, display

In [None]:
import streamlit as st
import pandas as pd
from llama_index import SimpleDirectoryReader, GPTListIndex, readers, GPTSimpleVectorIndex, LLMPredictor, PromptHelper
from langchain import OpenAI
import sys
import os
from IPython.display import Markdown, display
def construct_index(directory_path):
        # set maximum input size
    max_input_size = 4096
    # set number of output tokens
    num_outputs = 2000
    # set maximum chunk overlap
    max_chunk_overlap = 20
    # set chunk size limit
    chunk_size_limit = 600 

    # define LLM
    llm_predictor = LLMPredictor(llm=OpenAI(temperature=0.5, model_name="text-davinci-003", max_tokens=num_outputs))
    prompt_helper = PromptHelper(max_input_size, num_outputs, max_chunk_overlap, chunk_size_limit=chunk_size_limit)
 
    documents = SimpleDirectoryReader(directory_path).load_data()
    
    index = GPTSimpleVectorIndex(
        documents, llm_predictor=llm_predictor, prompt_helper=prompt_helper
    )

    index.save_to_disk('index.json')

    return index

def ask_ai():
    index = GPTSimpleVectorIndex.load_from_disk('index.json')
    while True: 
        query = input("What do you want to ask? ")
        response = index.query(query, response_mode="compact")
        display(Markdown(f"Response: <b>{response.response}</b>"))
  
os.environ["OPENAI_API_KEY"] = input("Paste your OpenAI key here and hit enter:")
st.title("AI Question-Answering App")

query = st.text_input("What do you want to ask?")
if query:
    response = ask_ai(query)
    st.write(f"Response: {response.response}")

    


Paste your OpenAI key here and hit enter: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
Usage: streamlit run [OPTIONS] TARGET [ARGS]...

Error: Invalid value: File does not exist: streamlit_app.py


In [39]:
!pip install gradio

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting gradio
  Downloading gradio-3.21.0-py3-none-any.whl (15.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m15.8/15.8 MB[0m [31m39.6 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting websockets>=10.0
  Downloading websockets-10.4-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (106 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m107.0/107.0 KB[0m [31m12.3 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting pydub
  Downloading pydub-0.25.1-py2.py3-none-any.whl (32 kB)
Collecting uvicorn
  Downloading uvicorn-0.21.0-py3-none-any.whl (57 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m57.8/57.8 KB[0m [31m6.4 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting python-multipart
  Downloading python_multipart-0.0.6-py3-none-any.whl (45 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

In [42]:
import pandas as pd
from llama_index import SimpleDirectoryReader, GPTListIndex, readers, GPTSimpleVectorIndex, LLMPredictor, PromptHelper
from langchain import OpenAI
from IPython.display import Markdown, display
import gradio as gr


In [70]:
def ask_ai(query):
    index = GPTSimpleVectorIndex.load_from_disk('index.json')
    response = index.query(query, response_mode="compact")
    return response.response


In [73]:
iface = gr.Interface(fn=ask_ai, inputs="text", outputs="text", title="The following is a conversation with a human called Shegardi. Shegardi is helpful, precise, truthful, and very friendly.  Also, Shegardi is an employee of Warba Bank, located in Kuwait. Shegardi will only use the information provided to him. ", 
                     description="Enter a question and get an answer from Shegardi.")


In [74]:
iface.launch(share=True)


Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
Running on public URL: https://50abe92bf005485500.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades (NEW!), check out Spaces: https://huggingface.co/spaces




In [62]:
def ask_ai(query):
    index = GPTSimpleVectorIndex.load_from_disk('index.json')
    response = index.query(query, response_mode="compact")
    return response.response


In [63]:
def app():
    st.title("AI Question Answering")
    query = st.text_input("Enter a question:")
    if query:
        response = ask_ai(query)
        st.markdown(f"**Response:** {response}")


In [65]:
if __name__ == "__main__":
    app()
