# Building a Chatbot that Doesn't Suck

In this notebook we'll build a RAG-based chatbot for a small furniture manufacturer in Oahu, Hawaii

## Set auth tokens

In this notebook we'll use:

- [Jina Embeddings v2]()
- [Hugging Face Inference API]()

You'll need to get tokens for each of the above and enter them below.

In [1]:
from getpass import getpass

jinaai_api_key = getpass(prompt="Your Jina Embeddings API key: ")
hf_inference_api_key = getpass(prompt="Your Hugging Face Inference API key: ")

Your Jina Embeddings API key:  ········
Your Hugging Face Inference API key:  ········
Ngrok auth token:  ········


In [2]:
# RAG dependencies
!pip install -q llama-index llama-index-llms-openai llama-index-embeddings-jinaai llama-index-llms-huggingface "huggingface_hub[inference]"


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.0[0m[39;49m -> [0m[32;49m24.1.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


## Process data

We used GPT to generate some sample data for a fictitious small furniture maker in Oahu, Hawaii. This consists of four simple HTML pages:

- FAQ page
- Front page
- Contact page
- Product listings page

### Download data

In [3]:
from glob import glob
import os
import subprocess

In [4]:
# cleanup from last run
!rm -rf data
!mkdir data

In [5]:
# download html files
!wget -q https://github.com/alexcg1/rag-chatbot/raw/main/notebook/data/faq.html --directory-prefix data/
!wget -q https://github.com/alexcg1/rag-chatbot/raw/main/notebook/data/front.html --directory-prefix data/
!wget -q https://github.com/alexcg1/rag-chatbot/raw/main/notebook/data/contact.html --directory-prefix data/
!wget -q https://github.com/alexcg1/rag-chatbot/raw/main/notebook/data/products.html --directory-prefix data/

In [6]:
# store html files in list
data_dir = "./data"
html_files = glob(f'{data_dir}/*.html')

### Convert to Markdown

HTML is a pain to break into chunks and unreliable for LLMs to parse. We'll convert it to [markdown]() to make things easier:

In [44]:
# convert html files to markdown for easier chunking
for filename in html_files:
  base_name = os.path.splitext(filename)[0]
  md_file = os.path.join(base_name + ".md")

  # Colab uses ancient pandoc, with different argument for markdown header style
  try:
    # colab pandoc
    subprocess.run(["pandoc", "--atx-headers", filename, "-o", md_file], check=True)
  except:
    # newer pandoc
    subprocess.run(["pandoc", "--markdown-headings=atx", filename, "-o", md_file], check=True)

md_files = glob(f'{data_dir}/*.md')

pandoc: ./data/front.html: withBinaryFile: does not exist (No such file or directory)


CalledProcessError: Command '['pandoc', '--markdown-headings=atx', './data/front.html', '-o', './data/front.md']' returned non-zero exit status 1.

### Break Pages into Chunks

We'll make the data more digestible to our chatbot by breaking it into chunks:

In [8]:
# break markdown files into chunks
docs = []

for md_file in md_files:
  with open(md_file, 'r') as f:
    content = f.read()
    docs.extend(content) # add full page

    content_chunks = content.split("\n#")
    docs.extend(content_chunks) # add individual section

## Build RAG system

### Access Jina Embeddings v2 via the LlamaIndex interface.

This code creates the LlamaIndex object that manages your connection to the Jina Embeddings v2 API.

The resulting object is held in the variable `jina_embedding_model`.


In [9]:
from llama_index.embeddings.jinaai import JinaEmbedding

jina_embedding_model = JinaEmbedding(
    api_key=jinaai_api_key,
    model="jina-embeddings-v2-base-en",
)

### Access the Mixtral Model via the HuggingFace Inference API

This code creates a holder for accessing the `mistralai/Mixtral-8x7B-Instruct-v0.1` model via the Hugging Face Inference API. The resulting object is held in the variable `mixtral_llm`.

In [10]:
from llama_index.llms.huggingface import HuggingFaceInferenceAPI

mixtral_llm = HuggingFaceInferenceAPI(
    model_name="mistralai/Mixtral-8x7B-Instruct-v0.1", token=hf_inference_api_key
)

  from .autonotebook import tqdm as notebook_tqdm

  mixtral_llm = HuggingFaceInferenceAPI(


### Convert chunks to be suitable for LlamaIndex

In [11]:
from llama_index.core.readers import StringIterableReader
from llama_index.core.schema import Document

chunks = StringIterableReader().load_data(docs)

### Create a Service

The code creates a RAG service that has access to Jina Embeddings and Mixtral Instruct and stores it in the variable `service_context`.

In [12]:
from llama_index.core import ServiceContext

service_context = ServiceContext.from_defaults(
    llm=mixtral_llm, embed_model=jina_embedding_model
)

  service_context = ServiceContext.from_defaults(


### Build the document index

Next, we store the documents in LlamaIndex' `VectorStoreIndex`, generating embeddings with Jina Embeddings v2 model and using them as keys for retrieval.

In [13]:
from llama_index.core import VectorStoreIndex

index = VectorStoreIndex.from_documents(
    documents=chunks, service_context=service_context
)

### Prepare a Prompt Template

This is the prompt template that will be presented to Mixtral Instruct, with `{context_str}` and `{query_str}` replaced with the retrieved documents and your query respectively.

In [14]:
from llama_index.core import PromptTemplate

qa_prompt_tmpl = (
    "Context information is below.\n"
    "---------------------\n"
    "{context_str}\n"
    "---------------------\n"
    "Given the context information and not prior knowledge, "
    "answer the query. Please be brief, concise, and complete.\n"
    "If the context information does not contain an answer to the query, "
    "respond with \"I'm sorry, but we don't have any information about that. Please contact us on info@oahufurniture.com for more information.\"."
    "Query: {query_str}\n"
    "Answer: "
)
qa_prompt = PromptTemplate(qa_prompt_tmpl)

### Assemble the Full Query Engine

The query engine has three parts:

* `retriever` is the search engine that takes user requests and retrieves relevant documents from the vector store.
* `response_synthesizer` uses the prompt created above to join the retrieved documents and user request and passes them to the LLM, getting back its response.
* `query_engine` is a container object that holds the two together.

In [15]:
from llama_index.core.retrievers import VectorIndexRetriever
from llama_index.core.query_engine import RetrieverQueryEngine
from llama_index.core import get_response_synthesizer

# configure retriever
retriever = VectorIndexRetriever(
    index=index,
    similarity_top_k=2,
)

# configure response synthesizer
response_synthesizer = get_response_synthesizer(
    service_context=service_context,
    text_qa_template=qa_prompt,
    response_mode="compact",
)

# assemble query engine
query_engine = RetrieverQueryEngine(
    retriever=retriever,
    response_synthesizer=response_synthesizer,
)

## Run some queries

Let's run some queries to see our chatbot in action:

In [16]:
def get_answer(question):
    result = query_engine.query(question)
    return result.response.strip()

In [17]:
get_answer("How is a computer useful on a farm?")

'A computer can be useful on a farm for various tasks such as managing financial records, tracking crop yields, monitoring weather patterns, and accessing online resources for farming tips and techniques. It can also be used for communication and collaboration purposes, such as coordinating with other farmers or suppliers. Additionally, computers can be used to operate and monitor automated farming equipment, making farm operations more efficient and precise.'

In [18]:
get_answer("What kind of furniture do you make?")


We make sustainable furniture using locally sourced native timbers that are harvested responsibly. We can tailor designs to match specific themes or decor styles for residential and commercial spaces.


In [19]:
get_answer("How much does your furniture cost?")


I'm sorry, but we don't have any information about that. Please contact us on info@oahufurniture.com for more information.


In [20]:
get_answer("Can I see your furniture in person?")


We do not have a public showroom, but we can arrange viewings of specific furniture pieces by appointment at our workshop in Honolulu.


In [21]:
get_answer("What payment methods do you accept?")


We accept major credit cards (Visa, MasterCard, American Express), PayPal, and bank transfers.


In [23]:
get_answer("What is your furniture made from?")


Our furniture is made from locally sourced native timbers such as Koa, Milo, and Kamani.


### Testing in different languages

In [24]:
get_answer("你的家具是用什么材料制成的？")

我们的家具是用高质量的木材制成的。


In [25]:
get_answer("Welche Zahlungsmethoden werden akzeptiert?")


Die Zahlungsmethoden, die von Oahu Furniture akzeptiert werden, sind Kreditkarten (Visa, Mastercard, American Express), PayPal, Apple Pay, Google Pay und Banküberweisung.


### Run your own queries

In [None]:
while True:
    question = input("Please enter your question: ")
    answer = get_answer(question)
    print(answer)

## Set up API for external access

If you're running this locally in a Jupyter notebook (i.e. *not* Google Colab) you can test the chatbot via a RESTful API and simple web interface:

In [27]:
port = 5000

In [28]:
!pip install -q fastapi uvicorn requests


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.0[0m[39;49m -> [0m[32;49m24.1.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.0[0m[39;49m -> [0m[32;49m24.1.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


In [31]:
import os
import threading
import uvicorn
from fastapi import FastAPI, Request
from fastapi.middleware.cors import CORSMiddleware
from pyngrok import ngrok, conf

app = FastAPI()

conf.get_default().auth_token = ngrok_token

# Enable CORS
origins = ["*"] # all origins

app.add_middleware(
    CORSMiddleware,
    allow_origins=origins,  # Allows all origins
    allow_credentials=True,
    allow_methods=["*"],  # Allows all methods
    allow_headers=["*"],  # Allows all headers
)
# Enable cors end code

# Open a ngrok tunnel to the HTTP server
public_url = ngrok.connect(port).public_url
print(f" * ngrok tunnel \"{public_url}\" -> \"http://127.0.0.1:{port}\"")

# Update any base URLs to use the public ngrok URL
app.state.BASE_URL = public_url

# Define FastAPI routes

@app.post('/')
async def chat_endpoint(request: Request):
    data = await request.json()  # Get JSON data from the request
    response_data = {
        "question": data["question"],
        "answer": get_answer(data["question"])
    }
    
    return response_data

@app.post('/shutdown')
async def shutdown():
    global server_running
    server_running = False
    def stop_uvicorn():
        uvicorn_server.should_exit = True
    threading.Thread(target=stop_uvicorn).start()
    return {"message": "Server shutting down..."}

# Start the FastAPI server in a new thread
def run():
    uvicorn.run(app, host="0.0.0.0", port=port)

threading.Thread(target=run).start()

 * ngrok tunnel "https://0c6b-212-20-115-56.ngrok-free.app" -> "http://127.0.0.1:5000"


INFO:     Started server process [1237768]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:5000 (Press CTRL+C to quit)


INFO:     127.0.0.1:38934 - "POST / HTTP/1.1" 200 OK
INFO:     127.0.0.1:60334 - "OPTIONS / HTTP/1.1" 200 OK
INFO:     127.0.0.1:50398 - "OPTIONS / HTTP/1.1" 200 OK
INFO:     127.0.0.1:50398 - "POST / HTTP/1.1" 200 OK
INFO:     212.20.115.56:0 - "OPTIONS / HTTP/1.1" 200 OK
INFO:     212.20.115.56:0 - "POST / HTTP/1.1" 200 OK
INFO:     212.20.115.56:0 - "POST / HTTP/1.1" 200 OK
INFO:     212.20.115.56:0 - "POST / HTTP/1.1" 200 OK
INFO:     127.0.0.1:38850 - "OPTIONS / HTTP/1.1" 200 OK
INFO:     127.0.0.1:38850 - "POST / HTTP/1.1" 200 OK
INFO:     127.0.0.1:49656 - "POST / HTTP/1.1" 200 OK
INFO:     127.0.0.1:45126 - "OPTIONS / HTTP/1.1" 200 OK
INFO:     127.0.0.1:45134 - "POST / HTTP/1.1" 200 OK
INFO:     127.0.0.1:32806 - "POST / HTTP/1.1" 200 OK
INFO:     127.0.0.1:54160 - "POST / HTTP/1.1" 200 OK
INFO:     127.0.0.1:34126 - "POST /shutdown HTTP/1.1" 200 OK


Exception in thread Thread-8 (stop_uvicorn):
Traceback (most recent call last):
  File "/usr/lib/python3.12/threading.py", line 1073, in _bootstrap_inner
    self.run()
  File "/home/alexcg/work/repos/jina-alexcg/blog/chatsmith/rag-chatbot/notebook/env/lib/python3.12/site-packages/ipykernel/ipkernel.py", line 766, in run_closure
    _threading_Thread_run(self)
  File "/usr/lib/python3.12/threading.py", line 1010, in run
    self._target(*self._args, **self._kwargs)
  File "/tmp/ipykernel_1237768/870245213.py", line 48, in stop_uvicorn
NameError: name 'uvicorn_server' is not defined


INFO:     127.0.0.1:59876 - "OPTIONS / HTTP/1.1" 200 OK
INFO:     127.0.0.1:59890 - "POST / HTTP/1.1" 200 OK
INFO:     127.0.0.1:47492 - "POST /shutdown HTTP/1.1" 200 OK


Exception in thread Thread-9 (stop_uvicorn):
Traceback (most recent call last):
  File "/usr/lib/python3.12/threading.py", line 1073, in _bootstrap_inner
    self.run()
  File "/home/alexcg/work/repos/jina-alexcg/blog/chatsmith/rag-chatbot/notebook/env/lib/python3.12/site-packages/ipykernel/ipkernel.py", line 766, in run_closure
    _threading_Thread_run(self)
  File "/usr/lib/python3.12/threading.py", line 1010, in run
    self._target(*self._args, **self._kwargs)
  File "/tmp/ipykernel_1237768/870245213.py", line 48, in stop_uvicorn
NameError: name 'uvicorn_server' is not defined


INFO:     127.0.0.1:59914 - "POST /shutdown HTTP/1.1" 200 OK


Exception in thread Thread-10 (stop_uvicorn):
Traceback (most recent call last):
  File "/usr/lib/python3.12/threading.py", line 1073, in _bootstrap_inner
    self.run()
  File "/home/alexcg/work/repos/jina-alexcg/blog/chatsmith/rag-chatbot/notebook/env/lib/python3.12/site-packages/ipykernel/ipkernel.py", line 766, in run_closure
    _threading_Thread_run(self)
  File "/usr/lib/python3.12/threading.py", line 1010, in run
    self._target(*self._args, **self._kwargs)
  File "/tmp/ipykernel_1237768/870245213.py", line 48, in stop_uvicorn
NameError: name 'uvicorn_server' is not defined


INFO:     127.0.0.1:46346 - "POST /shutdown HTTP/1.1" 200 OK


Exception in thread Thread-11 (stop_uvicorn):
Traceback (most recent call last):
  File "/usr/lib/python3.12/threading.py", line 1073, in _bootstrap_inner
    self.run()
  File "/home/alexcg/work/repos/jina-alexcg/blog/chatsmith/rag-chatbot/notebook/env/lib/python3.12/site-packages/ipykernel/ipkernel.py", line 766, in run_closure
    _threading_Thread_run(self)
  File "/usr/lib/python3.12/threading.py", line 1010, in run
    self._target(*self._args, **self._kwargs)
  File "/tmp/ipykernel_1237768/870245213.py", line 48, in stop_uvicorn
NameError: name 'uvicorn_server' is not defined


INFO:     127.0.0.1:43196 - "POST / HTTP/1.1" 200 OK
INFO:     127.0.0.1:53152 - "POST /shutdown HTTP/1.1" 200 OK


Exception in thread Thread-12 (stop_uvicorn):
Traceback (most recent call last):
  File "/usr/lib/python3.12/threading.py", line 1073, in _bootstrap_inner
    self.run()
  File "/home/alexcg/work/repos/jina-alexcg/blog/chatsmith/rag-chatbot/notebook/env/lib/python3.12/site-packages/ipykernel/ipkernel.py", line 766, in run_closure
    _threading_Thread_run(self)
  File "/usr/lib/python3.12/threading.py", line 1010, in run
    self._target(*self._args, **self._kwargs)
  File "/tmp/ipykernel_1237768/870245213.py", line 48, in stop_uvicorn
NameError: name 'uvicorn_server' is not defined


INFO:     127.0.0.1:53162 - "POST / HTTP/1.1" 200 OK


### Test in browser

We can open a simple HTML chatbot page for you to test out the chatbot

In [32]:
!git clone https://github.com/alexcg1/rag-chatbot

Cloning into 'rag-chatbot'...
remote: Enumerating objects: 18, done.[K
remote: Counting objects: 100% (18/18), done.[K
remote: Compressing objects: 100% (13/13), done.[K
remote: Total 18 (delta 1), reused 18 (delta 1), pack-reused 0[K
Receiving objects: 100% (18/18), 9.08 KiB | 9.08 MiB/s, done.
Resolving deltas: 100% (1/1), done.


In [None]:
os.chdir("./rag-chatbot/web")

In [36]:
import http.server
import socketserver
import os

web_port = 8000

handler = http.server.SimpleHTTPRequestHandler

with socketserver.TCPServer(("", web_port), handler) as httpd:
    print("Serving at port", web_port)
    httpd.serve_forever()

Serving at port 8000


127.0.0.1 - - [09/Jul/2024 15:13:52] "GET / HTTP/1.1" 304 -
127.0.0.1 - - [09/Jul/2024 15:13:52] "GET /styles.css HTTP/1.1" 304 -
127.0.0.1 - - [09/Jul/2024 15:13:52] "GET /script.js HTTP/1.1" 304 -
127.0.0.1 - - [09/Jul/2024 15:14:31] "GET / HTTP/1.1" 200 -
127.0.0.1 - - [09/Jul/2024 15:14:31] "GET /styles.css HTTP/1.1" 200 -
127.0.0.1 - - [09/Jul/2024 15:14:31] "GET /script.js HTTP/1.1" 200 -
127.0.0.1 - - [09/Jul/2024 15:14:31] code 404, message File not found
127.0.0.1 - - [09/Jul/2024 15:14:31] "GET /favicon.ico HTTP/1.1" 404 -


KeyboardInterrupt: 

Now you can open your web browser to [http://localhost:8000](http://localhost:8000) to play with the chatbot in your browser.

### Stop server when done

In [42]:
import requests

def stop_server():
    response = requests.post(f"http://localhost:{port}/shutdown")
    print(response.content)
    ngrok.disconnect(public_url)

In [43]:
stop_server()

b'{"message":"Server shutting down..."}'
