# Documentation

## Dataset

The dataset was generated using ChatGPT, following the format provided in the email. I have decided to include an additional column, which is the menu, for more context.

---

## General Message

For this project, I decided to go with local models. I believe that cloud models can easily be implemented as the logic would still remain the same but will have superior performance compared to what I have.

I made this project in my laptop with the following specs.

Processor: i5 12500H

RAM: 16GB

GPU RTX3060 Laptop GPU

VRAM: 6GB

---

## External things needed to be installed

This project will require Ollama, as Ollama is the software that I used to abstract the interaction with the models.

[Ollama Download Page](https://ollama.com/download)

For the LLM, I used `gemma3:1b-it-qat` as it is the most lightweight model from the Gemma 3 family.

[Gemma 3 Ollama Page](https://ollama.com/library/gemma3)

To install, run `ollama run gemma3:1b-it-qat` in CMD.

For the embedder, I used `nomic-embed-text:v1.5`. It is the top embedder by download numbers in Ollama.

[Nomic Embed Ollama Page](https://ollama.com/library/nomic-embed-text:v1.5)

To install, run `ollama pull nomic-embed-text:v1.5` in CMD.

Better models can be used if you can support them.

## Python related things needed to be installed

I have provided a `requirements.txt` file. To use, first create an environment for the project. The environment should then be activated. Once activated, the command `pip install -r requirements.txt` can be used.

---

## Setting up the server

The main functionality is stored in the fast_api_side .py file. However, this cannot be run directly like a normal .py file. Instead, to call the server, the following command should be run in the CMD with the environment activated.

`uvicorn fast_api_side:app --reload`

Once the server is loaded, the API can now be interacted.

---

## Interacting with the API

One way to interact with the API is through the docs page of Uvicorn. This can be accessed through

`http://127.0.0.1:8000/docs`

It should show two POST endpoints present. I have included a screenshot in the screenshot folder.

Another way is through the user_side .py file I have included. The user_side .py file should only be ran if the Uvicorn server is up as it works by making a request to the server.

---

## user_side

Below is the documentation for the user_side .py file

---

This part of the file declares the base URL and the two endpoints. The first request is sending the user query to the submit endpoint.

It is in this endpoint where the user query is used to find the best match in the database and a response is crafted by the LLM.

```
# API endpoint
base_url = "http://127.0.0.1:8000"

submit_query_url = f"{base_url}/submit/"
booking_url = f"{base_url}/book/"

user_input = input("Input: ")

# Data to send
payload = {"query": user_input}

# Make the POST request
response = requests.post(submit_query_url, json=payload)
```

---

This part of the file handles the next step. The second request is sending the response of the user to the book endpoint.

It is in this endpoint that the LLM will decide if the user wanted to book and then update the booking JSON file accordingly.

```
if response.status_code == 200:
    submit_data = response.json()
    print("Response:", submit_data["response"])
    print("Found:", submit_data["found"])
    print("Shop:", submit_data["shop_tuple"])

    if submit_data["found"]:
        print(f"Do you want to book a table")

        user_input = input("Input: ")

        # Data to send
        payload = {
            "user_input": user_input,
            "shop_tuple": submit_data["shop_tuple"],
        }

        # Make the POST request
        response = requests.post(booking_url, json=payload)

        if response.status_code == 200:

            booking_data = response.json()

            print("Here is what we got:")
            print("Booking Status:", booking_data["booked"])
            print("Details:", booking_data["entry"])

        else:
            print("Error:", response.status_code, response.text)

else:
    print("Error:", response.status_code, response.text)
```

---
<br><br><br><br>

## embed_data

Below is the documentation for the embed_data .py file

---

This function chunks the data into their respective rows. This chunking is to ensure that the context will be limited to an entry of the dataset.

```
def line_chunk(data_df):
    line_chunk_list = []
    for index, row in data_df.iterrows():
        name = row["name"]
        category = row["category"]
        location = row["location"]
        description = row["description"]
        menu = row["menu"]

        chunk = f"Name: {name}\nCategory: {category}\nLocation: {location}\nDescription: {description}\nMenu: {menu}"
        print(chunk)

        line_chunk_list.append(chunk)

    return line_chunk_list
```

It returns a list of chunks.

---

This function gets the embedding for each chunk using the RAG model for this project.

```
def get_embed(line_chunk_list, embed_model):
    embedding_list = []
    for chunk in line_chunk_list:
        embed_response = ollama.embeddings(model=embed_model, prompt=chunk)
        embedding_values = embed_response["embedding"]
        embedding_list.append(embedding_values)

    return embedding_list
```

The function returns a list which contains the embedding values of each chunk.

---

This function stores the values from the embedding process so that embedding will not be performed everytime the API is called.

```
def store_embed_json(embedding_list, json_embed_path):
    if json_embed_path.exists():
        os.remove(json_embed_path)

    with open(json_embed_path, "w") as embed_json:
        json.dump(embedding_list, embed_json)
```

The values are stored in a JSON file.


---
<br><br><br><br>

## embed_retrieve

Below is the documentation for the embed_retrieve .py file

---

This code block loads the embedding list from the JSON file.

```
def load_embed_json(json_embed_path):
    embedding_list = []

    if json_embed_path.exists():
        with open(json_embed_path, "r") as embed_json:
            embedding_list = json.load(embed_json)

    else:
        print("Not Found")

    return embedding_list
```

---

This code block performs cosine similarity. It takes in the embedded form of the query and compare it with all the entries in the embedding list.

The list is then sorted so that the best match will be the first element in the list being returned by the function.

```
def cosine_similarity_sort(query_vector, embedding_list):

    embed_cos_list = []

    count = 0

    for embedding_vector in embedding_list:

        a = np.array(query_vector)
        b = np.array(embedding_vector)

        dot_product = np.dot(a, b)
        norm_a = np.linalg.norm(a)
        norm_b = np.linalg.norm(b)

        cos_similar_value = dot_product / (norm_a * norm_b)

        embed_cos_tuple = (cos_similar_value, count)

        embed_cos_list.append(embed_cos_tuple)

        count += 1

    # Sort to get the best result at the top
    sorted_embed_cos_list = sorted(embed_cos_list, key=lambda x: x[0], reverse=True)

    return sorted_embed_cos_list
```

I have opted to do it this way since the dataset is just small and the time spent comparing will not be that long.

Cosine similarity works by comparing the directions of the two vectors being compared. Since they exist in a higher dimension, a sense of how close they are pointing can be done through their dot products. Vectors aligned perfectly will return a value of 1 and vectors perpendicular will return a value of 0. Directionality matters since this conveys the "context" of the chunk. Chunks pointing in the same direction are likely to have similar meanings.

---

This code block returns the top 3 entries based on the result of cosine similarity.

```
def get_top_list(dataset_df, user_input, embedding_list):
    # Embed user input
    user_input_embedding_values = ollama.embeddings(
        model="nomic-embed-text:v1.5", prompt=user_input
    )["embedding"]

    embed_cos_list = cosine_similarity_sort(user_input_embedding_values, embedding_list)

    count = 0
    max_count = 3

    print(f"Top {max_count} results based on similarity to query")

    top_list = []

    while count < max_count:

        current_row = dataset_df.iloc[embed_cos_list[count][1]]

        cos_score = embed_cos_list[count][0]

        name = current_row["name"]
        category = current_row["category"]
        location = current_row["location"]
        description = current_row["description"]
        menu = current_row["menu"]

        print(
            f"Name: {name}\nCategory: {category}\nLocation: {location}\nDescription: {description}\nMenu: {menu}\nScore: {cos_score}\n"
        )

        count += 1

        if cos_score > 0.5:
            top_list.append((name, category, location, description, menu))

    return top_list
```

It only accepts an entry if the cosine score is greater than 0.5. The returned data can then be used as context when answering the query of the user. This context is provided to the LLM.

---
<br><br><br><br>

## augment_generate

Below is the documentation for the augment_generate .py file.

---

This code block ensures a unique ID is created when logging.

```
def get_formatted_time():
    now = datetime.now()
    timestamp = now.strftime("%Y%m%d_%H%M%S")

    random_id = f"{random.randint(0, 99999999):08d}"
    return f"{timestamp}_{random_id}"
```

---

This function takes in the top 3 matches from embedding, the user input, and the model that will be used.

So I made the LLM have the system prompt of being an assistant. The function will check if the top list is empty or not. If it is empty, this means no match and the LLM should inform the user that there is no match.

If there is a match, the LLM will then take the information from the top list and use that as a context to generate a response based on the input of the user.

The code looks long but it is mainly composed of messages for the LLM. The logic of the function is simply check if top list is not empty then craft a response.

```
def generate_response(top_list, user_input, model_name):

    message_list = []

    # System prompt
    system_prompt = """
        You are a Business Lookup Assistant. You will help the user look for business that closely aligns with their requests.
    """

    # Appends the system prompt
    message_list.append(
        {
            "role": "system",
            "content": system_prompt,
        }
    )

    # Appends the user query
    message_list.append(
        {
            "role": "user",
            "content": f"This is the user query: {user_input}.",
        }
    )

    # If no match
    if len(top_list) == 0:

        message_list.append(
            {
                "role": "user",
                "content": f"After searching our database, there is no relevant result.",
            }
        )

        message_list.append(
            {
                "role": "user",
                "content": f"Create a response informing the user that we did not find a good match in our database. Do not offer any other help or extra information.",
            }
        )

        message_list.append(
            {
                "role": "user",
                "content": f"Just reply with the response of having no match.",
            }
        )

        message_list.append(
            {
                "role": "user",
                "content": f"Also keep it simple.",
            }
        )

        shop_tuple = ("No Match", "No Match", "No Match")

        result_found = False

    # If match
    else:

        message_list.append(
            {
                "role": "user",
                "content": f"This is the most relevant result:",
            }
        )

        name = top_list[0][0]
        category = top_list[0][1]
        location = top_list[0][2]
        description = top_list[0][3]
        menu = top_list[0][4]

        message_list.append(
            {
                "role": "user",
                "content": f"Name: {name}\nCategory: {category}\nLocation: {location}\nDescription: {description}\nMenu: {menu}\n",
            }
        )

        message_list.append(
            {
                "role": "user",
                "content": f"Synthesize a response to answer the query based on the most relevant result. Make it engaging.",
            }
        )

        message_list.append(
            {
                "role": "user",
                "content": f"Make sure to tell the name of the shop, the location, the type, the description, and the menu. Do not offer any other help or extra information.",
            }
        )

        message_list.append(
            {
                "role": "user",
                "content": f"Also keep it simple.",
            }
        )

        shop_tuple = (name, category, location)

        result_found = True

    # Uses the specified model to generate response
    response: ChatResponse = chat(
        model=model_name,
        messages=message_list,
    )

    response_message = response.message.content

    return response_message, result_found, shop_tuple
```

So in the end, the function returns the LLM response, a boolean corresponding if a match is found, and a tuple containing the details of the match. This will be used for the next step.

---

This code block is for parsing the response of the user if a match is found. The user will be asked if they want to book a table. 
While I could use a simple yes or no, I decided to use an LLM to catch their response assuming that they will type their thoughts.
Again this code block is long because of the messages that I appended as instructions for the LLM.

For inputs, it takes in the response of the user, the info of the shop, and the LLM being used.


```
def book_table_command(user_input, shop_tuple, model_name):

    message_list = []

    # System prompt
    system_prompt = f"""
        You are a booking table assistant. You will check if the user expressed their desire to book a table at {shop_tuple[0]}.
    """

    # Appends the system prompt
    message_list.append(
        {
            "role": "system",
            "content": system_prompt,
        }
    )

    message_list.append(
        {
            "role": "system",
            "content": f"Note that they have been asked already if they want to book a table at {shop_tuple[0]}.",
        }
    )

    message_list.append(
        {
            "role": "system",
            "content": f"This is their response.",
        }
    )

    message_list.append(
        {
            "role": "user",
            "content": user_input,
        }
    )

    message_list.append(
        {
            "role": "system",
            "content": f"If the user input is gibberish or not related to booking a table, NO must be the response.",
        }
    )

    message_list.append(
        {
            "role": "system",
            "content": f"Respond with either YES or NO only. No punctuations in the response. Either YES or NO only. I repeat, YES or NO only.",
        }
    )

    response: ChatResponse = chat(
        model=model_name,
        messages=message_list,
    )

    response_message = response.message.content
    print(response_message)

    if response_message.lower().strip() == "yes":

        user_id = get_formatted_time()
        shop_name = shop_tuple[0]
        shop_category = shop_tuple[1]
        shop_location = shop_tuple[2]
        user_book_message = user_input

        booking_entry_dict = {
            "user_id": user_id,
            "shop_name": shop_name,
            "shop_category": shop_category,
            "shop_location": shop_location,
            "message": user_book_message,
        }

        book_status = True

    else:
        booking_entry_dict = {}

        book_status = False

    return book_status, booking_entry_dict
```

At the end, it will just return a book_status, which is a boolean, and a dictionary containing the details of the booking.

The details stored is the user_id, the name and details of the shop, and the message sent by the user.

---

This code block modifies the booking.json file to be updated once a booking is confirmed.

```
def modify_book_json(json_book_path, booking_entry_dict):

    # Create the JSON file
    if not json_book_path.exists():
        with open(json_book_path, "w") as file:
            json.dump([], file)

    with open(json_book_path, "r") as file:
        try:
            booking_data = json.load(file)
        except json.JSONDecodeError:
            booking_data = []

    booking_data.append(booking_entry_dict)

    # Write new data
    with open(json_book_path, "w") as file:
        json.dump(booking_data, file, indent=4)
```

---
<br><br><br><br>

## fast_api_side

Below is the documentation for the fast_api_side .py file

---

This loads the dataframe and the embedding list so that the API can be called without reloading them everytime.

```
@asynccontextmanager
async def lifespan(app: FastAPI):
    try:
        df = pd.read_csv("expanded_dataset.csv")
        json_path = Path("embedding.json")
        embedding_list = embed_retrieve.load_embed_json(json_path)

        # Store in app state
        app.state.dataset = df
        app.state.embeddings = embedding_list

        print("Data loaded successfully.")
    except Exception as e:
        print(f"Error loading data: {e}")

    yield

```

---

This is the part that of the API that takes in the input. The input is sent to the submit endpoint, where the code that process the input with the LLM and appropriate context is performed.

```
class QueryInput(BaseModel):
    query: str


# Endpoint that handles user queries
@app.post("/submit/")
async def submit_query(input_data: QueryInput, request: Request):
    df = request.app.state.dataset
    embedding_list = request.app.state.embeddings
    llm_name = "gemma3:1b-it-qat"

    user_input = input_data.query

    top_list = embed_retrieve.get_top_list(df, user_input, embedding_list)
    response_message, result_found, shop_tuple = augment_generate.generate_response(
        top_list, user_input, llm_name
    )

    return {
        "response": response_message,
        "found": result_found,
        "shop_tuple": shop_tuple,
    }
```

After processing, the API will return a dictionary containing the response, the status whether a match is found, and the appropriate shop details.

---

This is the part of the API that handles booking. The reply of the user is sent back again to the LLM. The LLM decides whether the user has shown desire to book.

If yes, the "user_id", name of the store, location of the store, and the message of the user is collected.

```
class BookingInput(BaseModel):
    user_input: str
    shop_tuple: tuple

# Endpoint that handles booking
@app.post("/book/")
async def book_table(input_data: BookingInput):
    from pathlib import Path

    json_book_path = Path("booking.json")  # or whatever your booking file is
    llm_name = "gemma3:1b-it-qat"

    book_status, booking_entry_dict = augment_generate.book_table_command(
        input_data.user_input, input_data.shop_tuple, llm_name
    )

    if book_status:
        augment_generate.modify_book_json(json_book_path, booking_entry_dict)

    return {"booked": book_status, "entry": booking_entry_dict if book_status else None}
```

The collected data can then be stored in a separate file.