# Introduction
Nowadays, the world revolves around data. We created elaborate infrastructures, namely databases, to interact with the data in efficient ways. Yet, this requires more sophisticates techniques like mastering SQL access the data, which locks away information for people that lack knowledge of SQL. Currently, Large Language Models (LLMs) are becomming more and more popluar for interaction with databases enabling everyone to gain insights using the databases content. This short project looks into LLMs how they can call function by providing precise arguments helping to to interact with databases. 
This project has two parts:
* First, we'll let a LLM interact with a dataset generating features like a sentiment of a dialogue and names of involved persons. It uses the LLM capabilty of function calling to built a dialgue processing system. 
* The second part explores the ability of an LLM to interact with databases by creating increasingly more complex question and analyzing typical hallucinations.
 
I used  Nexus Raven for this project. It has 13B parameter, yet it is specifically trained on python code to enhance the performance in calling functions. If you want to find out more, check out their [course](https://www.deeplearning.ai/short-courses/function-calling-and-data-extraction-with-llms/) on Deeplearning.ai.





# Dataset
The Customer Service dataset used in this project is freely available from [HuggingFace](https://huggingface.co/datasets/SantiagoPG/customer_service_chatbot). It contains 1000 records of interactions of customers and customer service containing data about the type of issue, product category and more. I will be focussing on the conversation dialogue that happened between both parties. This allows for interesting applications of LLMs to extract features from text.

The other database I used in the project is the Chinook database. It contains several tables which allows to build more ellaborate SQL queries including joins.

# Part 1: Using LLM to interact with the data to create a new dataset

The first part follows the following steps:
* Load the data set and get an overview of the data
* Creating a data class to define desired outputs from the dialogues
* Initializing a test database to store extracted data
* Creating prompts for the LLM to fill instances of the data class and to upload it to the database
* Testing the function using the test data base
* Creating a database for the second part of the project

## Loading the data
Let's have a quick look at the customer service dataset and see how we can you an LLM to call functions to interact with the data.

First, we print out the a sample and look at the conversation, that we'll be using to create a sentiment prediction.

In [3]:
from datasets import load_dataset
import os
import warnings
warnings.filterwarnings("ignore")

cwd = os.getcwd()
# load the customer service chatbot dataset
dialogue_data = load_dataset(cwd + "/data/customer_service_chatbot", cache_dir="./cache")["train"]
sample = dialogue_data[6] # load a sample 
dialogue_string = sample["conversation"].replace("\n\n", "\n") # remove double newlines
print (dialogue_string)


Agent: Hello, thank you for contacting BrownBox customer support. My name is Alex, how can I assist you today?
Customer: Hi, I'm calling about my order for a water purifier. I received it yesterday, but it's not working correctly. I want to return it and get a refund.
Agent: I'm sorry to hear that. I'll be happy to help you with that. Can you please provide me with your order number?
Customer: Sure, it's 12345.
Agent: Thank you for the information. May I know the reason for the return?
Customer: As I mentioned earlier, the product is not working correctly. I want to return it and get a refund.
Agent: I'm sorry for the inconvenience. We would be happy to process your return and refund. However, since you have opted for Cash on Delivery, it will take some time to process your refund. Our refund timelines for Cash on Delivery returns are usually within 7-14 business days from the date of pickup. 
Customer: What? That's too long. Why does it take so much time?
Agent: I understand your frus

## Creating a data class
For the LLM to extract features from these dialogues, we create a data class "Record" which describes the desired format and features that should be extracted.

In [4]:
from dataclasses import dataclass, fields
dataclass_schema_representation = '''
@dataclass
class Record:
    agent_name : str # The agent name
    customer_email : str # customer email if provided, else ''
    customer_order : str # The customer order number if provided, else ''
    customer_phone : str # the customer phone number if provided, else ''
    customer_sentiment : str # Overall customer sentiment, either 'frustrated', or 'happy'. Always MUST have a value.
'''

# Let's call exec to insert the dataclass into our python interpreter so it understands this. 
exec(dataclass_schema_representation)

## Creating a test data base
This makes it easy for the LLM to add records to a database. We will now initialize a test database where the extracted features will be stored. 

In [5]:
def initialize_db():
    import sqlite3

    # Connect to SQLite database (or create it if it doesn't exist)
    conn = sqlite3.connect('extracted_test.db')
    cursor = conn.cursor()

    # Fixed table name
    table_name = "customer_information"

    # Fixed schema
    columns = """
    id INTEGER PRIMARY KEY, 
    agent_name TEXT, 
    customer_email TEXT, 
    customer_order TEXT, 
    customer_phone TEXT, 
    customer_sentiment TEXT
    """

    # Ensure the table name is enclosed in quotes if it contains special characters
    quoted_table_name = f'"{table_name}"'

    # Check if a table with the exact name already exists
    cursor.execute(f"SELECT name FROM sqlite_master WHERE type='table' AND name={quoted_table_name}")
    if cursor.fetchone():
        print(f"Table {table_name} already exists.")
    else:
        # Create the new table with the fixed schema
        cursor.execute(f'''CREATE TABLE {quoted_table_name} ({columns})''')
        print(f"Table {table_name} created successfully.")

    # Commit the transaction and close the connection
    conn.commit()
    conn.close()

In [6]:
!rm extracted_test.db
initialize_db()

Table customer_information created successfully.


## Creating prompts for the LLM

We need a function which allows us to populate the database. This functions structure is included in the prompt to ensure the function's proper use of the LLM.

In [7]:
from typing import List
def update_knowledge(results_list : List[Record]):
    """
    Registers the information necessary
    """
    import sqlite3
    from sqlite3 import ProgrammingError

    # Reconnect to the existing SQLite database
    conn = sqlite3.connect('extracted_test.db')
    cursor = conn.cursor()

    # Fixed table name
    table_name = "customer_information"

    # Prepare SQL for inserting data with fixed column names
    column_names = "agent_name, customer_email, customer_order, customer_phone, customer_sentiment"
    placeholders = ", ".join(["?"] * 5) 
    sql = f"INSERT INTO {table_name} ({column_names}) VALUES ({placeholders})"

    # Insert each record
    for record in results_list:
        try:
            record_values = tuple(getattr(record, f.name) for f in fields(record))
            cursor.execute(sql, record_values)
        except ProgrammingError as e:
            print(f"Error with record. {e}")
            continue

    # Commit the changes and close the connection
    conn.commit()
    conn.close()
    print("Records inserted successfully.")

Now, this is where it becomes interesting. We'll define a prompt which enables the LLM to extract the information. For this we take the signature and the docstring of the update_knowledge function as well as the data class "Record" to provide more information for the LLM on how to call the function update_knowledge.


In [8]:
import inspect
prompt = "\n" + dialogue_string

signature = inspect.signature(update_knowledge)
signature = str(signature).replace("__main__.Record", "Record")
docstring = update_knowledge.__doc__

raven_prompt = f'''{dataclass_schema_representation}\nFunction:\n{update_knowledge.__name__}{signature}\n    """{docstring}"""\n\n\nUser Query:{prompt}<human_end>'''
print (raven_prompt)


@dataclass
class Record:
    agent_name : str # The agent name
    customer_email : str # customer email if provided, else ''
    customer_order : str # The customer order number if provided, else ''
    customer_phone : str # the customer phone number if provided, else ''
    customer_sentiment : str # Overall customer sentiment, either 'frustrated', or 'happy'. Always MUST have a value.

Function:
update_knowledge(results_list: List[Record])
    """
    Registers the information necessary
    """


User Query:
Agent: Hello, thank you for contacting BrownBox customer support. My name is Alex, how can I assist you today?
Customer: Hi, I'm calling about my order for a water purifier. I received it yesterday, but it's not working correctly. I want to return it and get a refund.
Agent: I'm sorry to hear that. I'll be happy to help you with that. Can you please provide me with your order number?
Customer: Sure, it's 12345.
Agent: Thank you for the information. May I know the reason for th

We can now query the LLM to extract the features for the data class and the database. The LLM correctly populates the record and passes it to the update_knowledge function. We use the eval() function to run the function and to add the record to the database.

In [9]:
from utils import query_raven
raven_call = query_raven(raven_prompt)
print (raven_call)

update_knowledge(results_list=[Record(agent_name='Alex', customer_email='', customer_order='12345', customer_phone='', customer_sentiment='frustrated')])


In [10]:
eval(raven_call)

Records inserted successfully.


## Testing

To check the new entry, we need a function to interact with the database:

In [11]:
import sqlite3
def execute_sql(sql: str):
    """ Runs SQL code for the given schema. Make sure to properly leverage the schema to answer the user's question in the best way possible. """
    # Fixed table name, assuming it's not dynamically generated anymore
    table_name = "customer_information"

    # Establish a connection to the database
    conn = sqlite3.connect('extracted_test.db')
    cursor = conn.cursor()

    # Execute the SQL statement
    cursor.execute(sql)

    # Initialize an empty list to hold query results
    results = []

    results = cursor.fetchall()
    print("Query operation executed successfully. Number of rows returned:", len(results))

    # Close the connection to the database
    conn.close()

    # Return the results for SELECT operations; otherwise, return an empty list
    return results


In [12]:
sql = '''
    SELECT agent_name 
        FROM customer_information
        WHERE customer_sentiment = "frustrated"
    '''
# Print the final SQL command for debugging
print("Executing SQL:", sql)

execute_sql(sql)

Executing SQL: 
    SELECT agent_name 
        FROM customer_information
        WHERE customer_sentiment = "frustrated"
    
Query operation executed successfully. Number of rows returned: 1


[('Alex',)]

## Creating a database for Part 2
I have created a python file which allows the user to create a database called extracted.db. It can be called in the terminal using

```
python3 createDatabase.py --n=10
``` 
with n being the number of records created from the original dataset, which contains 1000 records.

**Note:** Everything so far was done using a test database called extracted_test.db. From now on, we'll use the database that is created using the script (specifically with n=10).

# Part 2
The second part of the project contains the following steps:
* Exploring the newly created  created by the LLM
* Exploring the Chinook database
* Creating more complex queries to test the performance of the LLM

For the database exploration, we'll use the functions execute_sql, get_query that are defined in utils.py. These functions are more versatile letting the user choose the database. For more information, check out my [repo](https://github.com/SebastianGhafafian/NexusLLM) on Github. 


## Exploring the Customer Service database


Let's check out the newly created database extracted.db and at the same time evaluate the LLM's performance on creating SQL queries.


In [13]:
from utils import execute_sql, get_query

In the cells below are three questions. The first one creates an overview of the database. The LLM performed well in extracting the data from the dialogue information. For entry 4, it does not identify the name correctly ("BrownBox" although the agent name was never mentioned). This could potentially be fixed by providing a more specific prompt in that regard. 

In [29]:
question = "Give me all the entries from the database?"

raven_call, prompt = get_query(database_path = "data/extracted.db", question = question)
print(raven_call)
eval(raven_call)


execute_sql(sql='SELECT * FROM customer_information', database_path='data/extracted.db')
Query operation executed successfully. Number of rows returned: 10


[(1, 'Tom', 'johndoe@email.com', '', '+1 123-456-7890', 'happy'),
 (2, 'Alex', '', '789101', '', 'happy'),
 (3, 'Sarah', 'jane.doe@email.com', '987654', '', 'happy'),
 (4, 'BrownBox', 'john.doe@gmail.com', 'BB98765432', '123-456-7890', 'happy'),
 (5, 'Sarah', '', 'BB123456', '', 'frustrated'),
 (6, 'Alex', 'johnsmith@email.com', '', '123-456-7890', 'happy'),
 (7, 'Alex', '', '12345', '', 'frustrated'),
 (8, 'Rachel', '', '', '', 'happy'),
 (9, 'Sarah', '', '#98765', '', 'happy'),
 (10, 'Sarah', 'jane@email.com', '', '9876543210', 'frustrated')]

Before checking the next two questions, let's check out the prompt passed to the LLM:

In [30]:
print(prompt)

Schema:

    
CREATE TABLE customer_information (
	id INTEGER, 
	agent_name TEXT, 
	customer_email TEXT, 
	customer_order TEXT, 
	customer_phone TEXT, 
	customer_sentiment TEXT, 
	PRIMARY KEY (id)
)

/*
3 rows from customer_information table:
id	agent_name	customer_email	customer_order	customer_phone	customer_sentiment
1	Tom	johndoe@email.com		+1 123-456-7890	happy
2	Alex		789101		happy
3	Sarah	jane.doe@email.com	987654		happy
*/
    
Function:
execute_sql(sql: str, database_path: str)
    """
    Runs SQL code for the given schema and database. Make sure to properly leverage the schema to answer the user's question in the best way possible. 
    Pay attention to use only the the column names of that belong to a table which are described in the schema description.
    Be careful to not query for columns that do not exist. Also, pay attention to which column is in which table.

    
    Args:
        sql: SQL code to run
        database_name: name of the database to run the SQL code
  

This prompt allows the LLM to gain insights over the used database. Yet, this also creates limitations for large databases containing numerous tables in combination with LLMs allowing for a limited amount of input tokens.

Let's continue to see how the LLM performs. The second question probes the LLM with a more challenging question. The LLM correctly creates a WHERE statement. It manages to understand what a gmail email adress is, i.e. a adress that ends in gmail.com which is checked using the LIKE statement.

In [15]:
question = "What are the names of customers with a gmail email adress?"
raven_call, prompt = get_query(database_path = "data/extracted.db", question = question)
print(raven_call)
eval(raven_call)

execute_sql(sql='SELECT customer_email FROM customer_information WHERE customer_email LIKE "%gmail.com"', database_path='data/extracted.db')
Query operation executed successfully. Number of rows returned: 1


[('john.doe@gmail.com',)]

For the third question. the LLM returns an empty list, although there is a phone number that starts with +1.

In [16]:
question = 'Give me all the entries from the database with an American phone number?' # outputs wrong results
raven_call, prompt = get_query(database_path = "data/extracted.db", question = question)
print(raven_call)
eval(raven_call)

execute_sql(sql='SELECT * FROM customer_information WHERE customer_phone LIKE "%-011-%"', database_path='data/extracted.db')
Query operation executed successfully. Number of rows returned: 0


[]

By providing a more detailed question, the LLM is able to solve the problem.

In [31]:
question = 'Give me all the entries from the database with an American phone number, which starts with "+1"?'
raven_call, prompt = get_query(database_path = "data/extracted.db", question = question)
print(raven_call)
eval(raven_call)

execute_sql(sql='SELECT * FROM customer_information WHERE customer_phone LIKE "+1%"', database_path='data/extracted.db')
Query operation executed successfully. Number of rows returned: 1


[(1, 'Tom', 'johndoe@email.com', '', '+1 123-456-7890', 'happy')]

## Exploring the Chinook database

The chinook database is a classic example database which represents a digital media store. It contains mutliple tables. This allows us to really explore the abilities of LLMs for complex tasks. Let's create three queries of increasing complexity to explore the database. The LLM does a good job in translating the questions into SQL queries allowing for a quick overview.
![Chinook](img/chinook_db.png)


In [27]:
question = "What the tables in the database?"
raven_call, prompt = get_query(database_path = "data/Chinook.db", question = question)
print(raven_call)
eval(raven_call)

execute_sql(sql='SELECT name FROM sqlite_master WHERE type = "table"', database_path='data/Chinook.db')
Query operation executed successfully. Number of rows returned: 11


[('Album',),
 ('Artist',),
 ('Customer',),
 ('Employee',),
 ('Genre',),
 ('Invoice',),
 ('InvoiceLine',),
 ('MediaType',),
 ('Playlist',),
 ('PlaylistTrack',),
 ('Track',)]

In [19]:
question = "Give me 15 the entries from the artist table? Sort the names alphabetically."
raven_call, prompt = get_query(database_path = "data/Chinook.db", question = question)
print(raven_call)
eval(raven_call)

execute_sql(sql='SELECT * FROM Artist ORDER BY Name ASC LIMIT 15', database_path='data/Chinook.db')
Query operation executed successfully. Number of rows returned: 15


[(43, 'A Cor Do Som'),
 (1, 'AC/DC'),
 (230, 'Aaron Copland & London Symphony Orchestra'),
 (202, 'Aaron Goldberg'),
 (214, 'Academy of St. Martin in the Fields & Sir Neville Marriner'),
 (215,
  'Academy of St. Martin in the Fields Chamber Ensemble & Sir Neville Marriner'),
 (222,
  'Academy of St. Martin in the Fields, John Birch, Sir Neville Marriner & Sylvia McNair'),
 (257,
  'Academy of St. Martin in the Fields, Sir Neville Marriner & Thurston Dart'),
 (239,
  'Academy of St. Martin in the Fields, Sir Neville Marriner & William Bennett'),
 (2, 'Accept'),
 (260, 'Adrian Leaper & Doreen de Feis'),
 (3, 'Aerosmith'),
 (161, "Aerosmith & Sierra Leone's Refugee Allstars"),
 (197, 'Aisha Duo'),
 (4, 'Alanis Morissette')]

In [20]:
question = "How many tracks did each composer have? Only show the 5 most common composers."
raven_call, prompt = get_query(database_path = "data/Chinook.db", question = question)
print(raven_call)
eval(raven_call)

execute_sql(sql='SELECT Composer, COUNT(TrackId) AS NumberOfTracks FROM Track GROUP BY Composer ORDER BY NumberOfTracks DESC LIMIT 5', database_path='data/Chinook.db')
Query operation executed successfully. Number of rows returned: 5


[(None, 977),
 ('Steve Harris', 80),
 ('U2', 44),
 ('Jagger/Richards', 35),
 ('Billy Corgan', 31)]

# Pushing the LLM - Hallucinations

We have now seen that the LLM is capable of performing simple tasks to extract data from a database. One major problem with LLM are hallicinations meaning wrong outputs, which in this case can be devided into two types:
* Wrong syntax leading to a error, e.g. using functions of a different SQL dialect
* Correct syntax yet output created is semantically wrong, e.g. the query is wrong.

Let's see what it can do to for more complicated questions. We'll be using the Chinook database as it contains multiple tables.

## Simple Joins


In [21]:
question = "What is the genre of the track with the longest duration?" 

raven_call, prompt = get_query(database_path = "data/Chinook.db", question = question)
print(raven_call)
eval(raven_call)

execute_sql(sql='SELECT Genre.Name, Track.Name, Track.Milliseconds FROM Track JOIN Genre ON Track.GenreId = Genre.GenreId ORDER BY Track.Milliseconds DESC LIMIT 1', database_path='data/Chinook.db')
Query operation executed successfully. Number of rows returned: 1


[('TV Shows', 'Occupation / Precipice', 5286953)]

In [22]:
question = "List the tracks and their genre? List 10 entries"

raven_call, prompt = get_query(database_path = "data/Chinook.db", question = question)
print(raven_call)
eval(raven_call)

execute_sql(sql='SELECT Track.Name, Genre.Name FROM Track JOIN Genre ON Track.GenreId = Genre.GenreId LIMIT 10', database_path='data/Chinook.db')
Query operation executed successfully. Number of rows returned: 10


[('For Those About To Rock (We Salute You)', 'Rock'),
 ('Balls to the Wall', 'Rock'),
 ('Fast As a Shark', 'Rock'),
 ('Restless and Wild', 'Rock'),
 ('Princess of the Dawn', 'Rock'),
 ('Put The Finger On You', 'Rock'),
 ("Let's Get It Up", 'Rock'),
 ('Inject The Venom', 'Rock'),
 ('Snowballed', 'Rock'),
 ('Evil Walks', 'Rock')]

In [23]:
question = "Which album has the most tracks?"
raven_call, prompt = get_query(database_path = "data/Chinook.db", question = question)
print(raven_call)
eval(raven_call)

execute_sql(sql='SELECT Album.Title, COUNT(Track.TrackId) AS TrackCount FROM Album JOIN Track ON Album.AlbumId = Track.AlbumId GROUP BY Album.Title ORDER BY TrackCount DESC LIMIT 1;', database_path='data/Chinook.db')
Query operation executed successfully. Number of rows returned: 1


[('Greatest Hits', 57)]

In [24]:
question = "Which album has shortest total playtime of milliseconds?"
raven_call, prompt = get_query(database_path = "data/Chinook.db", question = question)
print(raven_call)
eval(raven_call)

execute_sql(sql='SELECT Album.Title, SUM(Track.Milliseconds) AS TotalPlaytime FROM Album JOIN Track ON Album.AlbumId = Track.AlbumId GROUP BY Album.Title ORDER BY TotalPlaytime ASC LIMIT 1', database_path='data/Chinook.db')
Query operation executed successfully. Number of rows returned: 1


[("Liszt - 12 Études D'Execution Transcendante", 51780)]

Great. It seems to be performing quite well on simple joins. Let's look at queries multiple joins.

## Multiple joins
How about the following question: Get the artist name and genre of the each track? Limit the output to 10 entries.

It produces an output, yet on closer inspection, the parts of the demanded output is missing, namely the artist name.

In [25]:
question = "Get the album name and genre of the each track? Limit the output to 10 entries" #semantically incorrect

raven_call, prompt = get_query(database_path = "data/Chinook.db", question = question)
print(raven_call)
eval(raven_call)


execute_sql(sql='SELECT Track.Name, Genre.Name FROM Track JOIN Genre ON Track.GenreId = Genre.GenreId LIMIT 10', database_path='data/Chinook.db')
Query operation executed successfully. Number of rows returned: 10


[('For Those About To Rock (We Salute You)', 'Rock'),
 ('Balls to the Wall', 'Rock'),
 ('Fast As a Shark', 'Rock'),
 ('Restless and Wild', 'Rock'),
 ('Princess of the Dawn', 'Rock'),
 ('Put The Finger On You', 'Rock'),
 ("Let's Get It Up", 'Rock'),
 ('Inject The Venom', 'Rock'),
 ('Snowballed', 'Rock'),
 ('Evil Walks', 'Rock')]

Maybe it helps to rewrite the question and to provide more information:

In [26]:
question = "Get the album name of each track and its genre name? This requires a join of the Album table, the Genre table and the Track table" #syntax error

raven_call, prompt = get_query(database_path = "data/Chinook.db", question = question)
print(raven_call)
eval(raven_call)

execute_sql(sql='SELECT Album.Title, Genre.Name FROM Album JOIN Genre ON Album.GenreId = Genre.GenreId JOIN Track ON Album.AlbumId = Track.AlbumId', database_path='data/Chinook.db')


OperationalError: no such column: Album.GenreId

No, this time it tries to query a column that doesn't exist: Album.GenreId.

The following question is answered correctly:

In [None]:
question = "What is the name of the playlist with the longest playtime in milliseconds" #lets check this complex query

raven_call, prompt = get_query(database_path = "data/Chinook.db", question = question)
print(raven_call)
eval(raven_call)

execute_sql(sql='SELECT Playlist.Name, SUM(Track.Milliseconds) AS TotalMilliseconds FROM Playlist JOIN PlaylistTrack ON Playlist.PlaylistId = PlaylistTrack.PlaylistId JOIN Track ON PlaylistTrack.TrackId = Track.TrackId GROUP BY Playlist.Name ORDER BY TotalMilliseconds DESC LIMIT 1;', database_path='data/Chinook.db')
Query operation executed successfully. Number of rows returned: 1


[('Music', 1755366166)]

# Summary

This project explore the capabilities of the Nexus Raven LLM to call functions and to extract data from text. The LLM was then used to create SQL queries. The final performance of the LLM is very robust for simpler queries. I am sure that more elaborate prompts will help the LLM to get more accurate results on more complicated questions. Although, the expected performance increase in prompt engineering is limited. The occurance of hallucinations seems to rapidly increase for more complex queries involving multiple joins. A common mistake is that the LLM starts to use column names of the wrong table, i.e. Album.GenreId, or it simply ignores the part of the query. 

The created functions from this project come in very handy in first interacting with a unknown database performing easier queries. Yet, the prompt should always be closely inspected in order to gage the received answers. It only needs the following code block which can be copied in a Jupyter notebook to get a first overview of a new data base. 


In [None]:
from utils import execute_sql, get_query

question = "What is the name and email of each employee?"
raven_call, prompt = get_query("./data/Chinook.db", question)   
print(raven_call)
print(eval(raven_call))

execute_sql(sql='SELECT LastName, FirstName, Email FROM Employee', database_path='./data/Chinook.db')
Query operation executed successfully. Number of rows returned: 8
[('Adams', 'Andrew', 'andrew@chinookcorp.com'), ('Edwards', 'Nancy', 'nancy@chinookcorp.com'), ('Peacock', 'Jane', 'jane@chinookcorp.com'), ('Park', 'Margaret', 'margaret@chinookcorp.com'), ('Johnson', 'Steve', 'steve@chinookcorp.com'), ('Mitchell', 'Michael', 'michael@chinookcorp.com'), ('King', 'Robert', 'robert@chinookcorp.com'), ('Callahan', 'Laura', 'laura@chinookcorp.com')]


To access the performance quantitatively, we would need a evaluation set of question/query pairs, which I want to explore in my next project. That would enable to properly evaluate measures to increase the performance of the LLM. This project will include iteratively improving the model on a specific database by creating evaluations sets, and memory tuning, which reduces hallucinations by embedding facts directly into the weights of the model. This process will allow for the introductions of some sort of determinism in the probabilistic LLM without sacrificing generalization.
