Arsonor is a blog dedicated to home-studio audio production, aimed at creators of music, podcasts, videos, films and video games.
It covers essential topics such as sound design, post-production and home-studio optimization. You will find practical advice on improving sound quality, managing audio levels and using tools such as VST synths and mixing software. The goal is to make home sound creation accessible, by providing techniques and tips for a professional result.
Important note: This blog is in french! as it was first built to help the french-speaking community.
But don't worry!
All information you need to know to run the application is written in English. Just don't be surprised if I use test queries in french ;)
This project introduces an innovative LLM RAG (Retrieval-Augmented Generation) system designed to answer readers' questions on music production, sound design, post-production, and more.
This system enhances the user experience by delivering accurate, context-aware responses to queries, drawing from the blog's extensive content library, making it easier for creators to find the information they need.
This project was implemented for LLM Zoomcamp - a free course about LLMs and RAG.
The Arsonor-LLM-RAG system integrates an AI-powered model capable of understanding user queries and retrieving relevant content from Arsonor’s knowledge base. This approach combines the flexibility of language models with the accuracy of focused content retrieval, ensuring that users get precise, helpful responses.
Examples of Use:
- User Question: "What is a VCO and how does it work in a synthesizer?"
- LLM RAG Response: Detailed explanation of VCOs with relevant information on their modulation and sound generation.
- User Question: "How do I reduce background noise in my podcast recordings?"
- LLM RAG Response: Techniques for noise reduction using EQ, gating, and software solutions, linking directly to Arsonor tutorials.
- User Question: "What equipment do I need for a home studio?"
- LLM RAG Response: Tailored suggestions based on the user’s goals, from budget setups to professional configurations, with in-depth guides from the blog.
This system transforms how readers interact with the reading content of this blog, providing immediate, accurate answers, and enhancing their journey in audio production.
The dataset used in this project was generated by scrapping informations about all the articles from the blog Arsonor. All articles can be found and retrieved from the sitemap page.
You can read the code for this "scrapping/getting the data" stage in the following notebook arsonor_parse.ipynb
The resulting dataset is a JSON file, found in data/arsonor_data.json
, that contains information in this form:
{
'title': "main title of the article",
'category': "the category from where it belongs ('home-studio', 'sound design' or 'post-production')",
'text': "the text content of the whole article",
'tags': "a list of keywords based on the article content"
}
This file serves as the foundation for the knowledge base in the assistant app to answer Audio Production queries.
It contains 58 records corresponding to the number of articles currently presents on the site. But each article is quite a long text so each record need to pass a chunking step in order to be indexed properly.
- Python 3.12
- Docker and Docker Compose for containerization
- Minsearch for full-text search
- Flask as the API interface (see Background for more information on Flask)
- Grafana for monitoring and PostgreSQL as the backend for it
- OpenAI as an LLM
Since we use OpenAI, you need to provide the API key:
- Install
direnv
. If you use Ubuntu, runsudo apt install direnv
and thendirenv hook bash >> ~/.bashrc
. - Copy
.envrc_template
into.envrc
and insert your key there. - For OpenAI, it's recommended to create a new project and use a separate key.
- Run
direnv allow
to load the key into your environment.
For dependency management, we use pipenv, so you need to install it:
pip install pipenv
Once installed, you can install the app dependencies:
pipenv install --dev
Before the application starts for the first time, the database needs to be initialized.
First, run postgres
:
docker-compose up postgres
Then run the db_prep.py
script:
pipenv shell
cd fitness_assistant
export POSTGRES_HOST=localhost
python db_prep.py
To check the content of the database, use pgcli
(already
installed with pipenv):
pipenv run pgcli -h localhost -U your_username -d course_assistant -W
You can view the schema using the \d
command:
\d conversations;
And select from this table:
select * from conversations;
The easiest way to run the application is with docker-compose
:
docker-compose up
If you want to run the application locally, start only postres and grafana:
docker-compose up postgres grafana
If you previously started all applications with
docker-compose up
, you need to stop the app
:
docker-compose stop app
Now run the app on your host machine:
pipenv shell
cd fitness_assistant
export POSTGRES_HOST=localhost
python app.py
Sometimes you might want to run the application in Docker without Docker Compose, e.g., for debugging purposes.
First, prepare the environment by running Docker Compose as in the previous section.
Next, build the image:
docker build -t fitness-assistant .
And run it:
docker run -it --rm \
--network="fitness-assistant_default" \
--env-file=".env" \
-e OPENAI_API_KEY=${OPENAI_API_KEY} \
-e DATA_PATH="data/data.csv" \
-p 5000:5000 \
fitness-assistant
When inserting logs into the database, ensure the timestamps are correct. Otherwise, they won't be displayed accurately in Grafana.
When you start the application, you will see the following in your logs:
Database timezone: Etc/UTC
Database current time (UTC): 2024-08-24 06:43:12.169624+00:00
Database current time (Europe/Berlin): 2024-08-24 08:43:12.169624+02:00
Python current time: 2024-08-24 08:43:12.170246+02:00
Inserted time (UTC): 2024-08-24 06:43:12.170246+00:00
Inserted time (Europe/Berlin): 2024-08-24 08:43:12.170246+02:00
Selected time (UTC): 2024-08-24 06:43:12.170246+00:00
Selected time (Europe/Berlin): 2024-08-24 08:43:12.170246+02:00
Make sure the time is correct.
You can change the timezone by replacing TZ
in .env
.
On some systems, specifically WSL, the clock in Docker may get out of sync with the host system. You can check that by running:
docker run ubuntu date
If the time doesn't match yours, you need to sync the clock:
wsl
sudo apt install ntpdate
sudo ntpdate time.windows.com
Note that the time is in UTC.
After that, start the application (and the database) again.
When the application is running, we can start using it.
We built an interactive CLI application using questionary.
To start it, run:
pipenv run python cli.py
You can also make it randomly select a question from our ground truth dataset:
pipenv run python cli.py --random
When the application is running, you can use requests to send questions—use test.py for testing it:
pipenv run python test.py
It will pick a random question from the ground truth dataset and send it to the app.
You can also use curl
for interacting with the API:
URL=http://localhost:5000
QUESTION="Is the Lat Pulldown considered a strength training activity, and if so, why?"
DATA='{
"question": "'${QUESTION}'"
}'
curl -X POST \
-H "Content-Type: application/json" \
-d "${DATA}" \
${URL}/question
You will see something like the following in the response:
{
"answer": "Yes, the Lat Pulldown is considered a strength training activity. This classification is due to it targeting specific muscle groups, specifically the Latissimus Dorsi and Biceps, which are essential for building upper body strength. The exercise utilizes a machine, allowing for controlled resistance during the pulling action, which is a hallmark of strength training.",
"conversation_id": "4e1cef04-bfd9-4a2c-9cdd-2771d8f70e4d",
"question": "Is the Lat Pulldown considered a strength training activity, and if so, why?"
}
Sending feedback:
ID="4e1cef04-bfd9-4a2c-9cdd-2771d8f70e4d"
URL=http://localhost:5000
FEEDBACK_DATA='{
"conversation_id": "'${ID}'",
"feedback": 1
}'
curl -X POST \
-H "Content-Type: application/json" \
-d "${FEEDBACK_DATA}" \
${URL}/feedback
After sending it, you'll receive the acknowledgement:
{
"message": "Feedback received for conversation 4e1cef04-bfd9-4a2c-9cdd-2771d8f70e4d: 1"
}
The code for the application is in the arsonor_assistant
folder:
app.py
- the Flask API, the main entrypoint to the applicationrag.py
- the main RAG logic for building the retrieving the data and building the promptingest.py
- loading the data into the knowledge baseminsearch.py
- an in-memory search enginedb.py
- the logic for logging the requests and responses to postgresdb_prep.py
- the script for initializing the database
We also have some code in the project root directory:
We use Flask for serving the application as an API.
Refer to the "Using the Application" section for examples on how to interact with the application.
The ingestion script is in ingest.py
.
Since we use an in-memory database, minsearch
, as our
knowledge base, we run the ingestion script at the startup
of the application.
It's executed inside rag.py
when we import it.
For experiments, we use Jupyter notebooks.
They are in the notebooks
folder.
To start Jupyter, run:
cd notebooks
pipenv run jupyter notebook
We have the following notebooks:
rag-test.ipynb
: The RAG flow and evaluating the system.evaluation-data-generation.ipynb
: Generating the ground truth dataset for retrieval evaluation.
The basic approach - using minsearch
without any boosting - gave the following metrics:
- Hit rate: %
- MRR: %
The improved version (with tuned boosting):
- Hit rate: %
- MRR: %
The best boosting parameters:
boost = {
}
We used the LLM-as-a-Judge metric to evaluate the quality of our RAG flow.
For gpt-4o-mini
, in a sample with 200 records, we had:
- (%)
RELEVANT
- (%)
PARTLY_RELEVANT
- (%)
NON_RELEVANT
We also tested gpt-4o
:
- (%)
RELEVANT
- (%)
PARTLY_RELEVANT
- (%)
NON_RELEVANT
The difference is minimal, so we opted for gpt-4o-mini
.
We use Grafana for monitoring the application.
It's accessible at localhost:3000:
- Login: "admin"
- Password: "admin"
The monitoring dashboard contains several panels:
- Last 5 Conversations (Table): Displays a table showing the five most recent conversations, including details such as the question, answer, relevance, and timestamp. This panel helps monitor recent interactions with users.
- +1/-1 (Pie Chart): A pie chart that visualizes the feedback from users, showing the count of positive (thumbs up) and negative (thumbs down) feedback received. This panel helps track user satisfaction.
- Relevancy (Gauge): A gauge chart representing the relevance of the responses provided during conversations. The chart categorizes relevance and indicates thresholds using different colors to highlight varying levels of response quality.
- OpenAI Cost (Time Series): A time series line chart depicting the cost associated with OpenAI usage over time. This panel helps monitor and analyze the expenditure linked to the AI model's usage.
- Tokens (Time Series): Another time series chart that tracks the number of tokens used in conversations over time. This helps to understand the usage patterns and the volume of data processed.
- Model Used (Bar Chart): A bar chart displaying the count of conversations based on the different models used. This panel provides insights into which AI models are most frequently used.
- Response Time (Time Series): A time series chart showing the response time of conversations over time. This panel is useful for identifying performance issues and ensuring the system's responsiveness.
All Grafana configurations are in the grafana
folder:
init.py
- for initializing the datasource and the dashboard.dashboard.json
- the actual dashboard (taken from LLM Zoomcamp without changes).
To initialize the dashboard, first ensure Grafana is
running (it starts automatically when you do docker-compose up
).
Then run:
pipenv shell
cd grafana
# make sure the POSTGRES_HOST variable is not overwritten
env | grep POSTGRES_HOST
python init.py
Then go to localhost:3000:
- Login: "admin"
- Password: "admin"
When prompted, keep "admin" as the new password.
Here we provide background on some tech not used in the course and links for further reading.
We use Flask for creating the API interface for our application.
It's a web application framework for Python: we can easily
create an endpoint for asking questions and use web clients
(like curl
or requests
) for communicating with it.
In our case, we can send questions to http://localhost:5000/question
.
For more information, visit the official Flask documentation.
A grateful thanks to Alexey Grigorev for the creation and supervision of this LLM Zoomcamp without which this project would not have been possible. I would like to thank him as well for all his valuable teaching and support.
And I don't forget the help of the entire Slack community to answer all our questions. Last but not least, thanks to my peers for reviewing this project and helping me to improve it.