# Vanna.AI

[Vanna.AI](https://vanna.ai/) is an open-source library to create and execute SQL on your database using LLMs.

It implements a RAG approach for providing ("training") and retrieving context (schema + examples + domain knowledge) to create SQL queries that can be executed in the database. It also analyzes the query output and creates figures using Plotly.

![](https://private-user-images.githubusercontent.com/7146154/299417072-1d2718ad-12a8-4a76-afa2-c61754462f93.gif?jwt=eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3NTY4NDYzNjcsIm5iZiI6MTc1Njg0NjA2NywicGF0aCI6Ii83MTQ2MTU0LzI5OTQxNzA3Mi0xZDI3MThhZC0xMmE4LTRhNzYtYWZhMi1jNjE3NTQ0NjJmOTMuZ2lmP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI1MDkwMiUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNTA5MDJUMjA0NzQ3WiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9OGRiZDU5NGFhMWQxYTVhNjgzZWE0YTIyN2FiOWNhNTY1NjA5MWIxNzU5MjgyM2QyOGEzYzkyMGE1MTgxMjQwMiZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QifQ.Bimn4pG6r5ybmCiIU8z9zjuGTyjB-aCFPR1sqUrBVew)

This notebook follows the tutorial named [Quickstart With Your Own Data](https://vanna.ai/docs/postgres-openai-standard-chromadb/) from Vanna's documentation.

In [1]:
import os

from vanna.openai import OpenAI_Chat
from vanna.chromadb import ChromaDB_VectorStore

Let's first instantiate the a Vanna class and connect it to our database:

In [None]:
# Create a custom class as per the Vanna documentation
class MyVanna(ChromaDB_VectorStore, OpenAI_Chat):
    def __init__(self, config=None):
        ChromaDB_VectorStore.__init__(self, config=config)
        OpenAI_Chat.__init__(self, config=config)

# Initialize the Vanna class
vn = MyVanna(config={"api_key": os.getenv("OPENAI_API_KEY"), "model": "gpt-4.1-mini"}) # default temperature is 0.7

# Connect to the database
vn.connect_to_postgres(host="localhost", dbname="olist_ecommerce", user="postgres", password="postgres", port=5432)

Once connected, we can run SQL code directly:

In [None]:
vn.run_sql("SELECT 1")

So far, no "training" has really happened, meaning that the model know nothing about our database.  
If we try to ask something, it will hallucinate:

In [None]:
# Ask the LLM to generate a SQL query
vn.ask("What are the 10 most popular product categories?")

Vanna provides a way to ellaborate a "plan" for "training" using the inspection of the database directly: 

In [None]:
# The information schema query may need some tweaking depending on your database. This is a good starting point.
df_information_schema = vn.run_sql("SELECT * FROM INFORMATION_SCHEMA.COLUMNS")
display(df_information_schema)

# Filter target schemas
target_schemas = ["ecommerce", "marketing"]
df_information_schema = df_information_schema.query("table_schema in @target_schemas")

# This will break up the information schema into bite-sized chunks that can be referenced by the LLM
plan = vn.get_training_plan_generic(df_information_schema)
plan

We can then train the data (create and populate the vector store for RAG):

In [None]:
# If you like the plan, then uncomment this and run it to train
vn.train(plan=plan)

In [None]:
# At any time you can inspect what training data the package is able to reference
training_data = vn.get_training_data()
training_data

Now, let's try re-asking the question we've asked before:

In [None]:
# Ask the LLM to generate a SQL query
result = vn.ask("What are the 10 most popular product categories?")

In [None]:
# Output is a tuple with the generated query, the query output, and the plotly figure
generated_query, query_output, figure = result

# Visualize the outputs
print(generated_query)
display(query_output)
display(figure)

## Launch the User Interface

In [None]:
from vanna.flask import VannaFlaskApp
app = VannaFlaskApp(vn, allow_llm_to_see_data=True)
app.run()

# Troubleshoot

- Using models like gpt-5 doesn't work because Vanna passes temperature as parameter, which is unsupported.
- We need to specify the API key in the MyVanna class, otherwise we get the error `'MyVanna' object has no attribute 'client'`.