Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How can I add examples of questions + expected SQL over my own schema (zero-shot style)? #48

Closed
orcaman opened this issue Nov 16, 2023 · 5 comments
Labels

Comments

@orcaman
Copy link

orcaman commented Nov 16, 2023

Some of our user questions are a bit tricky - for example, Geo Spatial queries.

I would like to provide the model with zero-shot examples of user questions and the expected SQL.

I've tried adding this to the prompt (an examples section in the prompt.md). For example this is my prompt markdown:

    # Task
    Generate a SQL query to answer the following question:
    `{user_question}`

    # Database Schema
    The query will run on a database with the following schema:
    {table_metadata_string}
    
    # Examples
    ## Example 1
    Question: What place has the most cars?
    SQL: SELECT places.name, COUNT(*) as cnt FROM cars JOIN places ON ST_Contains(ST_GeometryFromText(places.wkt_polygon), ST_Point(cars.latitude, cars.longitude)) GROUP BY places.name ORDER BY cnt DESC LIMIT 1;
    
    ## Example 2
    Question: What is the sum of bookings for each month in 2023?
    SQL: SELECT SUM(total_cost) AS total_cost, EXTRACT(MONTH FROM start_date) as month FROM car_bookings WHERE YEAR(start_date) = 2023 GROUP BY EXTRACT(MONTH FROM start_date) ORDER BY month;
    
    # SQL

But when I do this, the model always returns the SQL for the first example, regardless of the user question.

Any hints on how to approach this?

Thanks!

@NPap0
Copy link

NPap0 commented Nov 27, 2023

Some of our user questions are a bit tricky - for example, Geo Spatial queries.

I would like to provide the model with zero-shot examples of user questions and the expected SQL.

I've tried adding this to the prompt (an examples section in the prompt.md). For example this is my prompt markdown:

    # Task
    Generate a SQL query to answer the following question:
    `{user_question}`

    # Database Schema
    The query will run on a database with the following schema:
    {table_metadata_string}
    
    # Examples
    ## Example 1
    Question: What place has the most cars?
    SQL: SELECT places.name, COUNT(*) as cnt FROM cars JOIN places ON ST_Contains(ST_GeometryFromText(places.wkt_polygon), ST_Point(cars.latitude, cars.longitude)) GROUP BY places.name ORDER BY cnt DESC LIMIT 1;
    
    ## Example 2
    Question: What is the sum of bookings for each month in 2023?
    SQL: SELECT SUM(total_cost) AS total_cost, EXTRACT(MONTH FROM start_date) as month FROM car_bookings WHERE YEAR(start_date) = 2023 GROUP BY EXTRACT(MONTH FROM start_date) ORDER BY month;
    
    # SQL

But when I do this, the model always returns the SQL for the first example, regardless of the user question.

Any hints on how to approach this?

Thanks!

Maybe special instructions now can help with this by providing the examples in the special instructions tab? (Talking about the browser demo application)

@buddyroo30
Copy link

Yes, I'd like to know this too. In general, it seems the open source release of their SQL generating LLM is not well supported: there really isn't much documentation for it, and they don't really answer issues that are submitted here at GitHub. I did find this which is a "Schema Cookbook" for their API (presumably a paid product): https://defog.notion.site/Cookbook-for-schema-definitions-1650a6855ea447fdb0be75d39975571b and they cover this info, but it isn't clear how the API calls could be translated into a format you would send to SQLcoder. I guess they are more focused on monetizing this, which makes sense and is their right, but they did release this open source and it would be nice if they supported its use as an open source product too.

Copy link

This issue is stale because it has been open for 30 days with no activity.

@github-actions github-actions bot added the stale label Feb 29, 2024
Copy link

github-actions bot commented Mar 1, 2024

This issue was closed because it has been inactive for 14 days since being marked as stale.

@github-actions github-actions bot closed this as completed Mar 1, 2024
@rishsriv rishsriv reopened this Mar 1, 2024
@rishsriv
Copy link
Member

rishsriv commented Mar 1, 2024

Hi there, apologies – I had totally compeletely missed this earlier.

If still relevant (or for future visitors that come in from Google): you can prompt the SQLCoder-7b-2 model with reference queries, like this:

Generate a SQL query to answer [QUESTION]{user_question}[/QUESTION]

### Instructions
{glossary}

### Database Schema
The query will run on a database with the following schema:
{table_metadata_string}

### Reference Queries
If relevant to the question asked, you can use the following queries as references for answering the question:
EXAMPLE QUESTION 1: YOUR QUESTION
EXAMPLE QUERY 1: YOUR_QUERY (ideally as a single line)

EXAMPLE QUESTION 2: YOUR QUESTION
EXAMPLE QUERY 2: YOUR_QUERY (ideally as a single line)

### Answer
Given the database schema, here is the SQL query that answers [QUESTION]{user_question}[/QUESTION]
[SQL]```

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants