# Text-to-SQL Query Generation with Amazon Bedrock and Llama Models

## Overview
This notebook demonstrates how to implement a text-to-SQL solution using Llama Models through Amazon Bedrock to transform natural language questions into executable SQL queries. We'll explore various prompt engineering techniques specifically designed for text-to-SQL generation tasks to improve query accuracy and reliability.

### Key Features
- Convert natural language questions into SQL queries using Llama models on AWS Bedrock
- Execute queries on a SQLite database for testing and validation
- Demonstrate multiple prompt engineering techniques optimized for SQL generation
- Provide best practices for text-to-SQL applications
- Analyze performance across different prompting strategies

### Use Cases
- Interactive database querying through natural language
- Automated SQL query generation for business intelligence
- Database exploration for non-technical users
- Prototype development for enterprise database solutions

## Contents

1. [Llama Model Selection for Text-to-SQL Generation](#llama-model-selection-for-text-to-SQL-generation)
    + [Overview of Latest Llama Models in Amazon Bedrock](#overview-of-latest-llama-models-in-amazon-bedrock)
    + [Recommended Model Selection](#recommended-model-selection)
1. [Approach to Text-to-SQL Generation](#approach-to-text-to-sql-generation)
    + [LLM-Powered Query Generation](#llm-powered-query-generation)
    + [Implementation Benefits](#implementation-benefits)
1. [Objectives](#objectives)
1. [Tools](#tools)
1. [Pre-requisites](#pre-requisites)
1. [Getting Started](#getting-started)
    + [Step 0: Install Dependencies](#step-0:-install-dependencies)
    + [Step 1: Bedrock Setup](#step-1:-bedrock-setup)
    + [Step 2: Generate local database](#step-2:-generate-local-database)
    + [Step 3: Bedrock Inference](#step-3:-bedrock-inference)
        + [Analysing a Single Table with Few-Shot Learning](#analysing-a-single-table-with-few-shot-learning)
1. [Prompting tecniques for text-to-SQL tasks](#prompting-tecniques-for-text-to-SQL-tasks)
    + [Tecnique 1: Few-Short Learning For Text-to-SQL](#tecnique-1:-few-short-learning-for-text-to-SQL)
    + [Tecnique 2: Chain-of-Thought Prompting For Text-to-SQL](#tecnique2:-chain-of-thought-prompting-for-text-to-SQL)
    + [Tecnique 3: Progressive Prompting for Text-to-SQL](#tecnique-3:-progressive-prompting-for-text-to-SQL)
1. [Conclusion](#conclusion)
1. [Thank you!](#thank-you)

---

## Introduction to Text-to-SQL with Llama Models
Text-to-SQL is a powerful natural language processing application that allows users to query databases using everyday language instead of SQL syntax. This capability democratizes data access by enabling non-technical users to extract insights from databases without SQL expertise.

Amazon Bedrock provides access to powerful foundation models, including Meta's Llama models, which excel at understanding complex instructions and generating structured outputs like SQL queries. These models can interpret user intent from natural language and translate it into functional SQL.
In this notebook, we'll focus on:

- How to construct effective prompts for text-to-SQL generation
- Techniques to improve accuracy and reliability of generated queries
- Best practices for implementing text-to-SQL solutions with AWS Bedrock

---

## Llama Models in Amazon Bedrock
Amazon Bedrock offers access to Meta's latest Llama models, which are particularly well-suited for text-to-SQL tasks:
### Available Llama Models
**Llama 4 Models(Primary Focus)**
  - **Llama 4 Maverick 17B**: Optimized for complex reasoning and structured output generation
  - **Llama 4 Scout 17B**: Balanced performance for general-purpose tasks

**Llama 3.3 Models**
  - **Llama 3.3 70B**: Advanced reasoning capabilities for complex workloads
  - **Llama 3.3 8B**: Efficient, smaller-scale model with reduced compute requirements


**Llama 3.2 Models**
- **Llama 3.2 11B**: Mid-size model with strong contextual understanding
- **Llama 3.2 1B**: Lightweight model for less complex applications

**Llama 3.1 Models**
- **Llama 3.1 70B**: High-capability model for complex reasoning tasks
- **Llama 3.1 8B**: Efficient model with good performance-to-cost ratio

## Model Selection Guidance 
- For production text-to-SQL applications: Llama 4 Maverick 17B
- For development/testing: Llama 3.1 8B or Llama 3.2 1B
- For balanced performance/cost: Llama 4 Scout 17B or Llama 3.3 8B

In this notebook, we'll primarily use Llama 4 Scout 17B for our examples, as it provides a good balance of capability and efficiency for text-to-SQL tasks.

### Approach to Text-to-SQL Generation

This notebook demonstrates an efficient and scalable approach to text-to-SQL generation using Amazon Bedrock's Llama models and SQLite. Our method focuses on:

1. **Dynamic Schema Retrieval**: Utilizing SQLite's built-in functionality to efficiently extract database schema information.
2. **LLM-Powered Query Generation**: Leveraging Llama models to translate natural language into SQL queries.
3. **Single-Table and Multi-Table Support**: Handling both simple and complex queries involving multiple tables.

This approach offers several advantages:

- Efficiency: Direct schema extraction without additional dependencies.
- Accuracy: Up-to-date schema information reflecting the current database state.
- Scalability: Easily adaptable to larger databases or different SQL dialects.
#### LLM-Powered Query Generation
We utilize Llama models from Amazon Bedrock to generate SQL queries based on:

- The retrieved schema information
- The user's natural language query
- Few-shot examples for context
This method allows for:

- Contextual Understanding: The model considers both the schema and the query intent.
- Adaptability: Easily fine-tuned for specific domains or query patterns.
- Complex Query Support: Capability to generate multi-table joins and nested queries.
#### Implementation Benefits
- Simplified Architecture: Eliminates the need for external vector databases or complex retrieval systems.
- Improved Maintainability: Direct integration with the database keeps the system up-to-date with schema changes.
- Enhanced Performance: Reduced latency by minimizing external API calls and data transfers.

---

## Environment Setup
### Install Dependencies
We'll start by installing the required libraries for this notebook:

In [1]:
%pip install boto3 numpy langchain langchain-community   --upgrade --quiet

[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
sagemaker 2.232.0 requires numpy<2.0,>=1.9.0, but you have numpy 2.2.6 which is incompatible.[0m[31m
[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.0.1[0m[39;49m -> [0m[32;49m25.1.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip3.10 install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


### Import Required Libraries

In [1]:
import boto3
import json
import os
import botocore
from botocore.config import Config
from langchain_community.utilities import SQLDatabase
from json_to_sqlite import JsonToSqlite


### Setup AWS Bedrock Client
Now we'll configure the AWS Bedrock client with optimized settings:

In [2]:
# Set default configurations
config = Config(
    retries={
        'total_max_attempts': 100,  # More reasonable number than 100
        'max_attempts': 3,         # Maximum retry attempts
        'mode': 'adaptive',        # Uses adaptive retry mode with client-side throttling
    },
    connect_timeout=5,    # Reduce connection timeout from default 60s
    read_timeout=30,      # Reduce read timeout from default 60s
    max_pool_connections=50,  # Increase from default 10
    tcp_keepalive=True    # Enable TCP keepalive
)

# Setup Bedrock Client
bedrock_client = boto3.client('bedrock', region_name='us-west-2',
    config=config)
# Setup Bedrock Runtime
brt = boto3.client(service_name='bedrock-runtime',region_name='us-west-2',
    config=config)




### Database Preparation
#### Create SQLite Database
We'll use the provided JsonToSqlite class to create and populate our database:

In [3]:
# First, let's drop the existing database and recreate it
if os.path.exists('flights.db'):
    os.remove('flights.db')

# Initialize converter
converter = JsonToSqlite('flights.db')

# Process both JSON files
success = converter.process_json_files(
    flights_json='sample_data/flights.json',
    airplanes_json='sample_data/airplanes.json'
)

if success:
    print("Data successfully imported to SQLite database")
else:
    print("Error occurred during import")

Data successfully imported to SQLite database


#### Database Schema Analysis
Now, let's connect to the database and extract the schema to understand its structure:

In [4]:

# Initialize SQLDatabase from langchain for easy schema extraction
db = SQLDatabase.from_uri("sqlite:///flights.db", sample_rows_in_table_info=0)

# Function to get schema information
def get_schema():
    return db.get_table_info()

# Display the schema
db_schema = f"""
Schema:
{get_schema()}
"""
print(db_schema)


Schema:

CREATE TABLE airplanes (
	"Airplane_id" INTEGER, 
	"Producer" TEXT, 
	"Type" TEXT, 
	PRIMARY KEY ("Airplane_id")
)


CREATE TABLE flights (
	"Flight_number" TEXT, 
	"Arrival_time" DATETIME, 
	"Arrival_date" DATE, 
	"Departure_time" DATETIME, 
	"Departure_date" DATE, 
	"Destination" TEXT, 
	"Airplane_id" INTEGER, 
	PRIMARY KEY ("Flight_number"), 
	FOREIGN KEY("Airplane_id") REFERENCES airplanes ("Airplane_id")
)



#### Create Query Helper Function
We'll create a helper function to nicely format and display query results:

In [5]:
def pretty_print_query(db, query: str):
    """Execute a SQL query and print the results in a formatted table"""
    try:
        # Get the raw result
        result = db.run(query)
        
        # Convert string representation of list to actual list of tuples
        if isinstance(result, str):
            result = eval(result)
            if not isinstance(result, (list, tuple)):
                print("Invalid result format")
                return
        
        # Handle empty results
        if not result:
            print("No results found")
            return

        # Ensure we have a list of tuples
        if isinstance(result[0], (str, int, float)):
            result = [result]

        # Get number of columns from first result
        num_cols = len(result[0])
        columns = [f'Column_{i}' for i in range(num_cols)]
        
        # Calculate column widths
        widths = [len(str(col)) for col in columns]
        for row in result:
            for i, val in enumerate(row):
                widths[i] = max(widths[i], len(str(val)))
        
        # Create format template
        row_format = "| " + " | ".join(["{:<" + str(width) + "}" for width in widths]) + " |"
        separator = "+" + "+".join(["-" * (width + 2) for width in widths]) + "+"
        
        # Print table
        print(separator)
        print(row_format.format(*columns))
        print(separator)
        for row in result:
            print(row_format.format(*[str(val) for val in row]))
        print(separator)
    
    except Exception as e:
        print(f"Error executing query: {str(e)}")



---

## Basic Text-to-SQL Generation

Now that we can extract our sqlite schema and that we created our local database, we need to setup how we are going to request access to the Bedrock API

#### Analyzing a Single Table
First, we create a `system prompt` containing two parts:

1. `table_schema`. This is a description of the structure of the database table, including the name of the table, the names of the columns within each table, and the data types of each column. This information helps Llama 3 to understand the organization and contents of the table.

2. `question`. This is the specific request or information that the user wants to obtain from the table.

By including both the table schema and the user's question in the system prompt, we provide Llama 3 model a complete understanding of the table structure and the user's desired output.

In [6]:
#Bedrock cross inferece model id
model_id='us.meta.llama4-scout-17b-instruct-v1:0'

#### Create Model Invocation Function
First, let's create a helper function to invoke the Bedrock model:

In [7]:

def invoke_model(body, model_id, accept, content_type):
    try:
        response = brt.invoke_model(
            body=json.dumps(body), 
            modelId=model_id, 
            accept=accept, 
            contentType=content_type
        )

        return response

    except Exception as e:
        print(f"Couldn't invoke {model_id}")
        raise e

#### Simple Query Generation
Let's create a basic function to generate SQL from natural language:

In [8]:
# Create the prompt template for basic SQL generation
def create_sql_prompt(question: str, schema: str) -> str:
    return f"""<|begin_of_text|><|start_header_id|>user<|end_header_id|>

You are a SQL expert. Based on the following database schema, write a SQL query that would answer the question. Return only the SQL query, no explanations.

Database Schema:
{schema}

Question: {question}

<|eot_id|><|start_header_id|>assistant<|end_header_id|>
"""


def create_and_execute_sql_query(prompt: str, db: SQLDatabase,model_id:str) -> None:
    """Generate a SQL query from natural language and execute it"""
    # Create the prompt
    body = {
        "prompt": prompt,
        "temperature": 0.1,
        "top_p": 0.9,
        "max_gen_len": 512
    }
    
    # Get SQL query from Llama model
    response = invoke_model(
        body=body,
        model_id=model_id,
        accept='application/json',
        content_type='application/json'
    )
    
    # Parse the response and extract the SQL query
    response_json = json.loads(response['body'].read())
    full_response = response_json['generation'].strip()
    
    # Clean up the response - extract only valid SQL
    # First remove code blocks markers if present
    sql_query = full_response.replace('```sql', '').replace('```', '')
    
    # Handle multi-line statements and comments
    sql_lines = sql_query.split('\n')
    cleaned_lines = []
    for line in sql_lines:
        # Skip comment lines
        if line.strip().startswith('--'):
            continue
        # Keep only code lines
        if line.strip():
            cleaned_lines.append(line)
    
    # Take only the last complete SQL statement if there are multiple
    sql_blocks = ' '.join(cleaned_lines).split(';')
    final_sql = sql_blocks[-2] if len(sql_blocks) > 1 and sql_blocks[-1].strip() == '' else sql_blocks[0]
    final_sql = final_sql.strip() + ';'
    
    # Print the processed SQL query
    print("\nGenerated SQL Query:")
    print("-" * 40)
    print(final_sql)
    print("-" * 40)
    
    # Execute and display the results
    try:
        pretty_print_query(db, final_sql)
    except Exception as e:
        print(f"Error executing query: {str(e)}")

#### Query Execution and Validation
Let's test our basic implementation with a simple query:

In [9]:
# Example usage
question = "What are the different airplane producers represented in the database?"
prompt=create_sql_prompt(question, get_schema())
create_and_execute_sql_query(prompt, db,model_id=model_id)

# Try another example
question = "How many flights are scheduled for each destination?"
prompt=create_sql_prompt(question, get_schema())
create_and_execute_sql_query(prompt, db,model_id=model_id)


Generated SQL Query:
----------------------------------------
SELECT DISTINCT Producer FROM airplanes;
----------------------------------------
+------------+
| Column_0   |
+------------+
| Boeing     |
| Airbus     |
| Embraer    |
| Bombardier |
+------------+

Generated SQL Query:
----------------------------------------
SELECT      "Destination",      COUNT("Flight_number") AS "Number_of_flights" FROM      flights GROUP BY      "Destination";
----------------------------------------
+----------------+----------+
| Column_0       | Column_1 |
+----------------+----------+
| Atlanta        | 1        |
| Boston         | 1        |
| Chicago        | 1        |
| Dallas         | 1        |
| Denver         | 1        |
| Detroit        | 1        |
| Houston        | 1        |
| Las Vegas      | 1        |
| Los Angeles    | 1        |
| Miami          | 1        |
| Minneapolis    | 1        |
| New York       | 1        |
| Orlando        | 1        |
| Philadelphia   | 1      

---

## Prompt Engineering Techniques
Now we'll explore various prompt engineering techniques to improve text-to-SQL generation quality.

### Technique 1: Few-Shot Learning
Few-shot learning improves performance by providing examples of how to convert natural language to SQL queries:

In [10]:
def create_few_shot_sql_prompt(question: str, schema: str) -> str:
    return f"""<|begin_of_text|><|start_header_id|>user<|end_header_id|>

You are a SQL expert. Use these examples to understand how to convert questions to SQL queries.Return only the SQL query, no explanations:

Example 1:
Question: How many flights does each airline operate?
SQL: SELECT Flight_number LIKE 'AA%' as Airline, COUNT(*) as flight_count 
FROM flights 
GROUP BY Flight_number LIKE 'AA%';

Example 2:
Question: List all destinations served by Boeing aircraft ordered by frequency.
SQL: SELECT f.Destination, COUNT(*) as frequency
FROM flights f
JOIN airplanes a ON f.Airplane_id = a.Airplane_id
WHERE a.Producer = 'Boeing'
GROUP BY f.Destination
ORDER BY frequency DESC;

Example 3:
Question: Which aircraft types have never been used for flights to Chicago?
SQL: SELECT a.Type
FROM airplanes a
WHERE a.Airplane_id NOT IN (
    SELECT Airplane_id 
    FROM flights 
    WHERE Destination = 'Chicago'
);

Database Schema:
{schema}

Question: {question}
<|eot_id|><|start_header_id|>assistant<|end_header_id|>
"""
question="What is the total number of flights per destination?"
prompt= create_few_shot_sql_prompt(question,get_schema())
create_and_execute_sql_query(prompt, db,model_id)


Generated SQL Query:
----------------------------------------
SELECT Destination, COUNT(*) as total_flights  FROM flights  GROUP BY Destination;
----------------------------------------
+----------------+----------+
| Column_0       | Column_1 |
+----------------+----------+
| Atlanta        | 1        |
| Boston         | 1        |
| Chicago        | 1        |
| Dallas         | 1        |
| Denver         | 1        |
| Detroit        | 1        |
| Houston        | 1        |
| Las Vegas      | 1        |
| Los Angeles    | 1        |
| Miami          | 1        |
| Minneapolis    | 1        |
| New York       | 1        |
| Orlando        | 1        |
| Philadelphia   | 1        |
| Phoenix        | 1        |
| Salt Lake City | 1        |
| San Diego      | 1        |
| San Francisco  | 1        |
| Seattle        | 1        |
| Tampa          | 1        |
+----------------+----------+


**Benefits of Few-Shot Learning for Text-to-SQL**
1. Pattern Recognition: Model learns from similar examples
2. Query Structure: Examples demonstrate proper SQL syntax and structure
3. Diverse Solutions: Multiple examples show different query patterns
4. Context Understanding: Examples help map natural language to SQL concepts
5. Error Prevention: Examples demonstrate proper handling of joins, conditions, and aggregations

### Technique 2: Chain-of-Thought Prompting
Chain-of-Thought prompting breaks down the query generation process into logical steps:

In [11]:
def create_cot_sql_prompt(question: str, schema: str) -> str:
    return f"""<|begin_of_text|><|start_header_id|>user<|end_header_id|>

You are a SQL expert. Follow this reasoning chain to create the SQL query:

1. IDENTIFY REQUIREMENTS
- Understand what data we need from the question
- Map requirements to available tables and columns
- IMPORTANT: Only use columns that exist in the schema

2. DETERMINE RELATIONSHIPS
- Identify which tables contain the required data
- Establish necessary table relationships
- Plan required JOIN operations

3. HANDLE MISSING CONCEPTS
- If the question asks for data not directly in schema, determine how to calculate it
- Use available columns to derive needed information
- For time comparisons, use SQLite date/time functions like strftime()

4. SPECIFY CONDITIONS
- Define WHERE clauses for filtering
- Determine if aggregations are needed
- Plan GROUP BY requirements

5. FINALIZE STRUCTURE
- Organize the SELECT statement
- Order results if needed
- Ensure proper syntax
- Verify all referenced columns exist in the schema

Database Schema:
{schema}

Question: {question}

Generate ONLY the executable SQL query based on this reasoning chain, without explanations.
<|eot_id|><|start_header_id|>assistant<|end_header_id|>
"""

Example Implementation

In [12]:
# Example with reasoning steps
question = "Find all flights where Boeing airplanes arrived later than their scheduled arrival time in June 2023."
prompt= create_cot_sql_prompt(question,get_schema())
create_and_execute_sql_query(prompt, db,model_id)


Generated SQL Query:
----------------------------------------
SELECT f.Flight_number, f.Arrival_time, f.Arrival_date FROM flights f JOIN airplanes a ON f.Airplane_id = a.Airplane_id WHERE a.Producer = 'Boeing' AND f.Arrival_date BETWEEN '2023-06-01' AND '2023-06-30' AND f.Arrival_time > f.Departure_time;
----------------------------------------
+----------+---------------------+------------+
| Column_0 | Column_1            | Column_2   |
+----------+---------------------+------------+
| AA123    | 2023-06-15T10:30:00 | 2023-06-15 |
| UA567    | 2023-06-23T09:15:00 | 2023-06-23 |
| AA890    | 2023-06-24T18:40:00 | 2023-06-24 |
+----------+---------------------+------------+


**Benefits of CoT for Text-to-SQL**

1. **Better Understanding**: Breaking down the problem helps the model understand complex query requirements
2. **Improved Accuracy**: Step-by-step reasoning reduces errors in query construction
3. **Maintainable Structure**: Generated queries follow a logical, organized pattern
4. **Debugging Support**: Each step can be verified independently
5. **Complex Query Handling**: Better handles multi-table joins and nested queries
This approach significantly improves the quality of generated SQL queries, especially for complex database operations requiring multiple tables and conditions.

### Technique 3: Progressive Prompting
Progressive prompting builds SQL queries incrementally by breaking down the process into clear, sequential steps:

In [13]:
def create_progressive_sql_prompt(question: str, schema: str) -> str:
    return f"""<|begin_of_text|><|start_header_id|>user<|end_header_id|>

You are a SQL expert. Generate a query for SQLite (without using WITH clause) following these steps:

1. BASE QUERY
Start with basic SELECT and table structures:
- Use simple JOIN syntax
- Avoid Common Table Expressions (WITH clause)
- Use direct calculations in SELECT

2. JOIN OPERATIONS
Add necessary table joins:
- Use INNER JOIN syntax
- Keep joins simple and direct

3. FILTERING CONDITIONS
Apply WHERE clauses:
- Use proper date formatting
- Handle time calculations inline

4. AGGREGATION LOGIC
Include grouping and aggregation:
- Use SQLite's date/time functions
- Calculate averages directly

5. RESULT ORGANIZATION
Finalize with ordering:
- Sort by multiple columns
- Apply proper GROUP BY

Database Schema:
{schema}

Question: {question}

Return ONLY the SQL query with no explanations or introduction text.
<|eot_id|><|start_header_id|>assistant<|end_header_id|>
"""

**Example Implementation**

In [14]:
# Example with reasoning steps
question = """Show me the top 5 destinations with the most flights by Boeing aircraft, 
including the number of flights for each destination."""
prompt= create_progressive_sql_prompt(question,get_schema())
create_and_execute_sql_query(prompt, db,model_id)


Generated SQL Query:
----------------------------------------
SELECT      f.Destination,      COUNT(f.Flight_number) as Number_of_Flights FROM      flights f INNER JOIN      airplanes a ON f.Airplane_id = a.Airplane_id WHERE      a.Producer = 'Boeing' GROUP BY      f.Destination ORDER BY      Number_of_Flights DESC,      f.Destination LIMIT 5;
----------------------------------------
+-------------+----------+
| Column_0    | Column_1 |
+-------------+----------+
| Atlanta     | 1        |
| Houston     | 1        |
| Los Angeles | 1        |
| Minneapolis | 1        |
+-------------+----------+


**Benefits of Progressive Prompting**
1. **Clear Structure**: Each step builds on the previous one
2. **Error Prevention**: Easier to catch issues at each stage
3. **Complexity Management**: Breaks down complex queries into manageable parts
4. **Learning Process**: Shows how queries evolve from simple to complex
5. **Maintainability**: Easier to modify or debug queries

### Testing Our Techniques
Now let's test all our techniques with a moderately complex query:

In [15]:
# Test question for comparison
complex_question = """Show me the number of flights for each airplane producer, sorted from highest to lowest count"""

print("\n=== BASIC APPROACH ===")
prompt=create_sql_prompt(complex_question, get_schema())
create_and_execute_sql_query(prompt, db,model_id)

print("\n=== FEW-SHOT LEARNING ===")
prompt=create_few_shot_sql_prompt(complex_question, get_schema())
create_and_execute_sql_query(prompt, db,model_id)

print("\n=== CHAIN-OF-THOUGHT ===")
prompt=create_cot_sql_prompt(complex_question, get_schema())
create_and_execute_sql_query(prompt, db,model_id)

print("\n=== PROGRESSIVE PROMPTING ===")
prompt=create_progressive_sql_prompt(complex_question, get_schema())
create_and_execute_sql_query(prompt, db,model_id)




=== BASIC APPROACH ===

Generated SQL Query:
----------------------------------------
SELECT T2.Producer, COUNT(T1.Flight_number) FROM flights AS T1 INNER JOIN airplanes AS T2 ON T1.Airplane_id = T2.Airplane_id GROUP BY T2.Producer ORDER BY COUNT(T1.Flight_number) DESC;
----------------------------------------
+------------+----------+
| Column_0   | Column_1 |
+------------+----------+
| Airbus     | 8        |
| Embraer    | 5        |
| Boeing     | 4        |
| Bombardier | 3        |
+------------+----------+

=== FEW-SHOT LEARNING ===

Generated SQL Query:
----------------------------------------
SELECT T1.Producer, COUNT(T2.Flight_number) as flight_count  FROM airplanes T1  JOIN flights T2 ON T1.Airplane_id = T2.Airplane_id  GROUP BY T1.Producer  ORDER BY flight_count DESC;
----------------------------------------
+------------+----------+
| Column_0   | Column_1 |
+------------+----------+
| Airbus     | 8        |
| Embraer    | 5        |
| Boeing     | 4        |
| Bombardi

## Best Practices for Text-to-SQL Prompting

### Schema Information
- Include complete schema information including table relationships
- Specify data types to help model understand column constraints
- Consider providing sample data for complex schemas
### Prompt Engineering
- Use consistent SQL style across examples (formatting, capitalization)
- Specify SQL dialect explicitly (SQLite, MySQL, PostgreSQL)
- Include join conditions and constraints in examples
- Specify output format requirements (column names, sorting)
### Error Handling
- Implement query validation before execution
- Handle NULL values and edge cases in your prompts
- Consider post-processing LLM outputs to correct common errors
### Performance Considerations
- Guide the model to avoid SELECT * for production use
- Encourage appropriate indexing hints where beneficial
- For complex queries, use progressive generation with intermediate validation
### Security
- Implement strict validation of generated SQL before execution
- Never directly execute user inputs without validation
- Use parameterized queries when implementing in production

## Conclusion
Throughout this notebook, we've explored several advanced prompt engineering techniques for text-to-SQL generation using Llama models on Amazon Bedrock:

1. **Basic Prompting** provides a solid foundation but may struggle with complex queries
2. **Few-Shot Learning** significantly improves query accuracy by providing relevant examples
3. **Chain-of-Thought Prompting** helps with complex reasoning by breaking down the query generation process
4. **Progressive Prompting** builds queries step-by-step for better structure and organization
5. **Validation-Focused Prompting** ensures error-free, production-ready queries with multiple validation layers

Our experiments demonstrate that the choice of prompting technique should be guided by:
- Query complexity requirements
- Performance and latency constraints
- Accuracy requirements
- Database complexity

For production applications, a combination of techniques often yields the best results:
- Few-shot examples for common query patterns
- Progressive building for complex queries
- Validation steps for all generated queries

Amazon Bedrock combined with Meta's Llama models provides a powerful and flexible solution for text-to-SQL applications, offering significant potential to democratize database access across organizations.

## Thank you!