This project demonstrates how to build a robust SQL generation agent using the Google Gen AI Agent Development Kit (ADK) and MCP Toolbox.
It features a reflection loop where generated SQL queries are validated
using BigQuery's dry_run capability before being executed, ensuring high
reliability and safety.
This project also includes an optional Dataplex integration that enriches the agent's understanding of the data by fetching semantic context from a data catalog.
The agent is composed of several sub-agents orchestrated sequentially:
- Semantic Enricher (Optional, if
DATAPLEX_ENABLED=true):- Term Extractor: Extracts key business terms from the user's query.
- Dataplex Searcher: Searches Dataplex for tables and metadata related to the extracted terms.
- Schema Inspector: Queries BigQuery
INFORMATION_SCHEMAvia MCP to understand the dataset. If Dataplex is enabled, it uses the filtered table list from the Semantic Enricher. - SQL Generator Loop (
LoopAgent):- Generator: Drafts SQL based on the user question, schema, and optional semantic context from Dataplex.
- Validator: Performs a dry run of the SQL via MCP to check for syntax and semantic errors.
- Reviewer: Analyzes the dry run result. If it fails, it provides guidance back to the Generator for the next iteration.
- Final Responder: Executes the validated SQL and answers the user's question with the data.
- Python 3.11+
uvfor dependency management.- Google Cloud SDK (
gcloud) installed and configured. mcp-toolboxbinary installed and on your system PATH.- Terraform installed.
-
Clone the repository:
git clone https://github.com/cwest/adk-sql-reflection-pattern.git cd adk-sql-reflection-pattern -
Install dependencies:
This project uses
uvto manage dependencies. To ensure you have all packages for development and for running both the standard and Dataplex-enabled modes, install the project with all optional dependencies:uv pip install ".[dev,dataplex]"This will install the base requirements plus packages for testing and Dataplex integration.
-
Authenticate with Google Cloud: Ensure you have Application Default Credentials (ADC) set up.
gcloud auth application-default login
-
Configure Environment: Copy
.env.exampleto.envand set yourGOOGLE_CLOUD_PROJECT. The other variables depend on the mode you wish to run.cp .env.example .env
This project has two distinct modes.
This mode uses the public bigquery-public-data.google_trends dataset and does
not require any Terraform setup.
-
Configure Environment: Ensure
DATAPLEX_ENABLEDis set tofalseor commented out in your.envfile. You only need to set your project ID.GOOGLE_CLOUD_PROJECT=your-project-id # DATAPLEX_ENABLED=false
-
Run the Agent:
uv run honcho start
-
Test the Agent: Open the ADK Web UI (usually
http://localhost:8000) and use the sample queries from tests/test_cases_google_trends.md.
This mode demonstrates how the agent can use semantic metadata from Dataplex to answer more complex questions. It requires provisioning a sample e-commerce dataset and Dataplex resources using Terraform.
-
Configure Environment: Update your
.envfile to enable Dataplex mode and configure the necessary resource IDs.GOOGLE_CLOUD_PROJECT=your-project-id DATAPLEX_ENABLED=true DATAPLEX_LAKE_ID=e-commerce-lake DATAPLEX_ZONE_ID=analytics-curated-zone DATAPLEX_LOCATION=us-central1
-
Provision Infrastructure with Terraform: The Terraform configuration will automatically:
- Create the BigQuery datasets and tables for the e-commerce example.
- Create the Dataplex Lake, Zone, and Assets.
- Run a script to generate sample data and load it into BigQuery.
- Run a script to attach semantic metadata from Dataplex to the BigQuery tables.
First, initialize Terraform:
terraform -chdir=terraform init
Then, apply the configuration. You must provide variables for your project and the user you want to grant Dataplex admin permissions to.
export GCP_PROJECT="your-project-id" export GCP_REGION="us-central1" export DATAPLEX_ADMIN="your-email@example.com" terraform -chdir=terraform apply -auto-approve \ -var="project_id=$GCP_PROJECT" \ -var="google_cloud_project=$GCP_PROJECT" \ -var="region=$GCP_REGION" \ -var="dataplex_admin_user=$DATAPLEX_ADMIN" \ -var="dataplex_lake_id=e-commerce-lake" \ -var="dataplex_zone_id=analytics-curated-zone"
-
Run the Agent:
uv run honcho start
-
Test the Agent: Open the ADK Web UI and use the sample queries from tests/test_cases_dataplex.md. These queries are specifically designed to test the agent's use of business rules and semantic understanding from Dataplex.
This will start:
- Toolbox: An MCP server exposing BigQuery and Dataplex tools on port 5000.
- ADK Web UI: The agent interface, typically accessible at
http://localhost:8000(check console output for exact URL).
agents/sql_agent/: Contains the agent implementation.agent.py: The rootSequentialAgentdefinition.semantic_enricher.py: Optional agent for enriching queries with Dataplex context.schema_inspector.py: Agent for retrieving database schema.sql_generator_loop.py: The core reflection loop (Generator, Validator, Reviewer).final_responder.py: Agent for executing the final query and answering.prompts.py: Detailed system instructions for SQL generation.config.py: Shared configuration (MCP connection parameters).
tools.yaml: Configuration for MCP Toolbox, defining the BigQuery and Dataplex tools.Procfile: Defines the services forhoncho.pyproject.toml: Project dependencies.terraform/: Contains Terraform configuration files for provisioning Google Cloud resources.main.tf: Main Terraform configuration, including provider setup.variables.tf: Input variables for Terraform scripts.bigquery.tf: BigQuery dataset and table definitions.dataplex.tf: Dataplex Lake, Zone, Assets, and Aspect Type definitions.iam.tf: IAM policy definitions (placeholder).