This repository provides an advanced, AI agent-powered T-SQL assistant for Microsoft SQL Server 2019, leveraging LangChain, LangGraph, Azure OpenAI, and Python data visualization. The assistant can interpret natural language business intelligence (BI) questions, generate optimized SQL queries, execute them, explain results, and produce charts when possible.
- Natural Language to SQL: Converts business questions into correct, efficient T-SQL queries for SQL Server 2019.
- Schema-aware: Uses schema metadata to ensure queries only reference valid tables/columns.
- Error Handling & Retries: Attempts to correct failed queries using error feedback.
- Result Explanation: Provides clear, human-readable answers based on query results.
- Automated Chart Generation: Attempts to visualize results using Matplotlib and Pandas, with chart type chosen via LLM.
- Workflow Orchestration: Modular agent workflow powered by LangGraph.
The core logic resides in app.ipynb, which orchestrates the following workflow:
-
Environment Setup
- Loads connection strings and Azure OpenAI credentials from environment variables.
- Establishes SQL Server connection using SQLAlchemy.
-
Schema Extraction
- Retrieves schema metadata from
[ExecutiveDashboard].[dbo].[schema_metadata]. - Formats schema info for LLM prompting.
- Retrieves schema metadata from
-
Agent State Modeling
- Defines
AgentState(using Pydantic) to track question, schema, SQL code, result, answers, chart code, and more.
- Defines
-
LangGraph Workflow Nodes
- Relevance Check: Determines if the question relates to warehouse data.
- Schema Fetch: Loads schema info into state.
- SQL Query Generation: Uses LLM to translate the question + schema into a T-SQL query (with NOLOCK hints).
- Query Execution: Runs SQL and records result/errors.
- Validation: Checks for execution errors.
- Retry: If errors, prompts the LLM for corrections (up to MAX_RETRIES).
- Result Explanation: Uses LLM to summarize the result.
- Visualization: If possible, LLM generates Python code for a chart and the notebook executes it, embedding the image.
-
Orchestration
- State transitions are managed via LangGraph's
StateGraph, with conditional edges for retries and error handling.
- State transitions are managed via LangGraph's
-
Usage Example
- Example queries show how to run the assistant and display answers, SQL, and charts.
-
Clone the repository
git clone https://github.com/UtkPatAI25/AI_For_BI_SQL_Server_using_Langgraph.git cd AI_For_BI_SQL_Server_using_Langgraph -
Install dependencies
- Recommended: Use a virtual environment.
pip install -r requirements.txt
- Required packages include:
python-dotenvsqlalchemypandasmatplotliblangchainlanggraphlangchain-openaipydanticazure-openai(if needed, for Azure LLM access)
-
Set up environment variables
- Create a
.envfile with:py-connectionString=your_sqlalchemy_connection_string AZURE_OPENAI_DEPLOYMENT_NAME=your_azure_deployment AZURE_OPENAI_ENDPOINT=your_azure_endpoint AZURE_OPENAI_API_KEY=your_api_key AZURE_OPENAI_API_VERSION=2024-12-01-preview
- Create a
Open app.ipynb in Jupyter Notebook or VS Code, and run the cells. You can interact with the assistant using the provided examples:
answer, sql_code, final_state = run_sql_assistant("Top 10 distributor by sales in May 2025")
print("LLM Answer:\n", answer)
print("SQL Query Executed:\n", sql_code)
if final_state.get("viz_image"):
from IPython.display import Image
import base64
display(Image(data=base64.b64decode(final_state["viz_image"])))
elif final_state.get("viz_reason"):
print("No chart generated:", final_state["viz_reason"])
- check_relevance_node: Filters out unrelated questions.
- get_schema_node: Loads schema info for context.
- build_query_node: Prompts LLM to generate SQL (schema-constrained, NOLOCK, T-SQL only).
- execute_query_node: Runs SQL against database.
- validate_node: Looks for errors in execution.
- retry_query_node: Attempts to fix failed queries using error feedback.
- explain_result_node: LLM generates a human-readable answer from result.
- analyze_and_generate_visualization_node: Asks LLM to write Python code for charts, executes code, and embeds image.
- SQL queries are strictly limited to schema metadata.
- NOLOCK hints are enforced to avoid locks.
- LLM-generated Python code for charts is executed in a controlled environment.
- Error handling, retries, and clear explanations are integrated.
- Schema Source: Adjust the schema extraction query as needed.
- LLM Prompting: Modify system prompts for different SQL dialects or business rules.
- Visualization Code: Extend chart support by customizing the LLM prompt and chart execution logic.
- Requires up-to-date schema metadata in
[schema_metadata]table. - LLM accuracy depends on prompt quality and model configuration.
- Visualization is limited to what the LLM can infer from sample data and columns.
MIT License (see LICENSE file).
For questions, issues, or contributions, open an issue or reach out via GitHub.