# SQL Agent & Data Visualisation with CrewAI

- https://github.com/imanoop7/SQL-Agent-using-CrewAI-and-Groq/blob/main/crewai_agent.ipynb
- https://www.crewai.com/

## Imports

In [12]:
from langchain_community.tools.sql_database.tool import (
    InfoSQLDatabaseTool,
    ListSQLDatabaseTool,
    QuerySQLCheckerTool,
    QuerySQLDataBaseTool,
)
from langchain_community.utilities.sql_database import SQLDatabase

from textwrap import dedent

from langchain_openai import ChatOpenAI

from crewai import Agent, Crew, Process, Task
from crewai_tools import tool
from crewai.crews.crew_output import CrewOutput

from dotenv import load_dotenv
import os
import re

In [13]:
# Change the default LLM
os.environ["OPENAI_MODEL_NAME"]="gpt-4o-mini"
llm = ChatOpenAI(model="gpt-4o-mini")

## Agents

Create some agents to work together in a crew:
- `sql_dev`: creates and executes SQL queries
- `data_visualiser`: recieves data from the `sql_dev` and creates a visualisation in plotly

In [16]:
from agents.sql_agent_policies import sql_dev as sql_dev_policies
from agents.sql_agent_users import sql_dev as sql_dev_users

In [19]:
from agents.data_visualiser import data_visualiser

## Tasks

In [20]:
extract_data_users = Task(
    description="Extract data that is required for the query {query}.",
    expected_output="Database result for the query",
    agent=sql_dev_users,
)

In [21]:
extract_data_policies = Task(
    description="Extract data that is required for the query {query}.",
    expected_output="Database result for the query",
    agent=sql_dev_policies,
)

In [22]:
data_visulaisation = Task(
    description="Plot the data if required by the query {query}.",
    expected_output="Plotly figure",
    agent=data_visualiser,
    context=[extract_data_users, extract_data_policies],
)

## Crew

In [24]:
crew = Crew(
    agents=[sql_dev_users, sql_dev_policies, data_visualiser],
    tasks=[extract_data_users, extract_data_policies, data_visulaisation],
    process=Process.hierarchical,
    manager_llm=llm,
    verbose=True,
    full_output=True,
    memory=False,
    output_log_file="crew.log",
)

In [25]:
inputs = {
    "query": "How many of each policy type are there?"
}

result = crew.kickoff(inputs=inputs)

[1m[95m [2024-09-30 22:13:41][DEBUG]: == Working Agent: Crew Manager[00m
[1m[95m [2024-09-30 22:13:41][INFO]: == Starting Task: Extract data that is required for the query How many of each policy type are there?.[00m


[1m> Entering new CrewAgentExecutor chain...[0m
[32;1m[1;3mTo extract the data required for the query "How many of each policy type are there?", I need to gather specific details about the policy types stored in our database. This involves understanding the database structure, including the table where policy information is stored, the relevant columns, and the method for counting the occurrences of each policy type.

I will delegate the task to the Senior Database Developer - Users, as they will have the expertise to extract this information. I need to provide them with comprehensive context regarding the database schema and what is required for the query.

Action: Delegate work to coworker
Action Input: {
    "task": "Extract data for the query on how many of

In [14]:
# Custom parser
def extract_python(crew_output: CrewOutput) -> str:
    """Extracts python content from a string where python code is embedded between ```python and ``` tags.

    Parameters:
        text (str): The text containing the python code content.

    Returns:
        str: A text string of the first python match.
    """
    text = crew_output.raw
    # Define the regular expression pattern to match JSON blocks
    pattern = r"```python(.*?)```"

    # Find all non-overlapping matches of the pattern in the string
    matches = re.findall(pattern, text, re.DOTALL)

    # Return the list of matched JSON strings, stripping any leading or trailing whitespace
    try:
        return matches[0].strip()
    except Exception:
        raise ValueError(f"Failed to parse: {message}")

In [15]:
python_output_code = extract_python(result)

In [16]:
python_output_code

"import plotly.graph_objects as go\n\n# Data provided\ndata = [\n    ('2023-09', 166), \n    ('2023-10', 862), \n    ('2023-11', 799), \n    ('2023-12', 842), \n    ('2024-01', 856), \n    ('2024-02', 808), \n    ('2024-03', 867), \n    ('2024-04', 799), \n    ('2024-05', 914), \n    ('2024-06', 801), \n    ('2024-07', 810), \n    ('2024-08', 826), \n    ('2024-09', 650)\n]\n\n# Extracting the months and user counts\nmonths, user_counts = zip(*data)\n\n# Creating the Plotly figure\nfig = go.Figure()\n\n# Adding a bar chart for user registrations\nfig.add_trace(go.Bar(\n    x=months,\n    y=user_counts,\n    marker_color='royalblue'\n))\n\n# Updating layout for better readability\nfig.update_layout(\n    title='Monthly User Registrations',\n    xaxis_title='Month',\n    yaxis_title='Number of Users Registered',\n    xaxis_tickangle=-45,\n    yaxis=dict(\n        title='Number of Users',\n        gridcolor='lightgrey'\n    ),\n    plot_bgcolor='white'\n)\n\n# Show the figure\nfig.show()"

<h2 style="color: red;">Warning! Running the cell below will execute the (arbitrary) python code generated by the LLM Crew. Make sure you've cheked the code before running.</h2>

In [17]:
exec(python_output_code)

In [20]:
sql_query = "SELECT strftime('%Y-%m', registration_date) AS registration_month, COUNT(*) AS registrations_count FROM users GROUP BY registration_month ORDER BY registration_month"
execute_sql.run(sql_query)

Using Tool: execute_sql


"[('2023-09', 166), ('2023-10', 862), ('2023-11', 799), ('2023-12', 842), ('2024-01', 856), ('2024-02', 808), ('2024-03', 867), ('2024-04', 799), ('2024-05', 914), ('2024-06', 801), ('2024-07', 810), ('2024-08', 826), ('2024-09', 650)]"