### Insight Tool 

In [1]:
import logging
import os
import sys
from pathlib import Path

# Add the project root to Python path
project_root = Path().absolute().parent
sys.path.append(str(project_root))

logging.basicConfig(
    level=logging.INFO, format="%(asctime)s - %(name)s - %(levelname)s - %(message)s"
)
logger = logging.getLogger(__name__)

print(f"Project root: {project_root}")
print(f"Current working directory: {os.getcwd()}")

Project root: /home/vlad/dev/data-analyser
Current working directory: /home/vlad/dev/data-analyser/notebooks


In [2]:
from dotenv import load_dotenv

from src.agent.agent import DataAnalysisAgent
from src.clients.db_client import DatabaseClient
from src.tools.insight_tool import InsightTool
from src.tools.sql_tool import SQLTool
from src.tools.validator_tool import ValidatorTool

load_dotenv()

True

## Initialize Agent and SQL Tool

In [3]:
# agent
agent = DataAnalysisAgent(config_path=str(project_root / "config" / "config.yaml"))

# clients
DB_PATH = os.path.expanduser("../data/porsche_analytics.db")
sqlite_connection_string = f"sqlite:///{DB_PATH}"
sqlite_client = DatabaseClient(sqlite_connection_string)

# tools
sql_tool = SQLTool(llm=agent.llm)
sql_insight_tool = InsightTool(llm=agent.llm)
sql_validation_tool = ValidatorTool(llm=agent.llm, schema_dict=agent.schema)

In [4]:
# tickets to test
JIRA_PROJECT_KEY = "KAN"

tickets = [
    {
        "project": JIRA_PROJECT_KEY,
        "summary": "Car Models Analysis",
        "description": "How many unqiue car models we have per car category? Sort the results in descending order!",
        "issuetype": "Task",
    },
    {
        "project": JIRA_PROJECT_KEY,
        "summary": "Dealership Performance by Region Analysis",
        "description": "Analyze the average dealership rating and sales capacity by region. Which regions have the highest performing dealerships? Sort the results by average rating in descending order.",
        "issuetype": "Task",
    },
    {
        "project": JIRA_PROJECT_KEY,
        "summary": "Service Cost Analysis by Model and Service Type",
        "description": "Analyze the average service costs by model and service type. Identify which models have higher maintenance costs and which service types contribute most to overall service revenue.",
        "issuetype": "Task",
    },
    # irrelevant task that doesn't match the schema
    {
        "id": "DA-101",
        "summary": "Total Sales Overview",
        "description": "What is the average user basket size?",
    },
]

In [13]:
# select ticket (testing)
ticket = tickets[2]

# generate SQL query
sql_from_llm = sql_tool.generate_query(
    task_description=ticket["description"], schema_dict=agent.schema
)

print("\nGenerated SQL query:\n", sql_from_llm)

2025-07-02 15:12:39,366 - src.tools.sql_tool - INFO - Generating SQL query for task: Analyze the average service costs by model and service type. Identify which models have higher maintenance costs and which service types contribute most to overall service revenue.
2025-07-02 15:12:44,659 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"



Generated SQL query:
 SELECT 
    models.model_name, 
    service_records.service_type, 
    AVG(service_records.cost) AS average_service_cost
FROM 
    service_records
JOIN 
    sales ON service_records.vin = sales.vin
JOIN 
    models ON sales.model_id = models.model_id
GROUP BY 
    models.model_name, 
    service_records.service_type
ORDER BY 
    average_service_cost DESC;


In [14]:
# validate SQL query
validation_result = sql_validation_tool.validate_sql(sql_query=sql_from_llm)
validation_result



In [15]:
# execute SQL query
query_result = sqlite_client.execute_query(sql_from_llm)
query_result

2025-07-02 15:12:47,877 - src.clients.db_client - INFO - Query executed successfully. Returned 10 rows.


QueryResult(data=[{'model_name': '718 Boxster', 'service_type': 'Performance Upgrade', 'average_service_cost': 3800.0}, {'model_name': 'Taycan Cross Turismo', 'service_type': 'Tire Replacement', 'average_service_cost': 2800.0}, {'model_name': 'Cayenne', 'service_type': 'Regular Maintenance', 'average_service_cost': 1450.0}, {'model_name': 'Cayenne Coupe', 'service_type': 'Regular Maintenance', 'average_service_cost': 1350.0}, {'model_name': 'Macan', 'service_type': 'Interior Repair', 'average_service_cost': 1200.0}, {'model_name': 'Taycan', 'service_type': 'Brake Service', 'average_service_cost': 1200.0}, {'model_name': '911 GT3', 'service_type': 'Regular Maintenance', 'average_service_cost': 1100.0}, {'model_name': '911 Turbo S', 'service_type': 'Regular Maintenance', 'average_service_cost': 950.0}, {'model_name': 'Panamera', 'service_type': 'Electrical System', 'average_service_cost': 950.0}, {'model_name': '911 Carrera', 'service_type': 'Regular Maintenance', 'average_service_cost':

In [16]:
# insight generation
insight_from_llm = sql_insight_tool.generate_insights(
    task_description=ticket["description"],
    query_result=query_result,
)

print("Insight from LLM:\n", insight_from_llm)

2025-07-02 15:12:49,933 - src.tools.insight_tool - INFO - Generating insights for task: Analyze the average service costs by model and service type. Identify which models have higher maintenance costs and which service types contribute most to overall service revenue.
2025-07-02 15:13:00,405 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


Insight from LLM:
 Summary: The model with the highest average service cost is the '718 Boxster' for 'Performance Upgrade' at $3800. 'Regular Maintenance' is the most common service type across different models. 

Key Points:
- The '718 Boxster' model incurs the highest average service cost, particularly for 'Performance Upgrade' services.
- 'Regular Maintenance' is the most common service type, required by models such as 'Cayenne', 'Cayenne Coupe', '911 GT3', '911 Turbo S', and '911 Carrera'.
- The 'Taycan Cross Turismo' model has a high average service cost for 'Tire Replacement' at $2800.

Recommendations:
- Consider offering promotional discounts or service packages for 'Performance Upgrade' services for the '718 Boxster' model to attract more customers.
- As 'Regular Maintenance' is a common service, consider creating a loyalty program or offering discounts for regular maintenance services to retain customers.
- For the 'Taycan Cross Turismo' model, consider partnering with tire m

### Debugging

In [31]:
result_summary = sql_insight_tool.format_result_summary(query_result)

prompt_value = sql_insight_tool.prompt.format(
    task_description=ticket["description"], result_summary=result_summary
)

In [35]:
insight_from_llm_raw = sql_insight_tool.llm.invoke(prompt_value)

2025-07-02 14:58:58,390 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


In [38]:
print(insight_from_llm_raw.content)

Summary: The data analysis reveals the number of unique car models per car category. The 'Sports Car' category has the highest number of unique models with 5, followed by 'SUV' with 3. All other categories have only one unique model each.

Key Points:
- The 'Sports Car' category has the most diverse range of unique models.
- The 'SUV' category is the second most diverse with 3 unique models.
- The 'Wagon', 'Supercar', 'Sedan', 'Luxury', and 'Hypercar' categories have the least diversity with only one unique model each.

Recommendations:
- Consider expanding the range of models in the 'Wagon', 'Supercar', 'Sedan', 'Luxury', and 'Hypercar' categories to increase diversity and potentially attract a wider customer base.
- Investigate why the 'Sports Car' and 'SUV' categories have more unique models. This could provide insights into consumer preferences and inform future product development.
- Conduct further analysis to understand if the number of unique models in each category is meeting 