# GenAI Quoting Agent for AI & Geospatial Projects

## Introduction

This notebook demonstrates a GenAI-powered quoting agent for AI-infused geospatial projects. The agent automates requirements extraction, services recommendation, quoting, and QA review using multi-agent orchestration, vector search, and structured output.

**Use Case:**

- Automate the process of generating accurate, transparent, and auditable quotes for complex geospatial/AI projects.
- Reduce manual effort, improve consistency, and enable rapid iteration for both technical and non-technical stakeholders.

**GenAI Capabilities Demonstrated:**

- Structured output/JSON mode (Pydantic schemas, controlled generation)
- Agents (CrewAI multi-agent orchestration)
- Embeddings & Vector Store (Google embeddings + Chroma for RAG)
- Retrieval Augmented Generation (RAG) for internal service catalog
- Function Calling (custom tools for quote calculation and Google search)
- GenAI Evaluation (QA agent for quote review)

**Workflow Overview:**

1. Load and preprocess service standards and client persona data.
2. Create a vector store of internal service offerings for semantic search.
3. Simulate a client conversation and extract requirements using an agent.
4. Research best practices and technologies using GenAI-powered web search.
5. Retrieve and recommend relevant internal services using vector search (RAG).
6. Calculate a detailed quote, including custom work and overhead, using a function-calling agent.
7. Run a QA agent to review the quote for accuracy, consistency, and alignment with requirements.
8. Generate a Statement of Work (SOW) document from the structured quote and analysis.

---

**Author:** Jeremie Bitsch

**Date:** [2025-04-17]

---


## Setting the environment

The following first cells are dedicated to setting up the environment in a way that ensures compatibility both on Kaggle and in local development. We start by defining a utility function `is_kaggle_environment()` which detects whether the code is running inside a Kaggle notebook.

Depending on the environment, the notebook then handles package installation, API key loading, and dataset access differently.

Additionally, a utility function `delay_before_task()` is defined to introduce a short pause between tasks execution in the crews, in addition to setting `max_rpm`. This is particularly useful for avoiding API rate limits when using Google Gemini on the free tier.

By choosing to structure the environment setup in this way, the notebook ensures reproducibility and smooth execution regardless of where it is run.


#### `is_kaggle_environment()` utility function


In [1]:
import os


# Utility function to detect if the code is running in a Kaggle notebook environment
def is_kaggle_environment():
    """Detect if running in a Kaggle notebook environment."""
    # Check for Kaggle-specific paths
    if os.path.exists("/kaggle/input") or os.path.exists("/kaggle/working"):
        return True

    # In case the first check fails, second check for Kaggle environment variables
    if os.environ.get("KAGGLE_URL_BASE") or os.environ.get("KAGGLE_KERNEL_RUN_TYPE"):
        return True

#### Package installation


In [2]:
%%capture
if is_kaggle_environment():
    !pip uninstall -qqy kfp jupyterlab libpysal thinc spacy fastai ydata-profiling google-cloud-bigquery google-generativeai
    !pip install -qU crewai chromadb langchain langchain-chroma langchain-google-genai langgraph-prebuilt google-genai google-generativeai

    import kagglehub
    from kaggle_secrets import UserSecretsClient

#### Imports


In [3]:
# If not in Kaggle, load environment variables from a .env file
if not is_kaggle_environment():
    from dotenv import load_dotenv

    load_dotenv()

In [4]:
import json
from typing import List, Dict, Any, Type, Optional, Literal
from pathlib import Path
import uuid
from datetime import date
import time
from IPython.display import display, HTML
import markdown as md

from langchain_chroma import Chroma
from langchain_core.documents import Document
from langchain_google_genai import GoogleGenerativeAIEmbeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter

from textwrap import dedent
from crewai import Agent, Task, Crew, Process, LLM
from crewai.tools import BaseTool
from pydantic import BaseModel, Field, ConfigDict

from google import genai
from google.genai import types

  from .autonotebook import tqdm as notebook_tqdm


#### API key


In [5]:
# Loading the Google API key depending on the environment
if is_kaggle_environment():
    # from kaggle secrets
    GOOGLE_API_KEY = UserSecretsClient().get_secret("GOOGLE_API_KEY")
else:
    # from .env file
    GOOGLE_API_KEY = os.getenv("GOOGLE_GEMINI_API_KEY")

#### `delay_before_task()` utility function


In [6]:
# Create a utility function to use as callback in the crews, introduce delays between tasks
# This is useful to avoid hitting the API rate limits when using Gemini free tier
def delay_before_task() -> None:
    delay_seconds = 10
    print(
        f"\\n--- Task Completed. Waiting {delay_seconds} seconds before next task --- \\n"
    )
    time.sleep(delay_seconds)

---

## Step 1 - Data Preparation

### 1.1 - Load the internal service standards and client persona data.


In [7]:
# Loading the datasets depending on the environment
if is_kaggle_environment():
    # Download supporting dataset from Kaggle
    json_data = kagglehub.dataset_download(
        "jeremiebitsch/capstone-2025-quote-standards-v2"
    )
    standards_dataset_path = Path(json_data) / "quote-standards.json"
    client_personas_path = Path(json_data) / "client-personas.json"
else:
    # Use data files directly for local development
    standards_dataset_path = Path("capstone-2025-quote-standards-v2.json")
    client_personas_path = Path("capstone-2025-client-personas-v2.json")

### 1.2 - Flatten the service catalog for vector search.


In [8]:
# Load the JSON dataset
with standards_dataset_path.open("r") as f:
    standards_dataset = json.load(f)


# Utility function to flatten and transform the json dataset
def _flatten_json_data(data: Dict[str, Any]) -> List[Document]:
    """Convert nested JSON service data into a list of Document objects."""
    documents = []

    for service_category, services in data["services"].items():
        for service in services:
            # Create metadata keeping relevant fields only
            metadata = {
                "uid": str(uuid.uuid4()),
                "name": service.get("name", ""),
                "label": service.get("label", ""),
                "price": service.get("price", 0),
                "parentCategory": service_category,
                "subCategory": service.get("category", ""),
                "developmentTime": service.get("developmentTime", 0),
                "description": service.get("description", ""),
            }

            # Combine fields to create the document page content
            page_content = f"""
            Service: {service.get("label", "")}
            Parent Category: {service_category}
            Sub Category: {service.get("category", "")}
            Price: ${service.get("price", 0)}
            Description: {service.get("description", "")}
            Development Time: {service.get("developmentTime", 0)} hours
            Deliverables: {", ".join(service.get("deliverables", []))}
            Use Cases: {", ".join(service.get("useCases", []))}
            """

            # Create a Document object
            doc = Document(page_content=page_content, metadata=metadata)
            documents.append(doc)

    return documents


# Create the document objects from json input
docs = _flatten_json_data(standards_dataset)

# Inspect the number of documents created
len(docs)

32

### 1.3 - Apply text splitting only if necessary.

This step is mainly included to illustrate the usual steps before adding documents to the vectorstore. In our case, each are below 1000 characters and text splitting can be ignored.
Still, we keep this step to ensure consistency with the rest of the code and to allow for future changes in the dataset.


In [9]:
# Collect all documents lenghts
docs_length = [len(doc.page_content) for doc in docs]

# Split documents only if they are not all under 100 chars
if not all(i < 1000 for i in docs_length):
    text_splitter = RecursiveCharacterTextSplitter(
        chunk_size=1000,
        chunk_overlap=200,
        length_function=len,
    )
    docs = text_splitter.split_documents(docs)

# Sample doc for sanity check
print(docs[0])

page_content='
            Service: Satellite Image Analysis with AI
            Parent Category: geospatialAI
            Sub Category: ai
            Price: $1200
            Description: Custom deep learning solutions for satellite imagery classification, segmentation, and anomaly detection
            Development Time: 36 hours
            Deliverables: Custom trained model on your imagery, Web dashboard for uploading and analyzing new images, API for automated processing pipelines, Technical documentation and training
            Use Cases: Agricultural crop health monitoring, Urban development tracking, Disaster damage assessment, Environmental change monitoring
            ' metadata={'uid': 'c16a8992-4a04-4e1b-b989-5fde3a679ead', 'name': 'SatelliteImageAnalysis', 'label': 'Satellite Image Analysis with AI', 'price': 1200, 'parentCategory': 'geospatialAI', 'subCategory': 'ai', 'developmentTime': 36, 'description': 'Custom deep learning solutions for satellite imagery classificat

## Step 2 - Vector Store Creation

Create a Chroma vector store using Google Generative AI embeddings for semantic search over the service catalog.


In [10]:
# Initialize the embedding function
embedding_function = GoogleGenerativeAIEmbeddings(
    model="models/text-embedding-004", google_api_key=GOOGLE_API_KEY
)

# Initialize Chroma vector store
chroma_store = Chroma(
    collection_name="kaggle_capstone",
    embedding_function=embedding_function,
)

# Collect the ids from metadata
ids = [doc.metadata["uid"] for doc in docs]

# Add documents to the store
chroma_store.add_documents(documents=docs, ids=ids)

# For sanity, check the collection is not empty
chroma_store._collection.count()

32

## Step 3 - Load the simulated conversation between the user and the agent

For testing and demonstration purposes, 3 differents personas are available:

- `wildfire_watch_na`: a persona for a wildfire watch service in North America
- `urban_traffic_ai`: a persona for an urban traffic monitoring and prediction service
- `agri_insights`: a persona for crop health monitoring and yield prediction


In [11]:
# Utility function to load the conversation from the dataset
def load_conversation(
    persona_id: Literal["wildfire_watch_na", "urban_traffic_ai", "agri_insights"],
    dataset_path: str | Path,
) -> List[Dict[str, str]]:
    if not isinstance(dataset_path, Path):
        dataset_path = Path(dataset_path)

    with dataset_path.open("r") as f:
        data = json.load(f)

    for persona in data["personas"]:
        if persona["id"] == persona_id:
            return persona["conversation"]


conversation = load_conversation(
    persona_id="wildfire_watch_na", dataset_path=client_personas_path
)

# Display the conversation
conversation

[{'speaker': 'agent',
  'message': 'Hello! Thank you for contacting us. Could you please tell me your name and the company you represent?'},
 {'speaker': 'customer',
  'message': "Hi, my name is Jane Smith, and I'm from Wildfire Watch North America."},
 {'speaker': 'agent',
  'message': 'Nice to meet you, Jane. How can we help Wildfire Watch North America today?'},
 {'speaker': 'customer',
  'message': "We're looking for a quote for a geospatial project focused on wildfire monitoring across Canada and the US. We need to build a comprehensive database of past wildfire events and gather data suitable for fine-tuning an AI model for monitoring and predictive risk management."},
 {'speaker': 'agent',
  'message': "Okay, that's a critical Geospatial AI project. What specific data sources are you considering for both the historical database and the ongoing monitoring?"},
 {'speaker': 'customer',
  'message': 'We plan to use satellite imagery like MODIS, VIIRS, Landsat, and Sentinel, along wi

## Step 4 - Define the data models for the crews' tasks


In [12]:
class TechnicalRequirements(BaseModel):
    accuracy: Optional[str] = Field(
        None, description="Required accuracy or performance metrics."
    )
    outputs: Optional[List[str]] = Field(
        default_factory=list,
        description="List of desired output formats (e.g., 'Web dashboard', 'API', 'Report').",
    )


class RequirementsOutput(BaseModel):
    client_name: Optional[str] = Field(None, description="Name of the client contact.")
    client_company: Optional[str] = Field(
        None, description="Name of the client's company."
    )
    project_type: Optional[str] = Field(
        None, description="Type of project (e.g., Geospatial AI, Data Automation)."
    )
    main_objective: Optional[str] = Field(
        None, description="The primary goal of the project."
    )
    data_sources: List[str] = Field(
        default_factory=list, description="List of data sources mentioned."
    )
    features_needed: List[str] = Field(
        default_factory=list,
        description="List of required features or functionalities.",
    )
    technical_requirements: Optional[TechnicalRequirements] = Field(
        None, description="Specific technical needs."
    )
    budget: Optional[str] = Field(None, description="Client's stated budget or range.")
    timeline_expectation: Optional[str] = Field(
        None, description="Any mention of desired timeline or deadlines."
    )
    potential_risks: List[str] = Field(
        default_factory=list,
        description="Potential challenges or risks identified from the conversation.",
    )
    summary: str = Field(
        ...,
        description="A concise Markdown summary (1-2 paragraphs) highlighting the core project request and key constraints.",
    )


class RecommendedService(BaseModel):
    name: str = Field(..., description="Internal name/ID of the service.")
    label: str = Field(..., description="Customer-facing label of the service.")
    price: float = Field(..., description="Price of the service.")
    development_time_hours: int = Field(
        ..., description="Estimated development time in hours."
    )


class ServicePackage(BaseModel):
    name: str = Field(..., description="A descriptive name for the proposed package.")
    services_included: List[str] = Field(
        ..., description="List of labels of the services included in the package."
    )
    total_estimated_price: float = Field(
        ..., description="Total calculated price for the package."
    )
    total_estimated_time_hours: int = Field(
        ..., description="Total calculated development time in hours for the package."
    )
    notes: Optional[str] = Field(
        None, description="Brief notes about the package coverage."
    )


class BudgetAnalysis(BaseModel):
    client_budget: Optional[str] = Field(
        ..., description="The client's budget extracted from requirements."
    )
    estimated_cost: float = Field(
        ..., description="The total estimated cost of the proposed package."
    )
    within_budget: bool = Field(
        ..., description="Whether the estimated cost is within the client's budget."
    )
    comments: Optional[str] = Field(None, description="Comments on the budget fit.")


class ServiceAnalysisOutput(BaseModel):
    recommended_services: List[RecommendedService] = Field(
        ..., description="List of relevant services found in the vector store."
    )
    service_package: ServicePackage = Field(
        ..., description="Details of the proposed service package."
    )
    customization_notes: List[str] = Field(
        default_factory=list,
        description="List of requirements potentially needing custom work.",
    )
    explanation: str = Field(
        ...,
        description="Markdown explanation justifying the service selection, cost/time, budget comparison, and customization needs.",
    )


class WebResearchOutput(BaseModel):
    summary: str = Field(
        ...,
        description="Brief overview of key findings regarding best practices and technologies.",
    )
    best_practices_insights: str = Field(
        ...,
        description="Actionable recommendations and notes on relevant tools/libraries found.",
    )
    potential_challenges: Optional[str] = Field(
        None, description="Potential challenges identified from search results, if any."
    )


class QuoteCalculationDetails(BaseModel):
    """Detailed breakdown of the quote price calculation."""

    sum_base_service_prices: float = Field(
        ..., description="Sum of prices for all recommended standard services."
    )
    total_estimated_hours: int = Field(
        ...,
        description="Total estimated hours including base, custom work, and lifecycle.",
    )
    base_hourly_rate: int = Field(
        ..., description="The hourly rate used for development cost calculation."
    )
    additional_hours: int = Field(
        ...,
        description="Hours allocated beyond standard service hours (for custom work, lifecycle, etc.).",
    )
    additional_development_cost: float = Field(
        ..., description="Calculated cost for the additional hours."
    )
    subtotal_before_overhead: float = Field(
        ...,
        description="Total cost before applying the overhead buffer (Base Services + Additional Development).",
    )
    overhead_buffer_percentage: float = Field(
        ..., description="The overhead buffer percentage applied."
    )
    overhead_amount: float = Field(
        ..., description="The calculated monetary amount of the overhead buffer."
    )
    final_total_price: float = Field(
        ..., description="The final calculated total price for the quote."
    )


class QuoteSummary(BaseModel):
    project_name: str
    client_name: Optional[str]
    client_company: Optional[str]
    quote_id: str
    quote_date: str
    total_price: float
    total_estimated_hours: int
    estimated_start_date: str
    estimated_completion_date: str
    estimation_justification: Optional[str] = Field(
        None,
        description="Brief justification for the estimated hours and overhead buffer applied.",
    )
    price_calculation_details: Optional[QuoteCalculationDetails] = Field(
        None,
        description="Detailed breakdown of the price calculation from the QuoteCalculatorTool.",
    )


class QuoteLineItem(BaseModel):
    service_name: str
    description: str
    price: float
    estimated_hours: int


class PaymentMilestone(BaseModel):
    milestone: str
    percentage: int
    amount: float
    due_date: str


class QuoteDetailsOutput(BaseModel):
    quote_summary: QuoteSummary
    line_items: List[QuoteLineItem]
    payment_schedule: List[PaymentMilestone]


class QAIssue(BaseModel):
    type: str = Field(
        ...,
        description="Category of the issue (e.g., pricing, timeline, scope, requirements_mismatch, clarity, consistency).",
    )
    severity: str = Field(..., description="Severity level (e.g., high, medium, low).")
    description: str = Field(
        ..., description="Specific description of the issue found."
    )
    recommendation: str = Field(..., description="Specific instruction for correction.")


class QAImprovementSuggestion(BaseModel):
    area: str = Field(
        ...,
        description="Area for improvement (e.g., Executive Summary, Timeline Detail).",
    )
    suggestion: str = Field(..., description="Specific suggestion.")


class QAReviewOutput(BaseModel):
    overall_status: str = Field(
        ..., description="Overall assessment status: 'approved' or 'needs_revision'."
    )
    issues: List[QAIssue] = Field(
        default_factory=list,
        description="List of identified issues if status is 'needs_revision'.",
    )
    improvement_suggestions: List[QAImprovementSuggestion] = Field(
        default_factory=list, description="Optional suggestions for improvement."
    )
    summary: str = Field(
        ..., description="A brief Markdown summary of the review findings."
    )


class AnalysisOutput(BaseModel):
    """Consolidated output from the Data Analysis Crew."""

    requirements: RequirementsOutput = Field(
        ..., description="Extracted client requirements and project context."
    )
    research: WebResearchOutput = Field(
        ..., description="Web research findings on technologies and best practices."
    )
    service_analysis: ServiceAnalysisOutput = Field(
        ..., description="Analysis of internal services, pricing, and budget fit."
    )


## Step 5 - Defining custom tools, and input schemas models, to be used by the agents


In [13]:
class VectorStoreSearchToolInput(BaseModel):
    """Input schema for the Vector Store Search tool."""

    query: str = Field(
        ...,
        description="The search query to find relevant services in the vector store.",
    )


class GoogleSearhToolInput(BaseModel):
    """Input schema for the Google Search tool."""

    query: str = Field(
        ..., description="The search query to be used for Google Search."
    )


class QuoteCalculatorToolInput(BaseModel):
    """Input schema for the Quote Calculator tool."""

    estimated_total_hours: int = Field(
        ...,
        description="The final estimated total development hours for the project, adjusted for complexity and custom work.",
    )
    recommended_services: List[RecommendedService] = Field(
        ..., description="List of recommended services with their base prices."
    )
    overhead_buffer_percentage: float = Field(
        ...,
        description="Overhead buffer percentage (e.g., 0.15 for 15%) estimated based on project risk and complexity.",
    )
    base_hourly_rate: int = Field(
        100, description="The base hourly rate for development time."
    )


# Defining the Vector Store Search tool, allowing to search services and return consistant results
class VectorStoreSearchTool(BaseTool):
    name: str = "VectorStoreSearch"
    description: str = "A tool to search for relevant services in the vector store."
    args_schema: Type[BaseModel] = VectorStoreSearchToolInput
    vector_store: Chroma
    model_config = ConfigDict(arbitrary_types_allowed=True)

    def _run(self, query: str) -> str:
        """Use the vector store to find relevant documents."""

        try:
            print(f"Querying Vector Store with: {query}")
            results = self.vector_store.similarity_search_with_score(query, k=10)

            # Format results for the agent
            formatted_results = []
            for doc, score in results:
                metadata = doc.metadata
                formatted_results.append(
                    f"Result (Score: {score:.4f}):\n"
                    f"  Name: {metadata.get('name', 'N/A')}\n"
                    f"  Label: {metadata.get('label', 'N/A')}\n"
                    f"  Price: ${metadata.get('price', 'N/A')}\n"
                    f"  Development Time: {metadata.get('developmentTime', 'N/A')} hours\n"
                    f"  Category: {metadata.get('parentCategory', 'N/A')} / {metadata.get('subCategory', 'N/A')}\n"
                    f"  Description: {metadata.get('description', 'N/A')}\n"
                )
            if not formatted_results:
                return "No relevant services found in the vector store for the query."

            return "\n-----\n".join(formatted_results)
        except Exception as e:
            raise ValueError(f"Error searching vector store: {e}")


# Defining the Google search tool
class GoogleSearchTool(BaseTool):
    name: str = "GoogleSearch"
    description: str = "A tool to perform Google searches."
    args_schema: Type[BaseModel] = GoogleSearhToolInput

    def _run(self, query: str) -> str:
        """Use Google Search to find relevant information."""

        try:
            client = genai.Client(api_key=GOOGLE_API_KEY)
            config_with_search = types.GenerateContentConfig(
                tools=[types.Tool(google_search=types.GoogleSearch())],
            )
            response = client.models.generate_content(
                model="gemini-2.0-flash",
                contents=query,
                config=config_with_search,
            )

            return [
                ("\n-----\n").join([i.text for i in res.content.parts if i.text])
                for res in response.candidates
            ][0]
        except Exception as e:
            return f"Error performing Google search: {e}"


# Defining the Quote Calculator tool used to ensure consistency in the quote
class QuoteCalculatorTool(BaseTool):
    name: str = "QuoteCalculator"
    description: str = (
        "Calculates the final project quote price and provides a detailed breakdown. "
        "Input: estimated total hours, list of recommended services (with base prices), overhead buffer percentage, and base hourly rate. "
        "Output: A structured object containing base service costs, additional development costs, overhead amount, and the final total price."
    )
    args_schema: Type[BaseModel] = QuoteCalculatorToolInput

    def _run(
        self,
        estimated_total_hours: int,
        recommended_services: List[Any],
        overhead_buffer_percentage: float,
        base_hourly_rate: int = 100,
    ) -> QuoteCalculationDetails:
        """Calculate the final quote price and return a detailed breakdown."""
        try:
            if not 0.0 <= overhead_buffer_percentage <= 0.3:
                print(
                    f"Warning: Clamping overhead buffer {overhead_buffer_percentage}."
                )
                overhead_buffer_percentage = max(
                    0.0, min(0.3, overhead_buffer_percentage)
                )

            # Calculate sum_base_service_prices and sum_base_service_hours
            sum_base_service_prices = 0.0
            sum_base_service_hours = 0

            for service in recommended_services:
                price = 0.0
                hours = 0
                if isinstance(service, dict):
                    price = service.get("price", 0.0)
                elif hasattr(service, "price") and hasattr(
                    service, "development_time_hours"
                ):
                    price = getattr(service, "price", 0.0)
                    hours = getattr(service, "development_time_hours", 0)
                else:
                    print(f"Warning: Skipping unknown service type: {type(service)}")
                    continue
                sum_base_service_prices += price
                sum_base_service_hours += hours

            # Calculate additional hours and cost
            additional_hours = max(0, estimated_total_hours - sum_base_service_hours)
            additional_development_cost = round(additional_hours * base_hourly_rate, 2)

            # Calculate subtotal and overhead amount
            subtotal_before_overhead = round(
                sum_base_service_prices + additional_development_cost, 2
            )
            overhead_amount = round(
                subtotal_before_overhead * overhead_buffer_percentage, 2
            )

            # Calculate final price
            final_total_price = round(subtotal_before_overhead + overhead_amount, 2)

            details = QuoteCalculationDetails(
                sum_base_service_prices=sum_base_service_prices,
                total_estimated_hours=estimated_total_hours,
                base_hourly_rate=base_hourly_rate,
                additional_hours=additional_hours,
                additional_development_cost=additional_development_cost,
                subtotal_before_overhead=subtotal_before_overhead,
                overhead_buffer_percentage=overhead_buffer_percentage,
                overhead_amount=overhead_amount,
                final_total_price=final_total_price,
            )

            # Prints for verbose output
            print(
                f"Calculator Input: Hours={estimated_total_hours}, Services Base={sum_base_service_prices}, Buffer={overhead_buffer_percentage}, Rate={base_hourly_rate}"
            )
            print(f"Calculated Details: {details.model_dump_json(indent=2)}")

            return details

        except Exception as e:
            print(f"Error in QuoteCalculatorTool: {e}")
            print(
                f"Input causing error: hours={estimated_total_hours}, services={recommended_services}, buffer={overhead_buffer_percentage}, rate={base_hourly_rate}"
            )
            raise ValueError(f"Error calculating quote details: {e}")


## Step 6 - Requirements Extraction & Analysis (Crew 1)

The first multi-agent crew pipeline `DataAnalysisCrew`, using CrewAI, processes the client conversation to:

- Extract the specific requirements of the project, from the conversation
- Research best practices, new technologies and challenges that would better define the project
- Query the company vector store for most relevant services matching the requirements
- Finally, consolidation of all previous researchs and analysis, crafting a structured output as the foundation for the subsequent quoting


In [14]:
# Crew 1: Data Analysis & Service Recommendation
class DataAnalysisCrew:
    def __init__(
        self, vector_store: Chroma, conversation: List[Dict[str, str]]
    ) -> None:
        self.llm = LLM(
            model="gemini/gemini-2.0-flash",
            temperature=0,
            api_key=GOOGLE_API_KEY,
            num_retries=5,
            request_timeout=80,
            seed=42,
        )
        self.conversation = conversation
        self.search_tool = GoogleSearchTool()
        self.vector_store_tool = VectorStoreSearchTool(vector_store=vector_store)
        self._tasks = {}

    # Agent #1 - Context Extraction Agent
    def context_extraction_agent(self) -> Agent:
        return Agent(
            role="Project Context and Requirements Extraction Specialist",
            goal=dedent("""\
                Analyze the provided client conversation transcript. Identify and extract key information including:
                - Client Name and Company
                - Project Type (e.g., geospatial analysis, data automation, AI model development)
                - Main Objectives and Goals
                - Specific Features or Functionalities required
                - Mentioned Data Sources (e.g., satellite imagery types, specific databases)
                - Technical Requirements (e.g., accuracy targets, performance needs, output formats like dashboards, APIs)
                - Budget constraints or indications
                - Timeline expectations (if mentioned)
                - Any potential challenges or risks implied
                Structure this information clearly into the required Pydantic format.
            """),
            backstory=dedent("""\
            You are a meticulous analyst specializing in understanding client needs from initial conversations.
            Your strength lies in parsing dialogue to capture the core requirements, constraints, and context
            of a potential project, organizing it into a structured Pydantic object that other team members can easily use.
            """),
            verbose=True,
            llm=self.llm,
            allow_delegation=False,
        )

    # Agent #2 - Web Search Agent
    def web_search_agent(self) -> Agent:
        return Agent(
            role="Geospatial Technology & Market Researcher",
            goal=dedent("""\
                Based on the extracted project requirements (like project type, objectives, data sources, technical needs),
                research current best practices, relevant technologies, potential challenges, and industry benchmarks
                for similar geospatial projects using the GoogleSearch tool. Provide concise summaries and actionable insights
                structured into the required Pydantic format.
            """),
            backstory=dedent("""\
            You are an expert researcher skilled in finding up-to-date information on geospatial technologies,
            AI applications in geography, data processing techniques, and market trends. You know how to
            efficiently gather relevant case studies, technical documentation, and best practice guides
            to inform project planning and estimation, outputting structured findings.
            """),
            verbose=True,
            llm=self.llm,
            tools=[self.search_tool],
            allow_delegation=False,
        )

    # Agent #3 - Vector Store Retrieval Agent
    def service_retrieval_agent(self) -> Agent:
        return Agent(
            role="Internal Service Catalog & Pricing Analyst",
            goal=dedent("""\
                Based on the extracted project requirements, use the VectorStoreSearch tool to find matching
                internal services and standards. Identify a potential package of services, calculate the
                total estimated price and development time, compare it against the client's budget,
                note any requirement gaps needing custom work, and structure all findings into the required Pydantic format.
            """),
            backstory=dedent("""\
            You have deep knowledge of the company's service offerings, standard components, and pricing structure,
            stored within the vector database. You excel at translating client requirements into a combination
            of existing services, estimating costs and timelines accurately, identifying customization needs,
            and outputting a structured analysis in the specified Pydantic format.
            """),
            verbose=True,
            llm=self.llm,
            tools=[self.vector_store_tool],
            allow_delegation=False,
        )

    # Task 1: Extract project context and requirements
    def extract_requirements_task(self) -> Task:
        formatted_conversation = "\n".join(
            [f"{item['speaker']}: {item['message']}" for item in self.conversation]
        )
        task = Task(
            description=f"""\
            Analyze the following client conversation transcript and extract key project details.
            Focus on understanding the client's needs, goals, constraints, and any specific mentions
            of technology, data, or deliverables.

            CONVERSATION:
            ---
            {formatted_conversation}
            ---

            Extract the following: Client Name/Company, Project Type, Objectives, Required Features,
            Data Sources, Technical Requirements (Accuracy, Outputs), Budget, Timeline hints, Potential Risks.
            Also provide a concise Markdown summary of the project request and constraints.
            """,
            expected_output="""\
            A Pydantic object conforming to the `RequirementsOutput` schema.
            This object should contain all the extracted details like client info, project type, objectives,
            data sources, features, technical requirements, budget, timeline, risks, and a markdown summary.
            Ensure all fields in the Pydantic model are populated accurately based on the conversation.
            The final output MUST be only the JSON representation of the populated `RequirementsOutput` object.
            """,
            agent=self.context_extraction_agent(),
            output_pydantic=RequirementsOutput,
        )
        self._tasks["extract_requirements"] = task
        return task

    # Task 2: Research relevant information on the web
    def web_research_task(self) -> Task:
        task = Task(
            description="""\
            Based on the project requirements extracted in Task 1 (referencing the `project_type`, `main_objective`, `data_sources`, `technical_requirements` fields from the previous task's output),
            conduct web research using the GoogleSearch tool. **Perform ONE focused search** combining the most critical aspects:
            - Best practices for implementing similar geospatial AI projects (e.g., wildfire monitoring AI).
            - Relevant technologies (specific libraries like TensorFlow/PyTorch/GeoPandas, platforms like Google Earth Engine/ArcGIS, algorithms for geospatial AI, satellite image processing).

            Synthesize the findings from your search into the structured format required.
            """,
            expected_output="""\
            A Pydantic object conforming to the `WebResearchOutput` schema.
            This object must contain:
            - `summary`: Brief overview of key findings regarding best practices and technologies.
            - `best_practices_insights`: Actionable recommendations and notes on relevant tools/libraries/platforms found (mention specific names if prominent in search results).
            - `potential_challenges`: Potential challenges identified from search results, if any.

            Focus on information directly applicable to building the quote and SOW. Keep the output concise but informative.
            The final output MUST be only the JSON representation of the populated `WebResearchOutput` object.
            """,
            agent=self.web_search_agent(),
            context=[self._tasks["extract_requirements"]],
            output_pydantic=WebResearchOutput,
        )
        self._tasks["web_research"] = task
        return task

    # Task 3: Retrieve relevant services and calculate estimates
    def service_retrieval_task(self) -> Task:
        task = Task(
            description="""\
            Analyze the structured project requirements (output object from Task 1).
            Use the 'VectorStoreSearch' tool to query the internal service catalog based on keywords derived from the requirements
            (e.g., `project_type`, `features_needed`, `data_sources`, `technical_requirements`). Use a query like "geospatial AI wildfire monitoring database model dashboard API".

            Identify the most relevant existing services from the tool's output. For each relevant service, extract its name, label, price, and development time.

            Based on the findings:
            1.  Select a combination of services that best address the client's needs to propose as a package.
            2.  Calculate the total estimated *base* price and total estimated *base* development time (in hours) for this package by summing the values from the selected services.
            3.  List any client requirements (from Task 1 output, e.g., `features_needed`, `technical_requirements.outputs`) that are NOT clearly covered by the selected standard services. These might require custom work.
            4.  Write a brief Markdown explanation justifying the service choices, base calculations, and customization needs.
            """,
            expected_output="""\
            A Pydantic object conforming to the `ServiceAnalysisOutput` schema.
            This object must contain:
            - `recommended_services`: A list of `RecommendedService` objects for services found and deemed relevant, with their details (name, label, price, development_time_hours).
            - `service_package`: A `ServicePackage` object detailing the proposed package including a descriptive name (e.g., "Wildfire Monitoring AI - Phase 1"), included service labels, calculated total *base* price, and calculated total *base* time.
            - `customization_notes`: A list of strings describing requirements potentially needing custom work.
            - `explanation`: A Markdown string summarizing the analysis and justifications.

            Ensure all calculations (total base price, total base time) are performed correctly.
            The final output MUST be only the JSON representation of the populated `ServiceAnalysisOutput` object.
            """,
            agent=self.service_retrieval_agent(),
            context=[self._tasks["extract_requirements"]],
            output_pydantic=ServiceAnalysisOutput,
        )
        self._tasks["service_retrieval"] = task
        return task

    # Task 4: Consolidate Analysis Output
    def consolidate_analysis_task(self) -> Task:
        return Task(
            description="""\
            Consolidate the outputs from the previous three tasks (Requirements Extraction, Web Research, Service Analysis)
            into a single structured `AnalysisOutput` object. You have access to the outputs of these tasks via context.
            Retrieve the `RequirementsOutput` object from Task 1 context.
            Retrieve the `WebResearchOutput` object from Task 2 context.
            Retrieve the `ServiceAnalysisOutput` object (which no longer contains budget analysis) from Task 3 context (your own previous output).
            Combine these three objects into the fields (`requirements`, `research`, `service_analysis`) of the `AnalysisOutput` Pydantic model.
            """,
            expected_output="""\
            A Pydantic object conforming to the `AnalysisOutput` schema.
            This object MUST contain the full outputs from Task 1, Task 2, and Task 3 nested within the appropriate fields.
            The `service_analysis` field will reflect the structure without the preliminary budget check.
            The final output MUST be only the JSON representation of the populated `AnalysisOutput` object.
            """,
            agent=self.service_retrieval_agent(),
            context=[
                self._tasks["extract_requirements"],
                self._tasks["web_research"],
                self._tasks["service_retrieval"],
            ],
            output_pydantic=AnalysisOutput,
        )

    def crew(self) -> Crew:
        """Assemble the crew with all agents and tasks."""
        return Crew(
            agents=[
                self.context_extraction_agent(),
                self.web_search_agent(),
                self.service_retrieval_agent(),
            ],
            tasks=[
                self.extract_requirements_task(),
                self.web_research_task(),
                self.service_retrieval_task(),
                self.consolidate_analysis_task(),
            ],
            process=Process.sequential,
            verbose=False,
            step_callback=delay_before_task,
            max_rpm=15,
            memory=True,
            embedder={
                "provider": "google",
                "config": {
                    "api_key": GOOGLE_API_KEY,
                    "model": "models/text-embedding-004",
                },
            },
        )


In [15]:
# Kickoff Crew 1
analysis_crew = DataAnalysisCrew(vector_store=chroma_store, conversation=conversation)
analysis_result = analysis_crew.crew().kickoff()

[1m[95m# Agent:[00m [1m[92mProject Context and Requirements Extraction Specialist[00m
[95m## Task:[00m [92m            Analyze the following client conversation transcript and extract key project details.
            Focus on understanding the client's needs, goals, constraints, and any specific mentions
            of technology, data, or deliverables.

            CONVERSATION:
            ---
            agent: Hello! Thank you for contacting us. Could you please tell me your name and the company you represent?
customer: Hi, my name is Jane Smith, and I'm from Wildfire Watch North America.
agent: Nice to meet you, Jane. How can we help Wildfire Watch North America today?
customer: We're looking for a quote for a geospatial project focused on wildfire monitoring across Canada and the US. We need to build a comprehensive database of past wildfire events and gather data suitable for fine-tuning an AI model for monitoring and predictive risk management.
agent: Okay, that's a cr

## Step 7 - Quote Structuring & QA (Crew 2)

The second multi-agent crew pipeline `QuoteGenerationCrew`, using CrewAI, takes the structured analysis and:

- Builds a detailed project quote using function-calling and structured output
- Reviews the quote for accuracy, consistency, and alignment with requirements, outputting a structured QA review

This ensures the quote is both comprehensive and quality-checked before SOW generation.


In [16]:
# Crew 2: Quote Structuring & QA
class QuoteGenerationCrew:
    def __init__(self, analysis_output: AnalysisOutput) -> None:
        self.llm = LLM(
            model="gemini/gemini-2.0-flash",
            temperature=0,
            api_key=GOOGLE_API_KEY,
            num_retries=5,
            request_timeout=80,
            seed=42,
        )
        self.analysis_output = analysis_output
        self.today_date_obj = date.today()
        self.today_date_str = self.today_date_obj.strftime("%Y-%m-%d")
        self._tasks = {}
        self.quote_calculator_tool = QuoteCalculatorTool()

    # Agent #4 - Quote Structuring Agent
    def quote_structuring_agent(self) -> Agent:
        return Agent(
            role="Quote Data Structuring Specialist",
            goal=dedent("""\
                Synthesize information from the provided `AnalysisOutput` object (containing requirements, research, and service analysis)
                to structure the detailed project quote data into the `QuoteDetailsOutput` Pydantic format.
                Estimate adjusted development hours and overhead buffer based on project specifics.
                Use the QuoteCalculator tool to calculate the final price.
                Perform all necessary calculations for dates and milestones accurately.
            """),
            backstory=dedent("""\
            You are an expert in translating project analysis into structured quote data.
            You take the consolidated analysis (requirements, research, service details, pricing, budget analysis)
            and meticulously populate a detailed quote structure.
            First, you estimate the realistic total development hours by starting from the sum of base service hours, and only moderately increasing it (typically by 10-40%) to account for project-specific customizations, integration, and full lifecycle (planning, testing, deployment, PM).
            Avoid drastic upscaling unless the project is highly custom or risky. Justify any increase clearly.
            Second, you estimate an appropriate overhead buffer percentage (between 0.10 and 0.20) based on project risks and complexity.
            Third, you use the QuoteCalculator tool with these estimates and the base service prices to get the final project price.
            Finally, you populate the quote summary, line items, and payment schedule, performing date/milestone calculations precisely. Your output is a clean Pydantic object.
            """),
            verbose=True,
            llm=self.llm,
            tools=[self.quote_calculator_tool],
            allow_delegation=False,
        )

    # Agent #5 - Quality Assurance Agent
    def qa_agent(self) -> Agent:
        return Agent(
            role="Quote Quality Assurance Engineer",
            goal=dedent("""\
                Review the structured quote data (`QuoteDetailsOutput`) against the initial analysis (`AnalysisOutput`).
                Check for accuracy in scope, deliverables, pricing, timeline, budget alignment, and consistency, based *only* on the provided inputs.
                Output a structured QA review (`QAReviewOutput`) indicating 'approved' or 'needs_revision' with specific issues if any.
            """),
            backstory=dedent("""\
            You are a meticulous QA specialist focusing on data integrity in technical proposals.
            You cross-reference the structured quote data against the original analysis data provided to you.
            You ensure the quote accurately reflects the client's needs, aligns with the proposed services,
            stays within budget, and is free of calculation errors or inconsistencies based *only* on the provided context.
            You output a structured review Pydantic object. You do NOT have delegation capabilities in this setup.
            """),
            allow_delegation=False,
            verbose=True,
            llm=self.llm,
        )

    # Task 4: Structure the quote data
    def quote_structuring_task(self) -> Task:
        analysis_json = self.analysis_output.model_dump_json(indent=2)
        base_hours_sum = sum(
            s.development_time_hours
            for s in self.analysis_output.service_analysis.recommended_services
        )

        task = Task(
            description=f"""\
            Synthesize information from the provided `AnalysisOutput` JSON below to structure the project quote data.
            Use the data within this JSON as the sole source for requirements, research, and service analysis.
            **Assume today's date is {self.today_date_str} for all calculations.**

            ANALYSIS OUTPUT:
            ```json
            {analysis_json}
            ```

            **Actions:**
            1.  **Estimate Adjusted Hours & Justify:** Start with base hours ({base_hours_sum}).
                - If the project is mostly standard, increase by 10-25% for lifecycle (planning, testing, deployment, PM).
                - If there are moderate customizations or integrations, increase by 25-40%.
                - Only increase more if the project is highly custom or risky, and justify clearly.
                - Arrive at `estimated_total_hours`. Justify the increase (e.g., "Increased base hours by X% for custom work and lifecycle activities...").
            2.  **Estimate Overhead Buffer & Justify:** Estimate `overhead_buffer_percentage` (0.10-0.20) based on risks/complexity. Justify briefly.
            3.  **Calculate Price Breakdown:** Use `QuoteCalculator` tool with `estimated_total_hours`, `recommended_services` (from analysis output), `overhead_buffer_percentage`, `base_hourly_rate` (100). The tool will return a `QuoteCalculationDetails` object.
            4.  **Extract Final Price:** Get the `final_total_price` from the `QuoteCalculationDetails` object returned by the tool. Use this for milestone calculations.
            5.  **Generate Quote ID:** Create Quote ID: Q-{self.today_date_str}-001.
            6.  **Extract Client Info:** Get `client_name`, `client_company`.
            7.  **Determine Project Name:** Create a suitable `project_name`.
            8.  **Calculate Dates:**
                *   `today_obj = date.today()` 
                *   `quote_date_str = today_obj.strftime('%Y-%m-%d')`
                *   `start_date_obj = today_obj + timedelta(days=7)`
                *   `estimated_start_date = start_date_obj.strftime('%Y-%m-%d')`
                *   `weeks_needed = math.ceil(estimated_total_hours / 40)` (Use `estimated_total_hours` from step 1)
                *   `completion_date_obj = start_date_obj + timedelta(weeks=weeks_needed)`
                *   `estimated_completion_date = completion_date_obj.strftime('%Y-%m-%d')`
                *   `midpoint1_weeks = math.ceil(weeks_needed / 3)`
                *   `midpoint1_date_obj = start_date_obj + timedelta(weeks=midpoint1_weeks)`
                *   `midpoint1_date_str = midpoint1_date_obj.strftime('%Y-%m-%d')`
                *   `midpoint2_weeks = math.ceil(weeks_needed * 2 / 3)`
                *   `midpoint2_date_obj = start_date_obj + timedelta(weeks=midpoint2_weeks)`
                *   `midpoint2_date_str = midpoint2_date_obj.strftime('%Y-%m-%d')`
            9.  **Create Line Items:** Create `QuoteLineItem` objects from `recommended_services` (base details).
            10.  **Create Payment Milestones (4 Milestones):**
                *   Milestone 1: "Project Kick-off", 30%, amount = `total_price` * 0.30, due_date = `estimated_start_date`
                *   Milestone 2: "Mid-Project Review 1", 30%, amount = `total_price` * 0.30, due_date = `midpoint1_date_str`
                *   Milestone 3: "Mid-Project Review 2", 30%, amount = `total_price` * 0.30, due_date = `midpoint2_date_str`
                *   Milestone 4: "Project Completion & Handover", 10%, amount = `total_price` * 0.10, due_date = `estimated_completion_date`
                *   Create `PaymentMilestone` objects for each. Ensure amounts are calculated correctly (handle potential rounding if needed, though direct multiplication is usually fine).
            11. **Populate Summary:** Populate `QuoteSummary` using extracted info, project name, quote ID, `quote_date_str` (from step 7), final `total_price`, `estimated_total_hours`, calculated dates (`estimated_start_date`, `estimated_completion_date`), the combined `estimation_justification`, **and the `overhead_buffer_percentage` estimated in step 2.**
            12. **Assemble Output:** Combine `QuoteSummary`, `line_items`, and `payment_schedule` into `QuoteDetailsOutput`.

            Ensure all estimations are justified, tool is called correctly, all dates use the current year ({self.today_date_obj.year}), milestones sum to 100%, and data mapping is accurate.
            """,
            expected_output=f"""\
            A Pydantic object conforming to the `QuoteDetailsOutput` schema.
            - `quote_summary.quote_date` must be today's date ({self.today_date_str}).
            - `quote_summary.estimated_start_date` and `quote_summary.estimated_completion_date` must be calculated correctly based on today's date and `total_estimated_hours`.
            - `quote_summary.estimation_justification` must be populated.
            - **`quote_summary.price_calculation_details` must be populated with the full `QuoteCalculationDetails` object returned by the QuoteCalculator tool.**
            - **`quote_summary.total_price` must match `quote_summary.price_calculation_details.final_total_price`.**
            - **`quote_summary.total_estimated_hours` must match `quote_summary.price_calculation_details.total_estimated_hours`.**
            - `payment_schedule` must contain 4 milestones (30%, 30%, 30%, 10%) with correctly calculated amounts based on the final total price.
            - All dates must reflect the current year ({self.today_date_obj.year}).

            The final output MUST be only the JSON representation of this populated object.
            """,
            agent=self.quote_structuring_agent(),
            output_pydantic=QuoteDetailsOutput,
        )
        self._tasks["quote_structuring"] = task
        return task

    # Task 5: Review and QA the quote data
    def qa_review_task(self) -> Task:
        analysis_json = self.analysis_output.model_dump_json(indent=2)
        task = Task(
            description=f"""\
            Review the structured quote data (`QuoteDetailsOutput` object from Task 4 context) for quality and accuracy against the provided `AnalysisOutput` JSON below.
            **You MUST use the provided `AnalysisOutput` JSON and the `QuoteDetailsOutput` from Task 4 context for your review.**

            ANALYSIS OUTPUT (Source of Truth):
            ```json
            {analysis_json}
            ```

            QUOTE DETAILS (To Be Reviewed - from Task 4 context):
            *(The `QuoteDetailsOutput` JSON from the previous task will be available here in the actual execution context)*

            **Verify the following based *only* on comparing Task 4 output (`QuoteDetailsOutput`) against the `AnalysisOutput` JSON and the logic described for Task 4:**
                - **Completeness:**
                    - Does `line_items` reflect `recommended_services`?
                    - **Is `quote_summary.price_calculation_details` present and fully populated?**
                - **Plausibility & Justification:**
                    - Is `quote_summary.total_estimated_hours` significantly higher than base hours (sum of `estimated_hours` in `line_items`)?
                    - Is `estimation_justification` present and reasonable?
                    - **Does `price_calculation_details.overhead_buffer_percentage` fall within the expected range (e.g., 0.10 to 0.20)?**
                - **Internal Consistency:**
                    - **Does `quote_summary.total_price` exactly match `quote_summary.price_calculation_details.final_total_price`?**
                    - **Does `quote_summary.total_estimated_hours` exactly match `quote_summary.price_calculation_details.total_estimated_hours`?**
                    - **Verify the calculation within `price_calculation_details`: Does `(details.subtotal_before_overhead * (1 + details.overhead_buffer_percentage))` approximately equal `details.final_total_price` (allowing for minor rounding)?**
                    - **Verify subtotal: Does `details.sum_base_service_prices + details.additional_development_cost` approximately equal `details.subtotal_before_overhead`?**
                - **Accuracy (Milestones/Dates):**
                    - Does `payment_schedule` contain exactly 4 milestones?
                    - Do percentages sum to 100?
                    - Are amounts calculated correctly based on `quote_summary.total_price`?
                    - Does `quote_summary.quote_date` match today's date ({self.today_date_str})?
                    - Is `estimated_completion_date` calculated correctly based on `quote_summary.total_estimated_hours`? (Verify logic as before). Are dates in the correct year?
                - **Budget Alignment:** (Check remains the same - compare `quote_summary.total_price` to parsed `budget`)
                - **Consistency:** Are `client_name`, `client_company`, `project_name` consistent?
                - **Clarity:** Are `line_items` descriptions reasonable? Is `payment_schedule` logical?

            **Output Format:**
            - If issues found (budget exceeded, date mismatch, milestone error, missing/poor justification, inconsistency), create `QAIssue` objects. Set `overall_status` to 'needs_revision'.
            - If satisfactory, set `overall_status` to 'approved'.
            - Optional `QAImprovementSuggestion`s.
            - Brief Markdown `summary`.
            - Base findings strictly on comparison.
            """,
            expected_output="""\
            A Pydantic object conforming to the `QAReviewOutput` schema.
            - `overall_status`: 'approved' or 'needs_revision'.
                        - `issues`: List of `QAIssue` objects if status is 'needs_revision'. Issues must detail problems based *only* on the context (e.g., budget comparison failure, date calculation mismatch, milestone errors, **inconsistent price breakdown**, missing justification).
            - `improvement_suggestions`: Optional list.
            - `summary`: Brief Markdown summary.

            The final output MUST be only the JSON representation of the populated `QAReviewOutput` object.
            """,
            agent=self.qa_agent(),
            context=[self._tasks["quote_structuring"]],
            output_pydantic=QAReviewOutput,
        )
        return task

    def crew(self) -> Crew:
        """Assemble the crew with all agents and tasks."""
        return Crew(
            agents=[
                self.quote_structuring_agent(),
                self.qa_agent(),
            ],
            tasks=[self.quote_structuring_task(), self.qa_review_task()],
            process=Process.sequential,
            verbose=False,
            step_callback=delay_before_task,
            max_rpm=15,
        )

In [17]:
# Get Crew 1 output
analysis_data: AnalysisOutput = analysis_result.pydantic

# Kickoff Crew 2
quote_crew = QuoteGenerationCrew(analysis_output=analysis_data)
quote_result = quote_crew.crew().kickoff()

[1m[95m# Agent:[00m [1m[92mQuote Data Structuring Specialist[00m
[95m## Task:[00m [92m            Synthesize information from the provided `AnalysisOutput` JSON below to structure the project quote data.
            Use the data within this JSON as the sole source for requirements, research, and service analysis.
            **Assume today's date is 2025-04-16 for all calculations.**

            ANALYSIS OUTPUT:
            ```json
            {
  "requirements": {
    "client_name": "Jane Smith",
    "client_company": "Wildfire Watch North America",
    "project_type": "Geospatial AI project",
    "main_objective": "Build a comprehensive database of past wildfire events and gather data suitable for fine-tuning an AI model for monitoring and predictive risk management of wildfires across Canada and the US.",
    "data_sources": [
      "MODIS satellite imagery",
      "VIIRS satellite imagery",
      "Landsat satellite imagery",
      "Sentinel satellite imagery",
      "Hist

## Step 8 - Statement of Work (SOW) Generation

Generate a professional SOW document from the structured quote and analysis. The SOW covers:

- Project overview, objectives, and summary
- In-scope and out-of-scope services and deliverables
- Key features, technical specifications, and data sources
- Acceptance criteria for each deliverable
- Timeline with estimated start, completion, and milestones
- Detailed price breakdown and payment schedule
- Roles and responsibilities for provider and client
- Assumptions, change management process, reporting, and disclaimer

The SOW is auto-filled using structured data, ensuring clarity and consistency for client approval.


### 8.1 - Create the SOW template

The template is formatted as markdown, for consistent output, but also include HTML tags to bypass formatting issues within Kaggle notebooks.


In [18]:
SOW_TEMPLATE = dedent("""\
    # Statement of Work

    ## 1. Introduction
    This Statement of Work (SOW) is made and entered into as of {today_date}, by and between {provider_name} ("Provider"), located at {provider_address}, and {client_company} ("Client"). This document outlines the scope, deliverables, timeline, and costs for the {project_name} project.

    ## 2. Project Overview & Objectives
    **Project Name:** {project_name} <br>
    **Main Objective:** {main_objective} <br>
    **Project Summary:**
    *{project_summary}*

    ## 3. Scope of Work
    ### 3.1. In-Scope Services & Deliverables
    The following services, based on the approved quote (ID: {quote_id}), are included:
    {line_items_markdown}

    Key features to be delivered include:
    {features_markdown}

    {customization_notes_section}

    ### 3.2. Out-of-Scope
    The following are explicitly excluded:
    *   Ongoing maintenance beyond a standard 30-day warranty period post-completion.
    *   Hardware procurement or setup.
    *   Data acquisition or licensing costs unless specified in line items.
    *   Training beyond basic handover documentation.
    *   Work not explicitly listed in section 3.1.

    ## 4. Technical Specifications
    *   **Key Technologies:**
        {key_technologies}
    *   **Data Sources:**
        {data_sources_list}
    *   **Performance Requirements:** Accuracy target: {accuracy_target}.
    *   **Outputs:**
        {outputs_list}

    ## 5. Deliverables & Acceptance Criteria
    Major deliverables correspond to the services listed in 3.1.
    {acceptance_criteria_markdown}

    ## 6. Project Timeline
    *   **Estimated Start Date:** {start_date}
    *   **Estimated Completion Date:** {completion_date}
    *   **Total Estimated Effort:** Approximately {total_hours} hours.

    ## 7. Roles & Responsibilities
    *   **Provider ({provider_name}):** Project management, development, testing, deployment, documentation.
    *   **Client ({client_company}):** Provide timely access to data sources ({data_sources_list}), provide subject matter expertise, assign a primary point of contact ({client_name}), participate in reviews, provide timely feedback/approvals.

    ## 8. Pricing & Payment Schedule
    *   **Total Project Price:** ${total_price:.2f}

    ### 8.1. Price Breakdown
    {price_breakdown_markdown}

    ### 8.2. Payment Schedule
    {payment_schedule_markdown}

    ## 9. Assumptions
    *   Client will provide access to specified data sources within 1 week of project start.
    *   Client point of contact will be available for weekly check-ins and provide feedback within 2 business days.
    *   Scope is fixed as defined in Section 3.1.
    *   Data quality from provided sources is sufficient for model training.
    *   {add_assumptions_from_research}

    ## 10. Change Management
    Any requested changes to scope, deliverables, or timeline must be submitted via a formal Change Request process. Approved changes may impact cost and schedule.

    ## 11. Reporting
    Provider will provide brief weekly status updates via email and hold bi-weekly review meetings with the Client.

    ## 12. Disclaimer
    Disclaimer: This Statement of Work was generated with AI assistance based on the approved quote (Quote ID: {quote_id}). It requires final human review and approval by both Provider and Client before execution.
    """)

### Step 8.2 - Definie the function that generate the SOW


In [19]:
def generate_sow(
    quote_details: QuoteDetailsOutput,
    analysis: AnalysisOutput,
    provider_info: Dict[str, str],
) -> str:
    """Generates the SOW Markdown document from structured data."""

    # Prepare template data
    qs = quote_details.quote_summary
    req = analysis.requirements
    sa = analysis.service_analysis
    res = analysis.research
    details = qs.price_calculation_details
    additional_development_cost = details.additional_development_cost
    overhead_amount = details.overhead_amount
    buffer_percentage_used = details.overhead_buffer_percentage
    total_estimated_hours = details.total_estimated_hours

    # Format Price Breakdown using HTML table
    price_breakdown_lines = ["<table>"]
    price_breakdown_lines.append(
        "<tr><th>Item</th><th>Estimated Cost</th><th>Notes</th></tr>"
    )

    # Add rows for services
    for item in quote_details.line_items:
        price_to_display = item.price
        price_breakdown_lines.append(
            f"<tr><td>Standard Service: {item.service_name}</td><td>${price_to_display:.2f}</td><td>Base price for standard offering</td></tr>"
        )

    # Add additional development cost row if necessary
    if additional_development_cost > 0:
        price_breakdown_lines.append(
            f"<tr><td>Additional Development & Lifecycle</td><td>${additional_development_cost:.2f}</td><td>Custom work, testing, PM, etc. ({total_estimated_hours} hrs)</td></tr>"
        )

    # Add overhead row if necessary
    if overhead_amount > 0:
        price_breakdown_lines.append(
            f"<tr><td>Overhead & Contingency ({buffer_percentage_used * 100:.0f}%)</td><td>${overhead_amount:.2f}</td><td>Project complexity & risk buffer</td></tr>"
        )

    # Add price breakdown by service
    price_breakdown_lines.append(
        f"<tr><th>Total Estimated Project Price</th><th>${qs.total_price:.2f}</th><td></td></tr>"
    )
    price_breakdown_lines.append("</table>")
    price_breakdown_md = "\n".join(price_breakdown_lines)

    line_items = "\n".join(
        [
            f"<li><b>{item.service_name}</b>: {item.description} (Base: ${item.price:.2f}, {item.estimated_hours} hrs)</li>"
            for item in quote_details.line_items
        ]
    )
    line_items_md = f"\n<ul>\n{line_items}\n</ul>\n"

    # Features as HTML list
    features_html = "<ul>\n"
    features_html += "\n".join(
        [f"<li>{feature}</li>" for feature in req.features_needed]
    )
    features_html += "\n</ul>"

    # Payment schedule as HTML table
    payment_schedule_lines = ["<table>"]
    payment_schedule_lines.append(
        "<tr><th>Milestone</th><th>Percentage</th><th>Amount</th><th>Due Date</th></tr>"
    )

    # Add rows for each milestone
    for milestone in quote_details.payment_schedule:
        payment_schedule_lines.append(
            f"<tr><td>{milestone.milestone}</td><td>{milestone.percentage}%</td><td>${milestone.amount:.2f}</td><td>{milestone.due_date}</td></tr>"
        )

    payment_schedule_lines.append("</table>")
    payment_schedule_md = "\n".join(payment_schedule_lines)

    # Acceptance criteria as HTML list
    acceptance_criteria_lines = ["<ul>"]

    for i, item in enumerate(quote_details.line_items):
        feature_text = req.features_needed[i] if i < len(req.features_needed) else "N/A"
        acceptance_criteria_lines.append(
            f"<li><b>{item.service_name} Deliverable</b>: Corresponds to service {item.service_name}.<br>"
            f"<ul><li><b>Acceptance Criteria:</b> Successfully delivers functionality related to '{feature_text}' as described.</li></ul></li>"
        )

    acceptance_criteria_lines.append("</ul>")
    acceptance_criteria_html = "\n".join(acceptance_criteria_lines)

    # Key technologies
    key_tech = (
        res.best_practices_insights
        if res.best_practices_insights
        else "Standard web/cloud technologies as appropriate"
    )

    # Assumptions from research
    add_assumptions = (
        f"Potential challenges identified during research include: {res.potential_challenges}"
        if res.potential_challenges
        else ""
    )

    # Populate Template
    sow_content = SOW_TEMPLATE.format(
        today_date=date.today().strftime("%Y-%m-%d"),
        provider_name=provider_info.get("name", "Acme Inc."),
        provider_address=provider_info.get(
            "address", "123 Business Street, City, Country"
        ),
        client_company=qs.client_company or "N/A",
        project_name=qs.project_name,
        main_objective=req.main_objective or "N/A",
        project_summary=req.summary or "",
        quote_id=qs.quote_id,
        line_items_markdown=line_items_md,
        features_markdown=features_html,
        customization_notes_section=(
            f"<p><em>Includes custom development considerations for: {', '.join(sa.customization_notes)}</em></p>"
            if sa.customization_notes
            else ""
        ),
        key_technologies=key_tech.strip(),
        data_sources_list=", ".join(req.data_sources) if req.data_sources else "N/A",
        accuracy_target=req.technical_requirements.accuracy
        if req.technical_requirements
        else "N/A",
        outputs_list=", ".join(req.technical_requirements.outputs)
        if req.technical_requirements and req.technical_requirements.outputs
        else "N/A",
        acceptance_criteria_markdown=acceptance_criteria_html,
        start_date=qs.estimated_start_date,
        completion_date=qs.estimated_completion_date,
        total_hours=qs.total_estimated_hours,
        client_name=qs.client_name or "N/A",
        total_price=qs.total_price,
        price_breakdown_markdown=price_breakdown_md,
        payment_schedule_markdown=payment_schedule_md,
        add_assumptions_from_research=add_assumptions.strip(),
    )

    return sow_content

### 8.3 - SOW Generation Execution

The function `create_final_sow` orchestrates the final SOW generation process:

- Takes the outputs from both crews
- Validates that outputs are present and of the correct types
- Calls `generate_sow` to produce the final SOW markdown document


In [20]:
def create_final_sow(quote_result: Any, analysis_data: AnalysisOutput) -> Optional[str]:
    """
    Generates the final Statement of Work (SOW) if the quote passes QA.

    Args:
        quote_result: The output object from the QuoteGenerationCrew.
        analysis_data: The output object from the DataAnalysisCrew.

    Returns:
        The generated SOW as a markdown string, or None if QA failed or an error occurred.
    """
    provider_details = {
        "name": "Acme Inc.",
        "address": "123 Your Street, ABCity, EDCountry",
    }

    try:
        # Ensure quote_result and its tasks_output are valid
        if not (
            quote_result
            and hasattr(quote_result, "tasks_output")
            and len(quote_result.tasks_output) == 2
        ):
            raise ValueError("Invalid quote_result structure.")

        quote_details_task = quote_result.tasks_output[0]
        qa_output_task = quote_result.tasks_output[-1]

        # Check task outputs against pydantic models
        if not (
            hasattr(qa_output_task, "pydantic")
            and isinstance(qa_output_task.pydantic, QAReviewOutput)
            and hasattr(quote_details_task, "pydantic")
            and isinstance(quote_details_task.pydantic, QuoteDetailsOutput)
        ):
            raise ValueError("Unexpected output types from quote generation tasks.")

        # qa_data: QAReviewOutput = qa_output_task.pydantic
        quote_data: QuoteDetailsOutput = quote_details_task.pydantic

        # if qa_data.overall_status == "approved":
        final_sow = generate_sow(
            quote_details=quote_data,
            analysis=analysis_data,
            provider_info=provider_details,
        )
        return final_sow

    except Exception as e:
        return f"Error during SOW generation: {e}"

In [21]:
# Generate the SOW
final_sow = create_final_sow(quote_result, analysis_data)

# Display as markdown, using display and HTML to bypass Kaggle's markdown limitations
display(HTML(md.markdown(final_sow)))

Item,Estimated Cost,Notes
Standard Service: AI-Assisted Geospatial Data Wrangling,$950.00,Base price for standard offering
Standard Service: Satellite Image Analysis with AI,$1200.00,Base price for standard offering
Standard Service: Spatial Predictive Modeling,$1500.00,Base price for standard offering
Standard Service: Geospatial React Dashboard,$1400.00,Base price for standard offering
Standard Service: AI Integration for Geospatial,$900.00,Base price for standard offering
Additional Development & Lifecycle,$23000.00,"Custom work, testing, PM, etc. (230 hrs)"
Overhead & Contingency (15%),$4342.50,Project complexity & risk buffer
Total Estimated Project Price,$33292.50,

Milestone,Percentage,Amount,Due Date
Project Kick-off,30%,$9987.75,2025-04-23
Mid-Project Review 1,30%,$9987.75,2025-06-04
Mid-Project Review 2,30%,$9987.75,2025-08-06
Project Completion & Handover,10%,$3329.25,2025-10-08


---
## Findings, Limitations, and Next Steps

**Key Findings:**

- The GenAI quoting agent can automate much of the requirements extraction, service recommendation, and quote generation process for AI-powered geospatial projects.
- The use of multi-agent orchestration, vector search, and structured output enables modular, auditable, and extensible workflows.
- Automated SOW generation from structured quote and analysis ensures professional, consistent client documentation with minimal manual effort.

**Limitations:**

- The QA agent currently only reviews and recommends changes. It does not trigger an automatic update or re-run of the quote if issues are found.
- There is no feedback loop to update the quote based on QA recommendations. This could be improved by adding a third agent to revise the quote or by making the QA output actionable for a human reviewer.
- The QA output could be oriented toward the company team, highlighting what to review before finalizing the quote with the client.
- The SOW is generated based on available outputs, even if the QA agent flags issues. A stricter gating mechanism could be considered.

**Next Steps / Future Improvements:**

- Add a third agent or workflow step to revise the quote based on QA recommendations, enabling a closed feedback loop.
- Enhance the QA output to provide actionable checklists for human reviewers.
- Expand the system to handle more complex negotiation scenarios or multi-round quoting.
- Improve error handling and user experience for non-technical users.
- Implement the final solution as fully featured cloud solution using Mastra.ai, within an existing project, a new dedicated project/backend, or even as MCP server. 

---


## References & Acknowledgements

- [Kaggle Gen AI Intensive Course Capstone 2025Q1](https://www.kaggle.com/competitions/gen-ai-intensive-course-capstone-2025q1)
- [CrewAI](https://docs.crewai.com/introduction)
- [LangChain](https://python.langchain.com/docs/introduction/)
- [Chroma Vector DB](https://docs.trychroma.com/docs/overview/introduction)
- [Google Generative AI](https://ai.google.dev/)
