# Hugging_Face_smol_agent_contract_search

Taken from Lab agent notbook: HuggingFace smolagents: Lightweight and Focused Agents

**Objective:** This notebook introduces `smolagents`, a lightweight and easy-to-understand agent framework from HuggingFace. You will learn its core philosophy and how to build a simple, tool-using agent based on the official documentation.

**Target Audience:** Software engineers attending the AI-Driven Software Engineering Program.

**Core Philosophy:** Simplicity and clarity. `smolagents` is not designed to be a sprawling, all-encompassing framework. Instead, it provides a minimal, clean, and effective implementation of a tool-using agent. It's an excellent choice for learning the fundamentals of the ReAct (Reason+Act) loop or for projects that need a simple, focused agent without a lot of overhead.

## 1. Setup

We will install `smolagents` and its dependencies. As per the official documentation, we'll include the `[litellm]` extra, which is required to use OpenAI models like GPT-4o.

In [1]:
import os
import json
import sys
from dotenv import load_dotenv

load_dotenv()

# Add the project's root directory to the Python path to ensure 'utils' can be imported.
# Get the current working directory and navigate to project root
current_dir = os.getcwd()
print(f"Current working directory: {current_dir}")

# From backend folder, go up one level to reach project root
project_root = os.path.abspath(os.path.join(current_dir, '..'))
print(f"Project root: {project_root}")

if project_root not in sys.path:
    sys.path.insert(0, project_root)
    print(f"Added {project_root} to sys.path")

# Verify the utils directory exists
utils_path = os.path.join(project_root, 'utils')
print(f"Utils path exists: {os.path.exists(utils_path)}")

try:
    from utils import setup_llm_client, get_completion, save_artifact, load_artifact, clean_llm_output
    print("Successfully imported utils functions!")
except ImportError as e:
    print(f"Import error: {e}")
    print(f"sys.path: {sys.path}")

if not os.getenv("OPENAI_API_KEY"):
    print("ERROR: OPENAI_API_KEY not found. Please check your .env file.")
else:
    print("OPENAI_API_KEY found!")

Current working directory: c:\Users\labadmin\Documents\repo\KMSH_contracting\backend
Project root: c:\Users\labadmin\Documents\repo\KMSH_contracting
Added c:\Users\labadmin\Documents\repo\KMSH_contracting to sys.path
Utils path exists: True
Successfully imported utils functions!
OPENAI_API_KEY found!
Successfully imported utils functions!
OPENAI_API_KEY found!


## 2. Foundational Agent with a Custom Tool

The primary use case for `smolagents` is creating agents that can use tools. To ensure compatibility, we must define our custom tools as classes that inherit from the library's `Tool` base class and implement a `forward` method for the execution logic.

The process involves:
1.  Defining a custom tool class that inherits from `smolagents.tools.Tool`.
2.  Initializing the `CodeAgent` with a `LiteLLMModel` wrapper and an instance of our custom tool class.
3.  Running the agent with a prompt.

In [3]:
from smolagents import CodeAgent, LiteLLMModel, DuckDuckGoSearchTool
from smolagents.tools import Tool
from typing import ClassVar

# 1. Define a custom tool class inheriting from smolagents' Tool
class ScrapeContractData(Tool):
    """Tool to webscrape data on awarded government contracts."""
    name: ClassVar[str] = "scrape_contract_data"
    description: ClassVar[str] = "Scrapes data on awarded government contracts."
    inputs: ClassVar[dict] = {
        "contract_id": {
            "type": "string",
            "description": "The ID of the government contract."
        }
    }
    output_type: ClassVar[str] = "string"

# determines the winner based on simple rules
    def forward(self, contract_id: str) -> str:
        # Placeholder implementation
        return f"Scraped data for contract ID {contract_id}"

# 2. Initialize the Model using the LiteLLMModel wrapper for OpenAI models
model = LiteLLMModel(model_id="gpt-4o")

# 3. Initialize the Agent with an instance of our custom tool
smol_agent = CodeAgent(
    model=model, 
    tools=[DuckDuckGoSearchTool()]
)

seed_data_format = """
[
  {
    "ContractName": "Example Contract 1",
    "CompanyName": "Acme Corp",
    "TotalContractValue": "$1,000,000",
    "DateAwarded": "2024-05-10",
    "Location": "Washington, DC",
    "LengthOfContract": "3 years",
    "NAICS": "541330",
    "PSC": "R499"
  },
]
"""


# 4. Run the agent
print("--- Running smol-agent ---")
# The .run() method will print the verbose thought process by default.
response = smol_agent.run(
    f"""For the top 10 U.S. government awarded contracts, over the past two years find the following information:
                          
    - Contract Name, 
    - CompanyName, 
    - TotalContractValue, 
    - Date Awarded, 
    - Location, 
    - Length of Contract and NAICS or PSC categories if available.

    If you cannot find all the information for a contract, include as much as you can and use "N/A" for missing fields.

    Extract the data and format it as a JSON array with the following format:
    {seed_data_format}

    Output of final JSON array must be wrapped in <code> ... </code> tags.
    
    """)

print(f"\n--- Final Answer ---")

# Save the response and cleaned code to artifacts
if response:    
    print(f"Response type: {type(response)}")
    print(response)
    
    # Convert response to string if it's not already
    response_str = str(response) if not isinstance(response, str) else response
    
    try:
        # Only call clean_llm_output if we have a string
        cleaned_code = clean_llm_output(response_str, language='json')
        print(f"Cleaned code type: {type(cleaned_code)}")
        
        # Convert to string for saving if it's not already
        if isinstance(cleaned_code, list):
            cleaned_code_str = json.dumps(cleaned_code, indent=2)
        elif not isinstance(cleaned_code, str):
            cleaned_code_str = str(cleaned_code)
        else:
            cleaned_code_str = cleaned_code
            
        print("Cleaned output:")
        print(cleaned_code_str)
        
        # Save the cleaned output
        save_artifact(cleaned_code_str, "app/contract_search_cleaned.json", overwrite=True)
        print("Saved final answer to contract_search_cleaned.json")
        
    except Exception as e:
        print(f"Error cleaning output: {e}")
        print("Saving raw response instead...")
        save_artifact(response_str, "app/contract_search_raw.txt", overwrite=True)
        print("Saved raw response to contract_search_raw.txt")
        
else:
    print("Unable to save artifacts. Response is empty.")

--- Running smol-agent ---


[92m10:59:36 - LiteLLM:INFO[0m: utils.py:3389 - 
LiteLLM completion() model= gpt-4o; provider = openai
[92m10:59:40 - LiteLLM:INFO[0m: utils.py:1282 - Wrapper: Completed Call, calling success_handler


[92m10:59:41 - LiteLLM:INFO[0m: utils.py:3389 - 
LiteLLM completion() model= gpt-4o; provider = openai
[92m10:59:45 - LiteLLM:INFO[0m: utils.py:1282 - Wrapper: Completed Call, calling success_handler


[92m10:59:46 - LiteLLM:INFO[0m: utils.py:3389 - 
LiteLLM completion() model= gpt-4o; provider = openai
[92m10:59:53 - LiteLLM:INFO[0m: utils.py:1282 - Wrapper: Completed Call, calling success_handler


[92m10:59:54 - LiteLLM:INFO[0m: utils.py:3389 - 
LiteLLM completion() model= gpt-4o; provider = openai
[92m11:00:04 - LiteLLM:INFO[0m: utils.py:1282 - Wrapper: Completed Call, calling success_handler


[92m11:00:04 - LiteLLM:INFO[0m: utils.py:3389 - 
LiteLLM completion() model= gpt-4o; provider = openai
[92m11:00:19 - LiteLLM:INFO[0m: utils.py:1282 - Wrapper: Completed Call, calling success_handler


[92m11:00:19 - LiteLLM:INFO[0m: utils.py:3389 - 
LiteLLM completion() model= gpt-4o; provider = openai
[92m11:00:27 - LiteLLM:INFO[0m: utils.py:1282 - Wrapper: Completed Call, calling success_handler


[92m11:00:27 - LiteLLM:INFO[0m: utils.py:3389 - 
LiteLLM completion() model= gpt-4o; provider = openai
[92m11:00:34 - LiteLLM:INFO[0m: utils.py:1282 - Wrapper: Completed Call, calling success_handler


[92m11:00:34 - LiteLLM:INFO[0m: utils.py:3389 - 
LiteLLM completion() model= gpt-4o; provider = openai
[92m11:00:37 - LiteLLM:INFO[0m: utils.py:1282 - Wrapper: Completed Call, calling success_handler


[92m11:00:37 - LiteLLM:INFO[0m: utils.py:3389 - 
LiteLLM completion() model= gpt-4o; provider = openai
[92m11:00:44 - LiteLLM:INFO[0m: utils.py:1282 - Wrapper: Completed Call, calling success_handler


[92m11:00:44 - LiteLLM:INFO[0m: utils.py:3389 - 
LiteLLM completion() model= gpt-4o; provider = openai
[92m11:00:48 - LiteLLM:INFO[0m: utils.py:1282 - Wrapper: Completed Call, calling success_handler


[92m11:00:49 - LiteLLM:INFO[0m: utils.py:3389 - 
LiteLLM completion() model= gpt-4o; provider = openai
[92m11:00:55 - LiteLLM:INFO[0m: utils.py:1282 - Wrapper: Completed Call, calling success_handler


[92m11:00:55 - LiteLLM:INFO[0m: utils.py:3389 - 
LiteLLM completion() model= gpt-4o; provider = openai
[92m11:00:59 - LiteLLM:INFO[0m: utils.py:1282 - Wrapper: Completed Call, calling success_handler


[92m11:01:00 - LiteLLM:INFO[0m: utils.py:3389 - 
LiteLLM completion() model= gpt-4o; provider = openai
[92m11:01:04 - LiteLLM:INFO[0m: utils.py:1282 - Wrapper: Completed Call, calling success_handler


[92m11:01:04 - LiteLLM:INFO[0m: utils.py:3389 - 
LiteLLM completion() model= gpt-4o; provider = openai
[92m11:01:10 - LiteLLM:INFO[0m: utils.py:1282 - Wrapper: Completed Call, calling success_handler


[92m11:01:10 - LiteLLM:INFO[0m: utils.py:3389 - 
LiteLLM completion() model= gpt-4o; provider = openai
[92m11:01:17 - LiteLLM:INFO[0m: utils.py:1282 - Wrapper: Completed Call, calling success_handler


[92m11:01:17 - LiteLLM:INFO[0m: utils.py:3389 - 
LiteLLM completion() model= gpt-4o; provider = openai
[92m11:01:38 - LiteLLM:INFO[0m: utils.py:1282 - Wrapper: Completed Call, calling success_handler


[92m11:01:38 - LiteLLM:INFO[0m: utils.py:3389 - 
LiteLLM completion() model= gpt-4o; provider = openai
[92m11:01:43 - LiteLLM:INFO[0m: utils.py:1282 - Wrapper: Completed Call, calling success_handler


[92m11:01:43 - LiteLLM:INFO[0m: utils.py:3389 - 
LiteLLM completion() model= gpt-4o; provider = openai
[92m11:01:48 - LiteLLM:INFO[0m: utils.py:1282 - Wrapper: Completed Call, calling success_handler


[92m11:01:48 - LiteLLM:INFO[0m: utils.py:3389 - 
LiteLLM completion() model= gpt-4o; provider = openai
[92m11:01:54 - LiteLLM:INFO[0m: utils.py:1282 - Wrapper: Completed Call, calling success_handler


[92m11:01:54 - LiteLLM:INFO[0m: utils.py:3389 - 
LiteLLM completion() model= gpt-4o; provider = openai
[92m11:01:58 - LiteLLM:INFO[0m: utils.py:1282 - Wrapper: Completed Call, calling success_handler


[92m11:01:58 - LiteLLM:INFO[0m: utils.py:3389 - 
LiteLLM completion() model= gpt-4o; provider = openai
[92m11:02:44 - LiteLLM:INFO[0m: utils.py:1282 - Wrapper: Completed Call, calling success_handler



--- Final Answer ---
Response type: <class 'smolagents.agent_types.AgentText'>
Certainly! Here is the JSON array for the top U.S. government contracts formatted as required:

<code>
[
  {
    "ContractName": "F-35 Production Contract Modification",
    "CompanyName": "Lockheed Martin",
    "TotalContractValue": "$30 billion",
    "DateAwarded": "January 2023",
    "Location": "N/A",
    "LengthOfContract": "N/A",
    "NAICS": "336413",
    "PSC": "K015"
  },
  {
    "ContractName": "CH-53K King Stallion Helicopters",
    "CompanyName": "Sikorsky (Lockheed Martin)",
    "TotalContractValue": "$10.86 billion",
    "DateAwarded": "2023",
    "Location": "N/A",
    "LengthOfContract": "N/A",
    "NAICS": "336411",
    "PSC": "1510"
  },
  {
    "ContractName": "Coyote Missile System",
    "CompanyName": "Raytheon Technologies",
    "TotalContractValue": "$5.04 billion",
    "DateAwarded": "2023",
    "Location": "N/A",
    "LengthOfContract": "N/A",
    "NAICS": "332994",
    "PSC": "NA"


In [4]:
# Alternative: Create a more focused agent with mock data for demonstration
# This simulates what a real government contracts API would return

class GovernmentContractsAPI(Tool):
    """Tool that simulates access to government contracts database."""
    name: ClassVar[str] = "government_contracts_search"
    description: ClassVar[str] = "Searches government contracts database for awarded contracts with detailed information."
    inputs: ClassVar[dict] = {
        "query": {
            "type": "string", 
            "description": "Search query for government contracts (e.g., 'top 10 largest contracts 2023-2024')"
        }
    }
    output_type: ClassVar[str] = "string"

    def forward(self, query: str) -> str:
        # Mock data that represents what a real API might return
        mock_contracts = [
            {
                "ContractName": "Joint Enterprise Defense Infrastructure (JEDI) Cloud",
                "CompanyName": "Microsoft Corporation",
                "TotalContractValue": "$10,000,000,000",
                "DateAwarded": "2019-10-25",
                "Location": "Arlington, VA",
                "LengthOfContract": "10 years",
                "NAICS": "518210",
                "PSC": "D307"
            },
            {
                "ContractName": "Enterprise Infrastructure Solutions (EIS)",
                "CompanyName": "General Services Administration",
                "TotalContractValue": "$50,000,000,000",
                "DateAwarded": "2017-09-29",
                "Location": "Washington, DC", 
                "LengthOfContract": "15 years",
                "NAICS": "517311",
                "PSC": "D316"
            },
            {
                "ContractName": "Alliant 2 Government-wide Acquisition Contract",
                "CompanyName": "Multiple Vendors",
                "TotalContractValue": "$65,000,000,000",
                "DateAwarded": "2017-12-15",
                "Location": "Washington, DC",
                "LengthOfContract": "10 years",
                "NAICS": "541511",
                "PSC": "R425"
            },
            {
                "ContractName": "NASA SEWP V (Solutions for Enterprise-Wide Procurement)",
                "CompanyName": "Multiple Vendors",
                "TotalContractValue": "$20,000,000,000",
                "DateAwarded": "2015-04-01",
                "Location": "Greenbelt, MD",
                "LengthOfContract": "10 years",
                "NAICS": "423430",
                "PSC": "7030"
            },
            {
                "ContractName": "Department of Veterans Affairs Electronic Health Record",
                "CompanyName": "Cerner Corporation",
                "TotalContractValue": "$16,100,000,000",
                "DateAwarded": "2018-05-17",
                "Location": "Kansas City, MO",
                "LengthOfContract": "10 years",
                "NAICS": "541511",
                "PSC": "D302"
            },
            {
                "ContractName": "Air Force Enterprise IT-as-a-Service (EITaaS)",
                "CompanyName": "General Dynamics Information Technology",
                "TotalContractValue": "$7,200,000,000",
                "DateAwarded": "2020-03-26",
                "Location": "San Antonio, TX",
                "LengthOfContract": "8 years",
                "NAICS": "541511",
                "PSC": "D316"
            },
            {
                "ContractName": "GSA Multiple Award Schedule (MAS)",
                "CompanyName": "IBM Corporation",
                "TotalContractValue": "$12,000,000,000",
                "DateAwarded": "2019-07-15",
                "Location": "Armonk, NY",
                "LengthOfContract": "20 years",
                "NAICS": "541511",
                "PSC": "D307"
            },
            {
                "ContractName": "Defense Enterprise Office Solution (DEOS)",
                "CompanyName": "Dell Technologies",
                "TotalContractValue": "$7,600,000,000",
                "DateAwarded": "2020-02-07",
                "Location": "Round Rock, TX",
                "LengthOfContract": "7 years",
                "NAICS": "334111",
                "PSC": "7025"
            },
            {
                "ContractName": "Homeland Security Enterprise Network (HSEN)",
                "CompanyName": "Verizon Communications",
                "TotalContractValue": "$3,100,000,000",
                "DateAwarded": "2021-09-30",
                "Location": "New York, NY",
                "LengthOfContract": "5 years",
                "NAICS": "517311",
                "PSC": "D316"
            },
            {
                "ContractName": "Navy SeaPort Next Generation (SeaPort NxG)",
                "CompanyName": "Multiple Vendors",
                "TotalContractValue": "$5,300,000,000",
                "DateAwarded": "2018-06-12",
                "Location": "Washington, DC",
                "LengthOfContract": "10 years",
                "NAICS": "541330",
                "PSC": "R425"
            }
        ]
        
        import json
        return f"Found {len(mock_contracts)} major government contracts:\n{json.dumps(mock_contracts, indent=2)}"

# Create a more efficient agent using the mock API
print("Creating more efficient agent with government contracts API...")
efficient_agent = CodeAgent(
    model=model,
    tools=[GovernmentContractsAPI()]
)

print("--- Running Efficient Agent ---")
efficient_response = efficient_agent.run(
    f"""Use the government_contracts_search tool to find the top 10 U.S. government awarded contracts. 
    
    Format the results as a JSON array matching this structure:
    {seed_data_format}
    
    Output the final JSON array wrapped in <code> ... </code> tags.
    """
)

print(f"\n--- Efficient Agent Results ---")
print(efficient_response)

Creating more efficient agent with government contracts API...
--- Running Efficient Agent ---


[92m11:05:56 - LiteLLM:INFO[0m: utils.py:3389 - 
LiteLLM completion() model= gpt-4o; provider = openai
[92m11:06:01 - LiteLLM:INFO[0m: utils.py:1282 - Wrapper: Completed Call, calling success_handler


[92m11:06:01 - LiteLLM:INFO[0m: utils.py:3389 - 
LiteLLM completion() model= gpt-4o; provider = openai
[92m11:06:11 - LiteLLM:INFO[0m: utils.py:1282 - Wrapper: Completed Call, calling success_handler


[92m11:06:11 - LiteLLM:INFO[0m: utils.py:3389 - 
LiteLLM completion() model= gpt-4o; provider = openai
[92m11:07:14 - LiteLLM:INFO[0m: utils.py:1282 - Wrapper: Completed Call, calling success_handler


[92m11:07:14 - LiteLLM:INFO[0m: utils.py:3389 - 
LiteLLM completion() model= gpt-4o; provider = openai
[92m11:08:11 - LiteLLM:INFO[0m: utils.py:1282 - Wrapper: Completed Call, calling success_handler


[92m11:08:11 - LiteLLM:INFO[0m: utils.py:3389 - 
LiteLLM completion() model= gpt-4o; provider = openai
[92m11:09:03 - LiteLLM:INFO[0m: utils.py:1282 - Wrapper: Completed Call, calling success_handler


[92m11:09:03 - LiteLLM:INFO[0m: utils.py:3389 - 
LiteLLM completion() model= gpt-4o; provider = openai
[92m11:09:52 - LiteLLM:INFO[0m: utils.py:1282 - Wrapper: Completed Call, calling success_handler


[92m11:09:52 - LiteLLM:INFO[0m: utils.py:3389 - 
LiteLLM completion() model= gpt-4o; provider = openai
[92m11:09:55 - LiteLLM:INFO[0m: utils.py:1282 - Wrapper: Completed Call, calling success_handler


[92m11:09:55 - LiteLLM:INFO[0m: utils.py:3389 - 
LiteLLM completion() model= gpt-4o; provider = openai
[92m11:10:05 - LiteLLM:INFO[0m: utils.py:1282 - Wrapper: Completed Call, calling success_handler


[92m11:10:05 - LiteLLM:INFO[0m: utils.py:3389 - 
LiteLLM completion() model= gpt-4o; provider = openai
[92m11:10:17 - LiteLLM:INFO[0m: utils.py:1282 - Wrapper: Completed Call, calling success_handler


[92m11:10:18 - LiteLLM:INFO[0m: utils.py:3389 - 
LiteLLM completion() model= gpt-4o; provider = openai
[92m11:10:28 - LiteLLM:INFO[0m: utils.py:1282 - Wrapper: Completed Call, calling success_handler


[92m11:10:28 - LiteLLM:INFO[0m: utils.py:3389 - 
LiteLLM completion() model= gpt-4o; provider = openai
[92m11:10:41 - LiteLLM:INFO[0m: utils.py:1282 - Wrapper: Completed Call, calling success_handler


[92m11:10:41 - LiteLLM:INFO[0m: utils.py:3389 - 
LiteLLM completion() model= gpt-4o; provider = openai
[92m11:10:52 - LiteLLM:INFO[0m: utils.py:1282 - Wrapper: Completed Call, calling success_handler


[92m11:10:52 - LiteLLM:INFO[0m: utils.py:3389 - 
LiteLLM completion() model= gpt-4o; provider = openai
[92m11:10:54 - LiteLLM:INFO[0m: utils.py:1282 - Wrapper: Completed Call, calling success_handler


[92m11:10:54 - LiteLLM:INFO[0m: utils.py:3389 - 
LiteLLM completion() model= gpt-4o; provider = openai
[92m11:11:08 - LiteLLM:INFO[0m: utils.py:1282 - Wrapper: Completed Call, calling success_handler


[92m11:11:08 - LiteLLM:INFO[0m: utils.py:3389 - 
LiteLLM completion() model= gpt-4o; provider = openai
[92m11:11:53 - LiteLLM:INFO[0m: utils.py:1282 - Wrapper: Completed Call, calling success_handler


[92m11:11:53 - LiteLLM:INFO[0m: utils.py:3389 - 
LiteLLM completion() model= gpt-4o; provider = openai
[92m11:12:41 - LiteLLM:INFO[0m: utils.py:1282 - Wrapper: Completed Call, calling success_handler


[92m11:12:41 - LiteLLM:INFO[0m: utils.py:3389 - 
LiteLLM completion() model= gpt-4o; provider = openai
[92m11:13:33 - LiteLLM:INFO[0m: utils.py:1282 - Wrapper: Completed Call, calling success_handler
[92m11:13:33 - LiteLLM:INFO[0m: utils.py:1282 - Wrapper: Completed Call, calling success_handler


[92m11:13:33 - LiteLLM:INFO[0m: utils.py:3389 - 
LiteLLM completion() model= gpt-4o; provider = openai
[92m11:14:28 - LiteLLM:INFO[0m: utils.py:1282 - Wrapper: Completed Call, calling success_handler
[92m11:14:28 - LiteLLM:INFO[0m: utils.py:1282 - Wrapper: Completed Call, calling success_handler


[92m11:14:28 - LiteLLM:INFO[0m: utils.py:3389 - 
LiteLLM completion() model= gpt-4o; provider = openai
[92m11:14:30 - LiteLLM:INFO[0m: utils.py:1282 - Wrapper: Completed Call, calling success_handler
[92m11:14:30 - LiteLLM:INFO[0m: utils.py:1282 - Wrapper: Completed Call, calling success_handler


[92m11:14:30 - LiteLLM:INFO[0m: utils.py:3389 - 
LiteLLM completion() model= gpt-4o; provider = openai
[92m11:14:40 - LiteLLM:INFO[0m: utils.py:1282 - Wrapper: Completed Call, calling success_handler
[92m11:14:40 - LiteLLM:INFO[0m: utils.py:1282 - Wrapper: Completed Call, calling success_handler


[92m11:14:40 - LiteLLM:INFO[0m: utils.py:3389 - 
LiteLLM completion() model= gpt-4o; provider = openai
[92m11:15:45 - LiteLLM:INFO[0m: utils.py:1282 - Wrapper: Completed Call, calling success_handler
[92m11:15:45 - LiteLLM:INFO[0m: utils.py:1282 - Wrapper: Completed Call, calling success_handler



--- Efficient Agent Results ---
Apologies for the earlier issues. Here is the JSON array as specified, encapsulated without additional commentary:

<code>
[
  {
    "ContractName": "Joint Enterprise Defense Infrastructure (JEDI) Cloud",
    "CompanyName": "Microsoft Corporation",
    "TotalContractValue": "$10,000,000,000",
    "DateAwarded": "2019-10-25",
    "Location": "Arlington, VA",
    "LengthOfContract": "10 years",
    "NAICS": "518210",
    "PSC": "D307"
  },
  {
    "ContractName": "Enterprise Infrastructure Solutions (EIS)",
    "CompanyName": "General Services Administration",
    "TotalContractValue": "$50,000,000,000",
    "DateAwarded": "2017-09-29",
    "Location": "Washington, DC",
    "LengthOfContract": "15 years",
    "NAICS": "517311",
    "PSC": "D316"
  },
  {
    "ContractName": "Alliant 2 Government-wide Acquisition Contract",
    "CompanyName": "Multiple Vendors",
    "TotalContractValue": "$65,000,000,000",
    "DateAwarded": "2017-12-15",
    "Location": "

## Why Web Search Agents Return Limited Results

The issue you're experiencing is common with web search-based agents. Here's why and how to fix it:

### Problems with Web Search for Government Contracts:
1. **Limited Search Results**: DuckDuckGo returns ~10-20 results per query
2. **Data Behind Paywalls**: Real contract data is in government databases (SAM.gov, USAspending.gov)
3. **Search Inefficiency**: Agent spends time on irrelevant searches
4. **Context Window Limits**: Agent forgets earlier findings

### Solutions:
1. **Use Real APIs**: SAM.gov API, USAspending.gov API, FPDS
2. **Mock Data**: For demos, use realistic mock data (shown above)
3. **Focused Queries**: Limit search scope and be more specific
4. **Better Tools**: Create domain-specific tools instead of generic web search

The mock API above demonstrates what you'd get with real government contract APIs - complete, structured data instead of limited web search results.

In [None]:
# Import Contract Data into Database
import sqlite3
import json
from datetime import datetime

def import_contracts_to_database(contracts_data, db_path="artifacts/convisoft.db"):
    """
    Import contract data into SQLite database
    """
    # Connect to the database
    conn = sqlite3.connect(db_path)
    cursor = conn.cursor()
    
    # Create contracts table if it doesn't exist
    cursor.execute('''
        CREATE TABLE IF NOT EXISTS contracts (
            id INTEGER PRIMARY KEY AUTOINCREMENT,
            contract_name TEXT NOT NULL,
            company_name TEXT NOT NULL,
            total_contract_value TEXT,
            date_awarded DATE,
            location TEXT,
            length_of_contract TEXT,
            naics TEXT,
            psc TEXT,
            created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
        )
    ''')
    
    # Clear existing data (optional - remove if you want to keep existing records)
    cursor.execute('DELETE FROM contracts')
    
    # Insert contract data
    contracts_inserted = 0
    for contract in contracts_data:
        try:
            cursor.execute('''
                INSERT INTO contracts (
                    contract_name, company_name, total_contract_value, 
                    date_awarded, location, length_of_contract, naics, psc
                ) VALUES (?, ?, ?, ?, ?, ?, ?, ?)
            ''', (
                contract.get('ContractName', ''),
                contract.get('CompanyName', ''),
                contract.get('TotalContractValue', ''),
                contract.get('DateAwarded', ''),
                contract.get('Location', ''),
                contract.get('LengthOfContract', ''),
                contract.get('NAICS', ''),
                contract.get('PSC', '')
            ))
            contracts_inserted += 1
        except Exception as e:
            print(f"Error inserting contract {contract.get('ContractName', 'Unknown')}: {e}")
    
    # Commit changes
    conn.commit()
    
    # Verify the data was inserted
    cursor.execute('SELECT COUNT(*) FROM contracts')
    total_records = cursor.fetchone()[0]
    
    conn.close()
    
    print(f"✅ Successfully imported {contracts_inserted} contracts into database")
    print(f"📊 Total records in database: {total_records}")
    print(f"🗄️ Database location: {db_path}")
    
    return contracts_inserted

# Extract contract data from the efficient agent's mock data
mock_contracts = [
    {
        "ContractName": "Joint Enterprise Defense Infrastructure (JEDI) Cloud",
        "CompanyName": "Microsoft Corporation",
        "TotalContractValue": "$10,000,000,000",
        "DateAwarded": "2019-10-25",
        "Location": "Arlington, VA",
        "LengthOfContract": "10 years",
        "NAICS": "518210",
        "PSC": "D307"
    },
    {
        "ContractName": "Enterprise Infrastructure Solutions (EIS)",
        "CompanyName": "General Services Administration",
        "TotalContractValue": "$50,000,000,000",
        "DateAwarded": "2017-09-29",
        "Location": "Washington, DC", 
        "LengthOfContract": "15 years",
        "NAICS": "517311",
        "PSC": "D316"
    },
    {
        "ContractName": "Alliant 2 Government-wide Acquisition Contract",
        "CompanyName": "Multiple Vendors",
        "TotalContractValue": "$65,000,000,000",
        "DateAwarded": "2017-12-15",
        "Location": "Washington, DC",
        "LengthOfContract": "10 years",
        "NAICS": "541511",
        "PSC": "R425"
    },
    {
        "ContractName": "NASA SEWP V (Solutions for Enterprise-Wide Procurement)",
        "CompanyName": "Multiple Vendors",
        "TotalContractValue": "$20,000,000,000",
        "DateAwarded": "2015-04-01",
        "Location": "Greenbelt, MD",
        "LengthOfContract": "10 years",
        "NAICS": "423430",
        "PSC": "7030"
    },
    {
        "ContractName": "Department of Veterans Affairs Electronic Health Record",
        "CompanyName": "Cerner Corporation",
        "TotalContractValue": "$16,100,000,000",
        "DateAwarded": "2018-05-17",
        "Location": "Kansas City, MO",
        "LengthOfContract": "10 years",
        "NAICS": "541511",
        "PSC": "D302"
    },
    {
        "ContractName": "Air Force Enterprise IT-as-a-Service (EITaaS)",
        "CompanyName": "General Dynamics Information Technology",
        "TotalContractValue": "$7,200,000,000",
        "DateAwarded": "2020-03-26",
        "Location": "San Antonio, TX",
        "LengthOfContract": "8 years",
        "NAICS": "541511",
        "PSC": "D316"
    },
    {
        "ContractName": "GSA Multiple Award Schedule (MAS)",
        "CompanyName": "IBM Corporation",
        "TotalContractValue": "$12,000,000,000",
        "DateAwarded": "2019-07-15",
        "Location": "Armonk, NY",
        "LengthOfContract": "20 years",
        "NAICS": "541511",
        "PSC": "D307"
    },
    {
        "ContractName": "Defense Enterprise Office Solution (DEOS)",
        "CompanyName": "Dell Technologies",
        "TotalContractValue": "$7,600,000,000",
        "DateAwarded": "2020-02-07",
        "Location": "Round Rock, TX",
        "LengthOfContract": "7 years",
        "NAICS": "334111",
        "PSC": "7025"
    },
    {
        "ContractName": "Homeland Security Enterprise Network (HSEN)",
        "CompanyName": "Verizon Communications",
        "TotalContractValue": "$3,100,000,000",
        "DateAwarded": "2021-09-30",
        "Location": "New York, NY",
        "LengthOfContract": "5 years",
        "NAICS": "517311",
        "PSC": "D316"
    },
    {
        "ContractName": "Navy SeaPort Next Generation (SeaPort NxG)",
        "CompanyName": "Multiple Vendors",
        "TotalContractValue": "$5,300,000,000",
        "DateAwarded": "2018-06-12",
        "Location": "Washington, DC",
        "LengthOfContract": "10 years",
        "NAICS": "541330",
        "PSC": "R425"
    }
]

# Import the data into the database
print("🔄 Importing contract data into database...")
imported_count = import_contracts_to_database(mock_contracts, db_path=os.path.join(project_root, "artifacts/convisoft.db"))

# Verify the import by querying the database
print("\n📋 Verifying database contents:")
conn = sqlite3.connect(os.path.join(project_root, "artifacts/convisoft.db"))
cursor = conn.cursor()

cursor.execute('''
    SELECT contract_name, company_name, total_contract_value, date_awarded 
    FROM contracts 
    ORDER BY CAST(REPLACE(REPLACE(total_contract_value, '$', ''), ',', '') AS INTEGER) DESC
    LIMIT 5
''')

top_contracts = cursor.fetchall()
print("\n🏆 Top 5 Contracts by Value:")
for i, (name, company, value, date) in enumerate(top_contracts, 1):
    print(f"{i}. {name}")
    print(f"   Company: {company}")
    print(f"   Value: {value}")
    print(f"   Date: {date}")
    print()

conn.close()

🔄 Importing contract data into database...
✅ Successfully imported 10 contracts into database
📊 Total records in database: 10
🗄️ Database location: c:\Users\labadmin\Documents\repo\KMSH_contracting\artifacts/app/convisoft.db

📋 Verifying database contents:

🏆 Top 5 Contracts by Value:
1. Alliant 2 Government-wide Acquisition Contract
   Company: Multiple Vendors
   Value: $65,000,000,000
   Date: 2017-12-15

2. Enterprise Infrastructure Solutions (EIS)
   Company: General Services Administration
   Value: $50,000,000,000
   Date: 2017-09-29

3. NASA SEWP V (Solutions for Enterprise-Wide Procurement)
   Company: Multiple Vendors
   Value: $20,000,000,000
   Date: 2015-04-01

4. Department of Veterans Affairs Electronic Health Record
   Company: Cerner Corporation
   Value: $16,100,000,000
   Date: 2018-05-17

5. GSA Multiple Award Schedule (MAS)
   Company: IBM Corporation
   Value: $12,000,000,000
   Date: 2019-07-15



In [8]:
# Alternative: Import data from agent response or JSON file

def import_from_agent_response(agent_response_str, db_path="contracts.db"):
    """
    Extract and import contract data from agent response string
    """
    import re
    
    # Try to extract JSON from agent response (between <code> tags)
    code_pattern = r'<code>(.*?)</code>'
    json_match = re.search(code_pattern, agent_response_str, re.DOTALL)
    
    if json_match:
        json_str = json_match.group(1).strip()
        try:
            contracts = json.loads(json_str)
            return import_contracts_to_database(contracts, db_path)
        except json.JSONDecodeError as e:
            print(f"Error parsing JSON from agent response: {e}")
            return 0
    else:
        print("No JSON found in agent response")
        return 0

def import_from_json_file(json_file_path, db_path="contracts.db"):
    """
    Import contract data from a JSON file
    """
    try:
        with open(json_file_path, 'r') as f:
            contracts = json.load(f)
        return import_contracts_to_database(contracts, db_path)
    except FileNotFoundError:
        print(f"File not found: {json_file_path}")
        return 0
    except json.JSONDecodeError as e:
        print(f"Error parsing JSON file: {e}")
        return 0

# Example: Import from the cleaned artifact if it exists
artifact_path = os.path.join(project_root, "app", "contract_search_cleaned.json")
if os.path.exists(artifact_path):
    print(f"📁 Found artifact file: {artifact_path}")
    print("🔄 Importing from saved artifact...")
    imported = import_from_json_file(artifact_path, db_path=os.path.join(project_root, "convisoft.db"))
    if imported > 0:
        print(f"✅ Successfully imported {imported} contracts from artifact file")
else:
    print("📁 No artifact file found - using mock data instead")

# You can also import directly from an agent response if you have one
# if 'response' in globals() and response:
#     print("🔄 Importing from agent response...")
#     imported = import_from_agent_response(str(response), db_path=os.path.join(project_root, "contracts.db"))
#     print(f"✅ Imported {imported} contracts from agent response")

📁 No artifact file found - using mock data instead
