# 🧰 Week 07-08 · Notebook 13 · Custom Tools & Multi-Tool Agents

Extend agents with async, concurrent tools and robust error handling tailored to manufacturing ops.

## 🎯 Learning Objectives
- Build custom LangChain tools with validation and retries.
- Handle async tool execution for parallel data pulls.
- Log tool performance metrics (latency, success rate).
- Capture incidents when tools fail to maintain compliance.

## 🧩 Scenario
Agent needs to pull sensor readings and maintenance logs simultaneously. Tools must degrade gracefully if one data source is offline.

In [None]:
import asyncio
import random
import time
from langchain.tools import BaseTool
from pydantic import Field, BaseModel
from typing import Type

# --- 1. Define Tool with Pydantic for structured inputs ---
class SensorInput(BaseModel):
    equipment_id: str = Field(description="The unique identifier for the equipment, e.g., 'Press-14'.")

class SensorTool(BaseTool):
    name = "SensorReader"
    description = "Reads the latest sensor data (vibration, temperature) for a specific piece of equipment."
    args_schema: Type[BaseModel] = SensorInput

    def _run(self, equipment_id: str) -> dict:
        """Synchronous execution of the tool."""
        print(f"Reading sensor data for {equipment_id}...")
        # Simulate API call
        time.sleep(0.5)
        return {
            "vibration_mm_s": round(random.uniform(5.0, 15.0), 2),
            "temperature_c": round(random.uniform(60.0, 95.0), 2)
        }

    async def _arun(self, equipment_id: str) -> dict:
        """Asynchronous execution for parallel runs."""
        print(f"Asynchronously reading sensor data for {equipment_id}...")
        # Simulate async API call
        await asyncio.sleep(random.uniform(0.5, 1.5))
        if random.random() < 0.1: # Simulate a 10% failure rate
            raise ConnectionError(f"Failed to connect to sensor API for {equipment_id}")
        return {
            "vibration_mm_s": round(random.uniform(5.0, 15.0), 2),
            "temperature_c": round(random.uniform(60.0, 95.0), 2)
        }

# --- 2. Define another async tool for maintenance logs ---
class MaintenanceLogInput(BaseModel):
    ticket_id: str = Field(description="The ID of the maintenance ticket, e.g., 'TICKET-552'.")

class MaintenanceLogTool(BaseTool):
    name = "MaintenanceLogFetcher"
    description = "Fetches details from a maintenance log ticket."
    args_schema: Type[BaseModel] = MaintenanceLogInput

    def _run(self, ticket_id: str) -> str:
        raise NotImplementedError("This tool only supports async execution.")

    async def _arun(self, ticket_id: str) -> str:
        """Asynchronously fetches maintenance log details."""
        print(f"Asynchronously fetching log for {ticket_id}...")
        await asyncio.sleep(random.uniform(0.2, 0.8))
        return f"Log for {ticket_id}: Bearing was replaced by Technician [REDACTED]. Root cause was insufficient lubrication."

# Instantiate the tools
sensor_tool = SensorTool()
log_tool = MaintenanceLogTool()

### 🧵 Concurrent Tool Execution
Execute tools in parallel using `asyncio.gather` and capture failures.

In [None]:
async def gather_data(equipment_id: str, ticket_id: str):
    """
    Gathers sensor data and maintenance logs concurrently and handles failures gracefully.
    """
    start_time = time.time()
    
    # Use asyncio.gather to run tools in parallel
    # `return_exceptions=True` prevents one failed tool from stopping the others.
    results = await asyncio.gather(
        sensor_tool._arun(equipment_id=equipment_id),
        log_tool._arun(ticket_id=ticket_id),
        return_exceptions=True
    )
    
    latency_ms = (time.time() - start_time) * 1000
    
    # Process results, checking for exceptions
    sensor_data = results[0] if not isinstance(results[0], Exception) else {"error": str(results[0])}
    log_data = results[1] if not isinstance(results[1], Exception) else {"error": str(results[1])}
    
    print(f"\n--- Concurrent Data Gathering Complete ({latency_ms:.0f} ms) ---")
    print(f"Sensor Data: {sensor_data}")
    print(f"Log Data: {log_data}")
    
    return sensor_data, log_data

# --- Run the concurrent gathering ---
# In a Jupyter environment, you can run top-level await
async def main():
    await gather_data(equipment_id="Press-14", ticket_id="TICKET-552")
    print("\n--- Simulating a failure ---")
    # This will sometimes fail due to the simulated error in SensorTool
    await gather_data(equipment_id="CNC-03", ticket_id="TICKET-553")

# To run in a .py file, you would use:
# asyncio.run(main())
# For notebooks, we can just await the main function
await main()

## 🛡️ Error Handling Policy
- Retry transient errors up to 2 times with exponential backoff.
- Record failure in governance log with timestamp, tool name, exception detail.
- Notify reliability engineer if tool failure persists > 15 minutes.

## 🧪 Lab Assignment
1. Add retry logic with `tenacity` or custom backoff.
2. Implement latency tracking and push metrics to Prometheus.
3. Integrate tools into the agent from Notebook 12 and verify concurrency performance.
4. Draft incident response checklist for tool failures.

## ✅ Checklist
- [ ] Async tools created
- [ ] Concurrency tested
- [ ] Error policy documented
- [ ] Lab tasks complete

## 📚 References
- LangChain Tool Development
- Tenacity Retry Library
- Week 09 Monitoring Notebook