## Section 1: Introduction & Problem Statement

### 1.1 The Challenge: Inefficiencies and Errors in 911 Call Processing

Emergency response relies on the swift and accurate transmission of critical information from 911 callers to dispatchers. Currently, a significant portion of this information is captured through manual data entry by call takers. This process is inherently **slow and prone to human error**. Even minor inaccuracies or delays in capturing details like location, nature of the emergency, and number of people involved can have **critical consequences**, potentially delaying life-saving assistance. The manual nature of this task also places a substantial cognitive load on call takers, especially during high-volume incidents. This can further exacerbate errors and reduce overall efficiency.

Specifically, the limitations of manual data entry include:

*   **Time Consumption:** Manually typing information takes valuable seconds, delaying the dispatch of emergency services.
*   **Error Rate:** Human error is inevitable, leading to inaccurate data that can misdirect responders.
*   **Scalability Issues:**  During peak demand, manual processing becomes a bottleneck, impacting response times.
*   **Inconsistent Data:**  Different call takers may interpret and record information differently, leading to data inconsistencies.

### 1.2 Introducing the AI-Powered Solution: Automated Information Extraction

To address these critical challenges, I have developed an AI assistant designed to **automatically extract key information from 911 call transcripts**. This solution leverages the power of Generative AI – specifically, Large Language Models (LLMs) – to process spoken language and identify crucial details.

**Our AI assistant performs the following key functions:**

*   **Automatic Speech Recognition (ASR):** Transcribes the audio from 911 calls into text. (Note: For this demonstration, I will use pre-existing transcriptions to focus on the AI’s extraction capabilities).

*   **Structured Data Output:**  Presents the extracted information in a standardized, machine-readable format (JSON), enabling seamless integration with dispatch systems and other downstream applications.

**By automating this process, our solution aims to:**

*   **Reduce Response Times:**  Deliver critical information to dispatchers more quickly.
*   **Improve Data Accuracy:** Minimize errors associated with manual data entry.
*   **Enhance Efficiency:**  Free up call takers to focus on critical communication and support.
*   **Enable Data-Driven Insights:**  Provide valuable data for analysis and improvement of emergency response systems.

This notebook demonstrates the core functionality of our AI assistant, focusing on the extraction and structuring of information from call transcripts. I believe this technology has the potential to significantly improve the effectiveness and efficiency of emergency response services.

## System Overview
![System Overview](assets/swift911.jpg)

The following diagram illustrates the overall approach of our AI-powered emergency response system.  Audio from the caller is first transcribed into text using Speech-to-Text (STT) technology. This text is then processed by the "AI Observer," which extracts key information and structures it into predefined fields.  This structured data is then available for use by the 911 operator and can be integrated with other emergency response systems.  
  

## Section 2: Data Simulation

### 2.1 Focused Demonstration: Fire Emergency

To keep this notebook concise and focused, I will demonstrate the AI assistant’s functionality using a single incident type: a **Fire Emergency**. While the core AI logic is applicable to other scenarios, focusing on one allows for a streamlined presentation.  Additional incident types and transcripts are available in the project repository.


### 2.2 Sample Chat


In [1]:
fire_chat = [
    ("911 Operator","This is 911, emergency center. How may I assist you?"),
    ("Caller","Hello, we have a fire at the main warehouse!"),
    ("911 Operator","Okay, stay calm. Can you tell me the exact location of the fire?"),
    ("Caller","It's at 123 Industrial Drive, near the loading docks."),
    ("911 Operator","And what is the wind direction currently?"),
    ("Caller","It's blowing from south to west, pretty strong."),
    ("911 Operator","Are there any injuries reported?"),
    ("Caller","I think so, I saw one worker running out coughing."),
    ("911 Operator","Can you estimate how many?"),
    ("Caller","Just one that I saw."),
    ("911 Operator","What time did the fire start approximately?"),
    ("Caller","Maybe around 10:30 AM."),
    ("911 Operator","Can you describe what caused the fire, or what you see happening?"),
    ("Caller","It started near some stacked cardboard boxes, it's spreading quickly."),
    ("911 Operator","What is your name, please?"),
    ("Caller","My name is John Smith."),
    ("911 Operator","And your badge number?"),
    ("Caller","It's 789012."),
    ("911 Operator","Okay, John Smith, badge 789012.  So, we have a fire at 123 Industrial Drive, wind from south to west, one reported injury, started around 10:30 AM, near cardboard boxes. Is that all correct?"),
    ("Caller","Yes, that's correct."),
    ("911 Operator", "Thank you.  Fire services have been dispatched.  Please evacuate the area and stay safe.")
]

## Section 3: Data Schemas and Pydantic Models

This section defines the data schemas used to represent incident information, using Pydantic to create clear and structured models. These models will be used to structure the information extracted from the incident transcript and ensure that the data is consistent and valid. The `dispatch_required_fields` attribute in the `FireIncident` model highlights the essential information needed for dispatching emergency services. 

**Notes:** 
* The `Incident` base model acts as a fallback. If the AI is unable to determine a more specific incident type (e.g., Fire, Medical Assistance), it will default to the base `Incident` model to ensure a valid data structure. 
* The `MEDICAL_ASSISTANCE` and `THEFT` models can be seen in the repo


In [2]:
from enum import Enum
from typing import Type
from pydantic import BaseModel, Field



# Represents the type of incident
class IncidentType(str, Enum):
    FIRE = "FIRE"
    MEDICAL_ASSISTANCE = "MEDICAL_ASSISTANCE"
    THEFT = "THEFT"
    GENERAL = "GENERAL"
# Represents the wind direction range
class WindDirectionType(str, Enum):
    FROM_SOUTH_TO_WEST = "from South to West"
    FROM_WEST_TO_SOUTH = "from West to South"
    FROM_SOUTH_TO_NORTH = "from South to North"
    FROM_NORTH_TO_SOUTH = "from North to South"
    FROM_EAST_TO_WEST = "from East to West"
    FROM_WEST_TO_EAST = "from West to East"
    FROM_NORTH_EAST = "from North to East"
    FROM_EAST_TO_NORTH = "from East to North"

    
# Base Incident Model
class Incident(BaseModel):
    incident_type: IncidentType | None = Field(..., description="Type of incident")
    caller_name: str | None = Field(..., description="Name of the caller reporting the incident")
    caller_badge: int | None = Field(..., description="Badge number of the caller")



# A specialized model inheriting from Incident and adding attributes specific to fire incidents
class FireIncident(Incident):
    location: str | None = Field(..., description="Location where the incident occurred")
    wind_direction: WindDirectionType | None = Field(
        ...,
        description="Wind direction range (e.g., 'from south to west')"
    )
    injury_count: int |None  = Field(..., description="Number of people injured")
    timestamp: str | None = Field(..., description="Time of the incident as XX:XX AM or XX:XX PM")
    description: str | None = Field(..., description="Detailed description of the incident")

    def is_dispatch_ready(self) -> bool:
        required_fields = {
            'location': self.location,
            'wind_direction': self.wind_direction
        }
        return all([v is not None for v in required_fields.values()])






## Section 4: Structured Generation with Gemini API

This section contains the implementation of structured generation using the Google Gemini API. I’ve chosen a similar design choice to (outlines) [https://github.com/dottxt-ai/outlines] to facilitate easier transition and potential local deployment. I also created an OpenAI API generator which is compatible with OLLAMA in the repo[link].


The structured generation pipeline consists of two functions:

*   **`create_gemini_generator(model_name, model_class)`:** This function creates a generator function tailored to a specific Gemini model and Pydantic model class. It encapsulates the API client and configuration, allowing for flexible model selection.
*   **`generator(prompt, max_tokens, temperature)`:** This function takes a prompt (the incident transcript), and generates a structured output based on the provided Pydantic model. It utilizes the Gemini API to generate content, enforcing the desired output format through the `response_schema` parameter.


In [3]:
from google import genai
import os
from dotenv import load_dotenv
load_dotenv()
INCIDENT_TYPES_MAP = {
    "FIRE": FireIncident,
    "GENERAL":  Incident  # Fallback 
}

def create_gemini_generator(model_name: str, model_class: Type[BaseModel]) -> callable:
    """
    Creates a generator function that uses the Gemini API to generate structured output
    based on a Pydantic model.

    Args:
        model_name: The name of the Gemini model to use.
        model_class: The Pydantic model class defining the desired output structure.

    Returns:
        A generator function that takes a prompt and returns an instance of the model class.
    """

    def generator(prompt: str, max_tokens: int = 100, temperature: float = 0.0):
        """
        Generates structured output using the Gemini API.

        Args:
            prompt: The prompt to send to the Gemini model.
            max_tokens: The maximum number of tokens to generate (not directly used in this implementation).
            temperature: The temperature to use for generation (not directly used in this implementation).

        Returns:
            An instance of the model class, or None if parsing fails.
        """
        try:
            client = genai.Client(api_key=os.environ['GOOGLE_API_KEY'])
            response = client.models.generate_content(
                model=model_name,
                contents=prompt,
                config={
                    'system_instruction':f"You must respond with a valid JSON object that matches this Pydantic model structure: {model_class.schema_json()}",
                    'response_mime_type': 'application/json',
                    'response_schema': model_class,
                    'max_output_tokens':max_tokens,
                    'temperature':temperature
                },
            )


            # Gemini may return an empty response or have issues parsing
            if response:
                return response.parsed
            else:
                raise ValueError(f"Failed to parse response: {response.text}")

        except Exception as e:
            print(f"Error generating or parsing response: {e}")
            return None

    return generator

## Section 5: Orchestration with the Observer Class

The `Observer` class acts as the central controller for the information extraction pipeline. It takes a 911 call transcript and processes it to produce structured data ready for dispatch.

**Here's how it works:**

1. **Transcript Preparation:** Formats the raw messages into a single text with speaker identification.
2. **Feature Extraction:** Uses a Gemini LLM to extract key information from the transcript and structure it into a Pydantic model.
3. **Incident Type Determination:** Dynamically selects the most appropriate Pydantic model (e.g., `FireIncident`, `MedicalAssistance`) based on the transcript's content, starting with a general `Incident` model.
4. **Dispatch Check:** Verifies that all crucial information for dispatching emergency services is present.
5. **State Management:** Stores the extracted and refined information in the `current_incident` object.

**Key Methods:**

*   **`__init__()`:** Initializes the class with pre-configured Gemini generators.
*   **`extract_features()`:**  The main method – prepares the transcript, calls the LLM, updates the `current_incident`, and checks dispatch readiness.

The system prioritizes flexibility by dynamically adjusting the data structure based on the incident type and ensuring all necessary information is available before dispatch.


In [4]:

from langchain_core.messages import BaseMessage
from typing import Dict, Any,List

OBSERVER_MODEL_NAME ="gemini-2.0-flash"

OBSERVER_SYSTEM_PROMPT ="""
Extract the fields from the emergency call transcript:

Important:
- Only fill fields if they are explicitly mentioned else put null
- Be precise with the selection


Example Input:
911 Operator: 911, what's your emergency?

Caller: My name is Sarah Jennings. My house is on fire! Please help us!

911 Operator: Okay, Sarah, stay calm. Can you give me your address, please?

Example Output:
{{
"incident_type": "Fire",
"caller_name": "Sarah Jennings,
}}

Now extract from this transcript:
{}
"""



class Observer:
    def __init__(self):
        self.incident_generators ={k:create_gemini_generator(OBSERVER_MODEL_NAME,v) for k,v in INCIDENT_TYPES_MAP.items()}
        self.current_incident : Incident | None  = None
        self.is_dispatch_ready = False

    def _perpare_transcript(self, messages:List[BaseMessage|tuple | str]):
        """
        Prepares a coherent transcript string from a list of messages.

        This method takes a list of messages (representing the 911 call transcript)
        and formats it into a single string, identifying the speaker (Caller or 911 Operator)
        for each line.

        Args:
            messages (List[BaseMessage | str]): A list of messages representing the call transcript.
                Each message can be either a Langchain BaseMessage object (with 'type' and 'content' attributes)
                or a simple string representing a line of dialogue.

        Returns:
            str: A coherent transcript string with speaker identification.

        Raises:
            TypeError: If the input is not a list.

        Example:
            >>> messages = [
            ...     {"type": "human", "content": "Hello, 911, what's your emergency?"},
            ...     {"type": "ai", "content": "My house is on fire!"},
            ...     "Please send help!"
            ... ]
            >>> observer._prepare_transcript(messages)
            'Caller: Hello, 911, what\'s your emergency?\n911 Operator: My house is on fire!\nCaller: Please send help!'
        """

    
        transcript_lines = []
        for i, message in enumerate(messages):
            if hasattr(message, 'type') and hasattr(message, 'content'):
                if message.type in ['human', 'ai']:
                    speaker = 'Caller' if message.type == 'human' else '911 Operator'
                    transcript_lines.append(f"{speaker}: {message.content}")
            elif isinstance(message, tuple):
                speaker, message = message
                transcript_lines.append(f"{speaker}: {message}")
            elif isinstance(message, str):
                speaker = '911 Operator' if i % 2 == 0 else 'Caller'
                transcript_lines.append(f"{speaker}: {message}")
        return '\n'.join(transcript_lines)
    

    def extract_features(self, messages: List[BaseMessage| tuple | str]) -> tuple[Any, bool]:
        """
        Extracts structured information from a 911 call transcript.

        This method orchestrates the entire information extraction pipeline, including:
        - Preparing the transcript.
        - Calling the Gemini LLM generator.
        - Updating the current incident object.
        - Checking for dispatch readiness.

        Args:
            messages (List[BaseMessage | str]): A list of messages representing the call transcript.

        Returns:
            tuple[Any, bool]: A tuple containing:
                - The updated incident object (containing the extracted information).
                - A boolean indicating whether the incident is ready for dispatch.

        Raises:
            Exception: If any error occurs during the extraction process.

        Example:
            >>> messages = [
            ...     {"type": "human", "content": "There's a fire at 123 Main Street!"},
            ...     {"type": "ai", "content": "What is your name?"},
            ...     {"type": "human", "content": "John Doe"}
            ... ]
            >>> incident, is_ready = observer.extract_features(messages)
            >>> print(incident.location)
            '123 Main Street'
            >>> print(is_ready)
            True
        """
        try:
            transcript = self._perpare_transcript(messages)

            # Extract two times if it's first time to pick the right pydantic model which is not needed 
            # unlees we provided the whole transcpit once. However, it could be more efficient.
            if self.current_incident is None: 
                self.current_incident =Incident(incident_type='GENERAL', caller_name=None, caller_badge=None)
                self.extract_features(messages) 


            
            current_incident_type = self.current_incident.incident_type.value

            generator = self.incident_generators.get(
                current_incident_type,
                self.incident_generators["GENERAL"]  # Fallback to general incident
            )
            
            # Generate structured information using pre-compiled generator
            extracted_info = generator(
                OBSERVER_SYSTEM_PROMPT.format(transcript),
                max_tokens=1000,
                temperature=0.0,
            )
            

            # Check if the incident type changed
            if current_incident_type!=extracted_info.incident_type:
                incident_class =INCIDENT_TYPES_MAP[extracted_info.incident_type]
                args = {key:None for key in incident_class.model_json_schema()['properties'].keys()} # since the feild required make all args None, and we will fill it later
                self.current_incident=incident_class(**args)
            
            # Update current incident with new information since The LLM somtimes writes null instead of None so we handle it here
            for key, value in extracted_info:
                if value is not None and hasattr(self.current_incident, key):
                    if value =='null': 
                        value=None
                    setattr(self.current_incident, key, value)

            # Check if the incident requires dispatch and dispatch readiness
            if extracted_info and hasattr(extracted_info, 'is_dispatch_ready'):
                self.is_dispatch_ready = extracted_info.is_dispatch_ready()

            return self.current_incident, self.is_dispatch_ready
        
        except Exception as e:
            print(f"Error extracting features: {e}")
            return Incident(incident_type='GENERAL', caller_name=None, caller_badge=None), self.is_dispatch_ready


In [5]:
# Test the functions
observer =Observer()
transcript = observer._perpare_transcript(fire_chat)
print(transcript)

911 Operator: This is 911, emergency center. How may I assist you?
Caller: Hello, we have a fire at the main warehouse!
911 Operator: Okay, stay calm. Can you tell me the exact location of the fire?
Caller: It's at 123 Industrial Drive, near the loading docks.
911 Operator: And what is the wind direction currently?
Caller: It's blowing from south to west, pretty strong.
911 Operator: Are there any injuries reported?
Caller: I think so, I saw one worker running out coughing.
911 Operator: Can you estimate how many?
Caller: Just one that I saw.
911 Operator: What time did the fire start approximately?
Caller: Maybe around 10:30 AM.
911 Operator: Can you describe what caused the fire, or what you see happening?
Caller: It started near some stacked cardboard boxes, it's spreading quickly.
911 Operator: What is your name, please?
Caller: My name is John Smith.
911 Operator: And your badge number?
Caller: It's 789012.
911 Operator: Okay, John Smith, badge 789012.  So, we have a fire at 123 I

In [6]:
resp = observer.extract_features(fire_chat)
resp

(FireIncident(incident_type=<IncidentType.FIRE: 'FIRE'>, caller_name='John Smith', caller_badge=789012, location='123 Industrial Drive', wind_direction=<WindDirectionType.FROM_SOUTH_TO_WEST: 'from South to West'>, injury_count=1, timestamp='10:30 AM', description="It started near some stacked cardboard boxes, it's spreading quickly."),
 True)

## Section 6 : Simulation 

In [None]:
import ipywidgets as widgets
from IPython.display import display

observer = Observer()

def simulate_chat(chat):
    # Output widget to display messages
    output = widgets.Output()

    # Button to go to next message
    next_button = widgets.Button(description="Next")

    # Message index tracker
    index = {'i': 0}

    def display_next_message(b):
        with output:
            if index['i'] < len(chat):
                role, text = chat[index['i']]
                color = "#DCF8C6" if role == "911 Operator" else "#E1F5FE"
                align = "left" if role == "911 Operator" else "right"

                if (index['i'] + 1) % 2 == 0 and index['i'] > 1:
                    incident_data = observer.extract_features(chat[:index['i'] + 1])[0]
                    observer_text = ""
                    for key, value in incident_data.dict().items():
                        if value is not None:
                            observer_text += f'<span style="background-color:#90EE90; padding:2px;"><b>{key}:</b> {value}</span><br>'
                        else:
                            observer_text += f'<b>{key}:</b> Not Available<br>'

                    observer_role = "Observer"
                    observer_color = "#AA4A44"
                    observer_align = "center"

                    html = f'''
                                <div style="background:{color}; padding:8px; margin:5px;
                                            border-radius:10px; max-width:70%; float:{align}; clear:both;">
                                    <b>{role}:</b> {text}
                                </div>
                                <div style="background:{observer_color}; padding:8px; margin:5px;
                                            border-radius:10px; max-width:70%; float:{observer_align}; clear:both;">
                                    <b>{observer_role}:\n</b> {observer_text}
                                </div>
                                '''

                else:
                    html = f'''
                                <div style="background:{color}; padding:8px; margin:5px;
                                            border-radius:10px; max-width:70%; float:{align}; clear:both;">
                                    <b>{role}:</b> {text}
                                </div>
                                '''

                display(widgets.HTML(html))

                index['i'] += 1
            else:
                next_button.disabled = True
                display(widgets.HTML("<b>End of conversation ✅</b>"))

    # Bind the button
    next_button.on_click(display_next_message)

    # Initial layout
    display(widgets.VBox([output, next_button]))

simulate_chat(fire_chat)


VBox(children=(Output(), Button(description='Next', style=ButtonStyle())))

## Conclusion

This notebook demonstrated a functional AI assistant capable of automatically extracting key information from 911 call transcripts. I’ve created a system that has the potential to significantly reduce response times, improve data accuracy, and enhance the efficiency of emergency response services.  The interactive demonstration highlighted how the system can process conversational data and present structured information in a clear and concise manner.

While this notebook focused on a fire emergency scenario, the underlying principles and architecture are readily adaptable to a wide range of incident types.

## Future Opportunities

Beyond automated information extraction, this AI assistant has the potential to significantly expand its capabilities to further improve emergency response effectiveness. Future enhancements could include:

*   **Real-time Translation:**  Providing instant translation of calls from non-English speakers, ensuring clear communication and accurate information gathering.
*   **Operator Guidance:**  Providing real-time instructions and prompts to call takers, guiding them through critical questioning and ensuring all necessary information is obtained. This includes suggesting follow-up questions based on the caller's responses.
*   **And more ...**


## Technical Considerations & Future Optimizations
*   **Asynchronous & Streaming Operations:** Implementing asynchronous programming and streaming data transfer could further optimize performance and responsiveness.
*   **Caching Mechanisms:** Utilizing caching strategies could reduce API costs and improve response times.

*   **Evaluation Metrics:** Defining and tracking key evaluation metrics (e.g., accuracy of information extraction, response time) would enable continuous improvement and optimization of the system.

These enhancements demonstrate the potential of this technology to transform emergency response systems and improve outcomes for both callers and responders. I believe this work represents a significant step towards creating a more efficient, accurate, and reliable emergency response infrastructure.
  