# **Insurance Claim Processing After a Natural Disaster**

**Imagine a major storm hits a city, causing extensive property damage. Insurance companies are flooded with thousands of claims — customers send photos, videos, documents, and voice messages describing damages**

**Processing these manually is slow, expensive, and frustrating for customers.**


# Process Stages - Functional Details

1. **Input Gathering** 
    * Accepts user-uploaded images of damage, voice messages, videos, and text descriptions.
  
2. **Claim Assessment**
    * Check policy coverage.
    * Estimate damage cost (using historical data and current prices).
    * Flag anomalies or suspicious activity

3. **Communication**
      * Drafts emails or messages to the customer
          * Acknowledges the claim.
          * Explains the findings.
          * Requests additional info if needed.
          * Gives estimated payout timeline.
4. **Human-in-the-loop**
      * Routes edge cases to human adjusters when:
          * Damage can't be confidently assessed
          * Policy terms are complex.
          * Sentiment analysis detects potential dissatisfaction.
   

# Process Stages - Technical Details

1. **Input Gathering**
    * Inputs:
        * Images
        * Videos
        * Voice notes
        * Text descriptions / PDFs
    * Responsibilities:
        * transcribe audio
        * extract frames
        * analyze image
        * summarize policy text
2. **Claim Assessment**
    * Inputs:
        * Unified data from Input Agent
    * Responsibilities:
        * estimate_damage_cost(info_dict): LLM call with damage estimation logic
        * validate_policy_coverage(info_dict): LLM or rule-based match with summary
        * fraud_check(info_dict): LLM prompt or classifier
3. **Communication**
    * Inputs:
        * Output from Claim Assessment + raw data    
    * Responsibilities:
        * Generate human-readable response (LLM-generated message)
        * Generate email or chatbot-style response
        * Optionally notify adjuster via webhook/email
4. **Human-in-the-loop**
    * Trigger Conditions:  
        * Ambiguity in image or audio
        * Complex/unclear coverage
        * High fraud risk
        * Low model confidence (optional)
    * Responsibilities:
        * Route flagged cases to human dashboard
        * Send internal notification
        * Log reason for escalation


# Import Libraries

In [7]:
!pip install -qU google-generativeai

# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
import sqlite3
import cv2
import os
import base64
import google.generativeai as genai
from PIL import Image
from typing import List
from pathlib import Path
from google.generativeai import GenerativeModel, configure
from kaggle_secrets import UserSecretsClient
from google.genai import types


# Kaggle mandatory code block

In [8]:

for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

/kaggle/input/claim-sample/file_example_MP3_700KB.mp3
/kaggle/input/claim-sample/house_damage.jpg
/kaggle/input/claim-sample/sample-10s.mp4


# Use key and Invoke models

In [9]:
# Get API key
GOOGLE_API_KEY = UserSecretsClient().get_secret("GOOGLE_API_KEY") 
# Configure the Google Generative AI
genai.configure(api_key=GOOGLE_API_KEY)


# Initialize the models
model = genai.GenerativeModel(model_name='gemini-2.0-flash')
#GenerativeModel(model_name='gemini-1.5-flash-001')


# Common functions

# extract each frame from video submitted by customer functin

In [10]:
def extract_video_frames(video_path: str, interval_sec: int = 3, out_dir="frames") -> List[str]:
    Path(out_dir).mkdir(parents=True, exist_ok=True)
    vidcap = cv2.VideoCapture(video_path)
    fps = vidcap.get(cv2.CAP_PROP_FPS)
    interval = int(fps * interval_sec)
    frame_paths = []
    count, saved = 0, 0

    while vidcap.isOpened():
        success, frame = vidcap.read()
        if not success:
            break
        if count % interval == 0:
            path = f"{out_dir}/frame_{saved}.jpg"
            cv2.imwrite(path, frame)
            frame_paths.append(path)
            saved += 1
        count += 1
    vidcap.release()
    return frame_paths

# Transcribe audio and summarize message from audio inputs

In [11]:
def transcribe_audio_with_gemini(audio_path: str) -> str:
    with open(audio_path, "rb") as audio_file:
        audio_data = audio_file.read()
    prompt = "Please transcribe and summarize this audio message related to an insurance claim."

    response = model.generate_content([
        prompt,
        {"mime_type": "audio/mpeg", "data": audio_data}
    ])
    return response.text

# Image , Audio ,Vidoe and Text porcessing using gemini models 

In [12]:
def analyze_with_gemini(input_data, input_type):
    if input_type == 'image':
        image = Image.open(input_data)
        prompt = f"Analyze the attached image of damage."
        response = model.generate_content(            
            contents=[
                prompt,
                image
            ]
        )
        return response.text
    elif input_type == 'audio':
        with open(input_data, "rb") as audio:
            audio_bytes = audio.read()
        prompt = f"Transcribe and summarize the following audio message."
        response = model.generate_content(            
            contents=[
                prompt,
                types.Part.from_bytes(data=audio_bytes, mime_type='audio/mpeg')
            ]
        )
        return response.text
    elif input_type == 'video':
        with open(input_data, "rb") as video:
            #video_bytes = video.read()
            img = Image.open(input_data)
        prompt = f"Analyze the following video for damage assessment."
        response = model.generate_content(            
            contents=[
                prompt,
                
            ]
        )
        return response.text
    elif input_type == 'text':
        prompt = f"Analyze the following text description for damage assessment: {input_data}"
        response = model.generate_content(            
            contents=[prompt]
        )
        return response.text


# Build claim context : Read different inputs and provide feed to the models to understand the user inputs

In [13]:
def build_claim_context(inputs):
    context = {}
    context['audio_transcript'] = transcribe_audio_with_gemini(inputs['audio_path'])
    context['image_analyses'] = [analyze_with_gemini(img, 'image') for img in inputs['image_paths']]
    context['video_frame_summaries'] = [analyze_with_gemini(f, 'video') for f in extract_video_frames(inputs['video_path'])]
    context['text_description_analysis'] = analyze_with_gemini(inputs['text_description'], 'text')
    return context

# Based on context get claim summary for the given inputs

In [14]:
def generate_claim_summary_with_gemini(context: dict) -> str:
    prompt = f"""
    You are an AI insurance analyst. Using the following data:

    Audio Transcript:
    {context.get('audio_transcript', 'N/A')}

    Image Analyses:
    {"".join(context.get('image_analyses', []))}

    Video Frame Summaries:
    {"".join(context.get('video_frame_summaries', []))}

    Text Description Analysis:
    {context.get('text_description_analysis', 'N/A')}

    Based on all of the above, please:
    - Summarize the type of damage
    - Assess the severity
    - Recommend next steps
    - Estimate the potential repair cost in INR
    """

    response = model.generate_content(prompt)
    return response.text


# Save the claim summary into the data base

In [15]:
def save_to_db(claim_summary, context, status="Pending"):
    conn = sqlite3.connect("claims.db")
    c = conn.cursor()
    c.execute('''
    CREATE TABLE IF NOT EXISTS claims (
        id INTEGER PRIMARY KEY AUTOINCREMENT,
        summary TEXT,
        audio TEXT,
        images TEXT,
        video TEXT,
        customer_input TEXT,
        status TEXT
    )
    ''')
    c.execute('''
    INSERT INTO claims (summary, audio, images, video, customer_input, status)
    VALUES (?, ?, ?, ?, ?, ?)
    ''', (
        claim_summary,
        context['audio_transcript'],
        "\n".join(context['image_analyses']),
        "\n".join(context['video_frame_summaries']),
        context['text_description_analysis'],
        status
    ))
    conn.commit()
    conn.close()

# Validate inputs provided by the end customer

In [16]:
def validate_inputs(inputs):
    for path in inputs.get("image_paths", []):
        if not Path(path).exists():
            raise FileNotFoundError(f"Image not found: {path}")
    if not Path(inputs["video_path"]).exists():
        raise FileNotFoundError("Video not found.")
    if not Path(inputs["audio_path"]).exists():
        raise FileNotFoundError("Audio not found.")

# Process claim based on inputs

In [17]:
def process_claim(inputs):
    try:
        context = build_claim_context(inputs)
        summary = generate_claim_summary_with_gemini(context)
        save_to_db(summary, context)
        print("Claim processed successfully.")
    except Exception as e:
        print(f"An error occurred: {e}")



# calling process_claim function by providing with sample inputs

In [18]:
inputs = {
            "text_description": "The garage roof has collapsed and rainwater damaged furniture.",
            "image_paths": ["/kaggle/input/claim-sample/house_damage.jpg"],
            "video_path": "/kaggle/input/claim-sample/sample-10s.mp4",
             "audio_path": "/kaggle/input/claim-sample/file_example_MP3_700KB.mp3"
         }
# Run the claim processing pipeline
validate_inputs(inputs)
process_claim(inputs)

Claim processed successfully.


# Get the details of the claim

In [19]:
def get_all_claims(db_path="claims.db"):
    conn = sqlite3.connect(db_path)
    c = conn.cursor()
    c.execute('SELECT id, summary,status FROM claims')
    rows = c.fetchall()
    conn.close()

    claims = []
    for row in rows:
        claim_id, summary,status = row
        
        claims.append({
            "id": claim_id,
            "summary": summary,            
            "status": status
        })
    return claims

# Display the claim details by reading from database

In [20]:
claims = get_all_claims()
for claim in claims:
    print(f" Claim ID: {claim['id']}")
    print(f" Summary: {claim['summary']}")
    print(f" Status: {claim['status']}")    

 Claim ID: 1
 Summary: Okay, based on the provided data, here's a summary, severity assessment, recommended next steps, and estimated repair costs (in INR):

**Summary of Damage:**

A building (likely a house or apartment building) has sustained severe structural damage, evidenced by a partial roof and wall collapse.  The garage specifically mentioned suffered a roof collapse leading to rainwater damage to furniture stored inside. Possible causes include earthquakes, explosions, severe weather (tornadoes, hurricanes), landslides/mudslides, structural failure, or possibly a failed demolition. The building is currently unsafe.

**Severity Assessment:**

The damage is classified as **severe to catastrophic**.

*   **Structural Collapse:** This poses the most immediate risk and cost. The collapse indicates fundamental problems with the building's integrity.
*   **Water Damage:** Following the collapse, water damage complicates the repair and remediation process. It introduces risks of mold