
# üîç From Manual Reviews to AI-Powered Claims Summaries & Fraud Detection

---

The financial services industry faces mounting pressure to improve claims processing efficiency while combating fraud. Manual claim summarization processes lead to inconsistent documentation, delayed fraud detection, and inefficient archival of suspicious cases‚Äîresulting in operational bottlenecks and increased financial exposure.

<div style="background: #f7f7f7; border-left: 5px solid #ff5f46; padding: 20px; margin: 20px 0; font-size: 18px;">
"Industry benchmarks reveal <b>30-day average resolution times</b> for complex claims, with manual processes accounting for <b>40% of processing time</b>‚Äîfar exceeding the <b>7-10 day ideal</b> for high-risk cases requiring detailed documentation."[^1]
</div>

<div style="background: #f7f7f7; border-left: 5px solid green; padding: 20px; margin: 20px 0; font-size: 18px;">
"Generative AI reduces claims processing time by <b>50%</b> through automated summarization while improving fraud detection accuracy by <b>35%</b> through pattern recognition in historical data."[^2]
</div>

<img src="https://raw.githubusercontent.com/databricks-demos/dbdemos-resources/refs/heads/main/images/liza.png" style="width: 100px; vertical-align: middle; margin-right: 10px; float: left;" />
<br>
Liza, a Generative AI Engineer, is implementing Databricks AI to transform claims management. Her system will:
<div style="background: #e0f7fa; border-left: 5px solid #00796b; padding-left: 10px; margin: 0 0; margin-left: 120px">
  <ol style="text-align: left;">
    <li>Retrieve all information pertaining to a certain claim id</li>
    <li>Automate structured claim summaries using GenAI</li>
    <li>Archive enriched case files to Unity Catalog for auditor review</li>
  </ol>
</div>

<!-- [^1]: MD Clarity Industry Benchmark Report 2024, Shipping Insurance Processing Metrics  
[^2]: Ricoh USA Claims Automation Study 2023, Ziffity AI Implementation Case Studies  
[^3]: Convin AI Fraud Detection Whitepaper 2024: Anomaly Detection in Financial Documents -->

<!-- Collect usage data (view). Remove it to disable collection or disable tracker during installation. View README for more details.  -->
<img width="1px" src="https://ppxrzfxige.execute-api.us-west-2.amazonaws.com/v1/analytics?category=lakehouse&notebook=04.1-AI-Functions-Creation&demo_name=lakehouse-fsi-smart-claims&event=VIEW">


## The Demo: Automate Suspicious Claims Summaries with Databricks GenAI
<!-- 
<img src="https://raw.githubusercontent.com/databricks-demos/dbdemos-resources/refs/heads/main/images/retail/lakehouse-churn/lakehouse-retail-c360-agent-functions.png" 
     style="float: right; margin-top: 10px; width: 700px; display: block;" /> -->

Databricks Generative AI is revolutionizing claims processing by enabling organizations to automate manual workflows, detect fraud, and unlock insights from previously siloed claims data. Where traditional claims processing can be a time-intensive endeavor, Databricks allows 

In this demo, we showcase a practical use case for Databricks to improve claims management. We will create an AI agent using the following functions, defined in the subsequent cells:

1. **Claims Data Retriever**: This function allows our GenAI agent to fetch claims data based on a given `claim_id`, pulling structured and unstructured data from our Databricks Lakehouse.
2. **Claims Summary Generator**: Using generative AI models, this function generates concise summaries of claims, including key details such as incident descriptions, policyholder information, and flagged anomalies for further review.

Finally, we will demonstrate how these functions can be applied in batch processing to unlock insights from archived claims data. By automating claims summarization and enabling real-time access to enriched data, insurers can improve operational efficiency and expedite fraud detection.

By leveraging Generative AI, insurers can move beyond manual processes to actively **streamline workflows, reduce costs, and enhance fraud detection capabilities**.


In [0]:
%run ../_resources/00-setup $reset_all_data=false

## üìä Function 1: Get Claim Details  

<div style="display: flex; align-items: center; gap: 15px;">
  <img src="https://raw.githubusercontent.com/databricks-demos/dbdemos-resources/refs/heads/main/images/liza.png" 
       style="width: 100px; border-radius: 10px;" />
  <div>
    Liza's first step in automating claims processing is to retrieve all relevant information for a given claim.  
    This function, `get_claim_details`, takes in a claim ID and returns the necessary data to generate a comprehensive claim summary.  
    The retrieved data includes incident descriptions, policyholder information, claim history, and any associated documents‚Äîproviding the foundation for accurate and efficient summarization.
  </div>
</div>

### üõ†Ô∏è Next Step  
In the following section, we define the `get_claim_details` function, which retrieves structured and unstructured data from the Databricks Lakehouse to enable automated claims summarization.


In [0]:
CREATE OR REPLACE FUNCTION get_claim_details (
  claim_id STRING
)
RETURNS TABLE (
  claim_no STRING,
  policy_no STRING,
  chassis_number STRING,
  claim_amount_total BIGINT,
  incident_type STRING,
  incident_date DATE,
  incident_severity STRING,
  driver_age DOUBLE,
  suspicious_activity BOOLEAN,
  make STRING,
  model STRING,
  model_year BIGINT,
  policytype STRING,
  sum_insured DOUBLE,
  borough STRING,
  neighborhood STRING
)
LANGUAGE SQL
COMMENT 'Returns essential claim details for the given claim ID'
RETURN(
  SELECT
    claim_no,
    policy_no,
    chassis_no AS chassis_number,
    claim_amount_total,
    incident_type,
    incident_date,
    incident_severity,
    driver_age,
    suspicious_activity,
    make,
    model,
    model_year,
    policytype,
    sum_insured,
    borough,
    neighborhood
  FROM claim_policy_telematics_ml
  WHERE claim_no = claim_id
  LIMIT 1);

### Example:

Let's ensure that our new tool works. Here we look up a claim that we know is suspicious for testing:

In [0]:
SELECT * FROM get_claim_details("5627e63a-f54b-4316-8abe-a5c5c3cfc7e0")

### Example:

And, let's look up a claim we know is not suspicious!

In [0]:
SELECT * FROM get_claim_details("6a365ec8-4def-4b4e-910f-0a9a033aab82")

## üìù Function 2: Generate Claim Summary  

<div style="display: flex; align-items: center; gap: 15px;">
  <img src="https://raw.githubusercontent.com/databricks-demos/dbdemos-resources/refs/heads/main/images/liza.png" 
       style="width: 100px; border-radius: 10px;" />
  <div>
    After retrieving the necessary claim details using the `get_claim_details` function, the next step is to generate a concise and structured summary of the claim.  
    The `generate_claim_summary` function uses Generative AI to process the retrieved data and produce a human-readable summary, which includes:
    <ul>
      <li>Incident description</li>
      <li>Policyholder information</li>
      <li>Claim amount and status</li>
      <li>Relevant notes or flagged items for further review</li>
    </ul>
    This summary helps streamline claims processing and provides a clear overview for auditors or claims adjusters.
  </div>
</div>

### üõ†Ô∏è Next Step  
In the following section, we define the `generate_claim_summary` function, which takes the output from `get_claim_details` and applies Generative AI to create structured summaries for individual claims. This enables faster decision-making and improved efficiency in claims management workflows.


### Step 1: Define the model prompt

This function will utilize Perplexity by-default for creating customized claims summaries.

Feel free to manipulate and change the below prompt in order to generate whatever kind of claims summary you think would be most impactful!

In [0]:
%python
prompt = """
You are an expert insurance claim analyst with 25 years of experience in the auto insurance industry. Your task is to generate a comprehensive, professional claim summary report.

## Output Format:
Create a well-structured report with these clearly labeled sections:
1. **CLAIM SUMMARY**: A concise 3-4 sentence overview of the incident
2. **CLAIM INFORMATION**: Policy details and financial impact
3. **INCIDENT DETAILS**: Date, location, type, and severity
4. **VEHICLE INFORMATION**: Make, model, and specifications
5. **RISK ASSESSMENT**: Analysis of any suspicious activity and next steps

## Guidelines:
- Maintain professional, formal language suitable for insurance industry standards
- Be precise with facts, dates, and monetary values
- If suspicious activity is flagged, note potential investigation requirements
- The tone should be objective and authoritative
- Total report length should be approximately 250-300 words
- Format monetary values appropriately (e.g., $10,000.00)
- Use insurance industry terminology where appropriate

Analyze the following claim data and provide your expert assessment:
"""

### Step 2: Function definition

Next, like before, we will utilize `ai_query` to define our function, leveraging the prompt from our previous cell as well as the previous two functions. 

In [0]:
%python
spark.sql(f"""
  CREATE OR REPLACE FUNCTION generate_claim_summary(
    claim_no STRING,
    policy_no STRING,
    chassis_number STRING,
    claim_amount_total STRING,
    incident_type STRING,
    incident_date STRING,
    incident_severity STRING,
    driver_age STRING,
    suspicious_activity STRING,
    make STRING,
    model STRING,
    model_year STRING,
    policytype STRING,
    sum_insured STRING,
    borough STRING,
    neighborhood STRING
  )
  RETURNS STRING
  LANGUAGE SQL
  COMMENT "Generates a professional insurance claim summary using AI based on the provided claim details. This function produces a well-structured report with sections for Claim Information, Incident Details, Vehicle Information, and Policy Details."
  RETURN (
    SELECT ai_query(
      'databricks-mixtral-8x7b-instruct',
      CONCAT(
        "{prompt}"
        'Claim details: ', 
        TO_JSON(NAMED_STRUCT('claim_no', claim_no, 'policy_no', policy_no, 'chassis_number', chassis_number, 'claim_amount_total', claim_amount_total, 'incident_type', incident_type, 'incident_date', incident_date, 'incident_severity', incident_severity, 'driver_age', driver_age, 'suspicious_activity', suspicious_activity, 'make', make, 'model', model, 'model_year', model_year, 'policytype', policytype, 'sum_insured', sum_insured, 'borough', borough, 'neighborhood', neighborhood))
      )
    )
  )
  """)

###  Step 3a: Suspicious Claim Example:
Now, let's make sure that this works. Let's manually input claims data pertaining to a claim we know is suspicious, and see how the summary looks! Feel free to modify the prompt above if you prefer a different output!

In [0]:
SELECT generate_claim_summary("5627e63a-f54b-4316-8abe-a5c5c3cfc7e0", "102085057", "JN1CC11C37T012187", "61500", "Single Vehicle Collision", "2016-04-20", "Minor Damage", "47", "true", "NISSAN", "TIIDA", "2007", "COMP", "10000", "QUEENS", "WEST CENTRAL QUEENS") AS claim_summary

###  Step 3b: Non-Suspicious Claim Example:
Now, let's make sure that this works. Let's manually input claims data pertaining to a claim we know is suspicious, and see how the summary looks! Feel free to modify the prompt above if you prefer a different output!

In [0]:
SELECT generate_claim_summary("6a365ec8-4def-4b4e-910f-0a9a033aab82", "102085055", "VWPW11K78W108679", "59700", "Multi-vehicle Collision", "2016-10-20", "Major Damage", "35", "true", "VOLKSWAGEN", "GOLF", "2008", "COMP", "30000", "MANHATTAN", "LOWER MANHATTAN") AS claim_summary

## üöÄ Operation 3: Apply Our Functions in Batch to Suspicious Claims  

<div style="display: flex; align-items: center; gap: 15px;">
  <img src="https://raw.githubusercontent.com/databricks-demos/dbdemos-resources/refs/heads/main/images/liza.png" 
       style="width: 100px; border-radius: 10px;" />
  <div>
    Now that Liza has implemented the `get_claim_details` and `generate_claim_summary` functions, the next step is to scale these operations by applying them in batch to a dataset of suspicious claims.  
    This operation enables organizations to:
    <ul>
      <li>Process large volumes of archived claims efficiently</li>
      <li>Generate summaries for flagged claims based on predefined criteria (e.g., high claim amounts or anomalies)</li>
      <li>Democratize access to enriched claim data for auditors and fraud investigators</li>
      <li>Identify patterns across multiple claims for fraud detection and risk analysis</li>
    </ul>
    By leveraging Databricks' scalable compute capabilities, this batch processing operation ensures that even large datasets can be analyzed quickly and effectively.
  </div>
</div>

### üõ†Ô∏è Next Step  
In the following section, we demonstrate how to apply the `get_claim_details` and `generate_claim_summary` functions in batch mode to process suspicious claims. This includes:
1. Loading a dataset of flagged claims from Unity Catalog.
2. Iteratively applying the functions to generate summaries for each claim.
3. Storing the enriched summaries back into Unity Catalog for downstream analysis.

<small>üí° Batch processing unlocks insights from historical claims data, enabling proactive fraud detection and improved operational efficiency.</small>


<div style="background: #ffe6e6; border-left: 5px solid #ff5f46; padding: 15px; margin: 20px 0; font-size: 16px;">
üö® <b>Notice:</b> Uncomment the following block of code to generate claim summaries in batch!  


‚ö†Ô∏è <i>Note:</i> This operation may take some time depending on the size of your dataset and the size of Databricks compute that you utilize.
</div>


In [0]:
CREATE OR REPLACE TABLE suspicious_claim_report_table AS
SELECT 
  claim_no, 
  suspicious_activity, 
  generate_claim_summary(
    claim_no, 
    policy_no, 
    CHASSIS_NO, 
    claim_amount_total, 
    incident_type, 
    incident_date, 
    incident_severity, 
    driver_age, 
    suspicious_activity, 
    MAKE, 
    MODEL, 
    MODEL_YEAR, 
    POLICYTYPE, 
    SUM_INSURED, 
    BOROUGH, 
    NEIGHBORHOOD
  ) AS claim_report 
FROM (
  SELECT * 
  FROM claim_policy_telematics_ml
  WHERE suspicious_activity = true
  -- REMOVE LIMIT if you would like to apply this summarization to more claims!
  LIMIT 10
)

In [0]:
%sql
SELECT * FROM suspicious_claim_report_table LIMIT 10;

## Next Steps

Proceed to notebook [04.2-Agent-Creation-Guide]($./04.2-Agent-Creation-Guide) in order to package the above functions into an implementable AI Agent with Databricks!