In [1]:
import sys
import os

sys.path.append(os.path.abspath(".."))
sys.path.append("..")

In [2]:
from ml.predict import predict_claim
from fraud_agent import investigate_claim
import pandas as pd

In [3]:
high_risk_claim = {
    "Month": "Dec",
    "WeekOfMonth": "5",
    "DayOfWeek": "Friday",
    "Make": "BMW",
    "AccidentArea": "Urban",
    "DayOfWeekClaimed": "Monday",
    "MonthClaimed": "Jan",
    "WeekOfMonthClaimed": "1",
    "Sex": "Male",
    "MaritalStatus": "Single",
    "Age": 23,
    "Fault": "Policy Holder",
    "PolicyType": "Sport - Collision",
    "VehicleCategory": "Sport",
    "VehiclePrice": "more than 69000",
    "Deductible": 300,
    "DriverRating": 1,
    "Days_Policy_Accident": "more than 30",
    "Days_Policy_Claim": "more than 30",
    "PastNumberOfClaims": "4 to 8",
    "AgeOfVehicle": "7 years",
    "AgeOfPolicyHolder": "18 to 20",
    "PoliceReportFiled": "No",
    "WitnessPresent": "No",
    "AgentType": "External",
    "NumberOfSuppliments": "more than 5",
    "NumberOfCars": "3 to 4",
    "BasePolicy": "Collision"
}

In [4]:
from rag.retriever import retrieve_rules
retrieve_rules()

'Rule 1: Claims with multiple past claims are higher fraud risk.\nRule 2: High vehicle price combined with no police report increases suspicion.\nRule 3: Long delay between accident and claim increases fraud probability.\nRule 4: Multiple supplements added to claim can indicate exaggeration.\nRule 5: Young drivers with expensive cars have elevated fraud patterns.'

In [5]:
response = investigate_claim(high_risk_claim)
print(response)

# Risk Assessment
The claim presents a high fraud probability of **70.17%**, indicating significant risk based on the combination of factors involved. 

# Key Suspicious Factors
1. **Multiple Past Claims**: The claimant has between 4 to 8 past claims, which raises suspicion according to Rule 1.
2. **High Vehicle Price**: The vehicle is categorized as a sport and priced at more than $69,000, which aligns with Rule 2 regarding high vehicle price and lack of a police report.
3. **Delay in Claim Filing**: The claim was filed more than 30 days after the accident, contradicting expected timelines and suggesting possible fraud as per Rule 3.
4. **Number of Supplements**: Claim includes more than 5 supplements, which suggests potential exaggeration of damages (Rule 4).
5. **Young Driver with Expensive Car**: The policyholder is 23 years old, which, combined with the expensive vehicle, fits Rule 5 regarding elevated fraud risks.

# Recommended Action
- **Investigate Further**: Conduct a deeper 

In [6]:
low_risk_claim = {
    "Month": "Jun",
    "WeekOfMonth": "2",
    "DayOfWeek": "Tuesday",
    "Make": "Honda",
    "AccidentArea": "Rural",
    "DayOfWeekClaimed": "Tuesday",
    "MonthClaimed": "Jun",
    "WeekOfMonthClaimed": "2",
    "Sex": "Female",
    "MaritalStatus": "Married",
    "Age": 45,
    "Fault": "Third Party",
    "PolicyType": "Sedan - Liability",
    "VehicleCategory": "Sedan",
    "VehiclePrice": "20000 to 29000",
    "Deductible": 500,
    "DriverRating": 4,
    "Days_Policy_Accident": "none",
    "Days_Policy_Claim": "none",
    "PastNumberOfClaims": "none",
    "AgeOfVehicle": "2 years",
    "AgeOfPolicyHolder": "41 to 50",
    "PoliceReportFiled": "Yes",
    "WitnessPresent": "Yes",
    "AgentType": "Internal",
    "NumberOfSuppliments": "none",
    "NumberOfCars": "1 vehicle",
    "BasePolicy": "Liability"
}

In [7]:
response = investigate_claim(low_risk_claim)
print(response)

# Risk Assessment

Based on the provided claim data and the model prediction of a fraud probability of **0.000175**, the overall risk for this claim appears to be low. The fraud knowledge base rules indicate potential risk factors; however, none of the factors from the provided data raise significant alarms.

---

# Key Suspicious Factors

1. **Past Claims**: The claimant has no past claims, which usually would lower the fraud risk; however, there is no data indicating a history that aligns with relevant rules.
2. **Vehicle Price**: The vehicle falls into the moderate price category ($20,000 to $29,000), but it is not exceptionally high, which minimizes suspicion.
3. **Police Report**: A police report was filed, which is generally a good sign and reduces suspicion.
4. **Supplements**: There are no supplements filed, which again lowers the chance of exaggeration.
5. **Age of Insured**: The insured is 45, which is not considered a high-risk age group for fraud.

---

# Recommended Action

In [8]:
medium_risk_claim = {
    "Month": "Sep",
    "WeekOfMonth": "3",
    "DayOfWeek": "Saturday",
    "Make": "Ford",
    "AccidentArea": "Urban",
    "DayOfWeekClaimed": "Sunday",
    "MonthClaimed": "Sep",
    "WeekOfMonthClaimed": "3",
    "Sex": "Male",
    "MaritalStatus": "Married",
    "Age": 34,
    "Fault": "Policy Holder",
    "PolicyType": "Utility - All Perils",
    "VehicleCategory": "Utility",
    "VehiclePrice": "30000 to 39000",
    "Deductible": 400,
    "DriverRating": 2,
    "Days_Policy_Accident": "15 to 30",
    "Days_Policy_Claim": "15 to 30",
    "PastNumberOfClaims": "1",
    "AgeOfVehicle": "3 years",
    "AgeOfPolicyHolder": "31 to 40",
    "PoliceReportFiled": "Yes",
    "WitnessPresent": "No",
    "AgentType": "External",
    "NumberOfSuppliments": "1 to 2",
    "NumberOfCars": "2 vehicles",
    "BasePolicy": "All Perils"
}

In [9]:
response = investigate_claim(medium_risk_claim)
print(response)

# Risk Assessment
The model indicates a **Fraud Probability** of approximately **18.66%**. While this value is below the commonly accepted threshold for high suspicion (often around 20% or higher), several factors warrant a closer look due to their potential implications.

# Key Suspicious Factors
1. **Past Claims**: The claimant has **1 past claim** on record. Although this is not multiple, it does contribute slightly to increased risk.
   
2. **Vehicle Price with Police Report**: The claim involves a **vehicle price range of $30,000 to $39,000** along with a **police report filed**. While the police report does reduce the suspicion, the higher vehicle price still maintains some level of risk.

3. **Delay Between Accident and Claim**: The claim shows a delay of only **15 to 30 days** between the accident and the claim, which does not significantly elevate suspicion based on the established rules.

4. **Number of Supplements**: The claim has **1 to 2 supplements** added, which can sugg