
# Health Insurance Premium Pricing – Toy Dataset

https://link.springer.com/article/10.1057/palgrave.jors.2601413

This dataset explores the following capabilities of our package:

- discrete **premium options per segment** (binary decisions),
- **derived portfolio metrics** (expected profit, loss ratio, coverage),
- **fairness / access constraints** for a vulnerable group (no permanent income),
- regulatory-style constraints on loss ratios and affordability.

The structure follows the general approach of combining risk segmentation, demand/retention models, and mathematical programming to determine
premium levels under multiple objectives (e.g., profitability, market share, consumer protection).


In [None]:

import pandas as pd

# -------------------------------------------------
# 1. Customer segments (risk + income characteristics)
# -------------------------------------------------

segments = pd.DataFrame({
    "Segment": [
        "Informal_YoungLowRisk",
        "Formal_MidAgeMediumRisk",
        "Formal_SeniorHighRisk",
    ],
    # Broad type – used to mark the "no permanent income" group
    "Segment_Type": [
        "Informal",  # gig / informal workers
        "Formal",
        "Formal",
    ],
    # Risk classification (non-actuarial, just for intuition)
    "Risk_Band": [
        "Low",
        "Medium",
        "High",
    ],
    # 0 = no permanent income; 1 = permanent income
    "Has_Permanent_Income": [
        0,
        1,
        1,
    ],
    # Current (baseline) annual premium per policyholder in this segment
    "Baseline_Premium": [
        5000.0,
        11000.0,
        22000.0,
    ],
    # Estimated mean annual claim cost per policyholder (risk cost)
    "Mean_Annual_Claim": [
        4000.0,
        9000.0,
        18000.0,
    ],
    # Number of policyholders in this segment under the baseline portfolio
    "Baseline_Lives": [
        800,
        1200,
        600,
    ],
    # Proxy for annual income (for affordability / fairness constraints)
    "Annual_Income_Proxy": [
        180000.0,  # informal young workers
        600000.0,  # mid-age formal employees
        500000.0,  # senior formal group
    ],
    # Minimum fraction of the segment that should remain insured
    # (access / fairness requirement)
    "Min_Coverage_Prop": [
        0.50,   # at least 50% of informal, low-risk should stay covered
        0.70,
        0.70,
    ],
})

# -------------------------------------------------
# 2. Premium options per segment (discrete choices)
# -------------------------------------------------

price_options = pd.DataFrame({
    "Segment": [
        # Informal_YoungLowRisk
        "Informal_YoungLowRisk", "Informal_YoungLowRisk", "Informal_YoungLowRisk",
        # Formal_MidAgeMediumRisk
        "Formal_MidAgeMediumRisk", "Formal_MidAgeMediumRisk", "Formal_MidAgeMediumRisk",
        # Formal_SeniorHighRisk
        "Formal_SeniorHighRisk", "Formal_SeniorHighRisk", "Formal_SeniorHighRisk",
    ],
    "Option_ID": [
        "S1_disc10", "S1_base", "S1_plus10",
        "S2_disc5", "S2_base", "S2_plus10",
        "S3_disc5", "S3_base", "S3_plus15",
    ],
    # Multiplier applied to Baseline_Premium to get the offered premium
    "Premium_Multiplier": [
        0.90, 1.00, 1.10,   # S1
        0.95, 1.00, 1.10,   # S2
        0.95, 1.00, 1.15,   # S3
    ],
    # Expected retention rate at this premium (fraction of Baseline_Lives)
    "Retention_Rate": [
        0.95, 0.85, 0.70,   # Informal young, more price-sensitive
        0.94, 0.90, 0.80,   # Mid-age formal
        0.93, 0.88, 0.75,   # Senior high-risk
    ],
    # Flag for options that are explicitly cross-subsidising / 'social'
    "Is_Subsidised": [
        1, 0, 0,   # discounted informal product is considered subsidised
        0, 0, 0,
        0, 0, 0,
    ],
})

# -------------------------------------------------
# 3. Global parameters (targets / regulatory constraints)
# -------------------------------------------------

global_params = {
    # Minimum acceptable expected annual profit across all segments
    "Min_Expected_Profit": 3_000_000.0,
    # Target market share as a fraction of the baseline total lives
    "Target_Market_Share": 0.85,
    # Affordability constraint for informal (no permanent income) group:
    # annual premium / annual income <= this ratio (as a goal or hard cap)
    "Max_Premium_Income_Ratio_Informal": 0.25,
    # Regulatory-style cap on overall loss ratio:
    # (total expected claim cost / total premium) <= this value
    "Regulatory_Max_Loss_Ratio": 0.80,
}

segments, price_options, global_params



## Data Structures and Column Descriptions

### `segments` – customer segments

Columns:

- **`Segment`**  
  Label for the risk / socio-economic segment. Three segments are defined: an informal, young, low-risk group and two formal-income groups with medium and high risk.

- **`Segment_Type`**  
  High-level classification (`"Informal"` vs `"Formal"`) used to identify the sub-consumer group without permanent income.

- **`Risk_Band`**  
  Qualitative risk level (Low / Medium / High), a non-actuarial label used for interpretation and possible segment-specific constraints.

- **`Has_Permanent_Income`**  
  Binary flag (0/1) indicating presence of a stable income source. Segments with 0 form the vulnerable population where affordability constraints are most relevant.

- **`Baseline_Premium`**  
  Current annual premium per policyholder in the segment under the baseline portfolio.

- **`Mean_Annual_Claim`**  
  Estimated mean annual claim cost per policyholder (risk cost), used to compute expected claims at different portfolio sizes.

- **`Baseline_Lives`**  
  Number of policyholders in the segment in the baseline portfolio. Used to derive current scale and to compute market share and coverage levels.

- **`Annual_Income_Proxy`**  
  Proxy for yearly income for the segment. Supports affordability metrics such as premium-to-income ratios.

- **`Min_Coverage_Prop`**  
  Minimum fraction of baseline lives that should remain insured in the optimised portfolio.

---

### `price_options` – candidate premiums and retention behavior

Columns:

- **`Segment`**  
  Segment identifier, matching `segments["Segment"]`.

- **`Option_ID`**  
  Identifier for the premium option. Intended to map one-to-one with a binary decision variable, selecting exactly one option per segment.

- **`Premium_Multiplier`**  
  Factor applied to the segment’s `Baseline_Premium` to obtain the offered premium

- **`Retention_Rate`**  
  Expected fraction of `Baseline_Lives` who accept the premium option.

- **`Is_Subsidised`**  
  Indicator for options that are explicitly cross-subsidising or socially targeted (e.g., a discounted informal product). Can be used to bound total subsidised volume or link subsidy usage to profit goals.

---

### `global_params` – portfolio-level targets and constraints

Keys:

- **`Min_Expected_Profit`**  
  Minimum acceptable total expected annual profit (premium revenue minus expected claim cost) across all segments. Implemented either as a hard constraint or as a GLP goal.

- **`Target_Market_Share`**  
  Target fraction of baseline lives that should remain insured in aggregate
- **`Max_Premium_Income_Ratio_Informal`**  
  Affordability cap for segments without permanent income.

- **`Regulatory_Max_Loss_Ratio`**  
  Maximum allowed portfolio loss ratio:  

These structures collectively multi-goal premium optimisation: choosing one premium option per segment, satisfying coverage and regulatory constraints, and balancing profit, market share, and affordability for a vulnerable sub-consumer group.


In [None]:

print("Segments:")
display(segments)
print("\nPrice options:")
display(price_options)
print("\nGlobal parameters:")
display(global_params)
