# Optimizing Insurance Prices

Optimization of premiums is critical for insurance companies to balance profitability with market growth. These companies have to charge a high enough premium to cover claims costs while not too severely limiting customer retention rate.

Suppose we are an insurance company, and we can divide our customer base into $N$ clusters. We have $r$ "termination rates" $\alpha_j$, each of which represents a percentage of people in the cluster that will terminate their policy when charged a given premium. For each cluster $i$, we have the number of policies in the cluster $P_i$ and the average claim cost for the cluster $C_i$. For each cluster $i$  and each termination rate $\alpha_j$, we have a premium $\rho_{ij}$ that we can charge that cluster to achieve that termination rate. We also have an overall target termination rate $\alpha_t$ that we do not wish to exceed to maintain market growth.

Our objective function represents a maximization of expected revenue:

$$max \sum_{i=1}^{N} (1- \sum_{j=1}^{r} \alpha_jx_{ij}) P_i(\sum_{j=1}^{r} \rho_{ij}x_{ij} - C_i)$$

$$s.t.$$ 
$$\sum_{j=1}^{r}x_{ij} = 1 \forall i $$
$$ \sum_{i=1}^{N} P_i(1-\alpha_i) \ge (1-\alpha_t) \sum_{i=1}^{N}P_i$$

The objective function can be stated as: select premium prices such that we maximize each cluster's (retention rate) $\times$ (the number of policy members) $\times$ (premium charged minus average claim). We constrain the objective function so that we can only pick one termination rate (and therefore one premium to charge each cluster) and so that the target overall termination rate is met.

## Why this problem is hard

If there were no constraint that we meet a target termination rate, this would be an easy problem to solve. We could simply pick the most profitable termination rate within each cluster, which means we only have to iterate through all $r$ termination rates.

However, this problem requires synchronization across clusters to stay within an overall target termination rate. Now we have $r^N$ different combinations to check. For the below example, that's already 205,891,132,094,649 combinations, a very large number for classical computers to iterate through, even with a more nuanced approach than brute force.


## Formulating and solving the problem

Let's generate some of the given information to solve the problem

In [1]:
import qubovert as qv
import sympy
import pandas as pd
from copy import deepcopy
from insurance_helper import *  # Helper functions (not publically available)

In [2]:
N = 15  # Number of clusters in our customer base
alpha = [5, 7, 9, 11, 13, 15, 17, 25, 30]  # termination rates for each cluster
clusters = [i for i in range(N)]  # List of clusters

rho, P, C = generate_random_data(
    N, alpha
)  # Generate premiums for each cluster/termination rate (rho), the number of policies in each cluster (P), and the average claim cost for each cluster

target_alpha = 11  # Overall termination rate desired

In [3]:
x = create_binary_variables(N, alpha)

In [4]:
show_data(rho, P, C, alpha)

Premiums that we can charge for a cluster to achieve a specific termination rate (rows are clusters numbered 0 to N, columns are termination rates (%), entries are dollar amounts that represent the premium for each cluster to achieve each termination rate):


Unnamed: 0,5,7,9,11,13,15,17,25,30
0,422,431,493,566,688,700,919,926,932
1,350,356,379,463,496,620,688,724,940
2,317,339,631,655,711,848,856,908,999
3,327,452,469,685,723,755,801,876,966
4,482,490,530,694,836,933,973,989,992
5,319,339,404,432,518,572,714,797,973
6,404,548,585,764,841,879,913,965,971
7,609,685,698,705,764,799,821,896,928
8,344,372,419,504,509,559,695,884,924
9,330,388,448,452,472,575,710,876,993


Data for each cluster (number of policies in each cluster, dollar amount of the average claim)


Unnamed: 0,# of Policies,Avg Claim Cost
0,17127,258
1,20071,477
2,27361,245
3,18066,483
4,13392,363
5,20259,190
6,16602,199
7,37747,487
8,30295,156
9,27370,273


In [5]:
obj = create_objective_function(
    N, alpha, rho, P, C, x
)  # Create the objective function, as detailed in the introduction

In [6]:
M = max(obj.values())  # Penalty factor
lam_one = sympy.Symbol(
    "lam_one"
)  # Placeholder for single termination rate constraint for each cluster
lam_two = sympy.Symbol("lam_two")  # Placeholder for overall termination rate constraint

for i in range(N):
    const = create_one_termination_rate_constraint(i, x)
    obj.add_constraint_eq_zero(const, lam=lam_one)  # Add single termiantion rate constraint

const = create_overall_termniation_rate_constraint(N, alpha, target_alpha, P, x)
obj.add_constraint_ge_zero(const, lam=lam_two)  # Add overall termination rate constraint
print("Constraints added")

Constraints added


In [7]:
M_one = M**2
M_two = M

# Substitute large number M to enforce constraints
obj = obj.subs({lam_one: M_one})
obj = obj.subs({lam_two: M_two})

print(f"Number of variables in QUBO: {len(obj.variables)}")

Number of variables in QUBO: 151


In [8]:
from qubovert.sim import anneal_qubo

%time res = anneal_qubo(obj, num_anneals=1000)
solution = res.best.state

Wall time: 13.5 s


In [9]:
if obj.is_solution_valid(solution):
    print("Solution is valid.")
else:
    print("Invalid solution found. Try adjusting penalty factor.")

Solution is valid.


In [10]:
obj_value = obj_value(N, alpha, rho, P, C, x, solution)
print("Expected Revenue", "${:0,.2f}".format(obj_value))

Expected Revenue $100,178,893.21


In [11]:
display_solution(rho, alpha, solution, P)

Overall termiation rate: 10.98%
Here's the solution for which premiums to charge each cluster:


Unnamed: 0,5,7,9,11,13,15,17,25,30
0,0,0,1,0,0,0,0,0,0
1,0,0,0,0,0,0,1,0,0
2,0,0,0,0,0,1,0,0,0
3,0,0,0,0,1,0,0,0,0
4,0,0,1,0,0,0,0,0,0
5,0,0,0,1,0,0,0,0,0
6,0,0,0,0,1,0,0,0,0
7,0,1,0,0,0,0,0,0,0
8,0,0,0,0,0,0,1,0,0
9,1,0,0,0,0,0,0,0,0


Reference: [A Mathematical Programming Approach to Optmise Insurance Premium Pricing within a Data Mining Framework](https://www.jstor.org/stable/pdf/822805.pdf?refreqid=excelsior%3A839bdbe2d4f70b8342350e03112dc8ef&ab_segments=&origin=&acceptTC=1)