## Target Variables

- Modeling rating plan
    - Frequency
    - Severity
    - Pure Premium<br>
    
- Identify deficiencies in the existing rating plan
    - Loss ratio<br>
    
- Evaluating UW restrictions
    - Prob of large loss

## Frequency/Severity vs PP

- Advantages of modeling freq/sev over PP
    - Provides more insight into variables driven by freq vs sev
        - Properly addresses situations where freq and sev counteract
    - Freq/Sev is more stable (i.e. less noise) than PP model
        - If variable only impacts sev then it would look less significant in PP model and could be omitted (underfitting)
            - If variable is included in PP model then it would be overfit since it picks up noise from sev.
    - Tweedie assumes freq and sev move in the same direction

## Combining Models and Perils<br>

- Combining Freq and Sev model
    - Just multiply the factors from the freq and sev model for each record.
    - Or, add coefficients together.<br><br>



- Combining perils
    - Add by-peril expected loss cost predictions together for each exposure.
    - Run model using all-peril exp loss cost as target and union of model predictors as predictors.
    - Since the target variable is very stable, doesn't require lot of data. Focus on business mix instead.

## Adjustments to Target variable

- Cap large losses
- Model non-CAT losses only
- Develop losses$^1$
    - For sev model, dev factor should reflect development on known claims only.
    - For PP or loss ratio model, the dev factor should include IBNR as well.
- For LR model, might need to on-level premium$^1$
- For multi-year data, losses and exposures may need to be trended$^1$<br><br>


$^1$ Could also use temporal variable such as year to adjust for these

## Detecting Non-Linearity in Continuous Predictor Variables

- If the GLM coefficient for log of age is -.314 then it means that each unit increase in log of age results in .314 decrease in log of expected severity.
    - GLM assumes a linear relationship in this case.
        - To detect non-linearity we use <b>Partial Residual Plot</b>.

#### Partial Residual Plot

- Let $r_i$ be a partial residual for predictor $x_j$.<br>

$$r_i = (y_i-\mu_i) g^\prime(\mu_i) + \beta_j x_{ij}$$<br>
$$ \text{Since, } g^{\prime}(\mu_i) = \frac{1}{\mu_i}$$<br>
$$ \therefore r_i = \frac{y_i-\mu_i}{\mu_i} + \beta_j x_{ij} = \frac{\text{actual - predicted}}{\text{predicted}} + \text{contribution to predicted}$$<br>

- Only the effect of predictor $x_j$ is reflected since scaled residual should otherwise be unbiased and distributed around 0.



- Plot $x_{i}$ against $r_i$ and draw $y = \beta_j x_j$.<br><br>

<center><img src='images/Partial_Resid.JPG'></center><br>

- Model over-predicts below log(2.5) and under-predicts between log(2.5) - log(3.25).

## Ways to transform continuous predictor variables

- Binning
- Polynomial terms
- Piecewise linear functions
- Natural cubic splines

#### Binning

- Drawbacks
    - Adds a lot more parameters to the model.
    - Could end up with reversals in factors due to volatility in data.
        - Could combine buckets to avoid reversals.
        - Or, use manual smoothing.
    - Variation within intervals is ignored.
        - Could make buckets smaller
            - Results in less credible estimates.

#### Polynomial Terms

- Add $x^2, x^3, \ldots$ terms.
    - If x = log(age) then $x^2$ = log(age)$^2$.
    
- <b>Drawback</b>
    - Leads to loss of interpretability
        - Can't tell the relationship based on terms and coeffiecients alone.
        - Have to graph to understand.
    - Polynomial functions tend to behave irratically at the edges of the data.

#### Piecewise Linear Functions

- Can add a hinge function at the break-point.

- <b>Drawback</b>
    - Break-points needs to be selected by the user.
    - Fitted curve changes abruptly around breakpoints.<br><br>

- Assume pattern reversal happens at log(age) = 2.75 then we add the following variable:<br><br>

$$ max(0,ln(age)-2.75) = 
\begin{cases}
0, & \text{if log(age) <= 2.75} \\[2ex]
\gt 0, & \text{if log(age) > 2.75}
\end{cases}$$


<table><tr><td><img src='images/Hinge_1.JPG'></td><td><img src='images/Hinge_2.JPG'></td></tr></table><br><br>

$$ \text{Slope} = 
\begin{cases}
1.225, & \text{if log(age) <= 2.75} \\[2ex]
-1.044, & \text{if log(age) > 2.75}
\end{cases}$$


#### Natural Cubic Splines

- Combines polynomial and piecewise functions.<br><br>

- <b>Advantages</b>
    - Curves are smooth around breakpoints. (1st and 2nd deriv. are continuous)
    - Fits at the edges are restricted to be linear.
    - Use of breakpoints makes it more suitable than regular polynomial functions for complex effect responses.<br><br>

- <b>Disadvantages</b>
    - Need to graph to understand modeled effect.


## Interactions

- <b>Interaction effect</b>: When two or more variables have additional combined effect on the target variable.<br><br>
- <b>Main effect:</b> Effect of a variable by itself.

#### Interacting categorical variables

- Software adds column for each combination of non-base levels for the two variables.<br><br>



<center><img src = 'images/Cat_x_Cat.JPG'></center><br>
Discount for occupancy class 2 with sprinklers compared to class 2 with no sprinklers is:<br><br>
$$e^{-.2895-.2847} -1 = 43.7\% \text{ discount}$$

#### Interacting categorical with continuous variable

- Software adds a column for interaction of <b>non-base levels of the categorical variable x log(continuous variable)</b>.
    - For a (Yes/No) categorical variable, the new predictor is 0 where categorical variable is 'No' and log(continuous variable) otherwise.
        - If categorical variable has various levels, then the number of log(continuous variable) values are divided among columns.
            - Causes credibility issues since data is getting sliced more.

<table><tr><td><img src='images/Cat_x_Cont_2.JPG'></td><td><img src='images/Cat_x_Cont_1.JPG'></td><td><img src='images/Cat_x_Cont.JPG'></td></tr></table><br><br>

- For better interpretability, it is best to divide the continuous variable with the base value.
    - Base Classes
        - In Table 11, log(aoi = 1) = 0 
        - In Table 12, log(aoi = 200k/200k) = 0 .
    - Predicted values don't differ between the two.