## PSTAT 160A Fall 2021 Python Homework 2

**Due date:** Friday, October 15, 11:59 p.m. via GauchoSpace

**Instructions:** Please upload your PDF or HTML file on Gradescope with filename "PythonHW2_YOURPERMNUMBER".


## Problem 1 (10 pts total)
__Background__:
A stochastic model for a car insurance company's total cost of damages from traffic accidents goes back to the work by Van der Lann and Louter, "A statistical model for the costs of passenger car traffic accidents", Journal of the Royal Statistical Society (1986).

For every $k=1,2,3\ldots$ we denote by the random variable $X_k$ the US dollar amount of a damage from a policy holder's traffic accident which will occur during the year 2020.

We assume that $X_1$, $X_2$,... is an i.i.d. sequence of exponential distributed random variables with an average claim size of \$1,500 USD.  

The (random) total number of accidents $N$ in 2020 is expected to be Poisson distributed with 25 claims on average.

It is assumed that the number of accidents is independent of the US dollar amount of damages for each accident. That is, the random variable $N$ is independent of the random variables $X_1$, $X_2$,...

The total costs for the insurance company by the end of 2020 will thus be given by the __random sum__ $S_N$ defined as

$$S_N = X_1 + X_2 + \dots + X_N = \sum_{k = 1}^{N} X_k.$$

Note again that the total number $N$ of accidents is random

The goal of the current exercise is to approximate the expected total costs $$\mathbb{E}[S_N]$$ for the insurance company in 2020 via simulations.

As usual, we start with loading some packages:

In [1]:
import numpy as np
import math

### Step 1: (5 Points)

Write a function called <tt>randomSum(...)</tt> which simulates the random variable $S_N$. 

Input:
* <tt>averageClaimSize</tt>: Average USD amount per claim
* <tt>averageNumberOfClaims</tt>: Average number of claims/accidents in 2020

Output:
* <tt>sampleRandomSum</tt>: A single scalar being one sample from the random variable $S_N$

<i>Hint:</i> Use build-in functions from the <i>NumPy</i>-package in your code in order to sample from a Poisson distribution and from an exponential distribution!

In [2]:
def randomSum(averageClaimSize, averageNumberOfClaims): 
    N = np.random.poisson(averageNumberOfClaims)
    X = np.random.exponential(averageClaimSize, N)
    sampleRandomSum = sum(X)   
    return sampleRandomSum  

In [3]:
## TEST YOUR FUNCTION HERE
randomSum(1500,25)

35966.969749132804

### Step 2: (3 Points)

Write a simulator function called <tt>simulator()</tt> which uses the function <tt>randomSum()</tt> from Step 1 to simulate $M \in \mathbb{N}$ samples from the random variable $S_N$. 

Input: 
* <tt>averageClaimSize</tt>: Average USD amount per claim
* <tt>averageNumberOfClaims</tt>: Average number of claims/accidents in 2020
* <tt>M</tt>: Number of Simulations

Output:
* <tt>samples</tt>: An array of length $M$ with samples from the random variable $S_N$.

In [4]:
def simulator(averageClaimSize, averageNumberOfClaims, M):
    S_array = []
    for i in range(M):
        S_array.append(randomSum(averageClaimSize, averageNumberOfClaims))
    S_array = np.array(S_array)
    
    return S_array

In [5]:
## TEST YOUR FUNCTION HERE
simulator(1500,25,10)

array([17215.75864706, 31415.85387878, 42190.93266363, 27714.02261768,
       40073.6879048 , 24397.74853553, 26998.45631304, 20521.41012746,
       18669.10352238, 62216.6654641 ])

### Step 3: (2 Points)

As we have shown in class, it holds via __Wald's Identity__ that the expectation of the random sum $S_N$ is given by the formula

\begin{equation}
\mathbb{E}[S_N] = \mathbb{E}[N] \cdot \mathbb{E}[X_1] = 25 \cdot \$1,500 = \$37,500.
\end{equation}

Check via the empirical mean that

$$\frac{1}{M} \sum_{m=1}^M s^{(m)}_N \approx \mathbb{E}[S_N] = \$37,500$$

where $s^{(1)}_N, s^{(2)}_N, \ldots, s^{(M)}_N$ denote $M$ independent realizations (samples) from the random variable $S_N$. 

Use $M = 10, 100, 1000, 10000, 50000$ simulations.  

That is, write a function <tt>MCsimulation(...)</tt> which uses the function <tt>simulator(...)</tt> from Step 2 to compute the empirical mean. 


Input: 
* <tt>averageClaimSize</tt>: Average USD amount per claim
* <tt>averageNumberOfClaims</tt>: Average number of claims/accidents in 2020
* <tt>M</tt>: Number of Simulations

Output:
* <tt>empricialMean</tt>: A real number in $\mathbb{R}_+$.

In [6]:
def MCsimulation(averageClaimSize, averageNumberOfClaims, M): # 2 points
    empricialMean = np.mean(simulator(averageClaimSize, averageNumberOfClaims, M))
    ### WRITE YOUR OWN CODE HERE
     
    return empricialMean

In [7]:
## TEST YOUR FUNCTION HERE
MCsimulation(1500, 25, 1)

41528.352109796244

In [8]:
## Compute the absolute error
print(np.absolute(MCsimulation(1500, 25, 10)-37500))
print(np.absolute(MCsimulation(1500, 25, 100)-37500))
print(np.absolute(MCsimulation(1500, 25, 1000)-37500))
print(np.absolute(MCsimulation(1500, 25, 10000)-37500))
print(np.absolute(MCsimulation(1500, 25, 50000)-37500))

3891.6230694052356
769.9493631897712
41.61604138536495
66.12872649600467
81.55770470710559


## Problem 2 (5 Points)

A health insurance will pay for a medical expense subject to a USD 100 deductible. Assume that the amount of the expense is __Gamma__ distributed with scale parameter 100 and shape parameter 2 (the mean is 100*2 dollars). This can be simulated using np.random.gamma(shape, scale, n)

Compute the empirical _mean_ and empirical _standard deviation_ of the payout by the insurance company by using 100,000 samples.

In [28]:
# WRITE YOUR OWN CODE HERE! FEEL FREE TO INSERT MORE CELLS!
# ADD SOME COMMENTS TO YOUR CODE!
n = 100000
x = np.random.gamma(100,2,n)
for i in range(len(x)):
    x[i] -= 100
mean = np.mean(x)
std = np.std(x)
print("empirical mean:",mean)
print("empirical standard deviation:",std)

empirical mean: 100.01048565464755
empirical standard deviation: 20.017796270985542


## Problem 3 (5 Points)

Since the beginning of fall quarter, Adam goes to Woodstock's Pizza every day, orders a slice of pizza, and picks a topping - pepper, mushrooms, pineapple, or onions - uniformly at random. 

1. Implement a simulator which uniformly samples from one topping:

In [44]:
# WRITE YOUR OWN CODE HERE! FEEL FREE TO INSERT MORE CELLS!
# ADD SOME COMMENTS TO YOUR CODE!
import random
def simu_topping():
    toppings = ['pepper','mushroom','pineapple','onion']
    topping = random.choice(toppings)
    return topping
simu_topping()

'onion'

2. On the day that Adam first picks pineapple, find the empricial _mean_ and empirical _standard deviation_ of the number of prior days in which he picked mushroom by running 100,000 simulations. [As you might realize, this is very similar to the question about rolling 5's before the first '6' appears that we did in class -- now we solve it/verify the answer by simulation]



In [73]:
# WRITE YOUR OWN CODE HERE! FEEL FREE TO INSERT MORE CELLS!
# ADD SOME COMMENTS TO YOUR CODE!
sum_day = 0
n = 0
for i in range(100000):
    if (simu_topping() == 'mushroom'):
        sum_day+=i
        n+=1
    if (simu_topping() == 'pineapple'):
        mean = sum_day/n
        print(sum_day)
        print(mean)
        print(i)
        break
        

4
4.0
5
