# Insurance Risk assessment with Monte Carlo method using Apacge Spark

In this notebook the Monte Carlo method is used to calculate the ruin probability of an insurance company using Apcache Spark by means of parallel Monte Carlo method


> Monte Carlo Method

 The main idea of this method is that if you want to estimate the probability of some randon event, just repeat your experiment many times. More experiments, more accuracy.

> Example 1: Calculating pi value by means of Monte Carlo Method 
The idea is simple - we draw a unit square with an inner circle and then generate a large number of random points uniformly distributed within the square.

<left>
    <img src="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/insurance-risk-assessment-with-montecarlo-method-using-apache-spark/images/circle.jpg" width="500" alt="MC Pi sampling"  />
</left>

$\frac{{Circle surface}}{{Square;surface}} \approx \frac{{Dots;in;circle }}{{Dots;in;square}} $


In [10]:
import random
#Amount of points to generate
N=1000000

In [11]:
# Number of dots in the circle 
bingo=0

for i in range(N):
    #Generate the random dots with uniform distribution within a square  
    x=random.uniform(-0.5,0.5)
    y=random.uniform(-0.5,0.5)
    #check if the dots hit the circle area
    if (x**2+y**2)<=0.25:
        #if they hit bingo! we add one more
        bingo+=1
        
    #Estimating value of pi 
    pi=4*bingo/N
    
print("The estimation of value of pi is ", pi)
        

The estimation of value of pi is  3.141344


# Ruin Probability calculation 
> lets calculate the ruin probability of an insurance company, by the Classica Risk Procces  

> # The Classical Risk Process

The process describe the initial capital $u$ of a company through incoming cash and outgoing claims. Premiums arrive at a constant rate $c>0$ and claims are random values.
    

<left>
    <img src="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/insurance-risk-assessment-with-montecarlo-method-using-apache-spark/images/RiskProcess.JPG" width="500" alt="Classical Risk Process"  />
</left>


${F(t)} = u + ct - \sum_{i=1}^{N_t}{\xi_i}$

 where $F(t)$ is the capital of the company at the time moment $t$

# Initialization Spark 

In [15]:
!pip install pyspark

Collecting pyspark
  Downloading pyspark-3.4.0.tar.gz (310.8 MB)
[2K     [38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m310.8/310.8 MB[0m [31m3.9 MB/s[0m eta [36m0:00:00[0mm eta [36m0:00:01[0m[36m0:00:01[0m
[?25h  Preparing metadata (setup.py) ... [?25ldone
[?25hCollecting py4j==0.10.9.7 (from pyspark)
  Downloading py4j-0.10.9.7-py2.py3-none-any.whl (200 kB)
[2K     [38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m200.5/200.5 kB[0m [31m3.0 MB/s[0m eta [36m0:00:00[0m0m eta [36m-:--:--[0m
[?25hBuilding wheels for collected packages: pyspark
  Building wheel for pyspark (setup.py) ... [?25ldone
[?25h  Created wheel for pyspark: filename=pyspark-3.4.0-py2.py3-none-any.whl size=311317131 sha256=79075f1a9367e65816aa0246b7d1ec8d4332424e9a05a44c93f09701ecb44ab3
  Stored in directory: /home/andres/.cache/pip/wheels/9f/34/a4/159aa12d0a510d5ff7c8f0220abbea42e5d81ecf588c4fd884
Successfully built pyspark
Installing collected packages: py

In [16]:
try:
    from pyspark import SparkContext, SparkConf
    from pyspark.sql import SparkSession
except ImportError as e:
    printmd('<<<!!!!! Please restart your kernel after installing Apache Spark !!!!!>>>')

In [20]:
sc = SparkContext.getOrCreate(SparkConf().setMaster("local[*]"))

spark = SparkSession \
    .builder \
    .getOrCreate()

# CALCULATIONS FOR CLASSICAL RISK PROCESS
> # Model parameters
For this calculation the following parameters were considered: INITIAL_CAPITAL,INCOME_INTENSITY,CLAIM_MEAN. 
Where claims are independent and identically distributed non- negative random variables (exponentially distributed claim size with CLAIM_MEAN as positive value) and arriving according to a Poisson procces.

In [47]:
INITIAL_CAPITAL= 1000 #Initial Capital
MAXTIME=24     # Simulation period
INCOME_INTENSITY=50  #Income intensity per time unit
CLAIM_INTENSITY=1  #Time between claims is exponentialy distributed
CLAIM_MEAN=45       #Claims are expomemtialy distributed with CALIM_MEAN > 0
TRAJEC_NUM=1000     #Number of trajectories simulated 

# The Model 

In [26]:
import time 
from operator import add

def bankrupcy(seed):
    random.seed(seed)
    capital=INITIAL_CAPITAL
    time=0
    while(time<MAXTIME and capital>=0):
        time_step=random.expovariate(CLAIM_INTENSITY)
        time+=time_step
        capital+=INCOME_INTENSITY*time_step-random.expovariate(1/CLAIM_MEAN)
    if (capital<0):
        return 1
    else:
        return 0


# Calculations

> Here spark is used to do the massive trajectories simulation using parallelization

In [48]:
ruin_probability=sc.parallelize([time.time()+i for i in range(TRAJEC_NUM)]).map(bankrupcy).reduce(add)/TRAJEC_NUM
print("The company will bankrupt with", ruin_probability, "probability")

The company will bankrupt with 0.002 probability
