<p style="text-align:center">
    <a href="https://skills.network/?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkGuidedProjectsinsuranceriskassessmentwithmontecarlomethodusingapachespark521-2023-01-01">
    <img src="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/assets/logos/SN_web_lightmode.png" width="200" alt="Skills Network Logo"  />
    </a>
</p>


# **Insurance Risk assessment with Monte Carlo method using Apache Spark**


Estimated time needed: **45** minutes


In this lab, you will learn about Monte Carlo method and calculate the ruin probability of an insurance company using Apache Spark by means of parallel Monte Carlo method.


## Prerequisites 
* Basic Apache Spark knowledge.
* Basic Probability theory knowledge.
* Basic Python knowledge. 


## Objectives


After completing this lab, you will be able to:


* Understand and apply Monte Carlo method.
  * Get the basic idea about the accuracy of the method
  * Calculate the "Pi" number by means of the method
  * Parallelize the method using Apache Spark
* Understand the Ruin probability of an insurance company.
* Calculate the Ruin Probability using Monte Carlo method.



## MONTE CARLO METHOD


The idea of the method is quite simple - if you want to estimate the probability of some random event (for example, you are curious about the probability of a coin landing on heads or tails), just repeat your experiment many times (toss the coin). Now, for example, to get the probability of the coin landing on "obverse" just divide the number of times you have got "obverse" by the total number of tosses. More experiments - more accuracy, due to the law of large numbers.

Monte Carlo method was invented in 1930-40 by such bright minds as Enrico Fermi, Stanislav Ulam, and John von Neumann while working on nuclear weapons projects at Los Alamos National Laboratory. The power of the method is that it allows us to calculate probabilities that are difficult or impossible to calculate by other methods. 

**Useful Links**

**Monte-Carlo Method**:  https://en.wikipedia.org/wiki/Monte_Carlo_method

**Law of large numbers**:  https://en.wikipedia.org/wiki/Law_of_large_numbers


## Example 1. Calculating pi value by means of Monte Carlo method


Let's do a simple experiment and estimate the value of the pi constant using Monte-Carlo method. The idea is simple - we draw a unit square with an inner circle and then generate a large number of random points uniformly distributed within the square. 	




<left>
    <img src="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/insurance-risk-assessment-with-montecarlo-method-using-apache-spark/images/circle.jpg" width="500" alt="MC Pi sampling">
</left>


Now let's see, how it works!



 $\frac{{Surface\;of\;the\;circle}}{{Surface\;of\;the\;square}} \approx \frac{{Number\;of\; points\; in\; the\; circle}}{{Total\; number\; of\; points}} $


Knowing that the surface of the circle is $\pi*R^2$ and the radius of our circle is $R=0.5$ we get $\pi \approx 4 \frac{{Number\;of\; points\; in\; the\; circle}}{{Total\; number\; of\; points}} $  


The error, according to Höeffding inequality, is proportional to $\frac{{1}}{{\sqrt{Total\; number\; of\; points}}}$


Importing libraries, setting constants.

**Library documentation** https://docs.python.org/3/library/random.html


In [1]:
import random   
N = 1000000 #Total number of points generated

Now, let's start our simulations. Please replace ##YOUR CODE GOES HERE## with your code. Your goal is to check if our randomly generated point with (x,y) coordinates is in the circle with radius R=1 and the center at (0,0). Hint: use the Pythagorean Theorem.

**Pythagorean Theorem** https://en.wikipedia.org/wiki/Pythagorean_theorem



In [2]:
bingo = 0 # Number of points in the circle

for i in range(N):
  # Generate the coortidates of the point with uniform distribution within the square. 
    x= random.uniform(-0.5, 0.5)
    y= random.uniform(-0.5, 0.5)
  
    # Checking if we've hit the circle (hint: use Pythagorean Theorem)
    if (x**2 + y**2)<= 0.25:
        bingo+= 1 # Bingo! We did it!
  
    # Estimating value of pi,
    pi = 4* bingo/N

  
print("Approximate value of Pi is ", pi)    

Approximate value of Pi is  3.14084


## RUIN PROBABILITY CALCULATION


Now let's do some serious stuff and calculate the ruin probability of an insurance company, modeled by the Classical Risk Process.


## The Classical Risk Process


The Classical Risk Process describes an insurance company with an initial capital $u$ through incoming cash premiums and outgoing claims. Premiums arrive at a constant rate $c>0$ and claims are random values.


<left>
    <img src="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/insurance-risk-assessment-with-montecarlo-method-using-apache-spark/images/RiskProcess.JPG" width="500" alt="Classical Risk Process">
</left>


 ${\xi _t} = u + ct - \sum\nolimits_{k = 1}^{{N_t}} {{z_k}}$


where  ${\xi _t}$ is the capital of the insurance company at the time moment $t$ .


The sum of insurance claims ${{z_k}}$ that  arrive according to a Poisson process $N_{t}$ with intensity $\lambda$ before the time moment  $t$ and are independent and identically distributed non-negative random variables (for simplicity, we consider an exponentially distributed claim size with a positive mean).


Now we will calculate the Ruin Probability of the company on the finite time interval with crude Monte Carlo method. To do this, we will simulate random trajectories and calculate the fraction of those leading the company to ruin (capital turned negative). 

**Ruin Theory** https://en.wikipedia.org/wiki/Ruin_theory


## Initialization/loading Spark


In [3]:
conda install pyspark

Retrieving notices: ...working... done
Collecting package metadata (current_repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Solving environment: failed with repodata from current_repodata.json, will retry with next repodata source.
Collecting package metadata (repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /home/jupyterlab/conda/envs/python

  added / updated specs:
    - pyspark


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    ca-certificates-2023.12.12 |       h06a4308_0         126 KB
    certifi-2022.12.7          |   py37h06a4308_0         150 KB
    openssl-1.1.1w             |       h7f8727e_0         3.7 MB
    py4j-0.10.7                |           py37_0         241 KB
    pyspark-2.4.5              |             py_0       198.8 MB
    ---------------------------------------

In [4]:
try:
    from pyspark import SparkContext, SparkConf
    from pyspark.sql import SparkSession
except ImportError as e:
    printmd('<<<<<!!!!! Please restart your kernel after installing Apache Spark !!!!!>>>>>')

In [5]:
sc = SparkContext.getOrCreate(SparkConf().setMaster("local[*]"))

spark = SparkSession \
    .builder \
    .getOrCreate()

23/12/26 02:46:21 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).


## CALCULATIONS FOR CLASSICAL RISK PROCESS




**MODEL PARAMETERS**


We consider risk process with INITIAL_CAPITAL, with premium arrive at a constant INCOME_INTENSITY rate, claims are independent and identically distributed non-negative random variables (here we consider an exponentially distributed claim size with a positive CLAIM_MEAN) and arrive according to a Poisson process with INCOME_INTENSITY rate. 


In [6]:
INITIAL_CAPITAL = 10  # initial capital
MAXTIME = 10          # simulation period
INCOME_INTENSITY = 1  # income intensity per time unit
CLAIM_INTENSITY = 1   # time between claims is expontntialy distributed 
CLAIM_MEAN = 1        # claims are exponentialy distributed with CLAIM MEAN, should be >0
TRAJEC_NUM = 1000      # number of trajectories simulated

**THE MODEL**


Now let's set up our model. Please, define the capital variation by replacing ##YOUR CODE GOES HERE## with your code.


In [10]:
import random
import time
from operator import add

def bankrupcy(seed):
    random.seed(seed)
    capital = INITIAL_CAPITAL
    time = 0
    while (time < MAXTIME)and(capital>=0):
      time_step=random.expovariate(CLAIM_INTENSITY)
      time+=time_step
      capital += INCOME_INTENSITY * time_step - random.expovariate(1/CLAIM_MEAN)
    if (capital<0):
      return 1 
    else: 
      return 0

**CALCULATIONS**


Using Apache Spark parallelization for massive trajectories simulation.

**Parallelize method:** https://sparkbyexamples.com/pyspark/pyspark-parallelize-create-rdd/


In [11]:
ruin_probability =sc.parallelize([time.time() + i for i in range(TRAJEC_NUM)]).map(bankrupcy).reduce(add)/TRAJEC_NUM
print("Our company will bunkrupt with", ruin_probability, "probability")

[Stage 0:>                                                          (0 + 8) / 8]

Our company will bunkrupt with 0.039 probability


                                                                                

**CONCLUSIONS**


Now, when you are familiar with Monte Carlo method, please do not forget that the main feature of this method is that you can apply it to any Generalized Risk Process you need, with no limits at all. Non-linear income, non-Poisson claim arrival, dividend withdrawal, discreet time, etc.


**Example of Generalized Risk Process**


<left>
    <img src="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/insurance-risk-assessment-with-montecarlo-method-using-apache-spark/images/GRiskProcess.JPG" width="500" alt="Generalized Risk Process">
</left>


## Authors


[Bogdan Norkin](https://www.researchgate.net/profile/Bogdan-Norkin?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkGuidedProjectsinsuranceriskassessmentwithmontecarlomethodusingapachespark521-2023-01-01)


Copyright &copy; 2021 IBM Corporation. This notebook and its source code are released under the terms of the [MIT License](https://cognitiveclass.ai/mit-license/?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkGuidedProjectsinsuranceriskassessmentwithmontecarlomethodusingapachespark521-2023-01-01).
