Preparing the notebook for using matplotlib and numpy.

In [1]:
#%matplotlib inline # this line is required for the plots to appear in the Jupyter cells, rather than launching the matplotlib GUI
%matplotlib widget 
#this allows interactive view but you need to be in classic rather than CoCalc Jupyter notebook for this to work

import matplotlib

import numpy as np

import matplotlib.pyplot as plt

# Let printing work the same in Python 2 and 3
from __future__ import print_function, division

# notice two underscores _ either side of future



###### PH2150-  Scientific Computing and Employabilty Skills

### Python-Random Numbers (Week 7)

### Dr. Andrew Casey  (a.casey@rhul.ac.uk, W054)


## * Random Numbers and Simulations
## * Monte- Carlo Simulations
<img src="dice.jpg" style="max-width:50%">


Random numbers have many applications in science and computer programming, especially when there are signiﬁcant uncertainties in a phenomenon of interest.

The purpose of this weeks problem sheet is to look at some practical problems involving random numbers and learn how to program with such numbers.


The key idea in computer simulations with random numbers is:

* First to formulate an algorithmic description of the phenomenon we want to study. (This description hopefully maps directly onto a simple and short Python program)

* Use random numbers to mimic the uncertain features of the phenomenon. 

* The program needs to perform a large number of repeated calculations,

* The ﬁnal answers are “only” approximate, but the accuracy can usually be made good enough for practical purposes


###  A random physical quantity can be described as a measurement on a system that is very difﬁcult to describe precisely. Predicting the outcome of complex systems.

* The toss of a coin
* The motion of a turbulent ﬂuid
* Thermal noise from electrons in an metal, Brownian motion etc.

<table><tr>
<td> <img src="tenor.gif" alt="Drawing" style="width: 250px;"/> </td>
<td> <img src="weather.gif" alt="Drawing" style="width: 250px;"/> </td>
</tr></table>


### The connection between randomness and the complexity of a system is exploited in the modern deﬁnition of a random number.

####  “A sequence of numbers $n_1,n_2,n_3$··· is said to be random if the shortest computer program that can produce the sequence is print ($n_1$), print( $n_2$), print ($n_3$) ...”

* In other words there is no algorithm that can generate the sequence that is more succinct than the sequence itself.

### A good random number generator should have the following properties:

* Good uniform distribution. 
* No correlations between successive values.
* Long period before the numbers repeat. 
* Reasonably fast.

### It is also convenient if

* A sequence can be repeated if required.
* It can generate a sequence guaranteed to be different. 
* It is portable to other machines.


### Python uses the <font color='red'>*Mersenne Twister*</font> as the core generator in the library *numpy.random*.

* The Mersenne Twister is a pseudorandom number generator (PRNG). 
* It produces 53-bit precision floats and has a period of $2^{19937}-1$. 
The underlying implementation in C is fast.

The <font color='red'>*Mersenne Twister*</font> is one of the most extensively tested random number generators in existence. However, being completely deterministic, it is not suitable for all purposes, and is completely unsuitable for cryptographic purposes.


### Let's take a look at some of the most useful functions in the *numpy.random* library

In [2]:
from numpy import random
print(dir(random))

['BitGenerator', 'Generator', 'MT19937', 'PCG64', 'Philox', 'RandomState', 'SFC64', 'SeedSequence', '__RandomState_ctor', '__all__', '__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__path__', '__spec__', '_bit_generator', '_bounded_integers', '_common', '_generator', '_mt19937', '_pcg64', '_philox', '_pickle', '_sfc64', 'absolute_import', 'beta', 'binomial', 'bytes', 'chisquare', 'choice', 'default_rng', 'dirichlet', 'division', 'exponential', 'f', 'gamma', 'geometric', 'get_state', 'gumbel', 'hypergeometric', 'laplace', 'logistic', 'lognormal', 'logseries', 'mtrand', 'multinomial', 'multivariate_normal', 'negative_binomial', 'noncentral_chisquare', 'noncentral_f', 'normal', 'pareto', 'permutation', 'poisson', 'power', 'print_function', 'rand', 'randint', 'randn', 'random', 'random_integers', 'random_sample', 'ranf', 'rayleigh', 'sample', 'seed', 'set_state', 'shuffle', 'standard_cauchy', 'standard_exponential', 'standard_gamma', 'standard

In [3]:
print(help(random.random)) #the function random in the library numpy.random

for i in range(10):
    print(random.random())
    
print(random.random(size=5)) # returns a list of 5 random floating point numbers in the interval 0.0-1.0

Help on built-in function random:

random(...) method of numpy.random.mtrand.RandomState instance
    random(size=None)
    
    Return random floats in the half-open interval [0.0, 1.0). Alias for
    `random_sample` to ease forward-porting to the new random API.

None
0.2938201099118306
0.5779283651520093
0.48693158805701087
0.3464010155898074
0.9153640157317657
0.39924608390677985
0.7376340774050798
0.499816348991316
0.8772129355502017
0.15357465153535577
[0.4821537  0.90434776 0.23293767 0.33404963 0.1126283 ]


In [19]:
print(help(random.randn))
print(random.randn(4,4)) # returns a (4,4) array of random numbers between 0.0 and 1.0

Help on built-in function randn:

randn(...) method of numpy.random.mtrand.RandomState instance
    randn(d0, d1, ..., dn)
    
    Return a sample (or samples) from the "standard normal" distribution.
    
    .. note::
        This is a convenience function for users porting code from Matlab,
        and wraps `standard_normal`. That function takes a
        tuple to specify the size of the output, which is consistent with
        other NumPy functions like `numpy.zeros` and `numpy.ones`.
    
    .. note::
        New code should use the ``standard_normal`` method of a ``default_rng()``
        instance instead; see `random-quick-start`.
    
    If positive int_like arguments are provided, `randn` generates an array
    of shape ``(d0, d1, ..., dn)``, filled
    with random floats sampled from a univariate "normal" (Gaussian)
    distribution of mean 0 and variance 1. A single float randomly sampled
    from the distribution is returned if no argument is provided.
    
    Paramete

In [4]:
print(help(random.randint))

print('Result of randint(a,b)=',random.randint(3,21)) # random integer in range a,b
print('Result of randint(high, size=number)',random.randint(5,size=10))# returns an array of random integers between 0 and high with size =10


Help on built-in function randint:

randint(...) method of numpy.random.mtrand.RandomState instance
    randint(low, high=None, size=None, dtype=int)
    
    Return random integers from `low` (inclusive) to `high` (exclusive).
    
    Return random integers from the "discrete uniform" distribution of
    the specified dtype in the "half-open" interval [`low`, `high`). If
    `high` is None (the default), then results are from [0, `low`).
    
    .. note::
        New code should use the ``integers`` method of a ``default_rng()``
        instance instead; see `random-quick-start`.
    
    Parameters
    ----------
    low : int or array-like of ints
        Lowest (signed) integers to be drawn from the distribution (unless
        ``high=None``, in which case this parameter is one above the
        *highest* such integer).
    high : int or array-like of ints, optional
        If provided, one above the largest (signed) integer to be drawn
        from the distribution (see above fo

In [5]:
import random as rdn # there is another library (not in numpy called random)
print(help(rdn.gauss))
print('random number with a Gaussian distribution around 10 with a standard deviation of 1',rdn.gauss(10,1.0))

nums = []  
mu = 100
sigma = 50
    
for i in range(10000):  
    temp = rdn.gauss(mu, sigma)  
    nums.append(temp)  
        
# plotting a graph  
fig1 = plt.figure(figsize=(10,10))
ax1=fig1.add_subplot(1, 1, 1)
ax1.hist(nums, bins = 200)  
ax1.set_ylabel('Counts')
ax1.set_xlabel('value')


Help on method gauss in module random:

gauss(mu, sigma) method of random.Random instance
    Gaussian distribution.
    
    mu is the mean, and sigma is the standard deviation.  This is
    slightly faster than the normalvariate() function.
    
    Not thread-safe without a lock around calls.

None
random number with a Gaussian distribution around 10 with a standard deviation of 1 9.530249500959288


Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

Text(0.5, 0, 'value')

Almost all module functions depend on the basic function *random()*, which generates a random float uniformly in the semi-open range $[0.0, 1.0)$. 

* Differential equations with analytic solutions are the exceptions rather than the rule, there are lots of non-integrable functions and next week we will look at how we can tackle these with numerical techniqes.

* However the classical way to solve a problem of numerical analysis relies on a rigorous algorithm, which, for a given input, provides a well-defined solution in a predetermined number of steps. Such an approach is essentially deterministic.

* For numerous topical problems in applied sciences and engineering, the complexity of deterministic algorithms renders the computations intractable within a reasonable amount of time. 

Certain areas of modern sciences are concerned with systems composed of a huge number of coupled components, which can be subject to fluctuations. Their characterisation is often accomplished by means of high-dimensional integrals. A typical example is the classical canonical partition function of a system of N interacting particles,

![image-3.png](attachment:image-3.png)

where $E(r_1 ··· r_N)$ is the total energy of the system
$r_i$ the position of particle $i$,
$T$ is the system temperature,
$k_\rm{B}$, the Boltzmann constant.

The evaluation of this 3N-dimensional integral by any of the classical quadrature methods can be completely ruled out even for the lowest particle numbers.

For $N = 20$ and 10 integration points, the number of required operations is on the order of $10^{60}$.

Even using the latest petascale supercomputers, which are capable of more than $10^{16}$ floating point operations per second, the computation would require approximately 3 x $10^{36}$ years! (age of the universe 13.77 billion years)




### Monte carlo methods

* Alternative to the deterministic approaches for complex high-dimensional problems are the so-called stochastic methods, based on the law of large numbers from the probability theory.
 
* Here, the quantities of interest are defined as expectation values of random variables or, in other words, the average values of large sequences of random variables are considered under certain assumptions probabilistic estimates of the sought-for quantities. 

* Such techniques, generically referred to as *Monte Carlo* methods, have an intrinsically non-deterministic character, since they use the outcome of stochastic experiments and, within statistical errors, they exhibit different behaviours on different runs.

Essentially, instead of deterministically covering the domains of the involved functions, the Monte Carlo methods sample these randomly. In general, stochastic techniques do not require genuine random numbers, but rather pseudorandom sequences, nevertheless with a high degree of uniformity and low sequential correlations.

As you have seen from the quantum mechanics course, there are many properties that are intrinsically probabilistic, such as the pdf of an electron.
![image-3.png](attachment:image-3.png)



### A Monte Carlo method:
Is a technique that involves using random numbers and probability to solve problems. The term Monte Carlo Method was coined by *Stanislaw Ulam and Nicholas Metropolis* in reference to games of chance played in casinos in Monte Carlo, Monaco (Metropolis and Ulam,1949).

### Monte Carlo simulation
Is a method for iteratively evaluating a deterministic model using sets of random numbers as inputs. A simulation can typically involve over 10,000 evaluations of the model, and became more widespread after the development of fast computers.

![image-2.png](attachment:image-2.png)

**Figure:** Stanislaw Ulam and FERMIAC, mechanical device designed by Enrico Fermi to run Monte Carlo simulations of neutron transport during the Manhattan project.


### For PS6: A monte Carlo method to approximate pi

![image.png](attachment:image.png)


![image.png](attachment:image.png)

![image-2.png](attachment:image-2.png)

* The ratio between the area of the circle inscribed in a square and the square is given by:

$$\pi r^2 / 4 r^2$$

* Use random numbers to select a coordinate (throw a dart in the board).

* Use a rule to decide if the dart is inside or outside the circle. And add to the count.

* Repeat many times.
* Compare the inside/outside count to approx. pi


### A Monte Carlo method to integrate

We have just worked out the area inside the circle, so have used MC simulations to integrate.

* Given an arbitrary function, *f(x)* , you can determine its integral numerically using the “Accept/Reject Method”
* First, find the maximum value of the function, *Fmax*.
* Second, enclose the function in a box, *h(x)*, whose height is *Fmax* and whose length encloses as much of *f(x)* as is possible.
* Third, compute the area of the box.
* Finally randomly sample, and record the ratio of under/over the function.


![image.png](attachment:image.png)

## An example of a pandas datareader.

* Monte Carlo in finance, used to predict risk.
* There are many web-based repositories of historical financial data. 
* These repositories can be accessed via data readers such as Pandas-datareader.

* Functions from pandas_datareader.data and pandas_datareader.wb extract data from various Internet sources into a pandas DataFrame. 

In this example I use *Quandl*, free to register an account at https://www.quandl.com/



In [6]:
#import packages
import numpy as np
import math
import matplotlib.pyplot as plt
import os
#from scipy.stats import norm
from datetime import datetime, date

import pandas as pd
pd.core.common.is_list_like = pd.api.types.is_list_like
from pandas_datareader import data

start=None
end=None
#download Apple price data into DataFrame from Quandl


apple = data.DataReader('AAPL', 'quandl',start,end,api_key = 'oksnv3z1CyJs65wVUz1q') # requires an access key (free to get from website)
apple

Unnamed: 0_level_0,Open,High,Low,Close,Volume,ExDividend,SplitRatio,AdjOpen,AdjHigh,AdjLow,AdjClose,AdjVolume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
2018-03-27,173.68,175.15,166.920,168.340,38962839.0,0.00,1.0,173.680000,175.150000,166.920000,168.340000,38962839.0
2018-03-26,168.07,173.10,166.440,172.770,36272617.0,0.00,1.0,168.070000,173.100000,166.440000,172.770000,36272617.0
2018-03-23,168.39,169.92,164.940,164.940,40248954.0,0.00,1.0,168.390000,169.920000,164.940000,164.940000,40248954.0
2018-03-22,170.00,172.68,168.600,168.845,41051076.0,0.00,1.0,170.000000,172.680000,168.600000,168.845000,41051076.0
2018-03-21,175.04,175.09,171.260,171.270,35247358.0,0.00,1.0,175.040000,175.090000,171.260000,171.270000,35247358.0
...,...,...,...,...,...,...,...,...,...,...,...,...
2015-11-10,116.90,118.07,116.061,116.770,59127931.0,0.00,1.0,112.942781,114.073175,112.132182,112.817181,59127931.0
2015-11-09,120.96,121.81,120.050,120.570,33871405.0,0.00,1.0,116.865344,117.686571,115.986149,116.488546,33871405.0
2015-11-06,121.11,121.81,120.620,121.060,33042283.0,0.00,1.0,117.010267,117.686571,116.536854,116.961959,33042283.0
2015-11-05,121.85,122.69,120.180,120.920,39552680.0,0.52,1.0,117.725217,118.536781,116.111748,116.826698,39552680.0


In [7]:
# makes some figures to display the data
fig3 = plt.figure(figsize=(10,10)) 
ax5 = fig3.add_subplot(1, 1, 1)

prices=apple['Close']
ax5.plot(prices.index,prices) # prices.index contains the date-time format 
ax5.set_title('Apple Historical Close of Day share price')
ax5.set_ylabel('Share Price Close')
ax5.set_xlabel('Date')


plt.show()


Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

In [8]:
print((apple.index[0] - apple.index[-1]))
print((apple.index[0] - apple.index[-1]).days)

874 days 00:00:00
874


In [9]:
fig4 = plt.figure(figsize=(10,10))
ax6 = fig4.add_subplot(2, 1, 1)
ax7 = fig4.add_subplot(2, 1, 2)

#calculate the compound annual growth rate (CAGR) which will give us our mean return input (mu) 
days = (apple.index[0] - apple.index[-1]).days
cagr = ((((apple.Close[0]) / apple.Close[-1])) ** (365.0/days)) - 1
print ('CAGR =',str(round(cagr,4)*100)+"%")
mu = cagr
 
#create a series of percentage returns and calculate the annual volatility of returns
apple['Returns'] = apple.Close.pct_change() # this creates a new column in the pandas dataframe "apple"
vol = apple['Returns'].std()*math.sqrt(252) # 252 is based on an mean of 252 trading days per year
print ("Annual Volatility =",str(round(vol,4)*100)+"%")

 
#set up empty list to hold our ending values for each simulated price series
result = []
 
#Define Variables
S = apple.Close[0] #starting stock price (i.e. last available real stock price)
T = 252 #Number of trading days
#mu = 0.2309 #Return
#vol = 0.4259 #Volatility
 
#choose number of runs to simulate - I have chosen 10,000
for i in range(10000):
    #create list of daily returns using random normal distribution
    daily_returns=np.random.normal(mu/T,vol/math.sqrt(T),T)+1
 
    #set starting price and create price series generated by above random daily returns
    price_list = [S]
 
    for x in daily_returns:
        price_list.append(price_list[-1]*x)
 
    #plot data from each individual run which we will plot at the end
    ax6.plot(price_list)
 
    #append the ending value of each simulated run to the empty list we created at the beginning
    result.append(price_list[-1])
 
#show the plot of multiple price series created above
ax6.set_title('10,000 predictions of share price based on \n daily returns using historical data to set parameters')
ax6.set_ylabel('Share Price Close')
ax6.set_xlabel('Trading Days')

 
#create histogram of ending stock values for our mutliple simulations

#use numpy mean function to calculate the mean of the result, 
print(round(np.mean(result),2))
print("5% quantile =",np.percentile(result,5))
print("95% quantile =",np.percentile(result,95))

ax7.hist(result,bins=100)
ax7.set_title('Histogram of predicted share price after 252 trading days')
ax7.set_xlabel('Share price')
ax7.set_ylabel('Counts (out of 10,000)')
ax7.annotate('5% percentile={}'.format(np.percentile(result,5)),(500,400))
ax7.annotate('95% percentile={}'.format(np.percentile(result,95)),(500,300))
ax7.axvline(np.percentile(result,5), color='r', linestyle='dashed', linewidth=2)
ax7.axvline(np.percentile(result,95), color='r', linestyle='dashed', linewidth=2)

Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

CAGR = 14.39%
Annual Volatility = 21.7%
194.01
5% quantile = 131.37948198201698
95% quantile = 269.97750877374295


<matplotlib.lines.Line2D at 0x7f6d154238e0>

In [7]:
days



874