**Timing processes**

It is always a good idea to have a clear understanding of where inefficiencies in your code may exist, and while it is often possible to do this by examining your code carefully, timing your code is important too because what the interpreter does behind the scenes when you actually run code can sometimes be misleading.

In Python there are a couple of options available.
- use the **time.perf_counter** function from the time package
- use the **timeit** package

**time.perf_counter**

Using the time.perf_counter function is easier to work with, but a drawback is that it is not ideal for isolating the work actually carried out in Python from other processes.

**timeit**

The timeit pacakge allows for timing snippets of code and also provides the capability of testing repeated runs of the same code.
This is considered more accurate.

Here is a reference: https://www.analyticsvidhya.com/blog/2020/09/timeit-in-python/

We illustrate these approaches comparing 

- the process of generating invidual numpy random numbers 
- the process of generating an array of numpy random numbers

In [1]:
import time
import numpy as np

start_time=time.perf_counter()
n=100000
for i in range(n):
    x=np.random.uniform(0.,1.)
end_time=time.perf_counter()
print(end_time-start_time)

0.2612936999939848


In [2]:
import time
import numpy as np

start_time=time.perf_counter()
n=100000
L=list(np.random.uniform(0,1,n))
end_time=time.perf_counter()
print(end_time-start_time)

0.008243799995398149


In [3]:
216/4

54.0

We see here how important it can be to generate our 
random numbers all at once instead of one at a time.

Of course, in an application that would require knowing *a priori* how many random numbers we would need.

**timeit**

The timeit package can be used to time a snippet of code.

The code snippet needs to be a quoted string.

Here we use code between lines with ''' to store that string. This allows for the easy creation of a string that uses multiple lines.

In [4]:
import timeit

setup = '''
import numpy as np
'''
# code snippet whose execution time is to be measured
mycode = '''
n=100000
for i in range(n):
    x=np.random.uniform(0.,1.)
'''
 

niters=10
print(timeit.timeit(setup = setup,
                     stmt = mycode,
                     number = niters)/niters)

setup = '''
import numpy as np
'''
# code snippet whose execution time is to be measured
mycode = '''
n=100000
x=np.random.uniform(0.,1.,n)
'''

niters=10
print(timeit.timeit(setup = setup,
                     stmt = mycode,
                     number = niters)/niters)

0.3368696999998065
0.001066920001176186


In [5]:
.219/.00093

235.48387096774192

**Comparing two methods for generating normal random variables**

**The Polar Method**

Here, we use the following basic fact. If $X$ and $Y$ are independent $N(0,1)$ random variables, if we let the polar coordinates of $(X,Y)$  be $(R,\Theta)$ then $R^2 \sim \chi^2_2,$ and $\Theta  \sim \mbox{Uniform}(0,2\pi)$ and $R$ and $\Theta$ are independent. The $\chi^2_2$ distribution has cdf $F(x) = 1-\exp(-x/2)$ for $x>0.$ So sampling of this distribution can be done using the inversion method. $F^{-1}(u) =- 2 \log(1-u).$ Once we know $R^2$ we can get $R$ and then take $X = R \cos(\Theta)$ and $X = R \cos(\Theta)$ where $\Theta$ is independent of $R$ and uniformly distributed in the interval $(0,2\pi).$ 

In [6]:
import numpy as np
#
# returns two independent standard normals
#
def polar_method():
    u=np.random.uniform()
    rsq=-2*np.log(1-u)
    r=np.sqrt(rsq)
    theta=2*np.pi*np.random.uniform()
    x=r*np.cos(theta)
    y=r*np.sin(theta)
    return((x,y))


To create a list of N (N even) values, we can use the polar method N/2 times and the flattening idea.

In [7]:
L=[polar_method() for n in range(5)]
print(L)
M=[x for y in L for x in y]
print(M)

[(0.025810833682290982, -0.6268940676438689), (-0.7025879739128639, -0.7675883931674927), (1.5587063450578684, 0.2793744301370364), (0.16689169922720287, 0.3593798307191136), (-1.0943467854042754, -0.7924030833694682)]
[0.025810833682290982, -0.6268940676438689, -0.7025879739128639, -0.7675883931674927, 1.5587063450578684, 0.2793744301370364, 0.16689169922720287, 0.3593798307191136, -1.0943467854042754, -0.7924030833694682]


Or more compactly

In [8]:
L=[x for n in range(5) for x in polar_method()]
print(L)

[0.8920178097422555, 0.8248172242500842, -0.15627650541964555, 1.1966123008809753, 0.13100574008663082, -0.14746727031521145, 1.493218869464816, -0.7775453749055458, -1.113880500720823, -0.8104054539138779]


Quick check that this worked. Do we get approximately the right proportion of values whose absolute value is less than 1?

In [9]:
N=50000
len([x for n in range(N) for x in polar_method() if np.abs(x)<1])/(2*N)

0.68373

For the rejection method, the normal pdf 

$$ f(x) = {1\over \sqrt{2\pi}} e^{-\frac{1}{2}x^2}$$

is bounded above by the scaled double-exponential pdf i.e.

$$ f(x) \leq c g(x)$$

where

$$ g(x) = \frac{1}{2} e^{-\vert x \vert} $$

is the double-exponential pdf, and

$$ c = 2 \sqrt{\frac{e}{2 \pi}} \approx 1.315 $$

This leads to the following algorithm. 

> Repeat:

> > Generate X having an Exponential(1) distribution

> > Generate Y uniform in the interval [0,c*g(X)]

> Until Y < f(X)

> Change the sign of X with probability 1/2

> Return(X)

In [10]:
import plotly.graph_objects as go
import numpy as np
np.random.seed(1)

T=np.linspace(-3,3,1000)

# std normal pdf
def f(x):
    return((1/np.sqrt(2*np.pi))*np.exp(-.5*x*x))
fnew=np.frompyfunc(f,1,1)
y=fnew(T)

# scaled double exponential pdf
def cg(x):
    c=2*np.sqrt(.5*np.e/np.pi)
    g=.5*np.exp(-np.abs(x))
    return(c*g)
cgnew=np.frompyfunc(cg,1,1)
z=cgnew(T)

# Create traces
fig = go.Figure()
fig.add_trace(go.Scatter(x=T, y=y,
                    line=dict(color='red', width=.75),
                    mode='lines',
                    name='std normal pdf'))
fig.add_trace(go.Scatter(x=T, y=z,
                    line=dict(color='blue', width=.75),
                    mode='lines',
                    name='scaled double exponential pdf'))

fig.update_layout(title=dict(text="Bounding the Normal PDF using a Double Exponential PDF",x=.5,font_size=20))
fig.update_layout(xaxis=dict(title="t",title_font_size=15))
fig.update_layout(yaxis=dict(title="",title_font_size=15))
fig.update_layout(showlegend=True)
fig.show()

In [11]:
#
# rejection method 
#
def rejection_method():
    c=2*np.sqrt(np.e/(2.*np.pi))       
    d=1./np.sqrt(2*np.pi)
    while True:
        u=np.random.uniform()
        x=-np.log(u)
        y=c*.5*np.exp(-np.abs(x))
        v=np.random.uniform()
        if y*v<d*np.exp(-.5*x*x): # acceptance test
            w=np.random.uniform()
            if w<.5:
                return(x)
            else:
                return(-x)

In [12]:
N=100000
L=[rejection_method() for n in range(N)]
len([x for x in L if np.abs(x)<1])/N

0.68277

**Comparing the methods**

We now have two methods for sampling from the standard normal distribution. Which is more efficient?

In [8]:
import time
import numpy as np

N=100000
M=50000
process_time_start = time.process_time()
L=[x for n in range(M) for x in polar_method()]
process_time_end=time.process_time()
timediff1=process_time_end-process_time_start
print("process_time time difference = "+str(timediff1))

process_time_start = time.process_time()
L=[rejection_method() for n in range(N)]
process_time_end=time.process_time()
timediff2=process_time_end-process_time_start
print("process_time time difference = "+str(timediff2))

process_time time difference = 0.09375
process_time time difference = 0.125


In [None]:
#
# revised rejection method 
#
def rejection_method():
    cd=np.sqrt(np.e)/np.pi
    while True:
        u=np.random.uniform()
        x=-np.log(u)
        y=np.exp(-np.abs(x))
        v=np.random.uniform()
        if y*v<cd*np.exp(-.5*x*x):
            w=np.random.uniform()
            if w<.5:
                return(x)
            else:
                return(-x)

In [None]:
N=100000
L=[rejection_method() for n in range(N)]
len([x for x in L if np.abs(x)<1])/N

In [None]:
import time
import numpy as np

N=100000
M=50000
process_time_start = time.process_time()
L=[x for n in range(M) for x in polar_method()]
process_time_end=time.process_time()
timediff1=process_time_end-process_time_start
print("process_time time difference = "+str(timediff1))

process_time_start = time.process_time()
L=[rejection_method() for n in range(N)]
process_time_end=time.process_time()
timediff2=process_time_end-process_time_start
print("process_time time difference = "+str(timediff2))