### Numpy Exercises Solution


#### 1. Random walk simulation

Simulate an 1000-step random walk exercise where a person starting from 0 takes one step to the right(+1) or to the left(-1) with equal probability. 

Find the max distance reached from the origin. Carry out 100 trails of this process and find the mean of the max distance of all trails.

Useful functions in numpy : random.randint , cumsum, where, abs 

In [1]:
import numpy as np
steps = 1000
num_trail = 100

def trail(steps):
    #First generate n-steps sequence, simulating the random walk
    step_sequence = np.random.choice([-1,1],steps)
    #Total distance traversed at the end of each iteration is the cumulative sum of the 
    #above random walk steps.
    iteration_distance = step_sequence.cumsum()  
    max_dist = np.max(np.abs(iteration_distance))  
    return max_dist

trail_result = [trail(1000) for _ in range(num_trail)] #max distance in each trail

In [2]:
print("Mean of the max distance of all trails",np.mean(trail_result))

Mean of the max distance of all trails 40.48


#### 2. Sides of a triangle

Consider a stick of length 10units.
Write a program having two functions, one to randomly select two points on the stick at which to break and another function to determine if the pieces can form the sides of a triangle. Carry out 100 trails and report the probability of finding the right pieces that can form the sides of the triangle

In [3]:
import numpy as np

#This problem is scale invariant.
#Hence, without the loss of generality we can assume a unit-length stick
def stick_pieces():    
    x,y=np.random.rand(2)
    if x>y:x,y=y,x
    return x,y-x,1-y

def triangle_check(x,y,z):
    return (x+y>z and y+z>x and z+x>y)

trails = 100
result = [triangle_check(*(stick_pieces())) for _ in range(trails)]
print("Probability:",sum(result)/trails)

Probability: 0.15


#### 3. Random number generation

Generate 10 random numbers in the interval [0,1] and obtain an array $X$. Generate another array $Y$ such that $Y[i]$ is 1 if $X[i]\ge 0.5$ and 0 otherwise. Compare the two arrays by stacking them. In other words, create a 2x10 matrix and print it.

In [4]:
import numpy as np
X = np.random.rand(10)

# Generate the array Y 
Y = np.where(X>=0.5,1,0)
#code to stack 
print(np.vstack((X,Y)))

[[0.0557821  0.8875212  0.96943179 0.12478569 0.23387767 0.54571418
  0.42247957 0.46936522 0.08743738 0.90397887]
 [0.         1.         1.         0.         0.         1.
  0.         0.         0.         1.        ]]


#### 4. Pearson's correlation coefficient

Given two arrays $X,Y$, correlation is a measure of linear dependence of values of one array on the other. For example, heights and weights of a group of people are correlated. Implement the below formula for correlation and verify the result using numpy's corrcoef method.

This is given by $\frac{\sum(X-X_{mean})(Y-Y_{mean})}{\sqrt{\sum(X-X_{mean})^2\sum(Y-Y_{mean})^2}}$  . 


In [5]:
X = np.random.randint(0,10,20)
Y = X + np.random.rand(20)

X_mean = np.mean(X)
Y_mean = np.mean(Y)

exp = np.sum((X-X_mean)*(Y-Y_mean))
std = np.sqrt(np.sum((X-X_mean)**2) * np.sum((Y-Y_mean)**2))
pcc = exp/std
print(pcc)
print(np.corrcoef(X,Y))

0.9936573433093854
[[1.         0.99365734]
 [0.99365734 1.        ]]


#### 5. Mean & Variance of crude oil prices
Read the price of the two varieties of crude oil from the file <code>spot_prices_crude</code>. Find the mean & the standard deviation of the two types of crude oil. 

Examine the contents of the csv file, there are some dates for which the price is not known, mean calculation should handle such cases.

Useful functions in numpy: isnan, genfromtxt, nan_to_num, mean, std

In [6]:
import numpy as np
data = np.genfromtxt('./misc/spot_prices_crude.csv',delimiter=";")
data=data[1:,1:]
crude_data=np.nan_to_num(data)

crude1_data = crude_data[~np.isnan(data[:,0])][:,0]
crude2_data = crude_data[~np.isnan(data[:,1])][:,1]

print("Crude1_mean:",np.mean(crude1_data),",Crude2_mean:",np.mean(crude2_data))
print("Crude1_SD:",np.std(crude1_data),",Crude2_SD:",np.std(crude2_data))

Crude1_mean: 45.91709681528662 ,Crude2_mean: 43.68011373260738
Crude1_SD: 33.14636993225907 ,Crude2_SD: 29.61836508539748
