
---

### Repeated operations


<div class="alert alert-block alert-info">
    
1. Create two vectors called ``vec1`` and ``vec2`` that each contain 50,000 random draws from the integer numbers between 1 and 100. Make sure the "random" operation produces the same results when the code is re-evaluated.
</div>

In [9]:
# Load all important packages for this assignment:
import pandas as pd
import numpy as np
import time

In [10]:
# I use set.seed() to produce the same results when we rerun the code:
np.random.seed(123)

# Now I create two vectors with each containing of 50,000 random integers from
# a range between 1 and 100
vec1 = np.random.randint(1,100, size = 50000)
vec2 = np.random.randint(1,100, size = 50000)

# The vectors look like this:
print(vec1[0:9])
print(vec2[0:9])

[67 93 99 18 84 58 87 98 97]
[87 40 51 78 16 52 74 78 40]


Consider the following ``for`` loop in R that subtracts the values of ``vec1`` and ``vec2`` from each other element-wise, squares the differences, and then takes the square root. The results are stored in the vector ``absoluteDifferences``.

```{r}
# Initiate empty object to store maximum values per row:
absoluteDifferences <- NULL

# Iterate through rows, find absolute difference, and append the value to the 
# object 'absoluteDifferences':
for (i in 1:length(vec1)){
  absoluteDifferences <- c(absoluteDifferences, sqrt((vec1[i] - vec2[i])^2))
}
```

This operation is not coded efficiently regarding several aspects.


<div class="alert alert-block alert-info">
    
3. Try and find one solution in Python that speeds up the operation. Explain the intuition behind this solution and why it leads to an efficiency gain. Track speed and compare with your timings from task 2 in R!
</div>

In [16]:
# I first track the time of the loop operation, to show how slow loop operators
# compute tasks compared to using an apply function or vectorizing functions.

# Initiate empty object to store maximum values per row:
absoluteDifferences = [None] * len(vec1)

start_time1 = time.time()

# Iterate through rows, find absolute difference, and append the value to the
# object 'absoluteDifferences':

for i in range(len(vec1)):
    absoluteDifferences[i] = np.sqrt((vec1[i] - vec2[i])**2) 

print(round(time.time() - start_time1, 5))


# Optimization approach using vectorized functions:

start_time2 = time.time()

vec_abs_diff = np.sqrt((vec1 - vec2)**2)

print(round(time.time() - start_time2, 5))


# Vectorized functions have the advantage, that Python does not have to figure out
# the data type for each single element of the vectors. In the loop approach Python
# has to figure out after each repetition which type the certain element of the ith
# run has. On the other side for the vectorized approach Python knows that a vector
# element always has the same data type and therefore a lot of proceeding is saved
# in a vectorized approach.

# Compared to R you can see no real time differences between the two vectorized
# approaches. On the other side you can see a time advantage for Python for the
# loop approach. It seems that Python can operate loops faster than R

0.09097
0.001
