# Pre-requisites quiz

The course assumes that you are comfortable in using Python and familiar with basic mathematics, statistics  and the use of `numpy`. 

For example, you should be able to complete the following quiz.

## Basic Python

1. Generate two collections of 1,000 random 3-character words (ASCII lowercase, only use a-z). 

In [None]:
import numpy as np

In [None]:
import string

In [None]:
s1 = [''.join(s) for s in np.random.choice(list(string.ascii_lowercase), (1000, 3))]
s2 = [''.join(s) for s in np.random.choice(list(string.ascii_lowercase), (1000, 3))]

2. Count the number of unique words found in both collections.

In [None]:
len(set(s1).intersection(set(s2)))

3. Find the most frequently occurring word(s) in the combined collection.

In [None]:
from collections import Counter

In [None]:
c = Counter(s1 + s2)

In [None]:
n = c.most_common(1)[0][1]
n

In [None]:
[k for k,v in c.items() if v==3]

4. Count the number of words in the first collection that consist of all vowels.

In [None]:
import re

In [None]:
re.findall(r'[aeiou]{3}', '\n'.join(s1))

5. Write a function that takes as input a list of words, and builds a new string starting from the first word according to the following rule;

If the next word begins with the same character as the first word, for a new "first" word by concatenating the next word with the first, otherwise discard the next word. Return the final string. Test this on collections 1 and 2.

For example, given the input
```python
['abc', 'def', 'cde', 'def', 'efg']
```

the function returns `abccdeefg`.

In [None]:
def make_long_string(words):
    """
    This function concatenates words that end and begin with the same character.
    
    Input: Collection of words
    Output: A single word formed by concatenating words using the same character rule.
    """
    s = words[0]
    for word in words[1:]:
        if s[-1] == word[0]:
            s += word
    return s

In [None]:
make_long_string(['abc', 'def', 'cde', 'def', 'efg'])

In [None]:
make_long_string(s1)

In [None]:
make_long_string(s2)

## Using `numpy`

Only use `numpy` to complete the following exercise.

1. Set the random see in `numpy` to 123

In [None]:
np.random.seed(123)

2. Create $X_1$, a 10 $\times$ 5 matrix of numbers from a $N(μ=10, σ=5)$ distribution

In [None]:
X1 = np.random.normal(10, 5, (10, 5))
X1

3. Create $X_2$ by scaling the rows of $X_1$ so that they have zero mean and unit standard deviation

In [None]:
X2 = (X1 - X1.mean(axis=1)[:, np.newaxis])/X1.std(axis=1)[:, np.newaxis]
X2

4. Create $X_3$ by extracting the odd rows of $X_1$

In [None]:
X3 = X1[range(1, X1.shape[0],2), :]
X3

5. Create $X_4$ by scaling the columns of $X_3$ so that each column sums to 1

In [None]:
X4 = X3 / X3.sum(axis=0)
X4

6. What is the eigenvector with eigenvalue 1 of $X_4$?. Hint: be careful when checking equality for floats.

In [None]:
λ, v = np.linalg.eig(X4[:5, :])
λ

In [None]:
v[:, np.isclose(λ, 1)]

7. Create $X_5$ by replacing all negative values in $X_2$ by 0

In [None]:
X5 = np.where(X2 < 0, 0, X2)
X5

8. Print the matrix $X_1$ such that each value has only 3 significant digits

In [None]:
np.set_printoptions(precision=3)
X1

9. Suppose you are given observations $y = \pmatrix{1,2,3,4,5,6,7,8,9,10}^T$. Find the least squares solution to $X_1 T \beta = y$.

In [None]:
y = np.arange(1, 11).reshape(-1,1)

In [None]:
β, r, rk, s = np.linalg.lstsq(X1, y, rcond=None)
β

10. What is the vector in the column space of $X_1$ closest to $y$?

In [None]:
X1 @ β