<div class="clearfix" style="padding: 10px; padding-left: 0px">
<a href="http://bombora.com"><img src="https://app.box.com/shared/static/e0j9v1xjmubit0inthhgv3llwnoansjp.png" width="200px" class="pull-right" style="display: inline-block; margin: 5px; vertical-align: middle;"></a>
<h1> Bombora Data Science: <br> *Interview Exam* </h1>
</div>

<img width="200px" src="https://app.box.com/shared/static/15slg1mvjd1zldbg3xkj9picjkmhzpa5.png">

---
# Welcome

Welcome! This notebook contains interview exam questions referenced in the *Instructions* section in the `README.md`—please read that first, *before* attempting to answer questions here.

<div class="alert alert-info" role="alert" style="margin: 10px">
<p style="font-weight:bold">ADVICE</p>
<p>*Do not* read these questions, and panic, *before* reading the instructions in `README.md`.</p>
</div>

<div class="alert alert-warning" role="alert" style="margin: 10px">
<p style="font-weight:bold">WARNING</p>

<p>If using <a href="https://try.jupyter.org">try.jupyter.org</a> do not rely on the server for anything you want to last - your server will be <span style="font-weight:bold">deleted after 10 minutes of inactivity</span>. Save often and rember download notebook when you step away (you can always re-upload and start again)!</p>
</div>


## Have fun!

Regardless of outcome, getting to know you is important. Give it your best shot and we'll look forward to following up!

# Exam Questions

## 1. Algo + Data Structures

### Q 1.1: Fibionacci
![fib image](https://upload.wikimedia.org/wikipedia/commons/thumb/9/93/Fibonacci_spiral_34.svg/200px-Fibonacci_spiral_34.svg.png)

#### Q 1.1.1
Given $n$ where $n \in \mathbb{N}$ (i.e., $n$ is an integer and $n > 0$), write a function `fibonacci(n)` that computes the Fibonacci number $F_n$, where $F_n$ is defined by the recurrence relation:

$$ F_n = F_{n-1} + F_{n-2}$$

with initial conditions of:

$$ F_1 = 1,  F_2 = 1$$

In [1]:
def fibonacci(n):
    if n == 1:
        return 1
    elif n == 2:
        return 1
    else:
        return fibonacci(n-1)+fibonacci(n-2)

In [2]:
# Some try-outs:
print('F2:', fibonacci(2))
print('F5:', fibonacci(5))
print('F8:', fibonacci(8))

F2: 1
F5: 5
F8: 21


#### Q 1.1.2
What's the complexity of your implementation?

Time complexity can be expressed as $T(n) = T(n-1) + T(n-2) + O(1)$. The equation is similar to Fibonacci itself since both are defined as $f(n) = f(n-1) + f(n-2)$. So we can use the solution for sum of Fibonacci series: $T(n) = O(a^n)$ where $a = (1+sqrt(5))/2 = 1.618$, the golden ratio. 

(For solving sum of Fibonacci, the strategy is guess-and-verify: assume solution is in a form of $c^n$, we can solve $c^n = c^{n-1} + c^{n-2}$, which is same as $c^2 = c + 1$, and solution for c is the golden ratio)

Space complexity should be linear, since we only need to store each $F_i$ for $i = 1,2,...,n$. So space complexity is $O(n)$

#### Q 1.1.3
Consider an alternative implementation to compute Fibonacci number $F_n$ and write a new function, `fibonacci2(n)`.

We can think of the question in a way of matrix multiplication: 

$$
\left(\begin{array}{l}
F_{n+1} \\
F_{n}
\end{array}\right)=A \cdot\left(\begin{array}{l}
F_{n} \\
F_{n-1}
\end{array}\right)
$$

where $$A = \left(\begin{array}{ll} 1 & 1 \\ 1 & 0 \end{array}\right)$$

In this way, we can express $F_n$, $F_{n-1}$ as

$$
\left(\begin{array}{l}
F_{n} \\
F_{n-1}
\end{array}\right)=A^{n-2} \cdot\left(\begin{array}{l}
F_{2} \\
F_{1}
\end{array}\right)=A^{n-2} \cdot\left(\begin{array}{l}
1 \\
1
\end{array}\right)
$$

In [3]:
import numpy as np
from numpy.linalg import matrix_power

def fibonacci2(n):
    if n == 1:
        return 1
    elif n == 2:
        return 1
    else:
        A = np.array([[1, 1], [1, 0]])
        A_n = matrix_power(A, (n-2))
        init_vec = np.array([1, 1]) # value of F1 and F2
        out_vec = np.matmul(A_n, init_vec)
        return out_vec[0]

In [4]:
#Some try-outs to check
print('F2:', fibonacci2(2))
print('F5:', fibonacci2(5))
print('F8:', fibonacci2(8))

F2: 1
F5: 5
F8: 21


#### Q 1.1.4
What's the complexity of your implementation?

Time complexity of implementation depends on how numpy calculates matrix multiplication (to get $A^{n-2})$. Here let's discuss one of the general approaches which is 'devide-and-conquer':

$$
\left(\begin{array}{l}
F_{n} \\
F_{n-1}
\end{array}\right)=\left\{\begin{array}{l}
A^{\frac{n-2}{2}} \cdot A^{\frac{n-2}{2}} \cdot(1,1)^{\top} (even) \\
A^{\frac{n-3}{2}} \cdot A^{\frac{n-3}{2}} \cdot A \cdot(1,1)^{\top} (odd)
\end{array}\right.
$$

In this case we have $T(n) = T(\frac{n}{2}) + O(1)$, so $T(n)$ is asymptotically $O(log(n))$

Space complexity should also be $O(log(n))$, the computation is similar to time complexity. We only need to store $A^{n-2}, A^{\frac{n-2}{2}}, A^{\frac{n-2}{4}}$, etc. and initial values which takes $O(1)$

#### Q 1.1.5
What are some examples of optimizations that could improve computational performance?


In Machine Learning the examples include SGD (stochastic gradient descent) which performs faster than general gradient descent; coordinate descent which improves efficiency for sparse coordinate cases such as Lasso regression; and subgradient descent which solves situation when gradient doesn't exist. 

### Q 1.2: Linked List
![ll img](https://upload.wikimedia.org/wikipedia/commons/thumb/6/6d/Singly-linked-list.svg/500px-Singly-linked-list.svg.png)

#### Q 1.2.1
Consider a [singly linked list](https://en.wikipedia.org/wiki/Linked_list), $L$. Write a function `is_palindrome(L)` that detects if $L$ is a [palindrome](https://en.wikipedia.org/wiki/Palindrome), by returning a bool, `True` or `False`.


#### Q 1.2.2
What is the complexity of your implementation?

#### Q 1.2.3
Consider an alternative implementation to detect if L is a palindrome and write a new function, `is_palindrome2(L)`.

#### Q 1.2.4
What's the complexity of this implementation?


#### Q 1.2.5 
What are some examples of optimizations that could improve computational performance?


## 2. Prob + Stats

### Q 2.1: Finding $\pi$ in a random uniform?
<img src=https://www.epicurus.com/food/recipes/wp-content/uploads/2015/03/Pi-Day.jpg width="480">

Given a uniform random generator $[0,1)$ (e.g., use your language's standard libary to generate random value), write a a function `compute_pi` to compute [$\pi$](https://en.wikipedia.org/wiki/Pi).

### Q 2.2: Making a 6-side die roll a 7?

Using a single 6-side die, how can you generate a random number between 1 - 7?

### Q 2.3: Is normality uniform?

<img src=https://rednaxela1618.files.wordpress.com/2014/06/uniformnormal.png width="480">


Given draws from a normal distribution with known parameters, how can you simulate draws from a uniform distribution?

**Answer:** <br>
The key idea is we want to set the CDF's of both random variables equal to each other. So we can model the question in the following way: <br>

Denote $Z$ as random variable from normal distribution $N(\mu, \sigma^2)$, and *U* as random variable that we want to simulate which follows Uniform$[a, b]$. The value drawn is $z$. And denote $\Phi$ as CDF for standard normal (can be obtained by looking up Z distribution table, i.e. given a certain Z score we want the percentile). 

Then $\frac{Z - \mu}{\sigma}$ follows a Standard Normal, so we set $\Phi(\frac{z - \mu}{\sigma}) = F_U(u) = \frac{1}{b-a}(u-a)$, for $a<u<b$

The solution is $u = (b-a) * \Phi(\frac{z-u}{\sigma}) + a$, and that's the value for our simulation.

### Q 2.4: Should you pay or should you go?

![coin flip](https://lh5.ggpht.com/iwD6MnHeHVAXNBgrO7r4N9MQxxYi6wT9vb0Mqu905zTnNlBciONAA98BqafyjzC06Q=w300)

Let’s say we play a game where I keep flipping a coin until I get heads. If the first time I get heads is on the nth coin, then I pay you $2^{(n-1)}$ US dollars. How much would you pay me to play this game? Explain.

### Q 2.5: Uber vs. Lyft

![uber vs lyft](http://usiaffinity.typepad.com/.a/6a01347fc1cb08970c01bb0876bcbe970d-pi)

You request 2 UberX’s and 3 Lyfts. If the time that each takes to reach you is IID, what is the probability that all the Lyfts arrive first? What is the probability that all the UberX’s arrive first?

### Q 2.6: Pick your prize
<img src=https://miro.medium.com/max/1100/1*m5b3O9sE68UCXjLw5oxy2g.png width="480">

A prize is placed at random behind one of three doors and you are asked to pick a door. To be concrete, say you always pick door 1. Now the game host chooses one of door 2 or 3, opens it and shows you that it is empty. They then give you the option to keep your picked door or switch to the unopened door. Should you stay or switch if you want to maximize your probability of winning the prize?

## 3 Conceptual ML

### Q 3.1 Why study gradient boosting or neural networks?

Consider a regression setting where $X \in \mathbb{R}^p$ and $Y \in \mathbb{R}$. The goal is to come up with a function $f(X): \mathbb{R}^p \rightarrow \mathbb{R}$ that minimizes the squared-error loss $(Y - f(X))^2$. Since X, Y are random variables, we seek to minimize the expectation of the squared error loss as follows
\begin{equation}
EPE(f) = \mathbb{E}\left[(Y-f(X)^2\right]
\end{equation}
where EPE stands for expected prediction error. One can show that minimizing the expected prediction error leads to the following _regression function_
\begin{equation}
f(x) = \mathbb{E}\left[Y|X=x\right]
\end{equation}

The goal of any method is to approximate the regression function above, which we denote as $\hat{f}(x)$. For example, linear regression explicitly assumes that the regression function is approximately linear in its arguments, i.e. $\hat{f}(x) = x^T\beta$ while a neural network provides a nonlinear approximation of the regression function. 

The simplest of all these methods is [k-nearest neighbors](https://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm). Given $x$ and some neighbourhood of $k$ points $N_k(x)$, $\hat{f}(x)$ is simply the average of all $y_i|x_i \in N_k(x)$.  Let $N$ denote the training sample size. Under mild regularity conditions on the joint probability distribution $Pr(X, Y)$, one can show that as $N \rightarrow \infty$, $k \rightarrow \infty$ such that $k/N \rightarrow 0$, then $\hat{f}(x) \rightarrow f(x)$ where $\rightarrow$ means approaches or goes to. In other words, the k-nearest neighbors algorithm converges to the ideal solution as both the training sample size and number of neighbors increase to infinity.

Now given this _universal approximator_, why look any further and research other methods? Please share your thoughts.


**Answer:** <br>

KNN is only theoretically the best. It converges to optimal as k and N go to infinity, but the convergence may be slow. In practice we only have finite (and sometimes sparse) data, thus we may not have any points near our test point. 

For other mothods, they don't require as much data to perform well. 

Basically at a high level, the more assumptions we make about the relationship between X and Y, the less data we can get away with using. KNN makes very few assumptions but requires lots of data; other methods (e.g. linear regression) makes lots of assumptions and requires less data. 

### Q 3.2 Model Selection and Assesment

Consider a multiclass classification problem with a large number of features $p >> N$, for e.g $p=10000, N=100$ The task is threefold
1. Find a "good" subset of features that show strong _univariate_ correlation with class labels
2. Using the "good" subset, build a multi class classifier
3. Estimate the generalization error of the final model

Given this dataset, outline your approach and please be sure to cover the following
- Data splitting
- Model Selection: either estimating the performance of different classifiers or the same classifier with different hyperparameters
- Model Assessment: having chosen a classifier, estimating the generalization error

Assume all features are numerical, the dataset contains no NULLS, outliers, etc. and doesn't require any preprocessing.

