## Quant Analyst Interview

#### 1. Given “Employee” table below:

| id | Name | Salary | manager_id |
|----|------|--------|------------|
| 1 | John  | 300    | 3 |
| 2 | Mike  | 200    | 3 |
| 3 | Sally | 550    | 4 |
| 4 | Jane  | 500    | 7 |
| 5 | Joe   | 600    | 7 |
| 6 | Dan   | 600    | 3 |
| 7 | Phil  | 550    | NULL |
|...|  ...  |  ...   |...|

  - Give the name of employees, whose salaries are greater than their immediate
manager’s
  - What is the average salary of employees who do not manage anyone? In the sample
above, that would be John, Mike, Joe and Dan, since they do not have anyone
reporting to them

We can answer of these using SQL query

> Give the name of employees, whose salaries are greater than their immediate manager’s

```
SELECT e2.name
FROM employee e1
JOIN employee e2
ON e1.id = e2.manager_id
WHERE e2.salary > e1.salary;
```

_Reasoning_: We do a self-join to get a list of entries on the condition which matches employee's id to their manager's id. Now the "second" table is the one that has list of employees, that's the table we are interested in. Then we compare whose salary from that table is higher than the employees in "first" table, which contains the details of their managers. For the table we get the result: `Dan | Sally | Joe`

> What is the average salary of employees who do not manage anyone?

```
SELECT AVG(salary)
FROM employee
WHERE name NOT IN (
	SELECT e1.name
	FROM employee e1
	JOIN employee e2
	ON e1.id = e2.manager_id)
```

_Reasoning_: We use a similar approach as before. Self join the table on id and manager_id, but this time to get a list of employees from "first" table i.e. people who are managers. Then we use it as a subquery to get the list of names that aren't anyone's manager. Once we have that, we can simply take an average of the salary, which in this case is `425.00`.

#### 2. Write a function ‘exists’ which takes a variable symbol v and returns whether v is defined.

In python, it is quite straight-forward to check if a variable exists using an `if` statement. However, when we use a function to check if a variable exists or not, we need to specify if it is _local_ or _global_ variable. From the context of this question, we can assume that we are looking for a global variable. In that case, the `exists()` function can be defined as below. The function returns a boolean True/False as result.

_Note_: Since we are checking for global variable we need to make sure we pass the name of the variable as argument because the function `globals()` returns a dictionary of all global varibales

In [2]:
# function to check if a varible exists
def exists(v):
    # check if the variable exists in the globals
    if v in globals():
        # if it exists, return True
        return True
    else:
        # else, return False
        return False

In [12]:
exists('v')

False

In [13]:
v = 5
exists('v')

True

#### 3. Create a function to compute N layer of a Pascal Triangle. The first 4 later will looks like:
1

1 1

1 2 1

1 3 3 1

We will write a python function to create the Pascal Triangle, which takes `N` as input and returns the pascal triangle of `N` layers.

In [9]:
# function to print pascal's triangle
def print_pascal_tri(lst):
    for row in lst:
        print(row)

In [10]:
# function to create pascal triangle of N layers
def pascal_triangle(N):
    # instantiate an empty list to store the layers
    pascal_tri_list = []
    
    # run a for loop for N number of times
    for i in range(N):
        # list of numbers in each layer
        layer = []
        
        if not pascal_tri_list: # if the full list is empty, append 1
            layer.append(1)
        else: # else insert the intermediate numbers
            # get the length of the previous layer
            prev_layer_len = len(pascal_tri_list[i-1])
            
            # run a second loop one more than previous layer's length
            for j in range(prev_layer_len + 1):
                # for the first two layers just add 1s
                if j == 0 or j == 1:
                    layer.append(1)
                else:
                    # insert the sum of top two numbers at previous index
                    layer.insert(j-1, pascal_tri_list[i-1][j-2] + pascal_tri_list[i-1][j-1])
                    
        # append the layer to final list
        pascal_tri_list.append(layer)
    
    # print the pascal's triangle
    print_pascal_tri(pascal_tri_list)

In [11]:
pascal_triangle(5)

[1]
[1, 1]
[1, 2, 1]
[1, 3, 3, 1]
[1, 4, 6, 4, 1]


_Reasoning_: The comments should be enough to follow along. Basically, we are running a for loop, and at every step checking the previous layer to update the current layer. At the end, in python, it is a list of lists and we print the results. This is a brute force of getting the result and am sure there's an efficient way to solve this by taking advantage of the fact that Pascal's Triangle is an arrangement of binomial coefficients. Using multiple for loops and lists is neither memory nor compute efficient.

#### 4. Assume have the following portfolio as of 2016/01/01:
|      |   |
|------|---|
|AAPL.O|15%|
|IBM.N |20%|
|GOOG.O|20%|
|BP.N  |15%|
|XOM.N |10%|
|COST.O|15%|
|GS.N  |5% |

  - Using historical daily returns (Yahoo/Google Finance or any other market data
source), calculate VaR95% and CVaR95% of the portfolio as of 2016/12/31
  - Using expected mean, covariance matrix and parametric method, calculate VaR95%
and CVaR95%
  - Assume you can change weights, allow shorting but no leverage (i.e. sum of weights
equal 100%), and rebalance monthly. What is the optimal portfolio holding by end of
each month till end of 2016

_Notes_: If you have other assumption(s) please state clearly

#### 5. Assume you have a Python project, which source code is under a git repo folder “my-python-project”. Write a program/script to produce the following statistics of this folder:

  - How many python files
  - How many lines of code in total, how many lines of comment line (empty line doesn’t
count)
  - How many functions is defined in total
  - How many lines of changes from the current version against HEAD~3
  - Total folder size (in MB) per each of the subfolder (down to 2 level depth)

#### 6. In a text file, give me total number of appearance of “date” within the text file. The date format can appears in either one (or multiple) formats shown below:

  - YYYY/MM/DD
  - MM/DD/YYYY
  - DD/MM/YYYY
  - DD (Jan/Feb/Mar/Apr/May/Jun/Jul/Aug/Sept/Oct/Nov/Dec) YYYY