Various interview questions with answers.  Grouped by topic.

## Math

### This question was asked by: Postmates

There are four people on the ground floor of a building that has five levels not including the ground floor. They all get into the same elevator.

If each person is equally likely to get on any floor and they leave independently of each other, what is the probability that no two passengers will get off at the same floor?

Solution:

The number of ways to assigning five floors to four different people is to get the total sample space. In this case it would be 5 * 5 * 5 * 5. For each person, they can choose one of five floors, which happens four times for four people. So the total number of combinations is 5^4.

The number of ways to assign five floors to four people without repetition of floors is 5 * 4 * 3 * 2 because for the first passenger you have five different options. The second person has four, and so on. Note that this number counts all possible orders betwen passengers as well.

The result is then 5/5 * 4/5 * 3/5 * 2/5 = 0.192

## This question was asked by: Google

Given X and Y are independent variables with normal distributions, what is the mean and variance of the distribution of 2X - Y when the corresponding distributions are X ~ N (3, 2²) and Y ~ N(1, 2²)?

Solution:

Because the linear combination of the two independent normal random variables is a normal random variable, we can solve the first problem of the mean by just substituting the given values into the formula for the existing two means in the problem statement.

For the two variables X and Y, the mean is calculated simply by:

2X - Y = 2(3) - 1 = 5

The variance however is calculated differently. The variance of aX-bY is:

𝑉𝑎𝑟(𝑎𝑋−𝑏𝑌) = 𝑎2𝑉𝑎𝑟(𝑋) + 𝑏2𝑉𝑎𝑟(𝑌) − 2𝑎𝑏 * 𝐶𝑜𝑣(𝑋,𝑌) where 𝐶𝑜𝑣(𝑋,𝑌) is the covariance between X and Y. The covariance between both X and Y is zero given the normal random variables. That way we can calculate this out:

𝑉𝑎𝑟(𝑎𝑋−𝑏𝑌) = 𝑎2𝑉𝑎𝑟(𝑋) + 𝑏2𝑉𝑎𝑟(𝑌) − 2𝑎𝑏 * 𝐶𝑜𝑣(𝑋,𝑌)
=4·𝑉𝑎𝑟(𝑋)+𝑉𝑎𝑟(𝑌)−0
=4·4 + 4 = 20

## Programming

In [None]:
"""
This question was asked by: Microsoft (Hard)
    
Given an array of words and a maxWidth parameter, format the text such that each line 
has exactly maxWidth characters. 

Pad extra spaces ' ' when necessary so that each line has exactly maxWidth characters.

Extra spaces between words should be distributed as evenly as possible. 

If the number of spaces on a line do not divide evenly between words, 
the empty slots on the left will be assigned more spaces than the slots on the right.

Example:
"""

words = ["This", "is", "an", "example", "of", "text", "justification."]
max_width = 16

output = [
    "This    is    an",
    "example  of text",
    "justification.  "
]

# Solution:

In [None]:
"""
Since extra spaces between words should be distributed as evenly as possible, 
we need to implement round robin logic. Round robin logic can be implemented 
by iterating over each value in the array, checking if it is over the max width, 
and then adding spaces to the existing line if it has reached capacity.

The following line implements the round robin logic:

for i in range(maxWidth - num_of_letters):
    cur[i%(len(cur)-1 or 1)] += ' ' 
    
Once you determine that there are only k words that can fit on a given line, 
you know what the total length of those words is num_of_letters. Then the rest 
are spaces, and there are (maxWidth - num_of_letters) of spaces.

The "or 1" part is for dealing with the edge case len(cur) == 1.
"""

def fullJustify(self, words, maxWidth):
    res = []
    cur= []
    num_of_letters = 0

    for w in words:
        #check if existing words + new words are greater than max width
        if num_of_letters + len(w) + len(cur) > maxWidth:
            #implement round robin logic
            for i in range(maxWidth - num_of_letters):
                cur[i%(len(cur)-1 or 1)] += ' '
            res.append(''.join(cur))
            cur, num_of_letters = [], 0
        cur += [w]
        num_of_letters += len(w)
    return res + [' '.join(cur).ljust(maxWidth)]

## This question was asked by: Facebook
There are two lists, list X and list Y. Both lists contain integers from -1000 to 1000 and are identical to each other except that one integer is removed in list Y that exists in list X.

Write a function that takes in both lists and returns the integer that was removed in $O(1)$ time and $O(n)$ space without using the python set function.

Solution:

This question is a definition of a trick question. It's not really a python or algorithms question but more of a brain teaser meant to give you a problem to be solved in a creative way.

The question is asking how you figure out the number that is missing from list Y, which is identical to list X, except that one number is missing. We could loop through one list, create a hashmap, and figure out which element doesn't exist but that wouldn't be done in O(1) time.

Before getting into the coding, think about it logically - how would you find the answer to this?

The quick and simple solution is to sum up all the numbers in X and sum up all the numbers in Y and subtract the sum of X from the sum of Y, and that gives you the number that's missing. Because the elements in the list are integers, it adds a different dimension to the problem in creativity rather than the typical approach of data structures and algorithms.

```python
def return_missing_integer(list_x, list_y):
    return sum(list_x) - sum(list_y)
```
Always ask follow up questions when given constraints. The interviewer could be holding back assumptions that would not ever be known without asking for more clarification. Some example would be:

- Is the list sorted?
- Is one of the lists the set of all integers from -1000 to 1000?
- Are any built in functions allowed besides the set function?

## Data science

### This question was asked by: Airbnb

Pretend you have to analyze the results of an AB test. One variant of the AB test has a sample size of 50K users and the other has a sample size of 200K users.

Given the unbalanced size between the two groups, can you determine if the test will result in bias towards the smaller group?

Solution:


There's a couple ways to test for bias but let's look at the size of each of the populations again. The interviewer in this case is trying to assess how you would approach the problem given an unbalanced group. We're not given context to the situation so we have to either:
1. State assumptions or
2. Ask clarifying questions.

How long has the AB test been running? Have they been running during the same time duration? If the data was collected during different time periods then bias certainly exists from one group being from a different date period than the other.

Let's assume that there is no bias when having unequal sample sizes because the smaller sample size is already very large. For even very small effects, 50K observations may confer quite a powerful test. The power of the test is heavily dependent on the smaller sample size. But since the sample size is large enough, power is not a concern here.

If the test is run inappropriately, in which case the pooled variance estimate is more heavily weighted toward the larger group, and the variances between the two samples are largely different compared to their means, then we might see bias in the result effects. Otherwise given than 50K already confers a powerful enough test, we might not see bias if we can downsample the other test variation to 50K.

## This question was asked by: Ampush

Let's say that you work for a software as a subscription (SAAS) company that has existed for just over a year. The chief revenue officer wants to know the average lifetime value.

We know that the product costs 100 dollars per month, averages 10% in monthly churn, and the average customer sticks around for around 3.5 months.

Calculate the formula for the average lifetime value.

Solution:

This is a trick question given that the candidate is given multiple pieces of supposedly relevant information. Let's break it down by looking at just the important pieces and zeroing in on what pieces of data we need for a calculation.

The chief revenue officer wants to know the average lifetime value. Otherwise known as LTV, average lifetime value is defined by the prediction of the net revenue attributed to the entire future relationship with all customers averaged. Given that we don't know the future net revenue, we can estimate it by taking the total amount of revenue generated divided by the total number of customers acquired over the same period of time. Given it is a subscription business, we can use the information provided by product price and churn.

In this case we are already given the average product value and churn. The product costs 100 dollars a month and the product averages 10% in monthly churn. Therefore we can calculate the expected value of the customer at each month as a multiplier of retention times the product cost.

Let's look at this example.

First month the expected value is 100 dollars with 100% of customers retained. The customer has to pay the entire value the first month and it's assumed they will pay upfront. The second month, since the company is dealing with average 10% churn, the company will retain only 90% of the customers. And so the expected value drops down on the second month to:

Second Month EV = $100 * 90% = $90

If N is the total number of months the company has been in business and will be in business, our calculation then becomes:

LTV = 100*.9^(0) + 100*.9^(1) + 100*.9^(2) + ..... + 100*.9^(N)

## Supervised learning

### This question was asked by: Lyft
    
Let's say we have 1 million Lyft rider journey trips in the city of Seattle. We want to build a model to predict ETA after a rider makes a Lyft request.

How would we know if we have enough data to create an accurate enough model?

Solution:

Collecting data can be costly. This question assesses the candidate’s skill in being able to practically figure out if a model needs more data. There are a couple of factors to look into.

- Look at the feature set size to training data size ratio. If we have an extremely high number of features compared to training data, then the model inaccuracy will be prone to overfitting.

- Create an existing model off a portion of the data, the training set, and measure performance of the model on the validation sets, otherwise known as using a holdout set. We hold back some subset of the data from the training of the model, and then use this holdout set to check the model performance to get a baseline level.

- Learning curves. Learning curves help us calculate our accuracy rate by testing data on subsequently larger subsets of data. If we fit our model on 20%, 40%, 60%, 80% of our data size and then cross-validate to determine model accuracy, we can then determine how much more data we need to achieve a certain accuracy level.

For example. If we reach 75% accuracy with 500K datapoints but then only 77% accuracy with 1 million datapoints, then we’ll realize that our model is not predicting well enough with it’s existing features since doubling the training data size did not significantly increase the accuracy rate.

## Classification

### This question was asked by: Square

Let's say that you work at a bank that wants to build a model to detect fraud on the platform. The bank wants to implement a text messaging service in addition that will text customers when the model detects a fradulent transaction in order for the customer to approve or deny the transaction with a text response.

1. What kind of model would need to be built?
2. Given the scenario, if you were building the model, which model metrics would you be optimizing for?

Solution:

1. Binary classifier. Given that fraud is binary, there either is a fradulent transaction or there isn't.

2. There are a lot of different ways to analyze model performance but let's take into account what's specified. We know that in binary classification problems there are precision versus recall trade-offs.

Precision is defined as the number of true positives divided by model predicted positives. In our example this would be the percentage of correct fradulent transactions out of predicted fradulent transactions.

Precision = (True Positive / (True Positive + False Positive))

Recall is defined as the number of true positives divided by number of actual true positives. In our example this would be the number of correct fradulent transactions out of actual fradulent transactions.

Recall = (True Positive / (True Positive + False Negative))

Given these two metrics for evaluating a binary classifier, which metric would a bank prefer to be higher? Low recall in a fradulent case scenario would be a disaster. With low predictive power on false negatives, fradulent purchases would go under the rug with consumers not even knowing they were being defrauded.

Meanwhile if there was low precision, customers would think their accounts would be under fraud all the time. But since the question prompts for a text messaging service, this would be okay since the end customer would just have to approve or deny transactions that were false fraud transactions.

### This question was asked by: Facebook

You are about to get on a plane to Seattle. You want to know if you should bring an umbrella. You call 3 random friends of yours who live there and ask each independently if it's raining. Each of your friends has a 2/3 chance of telling you the truth and a 1/3 chance of messing with you by lying. All 3 friends tell you that "Yes" it is raining.

What is the probability that it's actually raining in Seattle?

Solution:

This question can be solved in two ways in the schools of thought: Bayesian or Frequentist. The frequentist method is probably the easiest.

For example. The question prompt states, that each friend has a 2/3 change of telling the truth. Through logical transference, given that all of the friends have told you that it is raining, the question of "what is the probability that it is not raining" is the same thing as "what is the probability that all of your friends are lying?"

P(Not Raining) = P(Friend 1 Lying) AND P(Friend 2 Lying) AND P(Friend 3 Lying)

Given this logical expression. We can simply the problem to then to calculate the inverse of three AND functions. So the probability of it raining is then equated to:

P(Raining) = 1 - P(3 Friend's Lying)

Multiple of all independent probabilities:

P(3 Friend's Lying) = 1/3 * 1/3 * 1/3 = 1/27

P(Raining) = 1 - 1/27 = 26/27

## NLP

In [None]:
"""
This question was asked by: Adobe
    
In data science, there exists the concept of stemming, which is the heuristic 
of chopping off the end of a word to clean and bucket it into an easier feature set. 

Given a dictionary consisting of many roots and a sentence, stem all the words 
in the sentence with the root forming it. If a word has many roots can form it, 
replace it with the root with the shortest length.

Example
"""

# input
roots = ["cat", "bat", "rat"]
sentence = "the cattle was rattled by the battery"

output = "the cat was rat by the bat"

# Solution:

In [None]:
# At first it simply looks like we can just loop through each word and 
# check if the root exists in the word and if so, replace the word with the root. 
# But since we are technically stemming the words we have to make sure that the roots are 
# equivalent to the word at it's prefix rather than existing anywhere within the word.

# We're given a dictionary of roots with a sentence string. Given we have to check 
# each word, let's try creating a function that takes a word and returns the existing word 
# if it doesn't match a root, or return the root itself.

def replace(word, rootset):
    #loop through each subsequent letter
    for i in xrange(1, len(word)): 
        # if the word at the letter is equal one word in the rootset
        # return the rootset word
        if word[:i] in rootset: 
            return word[:i]
    return word

# Here we're going through each character in the word starting from the beginning and looping 
# through each letter until the resulting word is either equivalent or not to a root in the rootset. 
# We can create the rootset by just making the list into a set.

def replaceWords(roots, sentence):
    rootset = set(roots) #create a set

    def replace(word):
        for i in xrange(1, len(word)): 
            if word[:i] in rootset: 
                return word[:i]
        return word

    return " ".join(map(replace, sentence.split()))

# Given we've created the replace function, we can now just map it to splitting the sentence input and re-join the list back into a sentence.

## SQL

### This question was asked by: Uber

`transactions` table

columns	type
id	int
user_id	int
item	varchar
created_at	datetime
revenue	float

Given the revenue transaction table above that contains a user_id, created_at timestamp, and transaction revenue, write a query that finds the third purchase of every user.

Solution:

This problem set is relatively straight forward. We can first find the order of purchases for every user by looking at the created_at column and ordering by user_id and the created_at column. However we still need an indicator of which purchase was the third value.

In this case, we need to apply the RANK function to the transactions table. The RANK function is a window function that assigns a rank to each row in the partition of the result set.

RANK() OVER (PARTITION BY user_id ORDER BY created_at ASC) AS rank_value

In this example, the PARTITION BY clause distributes the rows in the result set into partitions by one or more criteria.

Second, the ORDER BY clause sorts the rows in each partition by the column we indicated, in this case, created_at.

Finally, the RANK() function is operated on the rows of each partition and re-initialized when crossing each partition boundary. The end result is a column with the rank of each purchase partitioned by user_id.

All we have to do is then wrap the table in a subquery and filter out where the new column is then equal to 3, which is equivalent for subsetting for the third purchase.

SELECT *
FROM ( 
    SELECT 
        user_id
        , created_at
        , revenue
        , RANK() OVER (PARTITION BY user_id ORDER BY created_at ASC) AS rank_value 
    FROM transactions
) AS t
WHERE rank_val = 3;

### This question was asked by: Tinder

There are two tables. One table is called `swipes` that holds a row for every Tinder swipe and contains a boolean column that determines if the swipe was a right or left swipe called `is_right_swipe`. The second is a table named `variants` that determines which user has which variant of an AB test.

Write a SQL query to output the average number of right swipes for two different variants of a feed ranking algorithm by comparing users that have swiped the first 10, 50, and 100 swipes on their feed.

Tip: Users have to have swiped at least 10 times to be included in the subset of users to analyze the mean number of right swipes.

Example Input:

`variants`

id	experiment	variant	user_id
1	feed_change	control	123
2	feed_change	test	567
3	feed_change	control	996
`swipes`

id	user_id	swiped_user_id	created_at	is_right_swipe
1	123	893	2018-01-01	0
2	123	825	2018-01-02	1
3	567	946	2018-01-04	0
4	123	823	2018-01-05	0
5	567	952	2018-01-05	1
6	567	234	2018-01-06	1
7	996	333	2018-01-06	1
8	996	563	2018-01-07	0
Note: created_at doesn't show timestamps but assume it is a datetime column.

Output:

mean_right_swipes	variant	swipe_threshold	num_users
5.3	control	10	9560
5.6	test	10	9450
20.1	control	50	2001
22.0	test	50	2019
33.0	control	100	590
34.0	test	100	568

Solution:

If you're a data scientist in charge of improving recommendations at a company and you develop an algorithm, how do you know if it performs better than the existing one?

One metric to measure performance is called precision (also called positive predictive value), which has applications in machine learning as well as information retrieval. It is defined as the fraction of relevant instances among the retrieved instances. Given the problem set of measuring two feed ranking algorithms, we can break down this problem as measuring the mean precision between two different algorithms by comparing average right swipes for two different populations for the users first 10, 50, and 100 swipes.

We're given two tables, one called `variants` that essentially breaks down which test variant each user has received. It has a column named `experiments` that we have to filter on for the `feed_change` experiment. We know we have to join this table back to the `swipes` table in order to differentiate both of the variants from each other.

The other table, `swipes`, is a transaction type table, meaning that it logs each users activity in the app. In this case it's left and right swipes on other users.

Given the problem set, the first step is to formulate a way to average the right swipes for each user that satisfies the conditions of swiping at least 10, 50, and 100 total swipes. Given this condition, the first thing we have to do is add a rank column to the swipe table. That way we can look at each user's first X swipes.

WITH swipe_ranks AS (
    SELECT 
        swipes.user_id
        , variant
        , RANK() OVER (
            PARTITION BY user_id ORDER BY created_at ASC
        ) AS rank
        , is_right_swipe
    FROM swipes
    INNER JOIN variants 
        ON swipes.user_id = variants.user_id
    WHERE experiment = 'feed_change'
)
Observe how we implement a RANK function by partitioning by user_id and ordering by the created_at field. This gives us a rank of 1 for the first swipe the user made, 2 for the second, and etc...

Now our swipe_ranks table looks like this:

user_id	variant	created_at	rank	is_right_swipe
123	control	2018-01-01	1	0
123	control	2018-01-02	2	1
567	test	2018-01-04	1	0
123	control	2018-01-05	3	0
567	test	2018-01-05	2	1
567	test	2018-01-06	3	1
996	control	2018-01-06	1	1
996	control	2018-01-07	2	0
Notice how the rank value does not reach above 3 in our sample data. Since each user needs to swipe on at least 10 users to reach the minimum swipe threshold, each of these users would be subsetted out of the analysis.

Constructing a query, we can create a subquery that specifically gets all the users that swiped at least 10 times. We can do that by using a COUNT(*) function or by looking where the rank column is greater than 10.

SELECT user_id
FROM swipe_ranks
WHERE rank > 10
GROUP BY 1
Then we can rejoin these users into the original swipe ranks table and group by the experiment variant and take an average of the number of right swipes each user made where the rank was less than 10. Remember that we have to specify a filter for the rank column because we cannot analyze swipe data greater than the threshold we are setting since the Then we can rejoin these users into the original swipe ranks table and group by the experiment variant and take an average of the number of right swipes each user made where the rank was less than 10. Remember that we have to specify a filter for the rank column because we cannot analyze swipe data greater than the threshold we are setting since the recommendation algorithm is intended to move more relevant matches to the top of the feed.

SELECT 
    variant
    , CAST(SUM(is_right_swipe) AS DECIMAL)/COUNT(*) AS mean_right_swipes
    , 10 AS swipe_threshold
    , COUNT(DISTINCT user_id) AS num_users
FROM swipe_ranks AS sr
INNER JOIN (
    SELECT user_id
    FROM swipe_ranks
    WHERE rank > 10
    GROUP BY 1
) AS subset 
    ON subset.user_id = sr.user_id
WHERE rank <= 10
GROUP BY 1
Awesome! This should work. Notice this value gives us the value for only the threshold of rank under 10. We can copy most of the code and re-use it for 50 and 100 by unioning the tables together. Putting it all together now.

WITH swipe_ranks AS (
    SELECT 
        swipes.user_id
        , variant
        , RANK() OVER (
            PARTITION BY user_id ORDER BY created_at ASC
        ) AS rank
        , is_right_swipe
    FROM swipes
    INNER JOIN variants 
        ON swipes.user_id = variants.user_id
    WHERE experiment = 'feed_change'
)

SELECT 
    variant
    , CAST(SUM(is_right_swipe) AS DECIMAL)/COUNT(*) AS mean_right_swipes
    , 10 AS swipe_threshold
    , COUNT(DISTINCT user_id) AS num_users
FROM swipe_ranks AS sr
INNER JOIN (
    SELECT user_id
    FROM swipe_ranks
    WHERE rank > 10
    GROUP BY 1
) AS subset 
    ON subset.user_id = sr.user_id
WHERE rank <= 10
GROUP BY 1

UNION ALL

SELECT 
    variant
    , CAST(SUM(is_right_swipe) AS DECIMAL)/COUNT(*) AS mean_right_swipes
    , 50 AS swipe_threshold
    , COUNT(DISTINCT user_id) AS num_users
FROM swipe_ranks AS sr
INNER JOIN (
    SELECT user_id
    FROM swipe_ranks
    WHERE rank > 50
    GROUP BY 1
) AS subset 
    ON subset.user_id = sr.user_id
WHERE rank <= 50
GROUP BY 1

UNION ALL

SELECT 
    variant
    , CAST(SUM(is_right_swipe) AS DECIMAL)/COUNT(*) AS mean_right_swipes
    , 100 AS swipe_threshold
    , COUNT(DISTINCT user_id) AS num_users
FROM swipe_ranks AS sr
INNER JOIN (
    SELECT user_id
    FROM swipe_ranks
    WHERE rank > 100
    GROUP BY 1
) AS subset 
    ON subset.user_id = sr.user_id
WHERE rank <= 100
GROUP BY 1