# Question 1
## Prompt: Given a non-empty message string containing only digits, determine the total number of ways to decode it.

The message containing letters from A-Z is being encoded to numbers using the following mapping:

'A' -> 1, 'B' -> 2, ...'Z' -> 26

#### Example 1:
Input: "12"
Output: 2
Explanation: It could be decoded as "AB" (1 2) or "L" (12).
 
#### Example 2:
Input: "226"
Output: 3
Explanation: It could be decoded as "BZ" (2 26), "VF" (22 6), or "BBF" (2 2 6).



In [15]:
class Solution:
  # def __init__(self, numDigits):
    # self.storage = [None] * numDigits
        
  def solve(self, digits: str):
    if len(digits) == 1:
      return 1
    
    toAdd = 0
    prior = digits[-2]
    if prior == '1' or (prior == '2' and int(digits[-1]) < 7):
      toAdd = 1
        
    return self.solve(digits[0: len(digits) - 1]) + toAdd

In [16]:
s = Solution()
print(s.solve('12')) # 2

s = Solution()
print(s.solve('226')) # 3

s = Solution()
print(s.solve('1212')) # 4

s = Solution()
print(s.solve('123456')) # 3

s = Solution()
print(s.solve('1')) # 1

s = Solution()
print(s.solve('227')) # 2

2
3
4
3
1
2


# * * * * * Machine Learning - Design Questions * * * * *
The following questions are for machine learning engineer roles. I have gone through a few during these last 6 months and figured it would be good to share and document. They are system design questions in nature.

# Question 2: Machine Learning for Credit Risk Evaluation
## Prompt: How do you test whether a new credit risk scoring model works? What data would you look at?

#### Initial Thoughts
First, I would like to take a step back and ask what kind of model are we using.

It seems like this is a supervised learning problem so we would need some kind of supervised learning model. Let's first focus on the second half of the question: "What data would you look at?"

#### The Data
The data could consist of a bunch of details on a person. The label for each data record would be whether or not they defaulted. In this way, if we train a model on this data, we are trying to establish a causal relationship between various details on a person and the likelihood they will default.

#### Model
Model wise, I don't see much reason to pick say XGBoost over Random Forests. We can try both and see how the metrics end up being for each. Depending on how much we know about the data and the assumptions we can make, perhaps picking a non-parametric model would be better, in order to fully learn from the data. If the cross validation scoring is worse and we overfit, we can always apply regularization constraints to the non-parametric model, such as limiting the max depth for Random Forests.

Another thing to consider is would this be a classification task or a regression task. If the decision was to classify a user profile into "risky" vs "not risky", then this would be a surefire classification task. However, we probably want to play around with the probabilities and trade off precision and recall. In scikit learn, classifiers provide a handy `predict_proba` method to output the probabilities behind a certain classification. Through this, you can adjust the decision threshold and thus adjust the precision/recall accordingly.

Stepping back to the original question, we can evaluate how well our model works by comparing to some benchmark that's already established in the industry. Perhaps we want to aim for a 0.8 F1 score. In production, we can check the output frequencies to verify the model is working properly. We can also randomly sample the output of the model in production and manually check if the decision is correct.


# Question 3: Machine Learning for Customer Support
## Prompt: Automate some portion of the customer support experience

#### Initial Thoughts
This also feels like a supervised problem. It seems like we would want to have some kind of article output mechanism given some text input.

#### Training Data:
I'd imagine the training data to be the single article exchanges between a customer and a customer support representative. If we wanted, we can add in app behavioral features to supplement the text input data.

```
Customer: How do I buy Bitcoin?
CS representative: Here's a link to a support article on how to buy Bitcoin.
Customer: Thanks!
```

Here we can assume the link provided was enough and the exchange ends. Thus our training data would comprise of these mappings between text input and an article selection output.

"How do I buy Bitcoin?" → article 102 about buying Bitcoin on the app
"Where can I do X" → article 59 about X
...

#### Approach
Let's convert the text input into word vectors - the industry standard `word2vec` format will do.

#### Models
We can use a traditional Random Forest or we can use deep learning by building a large deep neural network.
Either way, we would want to regularize because both tend to overfit if we are not careful.

#### Output
We can approach this by doing multinomial classification.

#### Evaluation
We can see for a given model, how many times does a customer respond, and in particular how many times do they respond with negative feedback. Alternatively, we can provide a yes button and a no button with a label that says "Was the  article useful?"


# Question 4: Machine Learning for Fulfillment Center Optimization
## Prompt: Given some order fulfillment centers, how would you go about improving the efficiency of the fulfillment centers?

#### Initial Thoughts
I'd imagine we want to see how different features of a fulfillment center would affect the fulfillment time.

#### Training Data:
The input data would consist of the time of day the order fulfillment request was made, 

```
Customer: How do I buy Bitcoin?
CS representative: Here's a link to a support article on how to buy Bitcoin.
Customer: Thanks!
```

Here we can assume the link provided was enough and the exchange ends. Thus our training data would comprise of these mappings between text input and an article selection output.

"How do I buy Bitcoin?" → article 102 about buying Bitcoin on the app
"Where can I do X" → article 59 about X
...

#### Approach
Let's convert the text input into word vectors - the industry standard `word2vec` format will do.

#### Models
We can use a traditional Random Forest or we can use deep learning by building a large deep neural network.
Either way, we would want to regularize because both tend to overfit if we are not careful.

#### Output
We can approach this by doing multinomial classification.

#### Evaluation