For each of the questions below, answer as if you were in an interview, explaining and justifying your answer with two to three paragraphs as you see fit. For coding answers, explain the relevant choices you made writing the code.

\section*{Question 1}

We A/B tested two styles for a sign-up button on our company's product page. 100 visitors viewed page A, out of which 20 clicked on the button; whereas, 70 visitors viewed page B, and only 15 of them clicked on the button. Can you confidently say that page A is a better choice, or page B? Why?

#### Answer:
We can use hypothesis testing to tell us whether there is a statistical difference between A and B, or whether the variations we found are simply due to random chance. Our null hypothesis in this case will be that the conversion rates between A and B are the same. Mathematically:

$$H_0:A_c-B_c=0$$

\noindent Where A<sub>c</sub> is the conversion rate of A and B<sub>c</sub> is the conversion rate of B. Before we go any further, let's fill in some of our variables with the information provided.

$$A_c=20/100=0.20$$
$$B_c=15/70=0.2142$$
$$N_a=100$$
$$N_b=70$$

\noindent Let's also pick a confidence level of 95% for our test. Our z-score is given by the formula:

$$z = \frac{A_c-B_c}{\sqrt{\frac{A_c(1-A_c)}{N_a} + \frac{B_c(1-B_c)}{N_b}}}$$

\noindent Plugging in our values and solving, we get a z-score of -0.2243. From memory, the critical z-value for a two tailed test at a 95% confidence level is -1.96. Since the z-value of our test is smaller than the critical z-value, we fail to reject the null hypothesis that the difference between conversion rates for A and B is 0. \bigskip

\noindent In other words, we cannot confidently say that page A or B is a better choice. Statistically, their conversion rates are the same at a 95% confidence level.

\section*{Question 2}

\noindent Can you devise a scheme to group Twitter users by looking only at their tweets? No demographic, geographic or other identifying information is available to you, just the messages they’ve posted, in plain text, and a timestamp for each message. \bigskip

\noindent In JSON format, they look like this:

```
{
    "user_id": 3,
    "timestamp": "2016-03-22_11-31-20",
    "tweet": "It's #dinner-time!"
}
```

\noindent Assuming you have a stream of these tweets coming in, describe the process of collecting and analyzing them, what transformations/algorithms you would apply, how you would train and test your model, and present the results.

* Candidate shows the ability to design a machine learning pipeline that can process streaming data and present desired results.
* Answer states any assumptions made and frames the problem as a machine learning task; clearly specifies any data-processing, cleaning and transformation steps; picks a suitable machine learning model to use (with justification); describes the training and testing procedure including a suitable evaluation metric; and finally, defines an interface through which results are presented and how it is updated as more streaming data comes in. The space required to store the model and time required to train/update it should also be discussed.
* Answer frames the problem as a machine learning task; specifies any data-processing and transformation steps; picks a suitable machine learning model to use; describes the training and testing procedure; and finally, defines an interface through which results are presented.

#### Answer:

Off the top of my head, this sounds like a clustering problem. It also sounds like we want to group Twitter users in real time, so the speed of our algorithm will be important. The first thing that comes to my mind is the Minibatch K-means algorithm, which attempts to assign clusters based on small randomly chosen batches of data. We lose accuracy with the minibatch approach, but the gains in time should made real-time clustering possible.

Data-processing:

Training and Testing:

Evaulation:

Interface and update:

\section*{Question 3}

\noindent In a classification setting, given a dataset of labeled examples and a machine learning model you're trying to fit, describe a strategy to detect and prevent overfitting.

#### Answer:

\noindent Overfitting occurs when our model does great when evaluated on the data used to fit the model, but generalizes poorly to new data. For example, a model could learn the following data by heart. The blue line would be a reasonable model, but the red line is an example of overfitting. \bigskip

image holder

\noindent To detect overfitting, we can compare our training results to the results for new data. If we find that our predictive power on the training data is extrememly high, but the predicitive power on new data is no better than random, it most likely means that our model is overfitting. \bigskip

\noindent The solution to preventing overfitting involves two main steps. The first involves splitting our data into 3 sets. One set is used for training, one for validation, and one for evaluation. This can be done with a kfolds algorithm, or any variation of a train test split. Once we have our splits, we fit our model on the training data, improve the model on the validation data, and finally test the data on the evaulation data. By building out our model over multiple subsets of data we will make it more robust and less prone to overfitting. The second way to prevent overfitting is to to use simpler models. For example, if a linear classifier works well, there is no need to use higher dimenionality model. \bigskip

\noindent Learning curves and complexity curves are another great way to visualize how different parameters effect our models overfitting. They show us how our training and testing scores are effected by different parameter choices.

\section*{Question 4}

\noindent Your team is designing the next generation user experience for your flagship 3D modeling tool. Specifically, you have been tasked with implementing a smart context menu that learns from a modeler’s usage of menu options and shows the ones that would be most beneficial. E.g. I often use Edit > Surface > Smooth Surface, and wish I could just right click and there would be a Smooth Surface option just like Cut, Copy and Paste. Note that not all commands make sense in all contexts, for instance I need to have a surface selected to smooth it. How would you go about designing a learning system/agent to enable this behavior?

#### Answer:

\noindent The first thing that comes to mind for this problem is using a reinforcement learning approach. For example, a Q-learning agent could be trained to display the best context menu for a given environment. Q-learning is a model-free reinforcement learning technique that learns an action-value function. In this case, the actions would be what menu options to show up in the context. For example, above the default context menu items could be 3 smart context options. \bigskip

\noindent The actions could be populated based on what users use. For example, the first time a user goes through the menu to find Smooth Surface, Smooth Surface could be added to the action space. This way if a user only uses two menu options, the agent would only need to learn to display two menu options. If a user uses every menu option there is, then the action space would swell up accordingly. \bigskip

\noindent The rewards would be positive if a user opens the smart context menu and finds what they are looking for, ranked by its position in the list. For example, if the user is looking for Smooth Surface, and it is the first item on the smart context menu, the reward might be 10. If it is the third down, the reward might only be 5, and so on. \bigskip

\noindent The states or environments would be represented by whether something is selected or what layer your viewing the project in. I am not that familiar with 3D modeling tools, but I assume these could be useful state representations. \bigskip

\noindent A problem with Q-learning algorithms is that it works by mapping a Q-value for every state-action pair. As the state and action space gets more complex, the space needed to map everything quickly becomes computationally impossible. Instead, we can use a Neural Network to approximate the state-action pairs. This will solve the complex space problem, but will also add time for training the Neural Network. To speed this up, maybe a minibatch approach could be implemented. For example, every 10 attempts by the user to use the smart context menu prompts training on a random batch.

\section*{Question 5}

\noindent Give an example of a situation where regularization is necessary for learning a good model. How about one where regularization doesn't make sense?

#### Answer:

\noindent Regularization is a technique to avoid overfitting when training a learning algorithm. For example, imagine we want to predict house prices with a regression. We could start with one feature, maybe size in square feet, and use that to predict the price of a house. Chances are our one feature model will be too simple. In other words it will underfit the data and have high bias. \bigskip

\noindent What if we add two more features? Maybe add quality of the school district and walkability score. The model will probably do better. Why not take this to the extreme and add 100 features? In this case, we would begin to overfit the data and have high variance. Our model would start to pick up on all the noise in our data and fail to generalize. \bigskip

\noindent Regularization is an automatic form of feature selection that trades off between high bias and high variance to find the optimal balance. For example, in regression we could use lasso regularization. With lasso, instead of just minimizing the squared errors, we also minimize the number of features we are using. In other words, the gain for adding a new feature has to be more than the punishment for adding the new feature. \bigskip

\noindent In our house price example, regularization would allow us to filter out the features that don't improve fit without overfitting. \bigskip

\noindent In general, the more samples we have to train our model, or the less complex our model is, the less need there will be for regularization. For example, if we wanted to predict income levels with a regression, and our only two features were age and whether a person had a college degree, and if we had millions of samples to work with, regularization would be less necessary.

\section*{Question 6}

\noindent Your neighborhood grocery store would like to give targeted coupons to its customers, ones that are likely to be useful to them. Given that you can access the purchase history of each customer and catalog of store items, how would you design a system that suggests which coupons they should be given? Can you measure how well the system is performing?

* Candidate must be able to situate a machine learning solution within a larger ecosystem of software components.
* Answer clearly describes different software components in the system, including any applications, databases, APIs, etc., as well as how they interact with each other. It captures what data needs to be stored/retrieved for each customer, where the intended machine learning solution should be deployed, what events/triggers need to be considered, what model to use and how to train it, and how the results are communicated back in order to generate and present/send coupons. A scheme for gathering some sort of feedback to measure performance, and incorporating that into the learning algorithm, must also be laid out.
* Answer describes the different software components in the system, including any applications, databases, APIs, etc. It captures what data needs to be stored/retrieved for each customer, what model to use and how to train it, and how the results are communicated back in order to generate and present/send coupons. A scheme for gathering some sort of feedback to measure performance must also be laid out.

#### Answer:

\section*{Question 7}

Pick a company of your choice and briefly describe a hypothetical Machine Learning Engineer role at that company you would like to apply for. \bigskip

\noindent Now, if you were hired for that position starting today, how do you see your role evolving over the next year? What are your long-term career goals, and how does this position help you achieve them?

#### Answer:

\noindent As an analyst and machine learning engineer at XXXXXX Financial Technologies, I would be working to identify the best methods for distinguishing between companies that are likely to be good investmetns and those that are likely to disappoint. For example, building a reinforcement learning platform that could read in financial statements and return whether or not a company is likely to outperform. This is no easy problem to solve, and the development of multiple models working in tandem would likely be necessary. \bigskip

\noindent As my role matured over the next year, I would research new advancements in deep learning and how they could be applied towards our goal. For example, as text processing techniques advance, it might be possible to input raw conference call transcripts into the learner. The Holy Grail would be a platform that could take in raw annual report filings and learn what is valuable information and what is not. \bigskip

\noindent My long term goal is to start my own asset management firm. However, this goal is still decades away. In order to get from where I am professionally today to where I want to be, my focus will be on gaining experience and furthering my education in the field. My role at your firm would be ideal for accomplishing this.