<a href="https://colab.research.google.com/github/ShaunakSen/Data-Science-and-Machine-Learning/blob/master/Grokking_ML.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Grokking ML interview

> Based on the course by edicative.io: https://www.educative.io/courses/grokking-the-machine-learning-interview/

---

### What to expect in a machine learning interview?

Companies hiring for machine learning roles conduct interviews to assess individual abilities in various areas. You can expect the following topics to be covered in these interviews:

![](https://i.imgur.com/l1oW0M1.png)

**Problem-solving/coding**

This portion of the interview is fairly similar to other software engineering coding interviews where the interviewer gives a coding problem, such as perform an ‘In-order tree traversal’, and the candidate is expected to solve that in about half an hour. There is ample content available on how to best prepare for such questions.

**Machine learning understanding**

This area generally focuses on individual understanding of basic ML concepts such as supervised vs. unsupervised learning, reinforcement learning, classification vs. regression, deep learning, optimization functions, and the learning process of various ML algorithms. There are many courses and books that go over these fundamental concepts. They facilitate the learning of ML basics and help candidates prepare for the interview.

**Career Discussions**

Career discussion tends to focus on an individual’s resume (previous projects) and behavioral aspects, such as the ability to work in teams (conflict resolution) and career motivation. Understanding the path you want to take in your career and having the ability to discuss previous experiences and projects is required for this portion.

**Machine learning system design discussion**

This discussion focuses on the interviewee’s ability to solve an __end-to-end__ machine learning problem and consists of open-ended questions. This is an integral part of the interview, and not much helping material is available for it. Hence, this course helps in developing the thought pattern required to approach ML system design questions.

In the ML system design interview portion, candidates are given open-ended machine learning problems and are expected to build an end-to-end machine learning system to solve that problem. Few of such problems could be:

- Build a recommendation system that shows relevant products to users.
- Build a visual understanding system for a self-driving car.
- Build a search-ranking system.

![](https://i.imgur.com/XEboyvw.png)



### Setting up a Machine Learning System

For instance, you may be asked to design a search engine that displays the most relevant results in response to user queries. You could narrow down the problem’s scope by asking the following questions:

- Is it a general search engine like Google or Bing or a specialized search engine like Amazon’s products search?

- What kind of queries is it expected to answer?

This will allow you to precisely define your ML problem statement as follows:

> Build a generic search engine that returns relevant results for queries like "Richard Nixon", "Programming languages" etc.

Or, you may be asked to build a system to display a Twitter feed for a user. In this case, you can discuss how the feed is currently displayed and how it can be improved to provide a better experience for the users.

After inspecting the problem from all aspects, you can easily narrow it down to a precise machine learning problem statement as follows:

> "Given a list of tweets, train an ML model that predicts the probability of engagement of tweets and orders them based on that score."

NOTE: Some problems may require you to think about hardware components that could provide input for the machine learning models.

**Understanding scale and latency requirements**

Another very important part of the problem setup is the discussion about performance and capacity considerations of the system. This conversation will allow you to clearly understand the scale of the system and its requirements.

Let’s look at some examples of the questions you need to ask.

**Latency requirements**

If you were given the search engine problem, you would ask:

    Do we want to return the search result in 100 milliseconds or 500 milliseconds?

Similarly, if you were given the Twitter feed problem, you would ask:

    Do we want to return the list of relevant tweets in 300 milliseconds or 400 milliseconds?

**Scale of the data**

Again, for the search engine problem, you would ask:

    How many requests per second do we anticipate to handle?


    How many websites exist that we want to enable through this search engine?

    If a query has 10 billion matching documents, how many of these would be ranked by our model?

And, for the Twitter feed problem, you would ask:

    How many tweets would we have to rank according to relevance for a user at a time?


The answers to these questions will guide you when you come up with the architecture of the system. **Knowing that you need to return results quickly will influence the depth and complexity of your models. Having huge amounts of data to process, you will design the system with scalability in mind. Find more on this in the architecture discussion section.**

**Defining metrics**

Now that you have figured out what machine learning problem you want to solve, the next step is to come up with metrics. Metrics will help you to see if your system is performing well.

Knowing our success criteria helps in understanding the problem and in selecting key architectural components. __This is why it’s important to discuss metrics early in our design discussions.__

**Metrics for offline testing**

You will use offline metrics to quickly test the models’ performance during the development phase. You may have generic metrics; for example, if you are performing binary classification, you will use AUC, log loss, precision, recall, and F1-score. In other cases, you might have to come up with specific metrics for a certain problem. For instance, for the search ranking problem, you would use __NDCG__ as a metric.



