## Interview question 3-13: What’s the difference between content-based recommender systems and collaborative filtering recommender systems? When would you use one over the other?

### Example answer
- Content-based recommender systems require knowledge of categorization or traits of the products being recommended to determine product similarity. Collaborative filtering relies on user behavior and user preferences to recommend products that users with similar tastes enjoyed and thus can be more ignorant about the products themselves.

- Hence, content-based recommenders work well when there are not many users or items to construct the user-item matrix for collaborative filtering. In other words, content-based recommenders can still make recommendations when there is the “cold-start” problem, as long as there is information on the features of the items/products and some information on the traits or preferences of the users, without requiring more data on user behavior and interactions like collaborative filtering does.

-On the other hand, collaborative filtering is suited for scenarios in which a lot of data on user behavior is available. At times, it can be hard to gather sufficient, meaningful features that describe the products, which makes content-based recommender systems ineffective. In these cases, collaborative filtering can be more suitable.

### Anecdotally, I once worked on a project where a collaborative filtering algorithm (ALS) worked better for users who had used the web platform for a while but poorly for new users. Using content-based filtering with XGBoost worked better for new users, and we deployed different models depending on what type of user they were. Of course, this is only one example, and it may differ for your case.

## Interview question 3-14: What are some common problems encountered in recommender systems, and how would you resolve them?

### Example answer
- The cold-start problem: this is when there aren’t a lot of past data points available for an ML model to train on. Therefore, the model won’t be able to learn enough patterns from the past to accurately predict the correct results for new data points. In recommender systems, content-based systems can be used, which require less user-behavior data but do still require sufficient product-feature data. This can help with the cold-start problem and still provide recommendations to newer website users.
- Recommender systems can also encounter challenges with data quality, a problem that isn’t exclusive to recommender systems. This can include errors in the source data—for example, due to a bug while ingesting the data. This issue can be addressed by analyzing where the source data has issues and then fixing it with the teams that handle data quality (at times, data engineers or platform engineers, or the MLEs and data scientists themselves). However, identifying that there is a data quality issue is important in the first place, and some preventative measures include using data quality monitoring tools like Great Expectations to alert the team when there are shifts in the data distribution or many missing values, for example.
- When there are many missing values in an ML dataset, this is called sparsity. For example, users who sign up for a web platform with a questionnaire that asks for user preferences might not input several signup fields correctly or might skip them altogether. As an example, when someone signs up for a new Reddit account, there is a prompt that shows them common subreddits (subforums) that they may be interested in, but the user can skip this step. This is by design to make the web signup as frictionless as possible, but scenarios like this could cause data sparsity when you are trying to build a feature set for a RecSys. Possible solutions include imputation (e.g., filling in missing values with the mean or using a tree-based method to fill in the data), using collaborative filtering or matrix factorization techniques, feature engineering, and more.

## Interview question 3-15: What is the difference between explicit and implicit feedback in recommender systems? What are the trade-offs with using each type, respectively?

### Example answer
- Explicit feedback includes user ratings or reviews while implicit feedback has to be derived from available user behavior, such as the time spent on a web page or clickstream behavior. The benefits of explicit feedback include a clearly quantified rating to use in machine learning as well as clarity when compared to implicit feedback. However, explicit feedback can be harder to gather since not all users will leave a review after every interaction (most don’t).

- Thus, measuring the user’s engagement or enjoyment via implicit feedback, such as video watch time or time spent reading on a website, might be used. Of course, this can lead to imperfect measures: is the user spending a long time on the webpage because they enjoyed the content or because they were confused about the text on it? Overall, it is important to consider the trade-offs, but in practice, you can often combine both feedback signals in your ML models.

## Interview question 3-16: How would you address imbalanced data in recommender systems?

### Example answer
- This is a common problem facing ML scenarios: there are a few classes or categories that have many more observations or data points than others, and there are many classes/categories that have so few observations that they form a long tail.41

- To handle this issue, oversampling techniques may be helpful in simpler cases, such as creating more data points of categories that have fewer observations. However, when there are many classes/categories of observations, simple oversampling techniques won’t be able to alleviate this issue. Additional techniques such as feature engineering and ensemble methods can be used instead, or in conjunction with oversampling. An example of ensemble methods could be creating a separate recommender for popular items versus low-engagement items.

- In companies like Amazon and Spotify, combining RecSys with other families such as reinforcement learning helps ensure that long-tail products, artists, or items are shown to users at least some of the time.42