# Laymanz Notebooks: ML System Design
Author: Amrbose Ling

**Our goal is to get rid of abstractions and black boxes when learning about ML**

**What is this notebook about?**

In this notebook, we will go over some of the most common fundamental ideas in ML system design. We want these notebooks to be dense and concise. There will be interactive examples, coding exercises and a final challenge made just for you to set you up for success in the later notebooks!

**What do I need to set up my environment?**

All of our notebooks will only use numpy, pytorch, matplotlib for visualizations. If you are very eager to learn about what PyTorch is and how it works, check out this super detailed notebook on PyTorch! If you are running this on colab you can just import the packages, if you are running this notebook locally , just remember to `pip install numpy torch matplotlib`. Check [here](https://pytorch.org/get-started/locally/) to see which torch version depending on the hardware you have.

**How is this notebook structured?**

Each notebook will have

[**How to use matplotlib for plotting**](https://colab.research.google.com/github/amanchadha/aman-ai/blob/master/matplotlib.ipynb#scrollTo=1-AcMM6NSmP-)


## Breakdown
*   Overview of ML systems
*   Intro to ML system design
*   Data Engineering
*   Feature Engineering
*   Model development & Evaluation
*   Model Deployment
*   Monitoring & Continual Learning
*   Case Study 1: 
*   Case Study 2:
*   Case Study 3: 
*   Case Study 4:

## Interview Preparation:
* Linear Algebra / Calculus questions:
* Coding Questions:
* Stats questions:
* ML algorithms:
* ML workflows:


https://huyenchip.com/ml-interviews-book/contents/5.2.1.2-questions.html



# Chapter 1: Overview of ML systems

ML systems learn complex patterns from existing data and use these patterns to make predictions on unseen data

When to use ML:
1) When the system has the capacity to learn, when there is data for it to learn (i.e. predicting rental prices of AirBnB places)
2) When there are complex patterns to learn (stock prices, crypto prices, object detection, speech recognition)
    - Traditional software: Inputs, Pattern -> output
    - ML: Inputs, Outputs -> pattern
3) There is data and you can collect it
    - you need ML algos to learn from data
    - zero shot learning: use an ML system  
    - continual learning: deploy a model and learn from incomign data in production
4) Its a prediction problem
    - make approximations with ML
5) Unseen data share patterns with training data
    - temporal relevancy of the data is very important (data from 10 years ago vs rn)
    - you can make assumtions about user behaviour
6) Its repetitive 
    - if it is repetitive there should be a pattern to be learnt from it
7) Cost of wrong predictions is cheap
    - wrong predictions wont have catastrophic consequences
8) Can be scaled
    - can make a lot of predictions (inferences) at the same time
    

**ML in research**
- SOTA performance
- fast training high throughput
- data is static
- dont care about fairness & interpretability


**ML in production**
- different stakeholders have different requirements
- fast inference and low latency
- data is constantly changing
- data in production is very unstructed, messy, noisy, imbalanced
- must consider fairness and interpretability 

**NOTE**:
* latency: how long it takes to receive 1 query
* throughput: how many queries can you spit out in a time frame
* batched queries: can mean higher latency AND throughout
* latency matterns a lot in production


# Chapter 2: Intro to ML System Design


Business objective & ML objective
- you want ML objective to align with the business objective
- some use cases of ML in business are very common (ad-click through rate, fraud detection), its easy to make that ML -> business mapping  


ML systen requirements:
- Reliability: 
    - systme needs to perform the correct function despite faults, human error
    - failing in production:
        - operational failure / violation: timeout, 404 erorrs, OOM, seg fault
        - performance failure: not making the right predictions
        - software system failures:
        - ML specific failures:
- Scalability: 
    - scaling of compute resources (can be super costly)
- Maintainability:
    - structuring workloads
    - set up infrastructure 
    - versioning of code, data, models

- Adaptability:
    - adapt to data distribution shifts
    - change in business requirements
    - allowing updates wihtout service interruption


Iterative process:
- Choose a metric to optimize (impressions, number of times an ad is shown)
- collect data
- engineer features
- train models
- realize theres errors in the labels, relabelling the data
- train the model again (repeat)
- model performs well, but need to update on newer data
- train the model again
- deploy
- repeat...

1. Project scoping
2. Data engineering
3. ML model development
4. Deployment
5. Monitoring and continual learning
6. Business analysis
**NOTE**: and you always go back and forth in between these steps

Types of ML tasks:
- Regression
- Classification
    - Binary
    - Multiclass
        - Low cardinality
        - High cardinality (lots of classes, need lots of data)
        
    - Multilabel
