# Notes

- You can look at Production Machine Learning as both Machine Learning itself, and the knowledge and skils required in modern Software Development
- If you're working on a Machine Learning team in industry, you really need expertise in both Machine Learning and Software to be successful. This is because your team will not just be producing a single result, you'll be developing a product or service that will operate continuously and may be a mission critical part of your company's work
- Oftentimes the most challenging aspects of building machine learning systems turn out to be the things you least expected, like deployment. It's all very well being able to build a model, but getting that into people's hands and seeing how they use it can be very eye-opening

<img src='img/2.png' width="600" height="300" align="center"/>


### The Machine Learning Project Lifecycle
- **Edge device:** A device that is living inside the factory that is manufacturing these smart phones and that edge device will have a piece of inspection software
- **Inspection software:** Piece of software whose job it is to take a picture of the phone to see if there is a scratch, and then make a decision on whether this phone is acceptable or not.
- The above is called **Automated Visual Defect Inspection**

<img src='img/3.png' width="600" height="300" align="center"/>

- Take X images of phones and map them to Y predictions (defective or not?)
- The prediction server is sometimes in the cloud, and sometimes at the edge device as well (in manufacturing, edge deployment is common because you can't have your factory go down every time your internet access goes down)
- **For many ML Projects, maybe only 5-10% (or less) of the code is actually ML model code.**
- **POC** = Proof of concept
- This is one of the reasons why, when you have a proof-of-concept model working, it can still be a lot of work to go to production deployment

<img src='img/4.png' width="600" height="300" align="center"/>

- In this course we learn all of the "other" pieces of software needed for a valuable production deployment

<img src='img/5.png' width="600" height="300" align="center"/>

- The "MLOps Amoeba:"

<img src='img/6.png' width="900" height="450" align="center"/>

## Steps of an ML Project

<img src='img/1.png' width="900" height="450" align="center"/>

- Potential key metrics in an NLP project:
    - Accuracy
    - Latency
    - Throughput

#### Data Definition Questions:
- Is the data labeled consistently?
- For an audio clip: how much silence before/after each clip?
- How to perform volume normalization?


- A lot of progress in ML was driven by ML research working to improve performance on benchmarked datasets
- What are some systemic frameworks to ensure you have high quality data in a live production environment?

#### Three Key Inputs to Training an ML Model
- Code (algorithm/model architecture)
- Hyperparameters
- Data


- In a lot of **research/academia**, the data is held fixed, while the code and hyperparameters may vary in order to try to get good performance
- In contrast, on a lot of **product teams** (if your main goal is to build and deploy a working, valuable machine learning system, it can be more effective to hold the code fixed, and to instead focus on optimizing the hyperparameters and the data.


An ML System = Code + Data (+ Hyperparameters)
- Rather than taking a model-centric view of trying to optimize the code to your fixed data set, for many problems you can use an open source implementation of something and instead just focus on optimizing your data
- Part of the trick is you don't want to just feel like you need to collect more data all time, but instead of just collecting more and more data (which is helpful, but expensive and time consuming) is if error analysis can help you be more targeted in exactly what data to collect, that can help you be much more efficient in building an accurate model.

<img src='img/7.png' width="900" height="450" align="center"/>

- Even after an ML system like the one outlined above is up and running, however, **you still need to monitor and maintain the system.**
- **One of the key challenges when it comes to deployment:**
    - Concept drift
    - Data drift
    
### Course Outline
- Deployment
- Modeling
- Data


- MLOps (Machine Learning Operations) is an emerging discipline, and comprises a set of tools and principles to support progress through the ML project lifecycle.
- A key idea in MLOps is that there are systematic ways to think about scoping, data, modeling, and deployment, and also software tools to support best practices

### Deployment: Key Challenges 
- Two major categories of challenges in deploying an ML model
    - 1) ML or statistical issues
    - 2) Software Engineering Issues
    
#### Concept Drift and Data Drift
- Loosely: What if your data changes after your system has already been deployed
- Data shift types:
    - Gradual change
    - Sudden change/shock
- The terminology to describe different data changes is not used completely consistently 
- **Data Drift:** when the input distrubtion of X changes (in X $\Rightarrow$ Y)
- **Concept Drift:** When the desired mapping from X $\Rightarrow$ Y changes

#### Software Engineering Issues
- A lot of speech systems in cars run on edge devices, and even some speech recognition systems on mobile devices run on a browser or edge device.
- Make sure you have similar CPU/GPU/memory capacity in deployment environment as you do in the testing environment 
- **QPS = Queries Per Second**
- In speech recognition applications, it's not unusual to aim to get a response back to the user within half a second, or 500 ms.
    - Of this 500ms budget, you may be able to allocate 300ms to your speech recognition task; this gives a latency requirement for your system
    - **Throughput** refers to "how many queries per second do you need to handle, given your compute resources?" (maybe given a certain number of cloud servers)
- **Logging:** When building your system, it may be useful to log as much of the data for analysis and review as well as to provide more data for retraining your learning algorithm in the future
- **Security and Privacy:** For different applications the required levels of security and privacy can be very different.

<img src='img/7.png' width="600" height="300" align="center"/>

-**Deploying a system requires two broad sets of tasks:**
    - Writing the **software** to enable you to deploy the system into production.
    - What you need to do to **monitor the system performance and continue to maintain it**, especially in the case of:
        - **Concept Drift**
        - **Data Drift**
- One of the things you see when building machine learning systems is that the practices for the very first deployment will be quite different compared to when you are updating or maintaining a system that has already previously been deployed
- There are some ML Engineers that view deploying the ML model as "getting to the finish line," but unfortunately, making it to the first deployment means you're more like (about) halfway there.
    - Even after you've deployed there's a lot of work to feed the data back and maybe to update the model to keep on maintaining the model even in the face of changes to the data.
    
<img src='img/8.png' width="600" height="300" align="center"/>

### Deployment patterns
- When deploying systems, there are a number of commun use cases, as well as different patterns (as to how you would deploy, depending on your use case).

#### Common Deployment Cases 

<img src='img/x.png' width="600" height="300" align="center"/>