# Model Serving Architecture

## Documentation on model servers
---
The video lecture covered some of the most popular model servers: TensorFlow Serving, TorchServer, KubeFlow Serving and the NVidia Triton inference server.  Here are the links to relevant documentation for each of these options:

- <a href = "https://www.tensorflow.org/tfx/serving/architecture">TensorFlow Serving </a>
- <a href = "https://github.com/pytorch/serve">TorchServe</a>
- <a href = "https://www.kubeflow.org/docs/external-add-ons/serving/">KubeFlow Serving</a>
- <a href = "https://developer.nvidia.com/nvidia-triton-inference-server">NVIDIA Triton</a>


## Ungraded Lab - Deploy a ML model with FastAPI and Docker
---
During this lab you will work with FastAPI and Docker to deploy a Dockerized version of your model while learning important concepts for container-based applications.

Follow this <a href = "https://github.com/https-deeplearning-ai/machine-learning-engineering-for-production-public/blob/main/course4/week2-ungraded-labs/C4_W2_Lab_1_FastAPI_Docker/README.md">link</a> to start the lab!

# Scaling Infrastructure

## Learn about scaling with boy bands
---
In the next few minutes you’ll learn about horizontal and vertical scaling. Before going into that, here’s a fun case study on managing scale. 

In this extreme case a famous boy band called ‘One Direction’ hosted a 10-hour live stream on YouTube, where they instructed fans to go visit a web site with a quiz on it every 10 minutes. This led to a really interesting pattern in scalability where the application would have zero usage for the vast majority of the time, but then, every 10 minutes may have hundreds of thousands of people hitting it. 

It’s a complex problem to solve when it comes to scaling. It could be very expensive to operate. Using smart scaling strategies, Sony Music and Google solved this problem very inexpensively. Laurence isn’t allowed to share how much it cost for the cloud services, but, when he and several of the other engineers went out for celebration drinks after the success of the project, the bar bill was more expensive than the cloud bill. (And they didn’t drink a lot!) 

Check out the talk about how scaling worked for this system here: https://www.youtube.com/watch?v=aIxNm5Eed_8

Learn about the event and the app here: https://www.computerweekly.com/news/2240228060/Sony-Music-Google-cloud-One-Directions-1D-Day-event-platform-services


## Ungraded Lab: Intro to Kubernetes
---
In this lab, you will get more hands-on practice with Kubernetes in preparation for this week's graded assignment. If you haven't already, please clone the public repo. You can do so with the following commands:

git clone https://github.com/https-deeplearning-ai/machine-learning-engineering-for-production-public

If you've already cloned this before, please do a git pull to make sure that you have the latest version of the files.

After that, please navigate to course4/week2-ungraded-labs/C4_W2_Lab_2_Intro_to_Kubernetes/ then read the root README.md with your favorite Markdown reader. Alternatively, you can just clone the repo then just go here to use Github's built-in Markdown viewer. Either way, that README file will contain the instructions on how to run the lab in your machine. 

In case you run into any issues, remember to post it in Discourse so mentors and course staff can assist. 

Happy learning!

# Online Inference

## Ungraded Lab - Latency testing with Docker Compose and Locust
---
During this lab you will work with Docker Compose and Locust to perform load testing on the servers you coded in the previous ungraded lab.

Follow this <a href = "https://github.com/https-deeplearning-ai/machine-learning-engineering-for-production-public/blob/main/course4/week2-ungraded-labs/C4_W2_Lab_3_Latency_Test_Compose/README.md">link</a> to start the lab!

# Data preprocessing

## Data preprocessing
---
Apache Beam is a product that gives you a unified programming model that lets you implement batch and streaming data processing jobs on any execution engine. It’s ideally suited for data preprocessing!

Go to https://beam.apache.org/get-started/try-apache-beam/ to try Apache Beam in a Colab so you can get a handle on how the APIs work. Make sure you try it in Python as well as Java by using the tabs at the top. 

Note: You can click the Run in Colab button below the code snippet to launch Colab. In the Colab menu bar, click Runtime > Change Runtime type then select Python 3 before running the code cells. You can get more explanations on the WordCount example here and you can use the Beam Programming Guide as well to look up any of the concepts.

You can learn about TensorFlow Transform here: https://www.tensorflow.org/tfx/transform/get_started . It also uses Beam style pipelines but has modules optimized for preprocessing Tensorflow datasets.

# Batch Processing with ETL

## Ungraded Lab (Optional): Machine Learning with Apache Beam and TensorFlow
---
This optional lab will show you how to preprocess, train, and make batch predictions on a machine learning model using Apache Beam and Tensorflow Transform. To prevent costs of using Cloud resources, you will just run the entire pipeline in Colab. We linked the original article which gives the option to run in GCP in case you want to give it a shot afterward. 

Click <a href = "https://colab.research.google.com/github/https-deeplearning-ai/machine-learning-engineering-for-production-public/blob/main/course4/week2-ungraded-labs/C4_W2_Lab_4_ETL_Beam/C4_W2_Lab_4_Apache_Beam_and_Tensorflow.ipynb">here</a> to launch Colab!

