Our MLOps Zoomcamp course
- Sign up here: https://airtable.com/shrCb8y6eTbPKwSTL (it's not automated, you will not receive an email immediately after filling in the form)
- Register in DataTalks.Club's Slack
- Join the
#course-mlops-zoomcamp
channel - Tweet about the course!
Teach practical aspects of productionizing ML services — from collecting requirements to model deployment and monitoring.
Data scientists and ML engineers. Also software engineers and data engineers interested in learning about putting ML in production
- Python
- Docker
- Being comfortable with command line
- Prior exposure to machine learning (at work or from other courses, e.g. from ML Zoomcamp)
- Prior programming experience (1+ years of professional experience)
Course start: 16 of May
There are five modules in the course and one project at the end. Each module is 1-2 lessons and homework. One lesson is 60-90 minutes long.
This is a draft and will change.
- What is MLOps
- Running example: NY Taxi trips dataset
- Why do we need MLOps
- Course overview
- Environment preparation
- CRISP-DM, CRISP-ML
- ML Canvas
- Data Landscape canvas
- (optional) MLOps Stack Canvas
- Documentation practices in ML projects (Model Cards Toolkit)
Instructors: Larysa Visengeriyeva
2 hours
- Tracking experiments
- MLFlow
- Model registry
- ML pipelines, TFX, Kubeflow Pipelines
- Scheduling pipelines (Airflow?)
- Model testing
Instructors: Cristian Martinez, Theofilos Papapanagiotou
Homework:
- ? something with MLFlow perhaps as it’s easier to run locally
- Batch vs online
- For online: web services vs streaming
- Serving models with Kubeflow+Kubernetes (refer to ML Zoomcamp)
- Serving models in Batch mode (AWS Batch, Spark)
- Streaming (Kinesis/SQS + AWS Lambda)
Instructors: Alexey Grigorev
Homework:
- Deploy a model with Spark (local mode)
- ML monitoring VS software monitoring
- Data quality monitoring
- Data drift / concept drift
- Batch VS real-time monitoring
- Tools: Evidently
- Tools: Prometheus/Grafana
Instructors: Emeli Dral
Homework:
- ?
Other things:
- Data quality issues
- Alerts
- Devops
- Virtual environments and Docker
- Python: logging, linting
- Testing: unit, integration, regression
- CI/CD (github actions)
- Infrastructure as code (terraform, cloudformation)
- Cookiecutter
- Makefiles
Instructors: Sejal Vaidya
Homework:
- ?
- End-to-end project with all the things above
To make it easier to connect different modules together, we’d like to use the same running example throughout the course.
Possible candidates:
- https://www1.nyc.gov/site/tlc/about/tlc-trip-record-data.page - predict the ride duration or if the driver is going to be tipped or not
- Larysa Visengeriyeva
- Cristian Martinez
- Theofilos Papapanagiotou
- Alexey Grigorev
- Emeli Dral
- Sejal Vaidya
- Machine Learning Zoomcamp - free 4-month course about ML Engineering
- Data Engineering Zoomcamp - free 9-week course about Data Engineering
I want to start preparing for the course. What can I do?
If you haven't used Flask or Docker
- Check Module 5 form ML Zoomcamp
- The section about Docker from Data Engineering Zoomcamp could also be useful
If you have no previous experience with ML