Python-based workflows - from notebook to production
This material is for intermediate-level data scientists, developers, data engineers, or researchers. Specifically, this material is for those who have some interest in or experience with developing ML/AI models on sample data sets (maybe in Jupyter), but who might struggle to understand the full ML/AI workflow and scale, deploy, and productionize their work. They need to understand which Python tools to use as they scale workflows beyond the notebook, and they need to understand how to manage and distribute work on large data.
- Slides from the class
- Instructor - Daniel Whitenack
- Prerequisties/getting started:
- You will need to ssh into a cloud instance. Remind yourself of how to do that and install a client if needed:
- You will also need to work a bit at the command line. If you are new to the command line or need a refresher, look through this quick tutorial.
- If you need further help productionizing ML/AI workflows, want to bring this class to your company, or just have ML/AI related questions, Ardan Labs is here to help! Reach out to the instructor using the links above or via the Ardan Labs website.
Note: This material has been designed to be taught in a classroom environment. The code is well commented but missing some of the contextual concepts and ideas that will be covered in class.
Introduction to Python tooling and ML/AI workflows
This material introduces some of the commonly used Python tooling for data science and ML/AI. It also introduces the ML/AI model development workflow. Once you are done with this material, you will understand what sets of tools are used in producing AI models, and how data scientists often interact with those tools.
This material introduces some pain points and pitfalls that people fall into when trying to productionize data science work. Once you are done with this material, you will understand what the common pain points are and the guiding principles that will help us overcome them.
Using frameworks that scale
This material introduces some methods and frameworks that will help our workflow scale beyond local sample data. Once you are done with this material, you will be exposed to some of the more scalable Python frameworks in the ecosystem (e.g., PyTorch) and have some experience refactoring modeling code for production.
Breaking our workflow up into pipeline stages
This material walks you through breaking up a workflow, contained in a Jupyter notebook, into separate, scalable pipeline stages. Once you are done with this material, you will understand which portions of a ML/AI pipeline might benefit from being managed in isolation. You will also get some experience writing code for specific stages of a data pipeline (pre-processing, training, inference).
Deploying, scaling, and managing our pipeline
This material introduces you to methods for orchestrating a multi-stage AI pipeline at scale. Once you are done with this material, you will understand various methods for deploying multi-stage pipelines along with their trade offs. You will also get hands-on experience deploying a multi-stage AI pipeline on a remote cluster.
All material is licensed under the Apache License Version 2.0, January 2004.