Switch branches/tags
Find file History
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
..
Failed to load latest commit information.
deploying_managing
frameworks_that_scale
introduction
pipeline_stages
productionizing
README.md

README.md

Python-based workflows - from notebook to production

This material is for intermediate-level data scientists, developers, data engineers, or researchers. Specifically, this material is for those who have some interest in or experience with developing ML/AI models on sample data sets (maybe in Jupyter), but who might struggle to understand the full ML/AI workflow and scale, deploy, and productionize their work. They need to understand which Python tools to use as they scale workflows beyond the notebook, and they need to understand how to manage and distribute work on large data.

Note: This material has been designed to be taught in a classroom environment. The code is well commented but missing some of the contextual concepts and ideas that will be covered in class.

Introduction to Python tooling and ML/AI workflows

This material introduces some of the commonly used Python tooling for data science and ML/AI. It also introduces the ML/AI model development workflow. Once you are done with this material, you will understand what sets of tools are used in producing AI models, and how data scientists often interact with those tools.

Introduction to Python tooling and ML/AI workflows

Productionizing ML/AI

This material introduces some pain points and pitfalls that people fall into when trying to productionize data science work. Once you are done with this material, you will understand what the common pain points are and the guiding principles that will help us overcome them.

Productioning ML/AI

Using frameworks that scale

This material introduces some methods and frameworks that will help our workflow scale beyond local sample data. Once you are done with this material, you will be exposed to some of the more scalable Python frameworks in the ecosystem (e.g., PyTorch) and have some experience refactoring modeling code for production.

Using frameworks that scale

Breaking our workflow up into pipeline stages

This material walks you through breaking up a workflow, contained in a Jupyter notebook, into separate, scalable pipeline stages. Once you are done with this material, you will understand which portions of a ML/AI pipeline might benefit from being managed in isolation. You will also get some experience writing code for specific stages of a data pipeline (pre-processing, training, inference).

Breaking our workflow up into pipeline stages

Deploying, scaling, and managing our pipeline

This material introduces you to methods for orchestrating a multi-stage AI pipeline at scale. Once you are done with this material, you will understand various methods for deploying multi-stage pipelines along with their trade offs. You will also get hands-on experience deploying a multi-stage AI pipeline on a remote cluster.

Deploying, scaling, and managing our pipeline


All material is licensed under the Apache License Version 2.0, January 2004.