Skip to content

Latest commit

 

History

History
56 lines (47 loc) · 2.63 KB

39_ploomber.md

File metadata and controls

56 lines (47 loc) · 2.63 KB
## Upcoming Events
Join our Meetup group for more events & subscribe to our YouTube channel!
https://www.meetup.com/data-umbrella

- Event: Ploomber: Maintainable and Collaborative Pipelines in Jupyter
- Speaker:  Eduardo Blancas
- Transcript:  https://github.com/data-umbrella/event-transcripts/blob/main/2020/
- Meetup Event:  https://www.meetup.com/data-umbrella/events/282572465/
- Video:  https://youtu.be/OI8TTH8EsDI
- Slides: 
- GitHub repo: https://github.com/ploomber/ploomber
- Documentation: https://ploomber.io/

## Agenda
00:00:00 Introduction to Data Umbrella 
00:04:00 -- Eduardo Begins Ploomber presentation --
00:04:21 Identifying problems with current practice and pipeline
00:05:58 Jupyter notebooks maintenance problems
00:10:13 Ploomber fixes the problems with notebooks
00:14:19 The ecosystem of the tools
00:16:40 Links to resources
00:17:15 -- Ploomber demo begins --
00:17:37 Notebook content walk-thru
00:19:23 Soorgeon refactor command
00:21:08 Ploomber pipeline (command, plot, and status)
00:22:50 Create a pipeline.yaml file
00:25:10 Building the pipeline
00:26:29 Declaring dependencies
00:29:01 Ploomber scaffold
00:34:07 Incremental Builds
00:35:40 Execution in the cloud (Check out video on Github link)
00:36:00 -- Q&A --
00:36:15 What is a .yaml file?
00:37:26 Is it possible to do the Ploomber pipeline with files other than .pkl files?
00:38:38 Plan on updates?
00:39:51 Where does the scikit learn fits into the Ploomber pipeline?
00:41:03 How Ploomber fits with or compares against Prefect?
00:42:46 The role of ploomber in reproducible data science
00:44:17 Any enterprise support for Ploomber?
00:45:05 -- Ploomber's History and Vision--
 
## Event
Jupyter notebooks are a prevalent tool for Data Science work; however, notebooks can get out of control quickly, making them hard to maintain and collaborate. Ploomber is an open-source framework that brings software engineering best practices to the Jupyter world to create more maintainable projects. This talk will describe the project's motivations and showcase its main features.

## About the Speaker
Eduardo is interested in developing tools to deliver reliable Machine Learning products. Towards that end, he created Ploomber, an open-source Python library to compose production-ready data workflows. Eduardo holds an M.S in Data Science from Columbia University, where he took part in Computational Neuroscience research. Eduardo started his Data Science career in 2015 at the Center for Data Science and Public Policy at The University of Chicago.

Linkedin: https://www.linkedin.com/in/edublancas
Twitter:https://twitter.com/edublancas
GitHub: https://github.com/edublancas