pandas ecosystem 2019
Slides of the pandas ecosystem 2019 talk.
pandas is more than 10 years old now. In this time, it became almost a standard for building data pipelines and perform data analysis in Python. As the popularity of the project grows, it also grows the number of projects that depend or interact with pandas.
This talk will cover this ecosystem of projects around pandas, mainly in the prespective of scalability and performance. Discussing for example how projects like Arrow are key for the future of pandas, or how Dask is overcoming pandas limitations.
In a first part, the talk will focus on pandas itself, its components, and its architecture. This will give the required context for a second part, that will explain related projects, how they interact with pandas, and what the whole ecosystem can offer to users.
Talk being delivered or proposed to the next conferences:
Set up
You can run the slides online using Binder:
Or you can install it locally:
- Install Miniconda 3.7
- Open an Anaconda/UNIX terminal
git clone https://github.com/datapythonista/pandas_ecosystem.gitcd pandas_ecosystemconda env createsource activate pandas_ecosystem(in Windows:conda activate pandas_ecosystem)jupyter notebook- Open
pandas_ecosystem.ipynbnotebook - Click the icon with the bar plot to show as slides with RISE
Speaker
Marc Garcia is a pandas core developer and Python fellow.
He has been working in Python for more than 12 years, and worked as data scientist and data engineer for different companies such as Bank of America, Tesco and Badoo.
He is a regular speaker at PyData and PyCon conferences, and a regular organizer of sprints.
