Skip to content
Talk at SciPy LatAm: The pandas of the future
Jupyter Notebook CSS Python
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
img
.gitignore
README.md
emoji.py
environment.yml
rise.css
the_pandas_of_the_future.ipynb

README.md

The pandas of the future

Material of my talk at SciPy LatAm 2019.

Abstract

Since the start of the project 10 years ago, pandas has grown in popularity, to become almost a standard for data wrangling and analysis in Python.

While pandas has served well the needs of many of its users, several new projects have been started in the last years to respond to needs that pandas is not able to address. For example, Dask provides a pandas-like API to distribute jobs over a cluster. Vaex provides a pandas-like API to perform out-of-core computation. cuDF is reimplementing a pandas-like dataframe for GPUs. Koalas implements a pandas-like API for Apache Spark. And there are even more projects like Modin or static-frame.

At the same time, pandas itself has been trying to address new needs from the users, like adding the ability to use third-party data types (besides the original numeric and datetime ones from NumPy). For example CyberPandas extends pandas with an efficient IP address type. And GeoPandas does the same with geolocations. Other work has been done to break parts of pandas, so it can be better extended, and used to solve new problems. For example, pandas 0.25 decoupled all plotting code in pandas, to allow the use of third-party plotting libraries. This allows for example to generate the same plots pandas is able to generate, but interactive, using Bokeh, HoloViews, Altair or others.

The future of pandas and its ecosystem is uncertain. In this talk I'll give an insider point of view on what can be broken in pandas, so many projects are being implemented to address the same needs. How pandas can be broken even more, to cover more user needs. What are the current and planned developments, and what users can expect from pandas in the future.

Speaker

Marc Garcia is a pandas core developer and Python fellow. Marc is also a co-organizer of EuroSciPy and the London Python sprints group.

He has been working in Python for more than 12 years, and worked as data scientist and data engineer for different companies such as Bank of America, Tesco and Badoo.

He is a regular speaker at PyData and PyCon conferences, and a regular organizer of sprints.

Website | Twitter | LinkedIn

Set up

You can run the slides online using Binder:

Binder

Or you can install it locally:

  • Install Miniconda 3.7
  • Open an Anaconda/UNIX terminal
  • git clone https://github.com/datapythonista/pandas_future.git
  • cd pandas_future
  • conda env create
  • source activate pandas_future (in Windows: conda activate pandas_future)
  • jupyter notebook
  • Open the_pandas_of_the_future.ipynb notebook
  • Click the icon with the bar plot to show as slides with RISE
You can’t perform that action at this time.