# Polars' craziest* feature

*debatable

![](logo.png)

## With: Marco Gorelli
## (Quansight Labs, maintainer Polars & pandas)

In [None]:
import polars as pl

## Wait, what's Polars?

- DataFrame library
- Written in Rust
- `pip install polars` is all you need!
- Blazingly fast!

![](duckdb_benchmark.png)

## Right then, let's try it!

I have a dataframe with 50 million rows, each of which has an id.

I'd like to map:
- 0 -> 'Regina'
- 1 -> 'Karen'
- 2 -> 'Gretchen'
- 3 -> 'Cady'

In [None]:
import polars as pl
import numpy as np
import random


id_to_name = {
    0: 'Regina',
    1: 'Karen',
    2: 'Gretchen',
    3: 'Cady',
}

df = pl.DataFrame(
    {'id': np.random.randint(0, 4, size=50_000_000)}
)
df.head()

In [None]:
%%time

result = df.with_columns(
    name = pl.col("id").map_elements(lambda x: id_to_name[x])
)
result.head()

We got a > 10x performance increase for free!

## How does this work?

### 1. disassemble function into bytecode:

In [None]:
import dis

list(dis.get_instructions(lambda x: id_to_name[x]))

### 2. parse the bytecode to figure out what the user wrote

`id_to_name`, `x`, `"binary_subscr"` ==> `id_to_name[x]`

### 3. educate the user on how they could have written their code more efficiently!

```diff
- pl.col("id").map_elements(lambda x: ...)
+ pl.col("id").map_dict(id_to_name)
```

## But...why? Why not just do it the fast way for users?

Answer: To teach you a lesson!

# That's Polars' craziest* feature! Thanks all!

*debatable

## What more can Polars do for me?

Reach out to me on LinkedIn: https://www.linkedin.com/in/marcogorelli/

I post Polars tips once every whenever I feel like it

And also offer **Polars corporate training**

![](learning.png)