# Polars' craziest* feature

*debatable

![](logo.png)

## With: Marco Gorelli (Quansight Labs, volunteer maintainer Polars)

## Wait, what's Polars?

- DataFrame library
- Written in Rust
- `pip install polars` is all you need!
- Blazingly fast!

![](duckdb_benchmark.png)

## Right then, let's try it!

In [5]:
import polars as pl
import numpy as np
df = pl.DataFrame({'x': np.random.randn(50_000_000)})
df.head()

x
f64
1.850064
1.55629
-0.740405
0.079336
-1.07925


In [6]:
%%time
result = df.with_columns(
    x_squared = pl.col.x.map_elements(lambda x: x**2),
)
result.head()

Expr.map_elements is significantly slower than the native expressions API.
Only use if you absolutely CANNOT implement your logic otherwise.
In this case, you can replace your `map_elements` with the following:
  - pl.col("x").map_elements(lambda x: ...)
  + pl.col("x") ** 2

Expr.map_elements is significantly slower than the native expressions API.
Only use if you absolutely CANNOT implement your logic otherwise.
In this case, you can replace your `map_elements` with the following:
  - pl.col("x").map_elements(lambda x: ...)
  + pl.col("x") ** 2

Expr.map_elements is significantly slower than the native expressions API.
Only use if you absolutely CANNOT implement your logic otherwise.
In this case, you can replace your `map_elements` with the following:
  - pl.col("x").map_elements(lambda x: ...)
  + pl.col("x") ** 2

Expr.map_elements is significantly slower than the native expressions API.
Only use if you absolutely CANNOT implement your logic otherwise.
In this case, you can replac

PanicException: python function failed KeyboardInterrupt: 


KeyboardInterrupt



In [3]:
%%time
result = df.with_columns(
    x_squared = pl.col("x") ** 2
)
result.head()

CPU times: user 74.2 ms, sys: 21.4 ms, total: 95.6 ms
Wall time: 92.6 ms


x,x_squared
f64,f64
0.214714,0.046102
0.534835,0.286048
0.212716,0.045248
-1.176031,1.383048
-1.412341,1.994708


We got a huge performance increase for free!

## How does this work?

### 1. disassemble function into bytecode:

In [4]:
import dis

list(dis.get_instructions(lambda x: x**2))

[Instruction(opname='LOAD_FAST', opcode=124, arg=0, argval='x', argrepr='x', offset=0, starts_line=3, is_jump_target=False),
 Instruction(opname='LOAD_CONST', opcode=100, arg=1, argval=2, argrepr='2', offset=2, starts_line=None, is_jump_target=False),
 Instruction(opname='BINARY_POWER', opcode=19, arg=None, argval=None, argrepr='', offset=4, starts_line=None, is_jump_target=False),
 Instruction(opname='RETURN_VALUE', opcode=83, arg=None, argval=None, argrepr='', offset=6, starts_line=None, is_jump_target=False)]

### 2. parse the bytecode to figure out what the user wrote

`x`, `2`, `"binary power"` ==> `x**2`

### 3. educate the user on how they could have written their code more efficiently!

```diff
- pl.col("x").map_elements(lambda x: ...)
+ pl.col("x") ** 2
```

## But...why? Why not just do it the fast way for users?

A: Because then, users wouldn't learn!

Unfortunately, the warning above can only be emitted for relatively simple cases.

Educating users > doing things for them but only in some cases

## What more can Polars do for me?

Reach out to me on LinkedIn: https://www.linkedin.com/in/marcogorelli/

I post Polars tips once every whenever I feel like it

And also offer **Polars corporate training**