Polars Gotchas

#1 Sorting rows independent of the other columns

My issue needing clarification was simply why the following commands sorted the “salary” column differently. It seemed like an easy point of confusion:

employees.with_columns(pl.col("salary").sort())
employees.with_columns(pl.col("salary").sort(descending=True))

The reason why the first sort isn’t what you want is because the salary column is sorted without the data frame context. In other words, it returns the salary without including the actual contents of the data frame considered. The salary row is sorted, but only within itself. The second sort considers the whole data frame and accurately includes the rightful person with the salary.

In [5]:
import polars as pl

employees = pl.read_csv("employees.csv", try_parse_dates=True)
employees.head(3)

name,department,email,salary,years_at_company,start_date
str,str,str,i64,i64,date
"""Nicholas Maldonado""","""CEO""","""nicholas.maldonado@polars.io""",250000,9,2016-07-14
"""Michael Fletcher""","""Operations""","""michael.fletcher@polars.io""",96540,9,2016-02-13
"""Jeffrey Tanner""",,"""jeffrey.tanner@polars.io""",126489,10,2015-03-01


In [4]:
employees.with_columns(pl.col("salary").sort())
employees.with_columns(pl.col("salary").sort(descending=True))

name,department,email,salary,years_at_company,start_date
str,str,str,i64,i64,date
"""Nicholas Maldonado""","""CEO""","""nicholas.maldonado@polars.io""",250000,9,2016-07-14
"""Michael Fletcher""","""Operations""","""michael.fletcher@polars.io""",199503,9,2016-02-13
"""Jeffrey Tanner""",,"""jeffrey.tanner@polars.io""",199381,10,2015-03-01
"""Diana Weaver""","""HR""","""diana.weaver@polars.io""",199260,5,2019-11-25
"""Sierra Ross""",,"""sierra.ross@polars.io""",199257,7,2018-02-14
…,…,…,…,…,…
"""James Bryant""",,"""james.bryant@polars.io""",55304,9,2016-05-09
"""Patricia Vazquez""","""Operations""","""patricia.vazquez@polars.io""",55242,6,2019-02-20
"""Katie Clay""",,"""katie.clay@polars.io""",55078,0,2025-02-12
"""Monique Swanson""","""Finance""","""monique.swanson@polars.io""",55012,4,2020-11-07


#1 Eager v Lazy Evaluation

https://docs.pola.rs/user-guide/concepts/lazy-api/

df.lazy().filter(...)  # Returns LazyFrame, doesn't execute
df.lazy().filter(...).collect()  # Actually executes

In [1]:
import polars as pl

# Sample data
df = pl.DataFrame(
    {
        "name": ["Alice", "Bob", "Charlie", "David"],
        "age": [25, 30, 35, 40],
        "city": ["NYC", "LA", "NYC", "Chicago"],
    }
)

# ❌ GOTCHA: This doesn't actually filter anything!
lazy_result = df.lazy().filter(pl.col("age") > 30)
print("Lazy result (no execution):")
print(lazy_result)  # Just shows LazyFrame object
print()

Lazy result (no execution):
naive plan: (run LazyFrame.explain(optimized=True) to see the optimized plan)

FILTER [(col("age")) > (30)]
FROM
  DF ["name", "age", "city"]; PROJECT */3 COLUMNS



In [2]:
# ✅ CORRECT: You must call .collect() to execute
actual_result = df.lazy().filter(pl.col("age") > 30).collect()
print("After .collect():")
print(actual_result)
print()

After .collect():
shape: (2, 3)
┌─────────┬─────┬─────────┐
│ name    ┆ age ┆ city    │
│ ---     ┆ --- ┆ ---     │
│ str     ┆ i64 ┆ str     │
╞═════════╪═════╪═════════╡
│ Charlie ┆ 35  ┆ NYC     │
│ David   ┆ 40  ┆ Chicago │
└─────────┴─────┴─────────┘



In [3]:
# Another common mistake: chaining operations
# ❌ This builds a query plan but doesn't run it
query = df.lazy().filter(pl.col("city") == "NYC").select(["name", "age"]).sort("age")
print("Query plan (not executed):")
print(query.explain())  # Shows what WOULD be executed
print()

# ✅ Execute with .collect()
result = query.collect()
print("Executed result:")
print(result)
print()

# Pro tip: Eager operations work immediately
eager_result = df.filter(pl.col("age") > 30)  # No .lazy(), executes instantly
print("Eager evaluation (immediate):")
print(eager_result)

Query plan (not executed):
SORT BY [col("age")]
  simple π 2/2 ["name", "age"]
    FILTER [(col("city")) == ("NYC")]
    FROM
      DF ["name", "age", "city"]; PROJECT["name", "age", "city"] 3/3 COLUMNS

Executed result:
shape: (2, 2)
┌─────────┬─────┐
│ name    ┆ age │
│ ---     ┆ --- │
│ str     ┆ i64 │
╞═════════╪═════╡
│ Alice   ┆ 25  │
│ Charlie ┆ 35  │
└─────────┴─────┘

Eager evaluation (immediate):
shape: (2, 3)
┌─────────┬─────┬─────────┐
│ name    ┆ age ┆ city    │
│ ---     ┆ --- ┆ ---     │
│ str     ┆ i64 ┆ str     │
╞═════════╪═════╪═════════╡
│ Charlie ┆ 35  ┆ NYC     │
│ David   ┆ 40  ┆ Chicago │
└─────────┴─────┴─────────┘
