# [Expressions and contexts](https://docs.pola.rs/user-guide/concepts/expressions-and-contexts/)

In [3]:
import polars as pl
from datetime import date

## Expressions

An lazy representation of a data transformation.

In [2]:
bmi_expr = pl.col("weight") / (pl.col("height") ** 2)
bmi_expr

## Context

The four most common contexts that polars has (there are more)

### `select`

In [4]:
df = pl.DataFrame(
    {
        "name": ["Alice Archer", "Ben Brown", "Chloe Cooper", "Daniel Donovan"],
        "birthdate": [
            date(1997, 1, 10),
            date(1985, 2, 15),
            date(1983, 3, 22),
            date(1981, 4, 30),
        ],
        "weight": [57.9, 72.5, 53.6, 83.1],  # (kg)
        "height": [1.56, 1.77, 1.65, 1.75],  # (m)
    }
)

print(df)

shape: (4, 4)
┌────────────────┬────────────┬────────┬────────┐
│ name           ┆ birthdate  ┆ weight ┆ height │
│ ---            ┆ ---        ┆ ---    ┆ ---    │
│ str            ┆ date       ┆ f64    ┆ f64    │
╞════════════════╪════════════╪════════╪════════╡
│ Alice Archer   ┆ 1997-01-10 ┆ 57.9   ┆ 1.56   │
│ Ben Brown      ┆ 1985-02-15 ┆ 72.5   ┆ 1.77   │
│ Chloe Cooper   ┆ 1983-03-22 ┆ 53.6   ┆ 1.65   │
│ Daniel Donovan ┆ 1981-04-30 ┆ 83.1   ┆ 1.75   │
└────────────────┴────────────┴────────┴────────┘


In [8]:
df.select(
    bmi = bmi_expr.round(2),
    avg_bmi = bmi_expr.mean().round(2),
    ideal_max_bmi = 25,
)

bmi,avg_bmi,ideal_max_bmi
f64,f64,i32
23.79,23.44,25
23.14,23.44,25
19.69,23.44,25
27.13,23.44,25


In [9]:
df.select(deviation = (bmi_expr - bmi_expr.mean()) / bmi_expr.std())

deviation
f64
0.115645
-0.097471
-1.22912
1.210946


### `with_columns`

adds new columns to the dataframe

In [10]:
df.with_columns(
    bmi=bmi_expr,
    avg_bmi=bmi_expr.mean(),
    ideal_max_bmi=25,
)

name,birthdate,weight,height,bmi,avg_bmi,ideal_max_bmi
str,date,f64,f64,f64,f64,i32
"""Alice Archer""",1997-01-10,57.9,1.56,23.791913,23.438973,25
"""Ben Brown""",1985-02-15,72.5,1.77,23.141498,23.438973,25
"""Chloe Cooper""",1983-03-22,53.6,1.65,19.687787,23.438973,25
"""Daniel Donovan""",1981-04-30,83.1,1.75,27.134694,23.438973,25


### `filter`

In [11]:
df.filter(
    pl.col("birthdate").is_between(date(1982, 12, 31), date(1996, 1, 1)),
    pl.col("height") > 1.7
)

name,birthdate,weight,height
str,date,f64,f64
"""Ben Brown""",1985-02-15,72.5,1.77


### `group_by`

In [12]:
df.group_by(
    (pl.col("birthdate").dt.year() // 10 * 10).alias("decade")
).agg(pl.col("name"))

decade,name
i32,list[str]
1990,"[""Alice Archer""]"
1980,"[""Ben Brown"", ""Chloe Cooper"", ""Daniel Donovan""]"


It's to make sub-groups, by just adding lines in group_by.

In [16]:
df.group_by(
    (pl.col("birthdate").dt.year() // 10*10).alias("decade"),
    (pl.col("height") < 1.7).alias("short?")
).agg(pl.col("name"))

decade,short?,name
i32,bool,list[str]
1990,True,"[""Alice Archer""]"
1980,True,"[""Chloe Cooper""]"
1980,False,"[""Ben Brown"", ""Daniel Donovan""]"


In [19]:
df.group_by(
    (pl.col("birthdate").dt.year() // 10 * 10).alias("decade"),
    (pl.col("height") < 1.7).alias("short?"),
).agg(
    pl.len(),
    pl.col("height").max().alias("tallest"),
    pl.col("weight", "height").mean().name.prefix("avg_"),
)

decade,short?,len,tallest,avg_weight,avg_height
i32,bool,u32,f64,f64,f64
1980,True,1,1.65,53.6,1.65
1990,True,1,1.56,57.9,1.56
1980,False,2,1.77,77.8,1.76


## Expression expansion

An easy way to apply the same transformation to multiple columns.

In [21]:
expr = (pl.col(pl.Float64) * 1.1).name.suffix("*1.1")
df.select(expr)

weight*1.1,height*1.1
f64,f64
63.69,1.716
79.75,1.947
58.96,1.815
91.41,1.925


Demonstration that expr only works when there are floats

In [22]:
df2 = pl.DataFrame(
    {
        "ints": [1,2,3,4],
        "letters": list("ABCD")
    }
)
df2.select(expr)