# [Expressions: Folds](https://docs.pola.rs/user-guide/expressions/folds/)

In [1]:
import operator
import polars as pl

df = pl.DataFrame(
    {
        "label": ["foo", "bar", "spam"],
        "a": [1, 2, 3],
        "b": [10, 20, 30],
    }
)

There are expressions that perform computations across columns.
These are called folds.
You can define your own, or use the build in functions like these:

In [2]:
df.with_columns(
    pl.mean_horizontal(pl.col("a","b")).alias("hor mean"),
    pl.sum_horizontal(pl.col("a","b")).alias("hor sum"),
    pl.min_horizontal(pl.col("a","b")).alias("hor min"),
    pl.max_horizontal(pl.col("a","b")).alias("hor max"),
)

label,a,b,hor mean,hor sum,hor min,hor max
str,i64,i64,f64,i64,i64,i64
"""foo""",1,10,5.5,11,1,10
"""bar""",2,20,11.0,22,2,20
"""spam""",3,30,16.5,33,3,30


## Basic example

Here we use a fold to duplicate the `pl.sum_horizontal` function.

In [3]:
df.select(
    pl.fold(
        acc=pl.lit(0),  # accumulated result, initialized as 0
        function = operator.add,
        exprs=pl.col("a","b"),
    ).alias("sum_fold"),
    pl.sum_horizontal(pl.col("a","b")).alias("sum_horz")
)

sum_fold,sum_horz
i64,i64
11,11
22,22
33,33


another way of doing this

In [4]:
acc = pl.lit(0)
f = operator.add

df.select(
    f(f(acc, pl.col("a")), pl.col("b")),
    pl.fold(acc=acc, function=f, exprs=pl.col("a", "b")).alias("sum_fold"),
    pl.col("a") + pl.col("b"), # my own easy way
)

literal,sum_fold,a
i64,i64,i64
11,11,11
22,22,22
33,33,33


## The initial value acc

The accumulator `acc` is an identity element, the initial values that will be used to do calculations on.
If it's 0, an multiplication with it will always be 0.

In [5]:
df.select(
    pl.fold(
        acc=pl.lit(0),
        function=operator.mul,
        exprs=pl.col("a","b"),
    ).alias("prod")
)

prod
i64
0
0
0


In [6]:
df.select(
    pl.fold(
        acc=pl.lit(1),
        function=operator.mul,
        exprs=pl.col("a","b"),
    ).alias("prod")
)

prod
i64
10
40
90


## conditional

It's possible to apply a conditional across columns by using a fold. The following gives the rows wehere all column values are bigegr than 1.

In [7]:
df = pl.DataFrame(
    {
        "a": [1, 2, 3],
        "b": [0, 1, 2],
    }
)

df.filter(
    pl.fold(
        acc=pl.lit(True),
        function=lambda acc, x: acc & x,
        exprs=pl.all() > 1
    )
)

a,b
i64,i64
3,2


## Folds and string data

In [9]:
df = pl.DataFrame(
    {
        "a": ["a", "b", "c"],
        "b": [1, 2, 3],
    }
)

df.select(pl.concat_str(["a","b"])) # stupid that we can't use pl.col() for this..

a
str
"""a1"""
"""b2"""
"""c3"""
