## Numerical dtypes and precision to improve performance

In [1]:
import polars as pl
import numpy as np

In [2]:
df = pl.DataFrame(
    {
        "ints":[0,1,2],
        "floats":[0.0,1,2]
    }
)
df

ints,floats
i64,f64
0,0.0
1,1.0
2,2.0


Polars defaults to `64-bit` representations for both integers and floats. 

## Integers

Polars has the following integer types:
| dtype | Precision (bits) | Signed |
|-----------|------------------|--------|
| Int8      | 8                | Yes    |
| Int16     | 16               | Yes    |
| Int32     | 32               | Yes    |
| Int64     | 64               | Yes    |
| UInt8     | 8                | No     |
| UInt16    | 16               | No     |
| UInt32    | 32               | No     |
| UInt64    | 64               | No     |


The unsigned integers have no negative number. 

Polars uses them for things like row `indexes`.

## Constraints of lower precision

The `upper_bound` and `lower_bound` expressions show the maximum and minimum values that can be represented at a given precision.

In [5]:
pl.Config.set_fmt_str_lengths(100)
df_ints = pl.DataFrame({"ints": [1, 2, 3]})
(
    df_ints
    .select(
        [
            pl.col("ints").upper_bound().alias("pl.Int64_upper"),
            pl.col("ints").cast(pl.Int32).upper_bound().alias("pl.Int32_upper"),
            pl.col("ints").cast(pl.Int16).upper_bound().alias("pl.Int16_upper"),
            pl.col("ints").cast(pl.Int8).upper_bound().alias("pl.Int8_upper"),
            
            pl.col("ints").lower_bound().alias("pl.Int64_lower"),
            pl.col("ints").cast(pl.Int32).lower_bound().alias("pl.Int32_lower"),
            pl.col("ints").cast(pl.Int16).lower_bound().alias("pl.Int16_lower"),
            pl.col("ints").cast(pl.Int8).lower_bound().alias("pl.Int8_lower"),
        ]
    )
    .unpivot()
    .sort("variable")
)

variable,value
str,i64
"""pl.Int16_lower""",-32768
"""pl.Int16_upper""",32767
"""pl.Int32_lower""",-2147483648
"""pl.Int32_upper""",2147483647
"""pl.Int64_lower""",-9223372036854775808
"""pl.Int64_upper""",9223372036854775807
"""pl.Int8_lower""",-128
"""pl.Int8_upper""",127


## Floats
Polars has the following floating point types:

`Float32`: 32-bit floating point

`Float64`: 64-bit floating point

`Decimal`: 128-bit floating point

The `pl.Decimal` dtype is 128-bit with an optional precision and scale:
- `precision` is the maximum number of digits
- `scale` is the number of digits to the right of the decimal point

In [6]:
pl.Decimal(precision=12, scale=3)

Decimal(precision=12, scale=3)

## Effects of moving to lower precision

### Size in memory
We get the estimated size in bytes of a `DataFrame` with `estimated_size`. 

We can pass the `unit` argument to change from e.g. bytes to kilobytes

In [13]:
df = pl.DataFrame(
    {
        "ints":[0,1,2],
        "floats":[0.0,1,2]
    }
)

df.estimated_size(unit="b")

48

In [15]:
df.with_columns(
    pl.col("ints").cast(pl.Int32),
    pl.col("floats").cast(pl.Float32),
).estimated_size(unit="b")

24

### Computational speed
The effect of lower precision on computational speed is not as simple.

In [16]:
N_rows = 1_000_000
N_columns = 10
df_num = pl.DataFrame(np.random.standard_normal((N_rows,N_columns)))
df_num.head(2)

column_0,column_1,column_2,column_3,column_4,column_5,column_6,column_7,column_8,column_9
f64,f64,f64,f64,f64,f64,f64,f64,f64,f64
0.998553,-0.62898,-2.266764,-0.457566,-0.074031,1.398116,-0.656038,-0.378223,-0.53996,1.774266
-0.967067,0.119973,-0.990401,1.521334,0.492414,0.566557,0.594269,-0.570608,0.589781,0.527651


In [17]:
df_num_32 = df_num.select(
    pl.all().cast(pl.Float32)
)

df_num_32.head(2)

column_0,column_1,column_2,column_3,column_4,column_5,column_6,column_7,column_8,column_9
f32,f32,f32,f32,f32,f32,f32,f32,f32,f32
0.998553,-0.62898,-2.266764,-0.457566,-0.074031,1.398116,-0.656038,-0.378223,-0.53996,1.774266
-0.967067,0.119973,-0.990401,1.521334,0.492414,0.566557,0.594269,-0.570608,0.589781,0.527651


In [18]:
print(f"64-bit DataFrame: {round(df_num.estimated_size(unit='mb'))} Mb")
print(f"32-bit DataFrame: {round(df_num_32.estimated_size(unit='mb'))} Mb")

64-bit DataFrame: 76 Mb
32-bit DataFrame: 38 Mb


### Computational speed at lower precision

Some calculations are faster with 32-bit data.

In [22]:
%%timeit -n1 -r3

2 + 4

867 ns ± 519 ns per loop (mean ± std. dev. of 3 runs, 1 loop each)


In [23]:
%%timeit -n1

df_num.select(
    (pl.all()-pl.all().mean()) / (pl.all().std())
)

45.7 ms ± 17 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [24]:
%%timeit -n1

df_num_32.select(
    (pl.all()-pl.all().mean()) / (pl.all().std())
)

33.7 ms ± 9.32 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [25]:
output64 = df_num.select((pl.all() - pl.all().mean()) / (pl.all().std()))

output32 = df_num_32.select((pl.all() - pl.all().mean()) / (pl.all().std()))

In [26]:
(output64 - output32).head(2)

column_0,column_1,column_2,column_3,column_4,column_5,column_6,column_7,column_8,column_9
f64,f64,f64,f64,f64,f64,f64,f64,f64,f64
3.415e-08,-3.8179e-08,-1.0938e-07,-2.1092e-08,-3.7753e-09,-3.3752e-08,-2.8975e-08,7.8739e-09,-6.8437e-09,1.1848e-08
4.0047e-08,6.1907e-09,-2.8534e-08,9.0309e-08,2.0606e-08,-2.1451e-08,-5.6955e-09,3.6112e-08,-1.7611e-08,3.3944e-08


In [27]:
(output64 - output32).max_horizontal().max()

8.511954012746514e-07

## Exercises

### Exercise 1

In [28]:
N_rows = 1_000_000
N_columns = 10
df_ints_64 = (
    pl.DataFrame(
        np.random.randint(1,10,(N_rows,N_columns),dtype=np.int64)
    )
)
df_ints_64.head(2)

column_0,column_1,column_2,column_3,column_4,column_5,column_6,column_7,column_8,column_9
i64,i64,i64,i64,i64,i64,i64,i64,i64,i64
8,1,8,1,1,6,5,9,9,7
2,9,5,9,5,3,4,8,8,2


Create a `DataFrame` called `df_ints_8` where all the values in `df_ints_64` are cast to `pl.Int8'

In [31]:
df_ints_8 = df_ints_64.with_columns(
    pl.all().cast(pl.Int8)
)

df_ints_8.head()

column_0,column_1,column_2,column_3,column_4,column_5,column_6,column_7,column_8,column_9
i8,i8,i8,i8,i8,i8,i8,i8,i8,i8
8,1,8,1,1,6,5,9,9,7
2,9,5,9,5,3,4,8,8,2
9,4,9,5,2,2,8,4,9,2
7,9,4,8,6,8,7,6,4,6
8,4,6,8,2,8,3,7,5,5


Compare the size of these `DataFrames` in memory in Mb

In [32]:
print(f"64-bit DataFrame: {round(df_ints_64.estimated_size(unit='mb'))} Mb")
print(f"8-bit DataFrame: {round(df_ints_8.estimated_size(unit='mb'))} Mb")

64-bit DataFrame: 76 Mb
8-bit DataFrame: 10 Mb


Compare how long it takes to do a cumulative sum on all the columns of the `DataFrames`

In [33]:
%%timeit -n1

df_ints_64.select(
    pl.all().cum_sum()
)

46.9 ms ± 19.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [34]:
%%timeit -n1

df_ints_8.select(
    pl.all().cum_sum()
)

54.5 ms ± 10.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


Compare how long it takes at 16- and 32-bit precision.

Which precision is fastest?

In [35]:
df_ints_16 = df_ints_64.select(
    pl.all().cast(pl.Int16)
)

df_ints_32 = df_ints_64.select(
    pl.all().cast(pl.Int32)
)

In [36]:
%%timeit -n1

df_ints_16.select(
    pl.all().cum_sum()
)

51.3 ms ± 8.61 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [38]:
%%timeit -n1

df_ints_32.select(
    pl.all().cum_sum()
)

35.7 ms ± 13.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
