## Numerical dtypes and precision

By the end of this lecture you will be able to:
- get the upper and lower bounds you can represent at a given precision
- estimate the size of a `DataFrame` in memory
- compare the effect of working with 32-bit and 64-bit representations

In this lecture we examine the affect of varying the numerical precision on computational speed, memory usage and precision. In some use cases this can be a simple way of improving performance and reducing memory usage.

In [None]:
import polars as pl
import numpy as np

We create a simple `DataFrame` to see the default dtypes for integers and floats

In [None]:
df = pl.DataFrame(
    {
        "ints":[0,1,2],
        "floats":[0.0,1,2]
    }
)
df

Polars defaults to 64-bit representations for both integers and floats. In this notebook we examine the affect of varying the numerical precision.

## Integers

Polars has the following integer types:

`Int8`: 8-bit signed integer

`Int16`: 16-bit signed integer

`Int32`: 32-bit signed integer

`Int64`: 64-bit signed integer

`UInt8`: 8-bit unsigned integer

`UInt16`: 16-bit unsigned integer

`UInt32`: 32-bit unsigned integer

`UInt64`: 64-bit unsigned integer

The unsigned integers are `0` and positive values only. Polars uses them for things like row indexes.

Polars generates an `Exception` if we try to cast a negative integer to an unsigned integer dtype.

## Floats
Polars has the following floating point types:

`Float32`: 32-bit floating point

`Float64`: 64-bit floating point

## Constraints of lower precision
With a lower precision the range of values we can represent is smaller.

The `upper_bound` and `lower_bound` expressions show the maximum and minimum values that can be represented at a given precision.

In [None]:
pl.Config.set_fmt_str_lengths(100)
dfInts = pl.DataFrame({"ints": [1, 2, 3]})
(
    dfInts
    .select(
        [
            pl.col("ints").upper_bound().alias("pl.Int64_upper"),
            pl.col("ints").cast(pl.Int32).upper_bound().alias("pl.Int32_upper"),
            pl.col("ints").cast(pl.Int16).upper_bound().alias("pl.Int16_upper"),
            pl.col("ints").cast(pl.Int8).upper_bound().alias("pl.Int8_upper"),
            
            pl.col("ints").lower_bound().alias("pl.Int64_lower"),
            pl.col("ints").cast(pl.Int32).lower_bound().alias("pl.Int32_lower"),
            pl.col("ints").cast(pl.Int16).lower_bound().alias("pl.Int16_lower"),
            pl.col("ints").cast(pl.Int8).lower_bound().alias("pl.Int8_lower"),
        ]
    )
    .melt()
    .sort("variable")
)

If we try to cast a value outside of the valid range Polars raises an `Exception` - uncomment the following code to test this

In [None]:
# (
#     pl.DataFrame(
#         {'values':[126,127,128]}
#     )
#     .with_columns(
#         pl.col("values").cast(pl.Int8).alias("values_Int8")
#     )
# )

## A dtype diet
Polars creates integer and float columns as 64-bit by default. Polars can detect if the actual data in a column can fit in a lower precision dtype and cast the column to that dtype with `shrink_dtype`

In [None]:
(
    pl.DataFrame(
         {
             "a": [1, 2, 3],
             "b": [1, 2, 2 << 32],
             "c": [-1, 2, 1 << 30],
             "d": [-112, 2, 112],
             "e": [-112, 2, 129],
             "f": [0.1, 1.32, 0.12],
         }
     )
    .select(
        pl.all().shrink_dtype()
    )
)

## Effect of a lower precision

Working at a lower precision may be more effective for some analysis.

### Size in memory
We get the estimated size in bytes of the small `DataFrame` we created above with `estimated_size`

In [None]:
df = pl.DataFrame(
    {
        "ints":[0,1,2],
        "floats":[0.0,1,2]
    }
)
df.estimated_size(unit="b")

We can compare this with a `DataFrame` with both columns cast to 32-bit representations

In [None]:
(
    df
    .with_columns(
        [
            pl.col("ints").cast(pl.Int32),
            pl.col("floats").cast(pl.Float32),
        ]
    )
    .estimated_size(unit="b")
)

Memory usage is halved by moving to 32-bit representations.

### Performance
We explore the effect of reduced precision by creating a larger `DataFrame` of random values

In [None]:
NRows = 1_000_000
NColumns = 10
dfNum = pl.DataFrame(np.random.standard_normal((NRows,NColumns)))
dfNum.head(2)

These columns all have dtype `pl.Float64`

In [None]:
dfNum.dtypes[0]

We create a new `DataFrame` where we cast values to 32-bit

In [None]:
dfNum32 = (
        dfNum
        .select(
            pl.all().cast(pl.Float32)
        )
)
dfNum32.dtypes[0]

### Memory usage at lower precision
The 32-bit `DataFrame` uses half as much memory

In [None]:
print(f"64-bit DataFrame: {round(dfNum.estimated_size(unit='mb'))} Mb")
print(f"32-bit DataFrame: {round(dfNum32.estimated_size(unit='mb'))} Mb")

### Computational speed at lower precision

Some calculations are faster with 32-bit data.


In this example we:
- subtract the mean of each column and 
- divide by the standard deviation

In [None]:
%%timeit -n1
(
    dfNum
    .select( 
        (pl.all()-pl.all().mean())/(pl.all().std())
    )
)

In [None]:
%%timeit -n1 
(
    dfNum32
    .select( 
        (pl.all()-pl.all().mean())/(pl.all().std())
    )
)

In this case the operation on 32-bit data is almost twice as fast. Operations at 32-bit are not always twice as fast, the difference depends on the transformations applied.

## Effect on outputs
We can check the size of the differences between the outputs

In [None]:
output64 = (
    dfNum
    .select( 
        (pl.all()-pl.all().mean())/(pl.all().std())
    )
)
output32 = (
    dfNum32
    .select( 
        (pl.all()-pl.all().mean())/(pl.all().std())
    )
)

We can see the size of the differences in the first two rows

In [None]:
(output64 - output32).head(2)

The overall maximum difference in this case is order `10^-5` or smaller

In [None]:
(output64 - output32).max(axis=1).max()

Always **check that the size of the difference between outputs is negligible** for your analysis before moving to a lower precision!

Moving to a lower precision than 32-bit does not always lead to faster performance. Many CPUs do not have native support for 8-bit and 16-bit operations and so they emulate it with 32-bit operations. See the exercises for an example of lowering precision below 32-bit.

## Exercises

In the exercises you will develop your understanding of:
- getting the upper and lower bounds for a dtype
- getting the estimated size of a `DataFrame`
- comparing performance between different precisions 

### Exercise `
We create a `DataFrame` with 10 columns of random integers between 1 and 10

In [None]:
NRows = 1_000_000
NColumns = 10
dfInts64 = pl.DataFrame(np.random.randint(1,10,(NRows,NColumns)))
dfInts64.head(2)

Create a `DataFrame` called `dfInts8` where all the values in `dfInts64` are cast to `pl.Int8'

In [None]:
dfInts8 = (
    <blank>
)


Compare the size of these `DataFrames` in memory in Mb

In [None]:
print(f"64-bit DataFrame: {} Mb")
print(f"8-bit DataFrame: {} Mb")

Compare how long it takes to do a cumulative sum on all the columns of the `DataFrames`

In [None]:
%%timeit -n1
(
    dfInts64
)

In [None]:
%%timeit -n1
(
    dfInts8
)

Compare how long it takes at 16- and 32-bit precision.

Which precision is fastest?

## Solutions

### Solution to exercise 1
We create a `DataFrame` with 10 columns of random integers between 1 and 10

In [None]:
NRows = 1_000_000
NColumns = 10
dfInts64 = pl.DataFrame(np.random.randint(1,10,(NRows,NColumns)))
dfInts64.head(2)

Create a `DataFrame` called `dfInts8` where all the values in `dfInts` are cast to `pl.Int8'

In [None]:
dfInts8 = (
    dfInts64
    .select(
        pl.all().cast(pl.Int8)
    )
)


Compare the size of these `DataFrames` in memory in Mb

In [None]:
print(f"64-bit DataFrame: {round(dfInts64.estimated_size(unit='mb'))} Mb")
print(f"8-bit DataFrame: {round(dfInts8.estimated_size(unit='mb'))} Mb")

Compare how long it takes to do a cumulative sum on all the columns of the `DataFrames`

In [None]:
%%timeit -n1
(
    dfInts64
    .select( 
        pl.all().cumsum()
    )
)

In [None]:
%%timeit -n1
(
    dfInts8
    .select( 
        pl.all().cumsum()
    )
)

Compare how long it takes at 16- and 32-bit precision.

Which precision is fastest?

In [None]:
dfInts16 = (
    dfInts64
    .select(
        pl.all().cast(pl.Int16)
    )
)
dfInts32 = (
    dfInts64
    .select(
        pl.all().cast(pl.Int32)
    )
)


In [None]:
%%timeit -n1
(
    dfInts16
    .select( 
        pl.all().cumsum()
    )
)

In [None]:
%%timeit -n1
(
    dfInts32
    .select( 
        pl.all().cumsum()
    )
)

Many CPUs do not have native support for 8-bit and 16-bit calculations and so calculations at these precisions may not be faster than at 32-bit.