In [7]:
import polars as pl
import json

# General Performance Testing

In here we test and try some general things for the codebase.
Fe. the polars efficiency, we try to document and reference relevant docs where needed to keep it peer reviewed.

## Some speedtests regarding polars reading in of files/frames/

references:
* [pandasVSpolars speed test, apr 2023](https://medium.com/cuenex/pandas-2-0-vs-polars-the-ultimate-battle-a378eb75d6d1)
* [input/output in polars](https://docs.pola.rs/api/python/stable/reference/io.html)


## test 1 reading in a newline delimited json to check efficiency


In [8]:
%%timeit
energy_use_df = pl.scan_ndjson(
    "data/PP/energy_use_test1.ndjson",
    schema={"timestamp": pl.Datetime(time_zone="Europe/Brussels"), "total": pl.Float64},
)

9.57 μs ± 218 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)


In [9]:
energy_use_lf_1 = pl.scan_ndjson(
    "data/PP/energy_use_test1.ndjson",
    schema={"timestamp": pl.Datetime(time_zone="Europe/Brussels"), "total": pl.Float64},
)
energy_use_lf_1.collect().head()

timestamp,total
"datetime[μs, Europe/Brussels]",f64
2023-01-01 00:00:00 CET,0.025
2023-01-01 00:15:00 CET,0.017
2023-01-01 00:30:00 CET,0.023
2023-01-01 00:45:00 CET,0.024
2023-01-01 01:00:00 CET,0.023


## Test 2, reading in the "smaller version of the json" and tranforming it into polars.

In [10]:
%%timeit
# Read the JSON file
with open("data/PP/energy_use.json", "r") as file:
    data = json.load(file)

# Convert the data into a list of dictionaries
data_list = [{"timestamp": int(k), "value": v} for k, v in data.items()]

# Create a DataFrame from the list
df = pl.DataFrame(
    data_list, schema={"timestamp": pl.Datetime(time_zone="Europe/Brussels"), "value": pl.Float64}
)

34.5 ms ± 1.31 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
