# Create Low-Precision Copies of Test Dataset
This notebook creates many low-precision copies of a dataset and re-converts each to double precision before saving it in a binary format so that no further precision loss occurs.

Later notebooks will use the binary files as a basis for later analysis

In [1]:
from datetime import datetime, timedelta
from pathlib import Path
import pandas as pd
import numpy as np

Configuration

In [2]:
key_columns = ['Current_A', 'Voltage_V', 'Cell_Temperature_C']  # Columns to study downsampling

## Load the Example Data
It is stored in a CSV, so it already is at a lower precision then double.

In [3]:
xcel = pd.read_csv('../example-data/xcel.csv')
xcel.query('Cycle_Label != "EIS"', inplace=True)  # Don't bother here

Get the datatime and any other key column

In [4]:
xcel['Datetime'] = xcel['Datenum_d'].apply(lambda x: datetime(year=1, month=1, day=1) + timedelta(days=x))

In [5]:
xcel = xcel[['Datetime'] + key_columns]

In [6]:
xcel.dtypes

Datetime              datetime64[ns]
Current_A                    float64
Voltage_V                    float64
Cell_Temperature_C           float64
dtype: object

## Run the Conversions
Drop to different formats and then convert back to double before saving in pickle format

In [7]:
out_dir = Path('reduced-datasets')
out_dir.mkdir(exist_ok=True)

Start with floating point

In [8]:
for low_type in [np.float16, np.float32]:
    lowfi = xcel.copy()
    for c in key_columns:
        lowfi[c] = xcel[c].astype(low_type).astype(np.float64)
    
    lowfi.to_pickle(out_dir / f'fulltime-{np.dtype(low_type).name}.pkl')

The try some "fixed-point" data. We are going to represent fixed point using equally-spaced values on the range $[ \lfloor \min f \rfloor, \lceil \max f \rceil ]$. I use the ceiling and floor functions to ensure the data points are decimal numbers that we can represent close-to-exactly.

In [9]:
for low_type in [np.uint8, np.uint16, np.uint32]:
    lowfi = xcel.copy()
    max_int = 2 ** (np.dtype(low_type).itemsize * 8) - 1
    for c in key_columns:
        # Convert to an integer representation
        min_f = np.floor(xcel[c].min())
        max_f = np.ceil(xcel[c].max())
        as_int = ((xcel[c] - min_f) / (max_f - min_f) * max_int).astype(low_type)

        # Convert back to floating point
        as_float = (max_f - min_f) * (as_int.astype(np.float64) / max_int) + min_f
        lowfi[c] = as_float

    lowfi.to_pickle(out_dir / f'fulltime-{np.dtype(low_type).name}.pkl')