# Li-ion Battery Aging Datasets (NASA): Data conversion

In this notebook, we load the raw data files (`.mat`) in the Li-ion Battery Aging Datasets, and convert them into `.csv` format.

1. Load a `.mat` data file into structured numpy array
2. Look at the numpy array structure
3. Convert the array into pd DataFrame(s) and export to `.csv` files

Reference: B. Saha and K. Goebel (2007). "Battery Data Set", NASA Prognostics Data Repository, NASA Ames Research Center, Moffett Field, CA

See also https://www.kaggle.com/code/rajeevsharma993/battery-health-nasa-dataset/notebook for data ingestion.

```
Author: Cedric Yu
Last modified: 20230104
```

The raw `.mat` data files all have the following data structure:

```
Data Structure:
cycle:	top level structure array containing the charge, discharge and impedance operations
	type: 	operation  type, can be charge, discharge or impedance
	ambient_temperature:	ambient temperature (degree C)
	time: 	the date and time of the start of the cycle, in MATLAB  date vector format
	data:	data structure containing the measurements
	   for charge the fields are:
		Voltage_measured: 	Battery terminal voltage (Volts)
		Current_measured:	Battery output current (Amps)
		Temperature_measured: 	Battery temperature (degree C)
		Current_charge:		Current measured at charger (Amps)
		Voltage_charge:		Voltage measured at charger (Volts)
		Time:			Time vector for the cycle (secs)
	   for discharge the fields are:
		Voltage_measured: 	Battery terminal voltage (Volts)
		Current_measured:	Battery output current (Amps)
		Temperature_measured: 	Battery temperature (degree C)
		Current_charge:		Current measured at load (Amps)
		Voltage_charge:		Voltage measured at load (Volts)
		Time:			Time vector for the cycle (secs)
		Capacity:		Battery capacity (Ahr) for discharge till 2.7V 
	   for impedance the fields are:
		Sense_current:		Current in sense branch (Amps)
		Battery_current:	Current in battery branch (Amps)
		Current_ratio:		Ratio of the above currents 
		Battery_impedance:	Battery impedance (Ohms) computed from raw data
		Rectified_impedance:	Calibrated and smoothed battery impedance (Ohms) 
		Re:			Estimated electrolyte resistance (Ohms)
		Rct:			Estimated charge transfer resistance (Ohms)
```



In [None]:
# Load the "autoreload" extension so that code can change
import datetime
from scipy.io import loadmat
import numpy as np
import os
import pandas as pd
%load_ext autoreload
# Always reload modules so that as you change code in src, it gets loaded
%autoreload 2

# Import all relevant libraries


### 1. Loading data file

Load a data file

In [None]:
battery_num = 'B0025'
filepath = f'../../li_ion_battery_aging_nasa/data/raw/3. BatteryAgingARC_25-44/{battery_num}.mat'


In [None]:
os.path.basename(filepath).split('.')[0]


Load mat file into structure np array

In [None]:
mat = loadmat(filepath)


## 2. Look at the structured numpy arrays

The key containing data is the same as the filename. The other keys can be dropped

In [None]:
mat.keys()


In [None]:
print(mat['__header__'])
print(mat['__version__'])
print(mat['__globals__'])


In [None]:
type(mat[battery_num])


In [None]:
mat[battery_num].shape


As given in file description, the top field is `cycle`

In [None]:
mat[battery_num].dtype


In [None]:
print(str(mat[battery_num]['cycle'][0, 0])[:1000])
print('...')
print(str(mat[battery_num]['cycle'][0, 0])[-1000:])


This file has 80 cycles (each being one of discharge, charge or impedance)

In [None]:
mat[battery_num]['cycle'][0, 0][0].shape


A look at a discharge cycle

In [None]:
print(str(mat[battery_num]['cycle'][0, 0][0, 3])[:1000])
print('...')
print(str(mat[battery_num]['cycle'][0, 0][0, 3])[-1000:])


In [None]:
mat[battery_num]['cycle'][0, 0][0, 3].dtype


In [None]:
mat[battery_num]['cycle'][0, 0][0, 3]['type'][0]


In [None]:
mat[battery_num]['cycle'][0, 0][0, 3]['ambient_temperature'][0, 0]


In [None]:
mat[battery_num]['cycle'][0, 0][0].shape


In [None]:
num_discharge_cycles = 0
i = 0
for cycle in mat[battery_num]['cycle'][0, 0][0]:
    if cycle['type'][0] == 'discharge':
        print(i)
        num_discharge_cycles += 1
    i += 1

num_discharge_cycles


Convert mat datetime to Python `datetime`

In [None]:
dt_mat = mat[battery_num]['cycle'][0, 0][0, 3]['time'][0]
print(dt_mat)


In [None]:
datetime.datetime(
    int(dt_mat[0]),
    int(dt_mat[1]),
    int(dt_mat[2]),
    int(dt_mat[3]),
    int(dt_mat[4]))


In [None]:
mat[battery_num]['cycle'][0, 0][0, 3]['data'].dtype


In [None]:
mat[battery_num]['cycle'][0, 0][0, 3]['data'].dtype.names


In [None]:
mat[battery_num]['cycle'][0, 0][0, 3]['data']['Voltage_measured'][0, 0].shape


In [None]:
mat[battery_num]['cycle'][0, 0][0, 3]['data']['Voltage_measured'][0, 0].T.squeeze()[
    :10]


In [None]:
mat[battery_num]['cycle'][0, 0][0, 3]['data']['Time'][0, 0].shape


In [None]:
mat[battery_num]['cycle'][0, 0][0, 3]['data']['Time'][0, 0].T.squeeze()[: 10]


Capacity

In [None]:
mat[battery_num]['cycle'][0, 0][0, 5]['data']['Capacity'][0, 0][0, 0]


A look at an impedance measurement cycle

In [None]:
mat[battery_num]['cycle'][0, 0][0, 2]['type']


In [None]:
mat[battery_num]['cycle'][0, 0][0, 2]['data'].dtype.names


There are less data points in `Rectified_Impedance`. Not sure what to do; drop it.

In [None]:
for name in mat[battery_num]['cycle'][0, 0][0, 2]['data'].dtype.names:
    print(mat[battery_num]['cycle'][0, 0][0, 2]['data'][name][0, 0].shape)


In [None]:
mat[battery_num]['cycle'][0, 0][0, 0]['data']['Sense_current'][0, 0].squeeze().shape


In [None]:
mat[battery_num]['cycle'][0, 0][0,
                                0]['data']['Battery_impedance'][0, 0].squeeze().shape


## 3. Convert data into pd DataFrames

Convert discharge, charge, and impedance data in the structured numpy array into separate pd DataFrames

In [None]:
cycles = mat[battery_num]['cycle'][0, 0][0]
cycles.shape


In [None]:
def load_cycles_to_df(cycles):

    def load_charge(cycles):
        cycles_dict = []
        cycle_num = 0
        for n in range(len(cycles)):
            cycle = cycles[n]
            if cycle['type'][0] != 'charge':
                continue
            cycle_num = cycle_num + 1
            ambient_temperature = cycle['ambient_temperature'][0, 0]
            time_start_mat = cycle['time'][0]
            time_start = datetime.datetime(
                int(time_start_mat[0]),
                int(time_start_mat[1]),
                int(time_start_mat[2]),
                int(time_start_mat[3]),
                int(time_start_mat[4])) + \
                datetime.timedelta(seconds=int(time_start_mat[5]))
            data = cycle['data']
            data_len = len(data['Time'][0, 0].T.squeeze())
            for i in range(data_len):
                data_dict = {}
                data_dict['cycle_num'] = cycle_num
                data_dict['ambient_temperature'] = ambient_temperature
                data_dict['cycle_time_start'] = time_start
                for name in data.dtype.names:
                    data_dict[name] = data[name][0, 0].T.squeeze()[i]

                    cycles_dict.append(data_dict)

        return cycles_dict

    def load_discharge(cycles):
        cycles_dict = []
        cycle_num = 0
        for n in range(len(cycles)):
            cycle = cycles[n]
            if cycle['type'][0] != 'discharge':
                continue
            cycle_num = cycle_num + 1
            ambient_temperature = cycle['ambient_temperature'][0, 0]
            time_start_mat = cycle['time'][0]
            time_start = datetime.datetime(
                int(time_start_mat[0]),
                int(time_start_mat[1]),
                int(time_start_mat[2]),
                int(time_start_mat[3]),
                int(time_start_mat[4])) + \
                datetime.timedelta(seconds=int(time_start_mat[5]))
            data = cycle['data']
            data_len = len(data['Time'][0, 0].T.squeeze())
            for i in range(data_len):
                data_dict = {}
                data_dict['cycle_num'] = cycle_num
                data_dict['ambient_temperature'] = ambient_temperature
                data_dict['cycle_time_start'] = time_start
                data_dict['Capacity'] = data['Capacity'][0, 0][0, 0]
                for name in data.dtype.names:
                    if name != 'Capacity':
                        data_dict[name] = data[name][0, 0].T.squeeze()[i]
                    else:
                        pass

                    cycles_dict.append(data_dict)

        return cycles_dict

    def load_impedance(cycles):
        cycles_dict = []
        cycle_num = 0
        for n in range(len(cycles)):
            cycle = cycles[n]
            if cycle['type'][0] != 'impedance':
                continue
            cycle_num = cycle_num + 1
            ambient_temperature = cycle['ambient_temperature'][0, 0]
            time_start_mat = cycle['time'][0]
            time_start = datetime.datetime(
                int(time_start_mat[0]),
                int(time_start_mat[1]),
                int(time_start_mat[2]),
                int(time_start_mat[3]),
                int(time_start_mat[4])) + \
                datetime.timedelta(seconds=int(time_start_mat[5]))
            data = cycle['data']
            data_len = len(data['Sense_current'][0, 0].T.squeeze())
            for i in range(data_len):
                data_dict = {}
                data_dict['cycle_num'] = cycle_num
                data_dict['ambient_temperature'] = ambient_temperature
                data_dict['cycle_time_start'] = time_start
                data_dict['Re'] = data['Re'][0, 0][0, 0]
                data_dict['Rct'] = data['Rct'][0, 0][0, 0]
                for name in data.dtype.names:
                    if name not in ['Re', 'Rct', 'Rectified_Impedance']:
                        data_dict[name] = data[name][0, 0].T.squeeze()[i]

                    cycles_dict.append(data_dict)

        return cycles_dict

    df_charge_cycles = pd.DataFrame.from_dict(load_charge(cycles))
    df_discharge_cycles = pd.DataFrame.from_dict(load_discharge(cycles))
    df_impedance_cycles = pd.DataFrame.from_dict(load_impedance(cycles))

    return df_charge_cycles, df_discharge_cycles, df_impedance_cycles


In [None]:
df_charge_cycles, df_discharge_cycles, df_impedance_cycles = load_cycles_to_df(
    cycles)


In [None]:
df_discharge_cycles.head()


In [None]:
df_discharge_cycles.tail()


In [None]:
df_impedance_cycles.tail()
