# Li-ion Battery Aging Datasets (NASA): Data conversion

In this notebook, we load the raw data files (`.mat`) in the Li-ion Battery Aging Datasets, and convert them into `.csv` format.

1. Load a `.mat` data file into structured numpy array
2. Look at the numpy array structure
3. Convert the array into pd DataFrame(s) and export to `.csv` files

Reference: B. Saha and K. Goebel (2007). "Battery Data Set", NASA Prognostics Data Repository, NASA Ames Research Center, Moffett Field, CA

See also https://www.kaggle.com/code/rajeevsharma993/battery-health-nasa-dataset/notebook for data ingestion.

```
Author: Cedric Yu
Last modified: 20230104
```

The raw `.mat` data files all have the following data structure:

```
Data Structure:
cycle:	top level structure array containing the charge, discharge and impedance operations
	type: 	operation  type, can be charge, discharge or impedance
	ambient_temperature:	ambient temperature (degree C)
	time: 	the date and time of the start of the cycle, in MATLAB  date vector format
	data:	data structure containing the measurements
	   for charge the fields are:
		Voltage_measured: 	Battery terminal voltage (Volts)
		Current_measured:	Battery output current (Amps)
		Temperature_measured: 	Battery temperature (degree C)
		Current_charge:		Current measured at charger (Amps)
		Voltage_charge:		Voltage measured at charger (Volts)
		Time:			Time vector for the cycle (secs)
	   for discharge the fields are:
		Voltage_measured: 	Battery terminal voltage (Volts)
		Current_measured:	Battery output current (Amps)
		Temperature_measured: 	Battery temperature (degree C)
		Current_charge:		Current measured at load (Amps)
		Voltage_charge:		Voltage measured at load (Volts)
		Time:			Time vector for the cycle (secs)
		Capacity:		Battery capacity (Ahr) for discharge till 2.7V 
	   for impedance the fields are:
		Sense_current:		Current in sense branch (Amps)
		Battery_current:	Current in battery branch (Amps)
		Current_ratio:		Ratio of the above currents 
		Battery_impedance:	Battery impedance (Ohms) computed from raw data
		Rectified_impedance:	Calibrated and smoothed battery impedance (Ohms) 
		Re:			Estimated electrolyte resistance (Ohms)
		Rct:			Estimated charge transfer resistance (Ohms)
```



In [45]:
# Load the "autoreload" extension so that code can change
import datetime
from scipy.io import loadmat
import numpy as np
import os
import pandas as pd
%load_ext autoreload
# Always reload modules so that as you change code in src, it gets loaded
%autoreload 2

# Import all relevant libraries


The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


### 1. Loading data file

Load a data file

In [26]:
battery_num = 'B0025'
filepath = f'../../li_ion_battery_aging_nasa/data/raw/3. BatteryAgingARC_25-44/{battery_num}.mat'


In [168]:
os.path.basename(filepath).split('.')[0]


'B0025'

Load mat file into structure np array

In [4]:
mat = loadmat(filepath)


## 2. Look at the structured numpy arrays

The key containing data is the same as the filename. The other keys can be dropped

In [5]:
mat.keys()


dict_keys(['__header__', '__version__', '__globals__', 'B0025'])

In [9]:
print(mat['__header__'])
print(mat['__version__'])
print(mat['__globals__'])


b'MATLAB 5.0 MAT-file, Platform: PCWIN, Created on: Fri Apr 17 00:48:28 2009'
1.0
[]


In [27]:
type(mat[battery_num])


numpy.ndarray

In [28]:
mat[battery_num].shape


(1, 1)

As given in file description, the top field is `cycle`

In [29]:
mat[battery_num].dtype


dtype([('cycle', 'O')])

In [30]:
print(str(mat[battery_num]['cycle'][0, 0])[:1000])
print('...')
print(str(mat[battery_num]['cycle'][0, 0])[-1000:])


[[(array(['impedance'], dtype='<U9'), array([[24]], dtype=uint8), array([[2.0090e+03, 2.0000e+00, 1.3000e+01, 1.9000e+01, 3.0000e+00,
          5.2109e+01]]), array([[(array([[864.84991455-3.33048973e+01j, 858.38482666-4.21315041e+01j,
                  856.72253418-4.97899857e+01j, 854.00585938-5.65011444e+01j,
                  853.94824219-5.53254852e+01j, 852.77819824-6.09274063e+01j,
                  851.30688477-6.22546387e+01j, 849.77905273-6.25042915e+01j,
                  848.9119873 -6.39757385e+01j, 847.16442871-6.62871246e+01j,
                  845.5536499 -5.81814041e+01j, 842.61297607-5.61905594e+01j,
                  846.62054443-5.62489433e+01j, 841.32269287-5.77769089e+01j,
                  839.11419678-5.58213501e+01j, 841.26696777-3.58519096e+01j,
                  840.00018311-3.50512276e+01j, 832.97357178-3.58107758e+01j,
                  831.01055908-3.70488548e+01j, 829.92889404-3.69748688e+01j,
                  827.62994385-3.89213562e+01j, 828.1619873 +4

This file has 80 cycles (each being one of discharge, charge or impedance)

In [132]:
mat[battery_num]['cycle'][0, 0][0].shape


(80,)

A look at a discharge cycle

In [32]:
print(str(mat[battery_num]['cycle'][0, 0][0, 5])[:1000])
print('...')
print(str(mat[battery_num]['cycle'][0, 0][0, 5])[-1000:])


(array(['discharge'], dtype='<U9'), array([[24]], dtype=uint8), array([[2.0090e+03, 2.0000e+00, 1.4000e+01, 4.0000e+00, 3.0000e+00,
        3.4578e+01]]), array([[(array([[4.20248467, 4.20255643, 3.73906929, 4.17038292, 3.71143005,
                4.1514465 , 3.69086689, 4.13670602, 3.67426536, 4.12473153,
                3.66061036, 4.11388937, 3.64915338, 4.10440291, 3.63949767,
                4.09541758, 3.63057497, 4.08713943, 3.62383299, 4.07921092,
                3.61682344, 4.07155881, 3.61012204, 4.06416308, 3.60481376,
                4.05734776, 3.5991097 , 4.05031646, 3.59345821, 4.04339425,
                3.58814548, 4.03739664, 3.58352296, 4.03087315, 3.5789718 ,
                4.02484199, 3.57466136, 4.0187364 , 3.57026712, 4.01328228,
                3.56593104, 4.0074457 , 3.56174272, 4.00194326, 3.55761864,
                3.99631488, 3.55348237, 3.99118651, 3.54940335, 3.98545397,
                3.54549331, 3.98026689, 3.54102408, 3.97465191, 3.53680671,
        

In [38]:
mat[battery_num]['cycle'][0, 0][0, 5].dtype


dtype([('type', 'O'), ('ambient_temperature', 'O'), ('time', 'O'), ('data', 'O')])

In [36]:
mat[battery_num]['cycle'][0, 0][0, 5]['type'][0]


'discharge'

In [41]:
mat[battery_num]['cycle'][0, 0][0, 5]['ambient_temperature'][0, 0]


24

In [116]:
mat[battery_num]['cycle'][0, 0][0].shape


(80,)

In [None]:
num_discharge_cycles = 0
i = 0
for cycle in mat[battery_num]['cycle'][0, 0][0]:
    if cycle['type'][0] == 'discharge':
        print(i)
        num_discharge_cycles += 1
    i += 1

num_discharge_cycles


Convert mat datetime to Python `datetime`

In [44]:
dt_mat = mat[battery_num]['cycle'][0, 0][0, 5]['time'][0]
print(dt_mat)


[2.0090e+03 2.0000e+00 1.4000e+01 4.0000e+00 3.0000e+00 3.4578e+01]


In [46]:
datetime.datetime(
    int(dt_mat[0]),
    int(dt_mat[1]),
    int(dt_mat[2]),
    int(dt_mat[3]),
    int(dt_mat[4]))


datetime.datetime(2009, 2, 14, 4, 3)

In [48]:
mat[battery_num]['cycle'][0, 0][0, 5]['data'].dtype


dtype([('Voltage_measured', 'O'), ('Current_measured', 'O'), ('Temperature_measured', 'O'), ('Current_load', 'O'), ('Voltage_load', 'O'), ('Time', 'O'), ('Capacity', 'O')])

In [50]:
mat[battery_num]['cycle'][0, 0][0, 5]['data'].dtype.names


('Voltage_measured',
 'Current_measured',
 'Temperature_measured',
 'Current_load',
 'Voltage_load',
 'Time',
 'Capacity')

In [150]:
mat[battery_num]['cycle'][0, 0][0, 5]['data']['Voltage_measured'][0, 0].shape


(1, 637)

In [79]:
mat[battery_num]['cycle'][0, 0][0, 5]['data']['Voltage_measured'][0, 0].T.squeeze()[
    2]


3.7390692946249398

In [104]:
mat[battery_num]['cycle'][0, 0][0, 5]['data']['Time'][0, 0].T.squeeze()[2]


19.702999999999996

Capacity

In [94]:
mat[battery_num]['cycle'][0, 0][0, 5]['data']['Capacity'][0, 0][0, 0]


1.8485654344528386

A look at an impedance measurement cycle

In [156]:
mat[battery_num]['cycle'][0, 0][0, 2]['type']


array(['impedance'], dtype='<U9')

In [154]:
mat[battery_num]['cycle'][0, 0][0, 2]['data'].dtype.names


('Sense_current',
 'Battery_current',
 'Current_ratio',
 'Battery_impedance',
 'Rectified_Impedance',
 'Re',
 'Rct')

There are less data points in `Rectified_Impedance`. Not sure what to do; drop it.

In [155]:
for name in mat[battery_num]['cycle'][0, 0][0, 2]['data'].dtype.names:
    print(mat[battery_num]['cycle'][0, 0][0, 2]['data'][name][0, 0].shape)


(1, 48)
(1, 48)
(1, 48)
(48, 1)
(39, 1)
(1, 1)
(1, 1)


In [153]:
mat[battery_num]['cycle'][0, 0][0, 0]['data']['Sense_current'][0, 0].squeeze().shape


(48,)

In [152]:
mat[battery_num]['cycle'][0, 0][0,
                                0]['data']['Battery_impedance'][0, 0].squeeze().shape


(48,)

## 3. Convert data into pd DataFrames

Convert discharge, charge, and impedance data in the structured numpy array into separate pd DataFrames

In [118]:
cycles = mat[battery_num]['cycle'][0, 0][0]
cycles.shape


(80,)

In [162]:
def load_cycles_to_df(cycles):

    def load_charge(cycles):
        cycles_dict = []
        cycle_num = 0
        for n in range(len(cycles)):
            cycle = cycles[n]
            if cycle['type'][0] != 'charge':
                continue
            cycle_num = cycle_num + 1
            ambient_temperature = cycle['ambient_temperature'][0, 0]
            time_start_mat = cycle['time'][0]
            time_start = datetime.datetime(
                int(time_start_mat[0]),
                int(time_start_mat[1]),
                int(time_start_mat[2]),
                int(time_start_mat[3]),
                int(time_start_mat[4])) + \
                datetime.timedelta(seconds=int(time_start_mat[5]))
            data = cycle['data']
            data_len = len(data['Time'][0, 0].T.squeeze())
            for i in range(data_len):
                data_dict = {}
                data_dict['cycle_num'] = cycle_num
                data_dict['ambient_temperature'] = ambient_temperature
                data_dict['cycle_time_start'] = time_start
                for name in data.dtype.names:
                    data_dict[name] = data[name][0, 0].T.squeeze()[i]

                    cycles_dict.append(data_dict)

        return cycles_dict

    def load_discharge(cycles):
        cycles_dict = []
        cycle_num = 0
        for n in range(len(cycles)):
            cycle = cycles[n]
            if cycle['type'][0] != 'discharge':
                continue
            cycle_num = cycle_num + 1
            ambient_temperature = cycle['ambient_temperature'][0, 0]
            time_start_mat = cycle['time'][0]
            time_start = datetime.datetime(
                int(time_start_mat[0]),
                int(time_start_mat[1]),
                int(time_start_mat[2]),
                int(time_start_mat[3]),
                int(time_start_mat[4])) + \
                datetime.timedelta(seconds=int(time_start_mat[5]))
            data = cycle['data']
            data_len = len(data['Time'][0, 0].T.squeeze())
            for i in range(data_len):
                data_dict = {}
                data_dict['cycle_num'] = cycle_num
                data_dict['ambient_temperature'] = ambient_temperature
                data_dict['cycle_time_start'] = time_start
                data_dict['Capacity'] = data['Capacity'][0, 0][0, 0]
                for name in data.dtype.names:
                    if name != 'Capacity':
                        data_dict[name] = data[name][0, 0].T.squeeze()[i]
                    else:
                        pass

                    cycles_dict.append(data_dict)

        return cycles_dict

    def load_impedance(cycles):
        cycles_dict = []
        cycle_num = 0
        for n in range(len(cycles)):
            cycle = cycles[n]
            if cycle['type'][0] != 'impedance':
                continue
            cycle_num = cycle_num + 1
            ambient_temperature = cycle['ambient_temperature'][0, 0]
            time_start_mat = cycle['time'][0]
            time_start = datetime.datetime(
                int(time_start_mat[0]),
                int(time_start_mat[1]),
                int(time_start_mat[2]),
                int(time_start_mat[3]),
                int(time_start_mat[4])) + \
                datetime.timedelta(seconds=int(time_start_mat[5]))
            data = cycle['data']
            data_len = len(data['Sense_current'][0, 0].T.squeeze())
            for i in range(data_len):
                data_dict = {}
                data_dict['cycle_num'] = cycle_num
                data_dict['ambient_temperature'] = ambient_temperature
                data_dict['cycle_time_start'] = time_start
                data_dict['Re'] = data['Re'][0, 0][0, 0]
                data_dict['Rct'] = data['Rct'][0, 0][0, 0]
                for name in data.dtype.names:
                    if name not in ['Re', 'Rct', 'Rectified_Impedance']:
                        data_dict[name] = data[name][0, 0].T.squeeze()[i]

                    cycles_dict.append(data_dict)

        return cycles_dict

    df_charge_cycles = pd.DataFrame.from_dict(load_charge(cycles))
    df_discharge_cycles = pd.DataFrame.from_dict(load_discharge(cycles))
    df_impedance_cycles = pd.DataFrame.from_dict(load_impedance(cycles))

    return df_charge_cycles, df_discharge_cycles, df_impedance_cycles


In [163]:
df_charge_cycles, df_discharge_cycles, df_impedance_cycles = load_cycles_to_df(
    cycles)


In [172]:
df_charge_cycles.to_csv('abc.csv', index=False)


In [164]:
df_charge_cycles.head()


Unnamed: 0,cycle_num,ambient_temperature,cycle_time_start,Voltage_measured,Current_measured,Temperature_measured,Current_charge,Voltage_charge,Time
0,1,24,2009-02-13 19:35:35,3.210795,-0.001231,26.935337,-0.002,0.003,0.0
1,1,24,2009-02-13 19:35:35,3.210795,-0.001231,26.935337,-0.002,0.003,0.0
2,1,24,2009-02-13 19:35:35,3.210795,-0.001231,26.935337,-0.002,0.003,0.0
3,1,24,2009-02-13 19:35:35,3.210795,-0.001231,26.935337,-0.002,0.003,0.0
4,1,24,2009-02-13 19:35:35,3.210795,-0.001231,26.935337,-0.002,0.003,0.0


In [165]:
df_discharge_cycles.head()


Unnamed: 0,cycle_num,ambient_temperature,cycle_time_start,Capacity,Voltage_measured,Current_measured,Temperature_measured,Current_load,Voltage_load,Time
0,1,24,2009-02-13 23:12:28,1.847011,4.196831,0.000296,26.515141,0.0004,0.0,0.0
1,1,24,2009-02-13 23:12:28,1.847011,4.196831,0.000296,26.515141,0.0004,0.0,0.0
2,1,24,2009-02-13 23:12:28,1.847011,4.196831,0.000296,26.515141,0.0004,0.0,0.0
3,1,24,2009-02-13 23:12:28,1.847011,4.196831,0.000296,26.515141,0.0004,0.0,0.0
4,1,24,2009-02-13 23:12:28,1.847011,4.196831,0.000296,26.515141,0.0004,0.0,0.0


In [170]:
df_discharge_cycles.tail()


Unnamed: 0,cycle_num,ambient_temperature,cycle_time_start,Capacity,Voltage_measured,Current_measured,Temperature_measured,Current_load,Voltage_load,Time
114361,28,24,2009-03-19 10:34:54,1.767789,3.265477,-0.002141,27.286738,0.0006,0.0,6198.75
114362,28,24,2009-03-19 10:34:54,1.767789,3.265477,-0.002141,27.286738,0.0006,0.0,6198.75
114363,28,24,2009-03-19 10:34:54,1.767789,3.265477,-0.002141,27.286738,0.0006,0.0,6198.75
114364,28,24,2009-03-19 10:34:54,1.767789,3.265477,-0.002141,27.286738,0.0006,0.0,6198.75
114365,28,24,2009-03-19 10:34:54,1.767789,3.265477,-0.002141,27.286738,0.0006,0.0,6198.75


In [169]:
df_impedance_cycles.tail()


Unnamed: 0,cycle_num,ambient_temperature,cycle_time_start,Re,Rct,Sense_current,Battery_current,Current_ratio,Battery_impedance
7051,21,24,2009-03-19 12:23:39,0.049661,0.082445,795.782654+465.666412j,87.721397+151.827713j,4.569864-2.601026j,0.358917-0.066477j
7052,21,24,2009-03-19 12:23:39,0.049661,0.082445,795.782654+465.666412j,87.721397+151.827713j,4.569864-2.601026j,0.358917-0.066477j
7053,21,24,2009-03-19 12:23:39,0.049661,0.082445,795.782654+465.666412j,87.721397+151.827713j,4.569864-2.601026j,0.358917-0.066477j
7054,21,24,2009-03-19 12:23:39,0.049661,0.082445,795.782654+465.666412j,87.721397+151.827713j,4.569864-2.601026j,0.358917-0.066477j
7055,21,24,2009-03-19 12:23:39,0.049661,0.082445,795.782654+465.666412j,87.721397+151.827713j,4.569864-2.601026j,0.358917-0.066477j
