# Converting from BatteryData.Energy.Gov Data

[BatteryData.Energy.Gov](https://BatteryData.Energy.Gov) stores data for cells in two CSV files: one with the time series data and another with cycle-level summaries.
Here, we show how to convert them into HDF5 format.

In [1]:
from batdata.extractors.batterydata import BDExtractor
from pathlib import Path
import pandas as pd

Configuration

In [2]:
test_file_path = Path('../tests/files/batterydata/')

## Load Example Data
The test directory in BatData has two example files from [XCEL project.](https://batterydata.energy.gov/project/xcel)

In [3]:
raw_data = pd.read_csv(test_file_path / 'p492-13-raw.csv')
raw_data.head()

Unnamed: 0,Cycle_Index,Step,Time_s,Amphr,Watthr,Current_A,Voltage_V,Cell_Temperature_C,Temp2,Datenum_d,...,Z_Phase_Degree,Z_Real_Ohm,Absolute_Charge_Throughput_Ah,Charge_Throughput_Ah,Absolute_Energy_Throughput_Wh,Energy_Throughput_Wh,Cycle_Label,Segment_Label,Differential_Capacity_Ah_V,Differential_Voltage_V_Ah
0,1,1,0.0,0.0,0.0,0.0,3.276036,29.7005,29.5556,737792.442373,...,,,0.0,0.0,0.0,0.0,,,,
1,1,1,0.0167,0.0,0.0,0.0,3.271916,29.7005,29.5556,737792.442373,...,,,0.0,0.0,0.0,0.0,,,,
2,1,1,0.0833,0.0,0.0,0.0,3.272068,29.7005,29.5556,737792.442419,...,,,0.0,0.0,0.0,0.0,,,,
3,1,2,0.0835,0.0,0.0,0.0,3.271458,29.7005,29.5556,737792.442419,...,,,0.0,0.0,0.0,0.0,,,,
4,1,2,0.1823,0.0,0.0,0.0,3.271916,29.7166,29.5233,737792.4425,...,,,0.0,0.0,0.0,0.0,,,,


In [4]:
sum_data = pd.read_csv(test_file_path / 'p492-13-summary.csv')
sum_data.head()

Unnamed: 0,Cycle_Index,Cycle_Label,Time_s,Time_d,Datenum_d,datenum_d,Absolute_Charge_Throughput_Ah,Absolute_Energy_Throughput_Wh,Equivalent_Full_Cycles,Charge_Throughput_Ah,...,V_avg,I_min,I_max,I_avg,P_min,P_max,P_avg,T_min,T_max,T_avg
0,1,,0.0,0.0,737792.442373,0.0,0.00072,0.002264,0.01655,-0.00072,...,3.087111,-0.00095,0.0,-0.000406,0,0.0031,0.001277,29.7005,29.942,29.85291
1,2,Capacity check,105.999,0.001227,737792.515984,0.073611,0.044328,0.162758,1.019585,-0.000588,...,3.689191,-0.00095,0.00095,3e-06,0,0.003895,0.003421,29.5877,29.9903,29.853699
2,3,,2920.243,0.033799,737794.470324,2.027951,0.054974,0.200878,1.264457,0.010059,...,3.399835,0.0,0.019007,0.004138,0,0.070289,0.014826,29.7166,29.9581,29.845681
3,4,,3073.8708,0.035577,737795.356597,2.914225,0.105357,0.389864,2.423301,0.0212,...,3.827341,-0.019009,0.019071,0.001882,0,0.078182,0.031994,29.6522,29.9742,29.856074
4,5,HPPC,3426.6666,0.03966,737795.601609,3.159236,0.129115,0.476588,2.969756,0.001947,...,3.688296,-0.095132,0.071259,-0.001706,0,0.380704,0.007371,29.7005,29.9581,29.847417


Our example is going to rename some of these files into batdata's schema.

## Detecting File Groupings
The extractors in batdata serve two functions: separate a directory of files into units that describe the same battery, then extract the data into our standard format. 

We start with the detection

In [5]:
extractor = BDExtractor()

In [6]:
groups = list(extractor.identify_files(test_file_path))
groups

[['../tests/files/batterydata/p492-13-raw.csv',
  '../tests/files/batterydata/p492-13-summary.csv']]

Note how we find one group of two files: the summary and time-series data.

## Extract into Standard Format
Given these groups, compiling into the batdata library is one further call.

In [7]:
data = extractor.parse_to_dataframe(groups[0])

  warn('We do not yet support parsing cycle summary statistics')


By default, the extractor only reads columns that are defined in the batdata schema. It takes those columns and converts them into the units or conventions we specify in the schema as well (e.g., `current` is in Amps and negative for discharge).

In [8]:
data.raw_data

Unnamed: 0,cycle_index,step_index,test_time,current,voltage,temperature,time
0,0,0,0.0000,-0.0,3.276036,29.7005,1.578066e+09
1,0,0,0.0167,-0.0,3.271916,29.7005,1.578066e+09
2,0,0,0.0833,-0.0,3.272068,29.7005,1.578066e+09
3,0,1,0.0835,-0.0,3.271458,29.7005,1.578066e+09
4,0,1,0.1823,-0.0,3.271916,29.7166,1.578066e+09
...,...,...,...,...,...,...,...
1534,7,207,4873.8056,-0.0,3.298314,29.8776,1.579108e+09
1535,7,207,4877.1390,-0.0,3.310979,29.9098,1.579108e+09
1536,7,207,4880.4723,-0.0,3.319066,29.9259,1.579108e+09
1537,7,207,4883.8056,-0.0,3.323949,29.8937,1.579108e+09


You can make it store the extra columns as well by changing an option of the extractor

In [9]:
extractor.store_all = True

In [10]:
extractor.parse_to_dataframe(groups[0]).raw_data

  warn('We do not yet support parsing cycle summary statistics')


Unnamed: 0,cycle_index,step_index,test_time,current,voltage,temperature,time,Amphr,Watthr,Temp2,...,Z_Phase_Degree,Z_Real_Ohm,Absolute_Charge_Throughput_Ah,Charge_Throughput_Ah,Absolute_Energy_Throughput_Wh,Energy_Throughput_Wh,Cycle_Label,Segment_Label,Differential_Capacity_Ah_V,Differential_Voltage_V_Ah
0,0,0,0.0000,-0.0,3.276036,29.7005,1.578066e+09,0.0,0.0,29.5556,...,,,0.000000,0.000000,0.000000,0.000000,,,,
1,0,0,0.0167,-0.0,3.271916,29.7005,1.578066e+09,0.0,0.0,29.5556,...,,,0.000000,0.000000,0.000000,0.000000,,,,
2,0,0,0.0833,-0.0,3.272068,29.7005,1.578066e+09,0.0,0.0,29.5556,...,,,0.000000,0.000000,0.000000,0.000000,,,,
3,0,1,0.0835,-0.0,3.271458,29.7005,1.578066e+09,0.0,0.0,29.5556,...,,,0.000000,0.000000,0.000000,0.000000,,,,
4,0,1,0.1823,-0.0,3.271916,29.7166,1.578066e+09,0.0,0.0,29.5233,...,,,0.000000,0.000000,0.000000,0.000000,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1534,7,207,4873.8056,-0.0,3.298314,29.8776,1.579108e+09,0.0,0.0,29.6040,...,,,0.148657,0.000471,0.545862,0.007873,Aging cycle,,,
1535,7,207,4877.1390,-0.0,3.310979,29.9098,1.579108e+09,0.0,0.0,29.6363,...,,,0.148657,0.000471,0.545862,0.007873,Aging cycle,,,
1536,7,207,4880.4723,-0.0,3.319066,29.9259,1.579108e+09,0.0,0.0,29.7008,...,,,0.148657,0.000471,0.545862,0.007873,Aging cycle,,,
1537,7,207,4883.8056,-0.0,3.323949,29.8937,1.579108e+09,0.0,0.0,29.5879,...,,,0.148657,0.000471,0.545862,0.007873,Aging cycle,,,
