<img src="imgs/GeorgiaTech_RGB.png" alt="GeorgiaTech_RGB" width="200" style="float: left;"/>
<br><br><br>

# <span style='color:#B3A369'> <b>Create Pickle Files for Battery Batch Data</b> </span>

> <b> Notebook Author:</b> Brian Keith (bkeith9@gatech.edu) <br>
> 
> **Based on codes provided by authors of paper at:**
> - https://github.com/rdbraatz/data-driven-prediction-of-battery-cycle-life-before-capacity-degradation/blob/master/BuildPkl_Batch1.ipynb
> - https://github.com/rdbraatz/data-driven-prediction-of-battery-cycle-life-before-capacity-degradation/blob/master/BuildPkl_Batch2.ipynb
> - https://github.com/rdbraatz/data-driven-prediction-of-battery-cycle-life-before-capacity-degradation/blob/master/BuildPkl_Batch3.ipynb
> 
> **Data Citation:**
> - Severson et al. Data-driven prediction of battery cycle life before capacity degradation. Nature Energy volume 4, pages 383–391 (2019)
> - Downloaded from: https://data.matr.io/1/projects/5c48dd2bc625d700019f3204
>   - Batch 1 (`2017-05-12_batchdata_updated_struct_errorcorrect.mat`): https://data.matr.io/1/projects/5c48dd2bc625d700019f3204/batches/5c86c0b5fa2ede00015ddf67
>   - Batch 2 (`2017-06-30_batchdata_updated_struct_errorcorrect.mat`): https://data.matr.io/1/projects/5c48dd2bc625d700019f3204/batches/5c86bf14fa2ede00015ddd83
>   - Batch 3 (`2018-04-12_batchdata_updated_struct_errorcorrect.mat`): https://data.matr.io/1/projects/5c48dd2bc625d700019f3204/batches/5c86bd64fa2ede00015ddbb3

## <span style='color:#003057'> Batch Notes from Authors </span>

### <span style='color:#54585A'> Batch 1: 2017-05-12 </span>

**Experimental Design:**
- All cells were cycled with one-step or two-step charging policies. The charging time varies from ~8 to 13.3 minutes (0-80% SOC). There are generally two cells tested per policy, with the exception of 3.6C(80%).
- 1 minute and 1 second rests were placed after reaching 80% SOC during charging and after discharging, respectively.
- We cycle to 80% of nominal capacity (0.88 Ah).
- An initial C/10 cycle was performed in the beginning of each test.
- The cutoff currents for the constant-voltage steps were C/50 for both charge and discharge.
- The pulse width of the IR test is 30 ms.

**Experimental Notes**
- The computer automatically restarted twice. As such, there are some time gaps in the data.
- The temperature control is somewhat inconsistent, leading to variability in the baseline chamber temperature.
- The tests in channels 4 and 8 did not successfully start and thus do not have data.
- The thermocouples for channels 15 and 16 were accidentally switched.

**Data notes**
- Cycle 1 data is not available in the struct. The sampling rate for this cycle was initially too high, so we excluded it from the data set to create more manageable file sizes.
- The cells in Channels 1, 2, 3, 5, and 6 (3.6C(80%) and 4C(80%) policies) were stopped at the end of this batch and resumed in the 2017-06-30 batch. This pause in cycling lead to a rise in capacity upon resuming the tests.
- The tests in channels 13, 19, 21, 22, and 31 were terminated before the cells reached 80% of nominal capacity.

### <span style='color:#54585A'> Batch 2: 2017-06-30 </span>

**Experimental Design:**
- All cells were cycled with one-step or two-step charging policies. The charging time is fixed at 10 minutes (0-80% SOC). There is generally only one cell tested per policy, with the exception of 4.8C(80%) (three cells).
- We resumed 5 cells from the 2017-05-12 batch that didn’t complete yet - 3.6C and 4.0C.
- We cycle to 75% of nominal capacity (0.88 Ah).
- 5 minute rests were placed both after reaching 80% SOC during charging and after discharging.
- An initial C/10 cycle was performed in the beginning of each test.
- The cutoff currents for the constant-voltage steps were C/50 for both charge and discharge.
- The pulse width of the IR test is 30 ms.

**Experimental Notes**
- The computer automatically restarted, affecting all tests (around cycle 250 for most policies). There are some ‘spikes’ in capacity. This effectively lead to around an 8-hour ‘rest’.
- The computer also restarted near the end of the tests, with one cell (Channel 3, EL150800460623) affected (expect a ‘spike’ near the end of life).
- Upon unloading the cells, we noticed the thermocouples from channels 7 and 21 had fallen off the cell.
- The thermocouples for channels 15 and 16 were accidentally switched.

**Data notes**
- 3.6C(80%) and 4C(80%) policies have data that carry over from batch 1 (link on barcode). This applies to the cells in channels 1, 2, 3, 5, and 6. Please note that these are NOT new experiments, but a continuation of experiments from the 2017-05-12 batch.
- Channel 10 (EL150800460605), dies quite quickly. This cell is possibly defective, although we detected no obvious signs of this.

### <span style='color:#54585A'> Batch 3: 2018-04-12 </span>

**Experimental Design:**
- All cells were cycled with two-step charging policies. The charging time fixed at 10 minutes (0-80% SOC). We test multiple cells per policy (3-8x per policy).
- We cycle to 80% of nominal capacity (0.88 Ah).
- Four 5-second rests were placed after reaching 80% SOC during charging, after the IR test, before discharging, and after discharging.
- A final C/10 cycle was performed at 80% of nominal capacity.
- The cutoff currents for the constant-voltage steps were C/20 for both charge and discharge.
- The pulse width of the IR test is 33 ms.

**Experimental Notes**
- Some cells had OCV errors (caused by the internal resistance test) that lead to temporary pauses in cycling.

**Data notes**
- The tests in channels 33 and 41 were terminated before the cells reached 80% of nominal capacity.
- The cell in channel 46 has noisy voltage profiles, likely due to an electronic connection error.

## <span style='color:#003057'> Initial Tasks </span>

In [1]:
import h5py
import numpy as np
import pickle
import pathlib
import os
from IPython.display import display, Markdown
import datetime as dt
curr_time = dt.datetime.today().strftime('%Y-%m-%d %H%M')

def printmd(string):
    header_map = {1:'#B3A369',2:'#003057',3:'#54585A'}
    if string.startswith('#'):
        nh = string.count('#')
        string = string.replace('#','')
        display(Markdown('#'*nh + f' <span style="color:{header_map[nh]}">{string}</span>'))
    else:
        display(Markdown(string))

## <span style='color:#003057'> Specify Directories </span>

In [2]:

DATA_DIR = 'data/'

files = [f for f in os.listdir(DATA_DIR) if f.endswith('.mat')]
aliases = ['batch1', 'batch2', 'batch3']


print('Files:')
for f in files:
    print('\t'+f)
print('Aliases:')
for a in aliases:
    print('\t'+a)

Files:
	2017-05-12_batchdata_updated_struct_errorcorrect.mat
	2017-06-30_batchdata_updated_struct_errorcorrect.mat
	2018-04-12_batchdata_updated_struct_errorcorrect.mat
Aliases:
	batch1
	batch2
	batch3


## <span style='color:#003057'> Pickle Data Files </span>

In [3]:
for file, alias in zip(files, aliases):

    with h5py.File(DATA_DIR+file, 'r') as f:
        printmd(f'## {alias} - {file}')
        printmd('### Info')
        print(f'Keys: {list(f.keys())}')
        batch = f['batch']
        print(f'Batch Keys: {list(batch.keys())}')
        num_cells = batch['summary'].shape[0]
        print(f'Number of cells: {num_cells}')
        bat_dict = {}
        for i in range(num_cells):
            #? NOTE: .value commands from h5py are deprecated so the .values were replaced with [()]
            #? using .value will give an error: AttributeError: 'Dataset' object has no attribute 'value'
            #? https://stackoverflow.com/questions/67409919/attributeerror-dataset-object-has-no-attribute-value
            cl = f[batch['cycle_life'][i,0]][()]
            policy = f[batch['policy_readable'][i,0]][()].tobytes()[::2].decode()
            summary_IR = np.hstack(f[batch['summary'][i,0]]['IR'][0,:].tolist())
            summary_QC = np.hstack(f[batch['summary'][i,0]]['QCharge'][0,:].tolist())
            summary_QD = np.hstack(f[batch['summary'][i,0]]['QDischarge'][0,:].tolist())
            summary_TA = np.hstack(f[batch['summary'][i,0]]['Tavg'][0,:].tolist())
            summary_TM = np.hstack(f[batch['summary'][i,0]]['Tmin'][0,:].tolist())
            summary_TX = np.hstack(f[batch['summary'][i,0]]['Tmax'][0,:].tolist())
            summary_CT = np.hstack(f[batch['summary'][i,0]]['chargetime'][0,:].tolist())
            summary_CY = np.hstack(f[batch['summary'][i,0]]['cycle'][0,:].tolist())
            summary = {'IR': summary_IR, 'QC': summary_QC, 'QD': summary_QD, 'Tavg':
                        summary_TA, 'Tmin': summary_TM, 'Tmax': summary_TX, 'chargetime': summary_CT,
                        'cycle': summary_CY}
            cycles = f[batch['cycles'][i,0]]
            cycle_dict = {}
            for j in range(cycles['I'].shape[0]):
                I = np.hstack((f[cycles['I'][j,0]][()]))
                Qc = np.hstack((f[cycles['Qc'][j,0]][()]))
                Qd = np.hstack((f[cycles['Qd'][j,0]][()]))
                Qdlin = np.hstack((f[cycles['Qdlin'][j,0]][()]))
                T = np.hstack((f[cycles['T'][j,0]][()]))
                Tdlin = np.hstack((f[cycles['Tdlin'][j,0]][()]))
                V = np.hstack((f[cycles['V'][j,0]][()]))
                dQdV = np.hstack((f[cycles['discharge_dQdV'][j,0]][()]))
                t = np.hstack((f[cycles['t'][j,0]][()]))
                cd = {'I': I, 'Qc': Qc, 'Qd': Qd, 'Qdlin': Qdlin, 'T': T, 'Tdlin': Tdlin, 'V':V, 'dQdV': dQdV, 't':t}
                cycle_dict[str(j)] = cd
                
            cell_dict = {'cycle_life': cl, 'charge_policy':policy, 'summary': summary, 'cycles': cycle_dict}
            key = f'b{alias[-1]}c' + str(i)
            bat_dict[key] = cell_dict
        
        print(f'Batch Dictionary Keys:{list(bat_dict.keys())}')
        
        pkl_name = f'{DATA_DIR+alias}.pkl'
        with open(pkl_name,'wb') as fp:
            pickle.dump(bat_dict,fp)
        printmd('### Pickle File Saved as ' + f'{DATA_DIR+alias}.pkl')

    printmd('-'*50)

## <span style="color:#003057"> batch1 - 2017-05-12_batchdata_updated_struct_errorcorrect.mat</span>

### <span style="color:#54585A"> Info</span>

Keys: ['#refs#', '#subsystem#', 'batch', 'batch_date']
Batch Keys: ['Vdlin', 'barcode', 'channel_id', 'cycle_life', 'cycles', 'policy', 'policy_readable', 'summary']
Number of cells: 46
Batch Dictionary Keys:['b1c0', 'b1c1', 'b1c2', 'b1c3', 'b1c4', 'b1c5', 'b1c6', 'b1c7', 'b1c8', 'b1c9', 'b1c10', 'b1c11', 'b1c12', 'b1c13', 'b1c14', 'b1c15', 'b1c16', 'b1c17', 'b1c18', 'b1c19', 'b1c20', 'b1c21', 'b1c22', 'b1c23', 'b1c24', 'b1c25', 'b1c26', 'b1c27', 'b1c28', 'b1c29', 'b1c30', 'b1c31', 'b1c32', 'b1c33', 'b1c34', 'b1c35', 'b1c36', 'b1c37', 'b1c38', 'b1c39', 'b1c40', 'b1c41', 'b1c42', 'b1c43', 'b1c44', 'b1c45']


### <span style="color:#54585A"> Pickle File Saved as data/batch1.pkl</span>

--------------------------------------------------

## <span style="color:#003057"> batch2 - 2017-06-30_batchdata_updated_struct_errorcorrect.mat</span>

### <span style="color:#54585A"> Info</span>

Keys: ['#refs#', '#subsystem#', 'batch', 'batch_date']
Batch Keys: ['Vdlin', 'barcode', 'channel_id', 'cycle_life', 'cycles', 'policy', 'policy_readable', 'summary']
Number of cells: 48
Batch Dictionary Keys:['b2c0', 'b2c1', 'b2c2', 'b2c3', 'b2c4', 'b2c5', 'b2c6', 'b2c7', 'b2c8', 'b2c9', 'b2c10', 'b2c11', 'b2c12', 'b2c13', 'b2c14', 'b2c15', 'b2c16', 'b2c17', 'b2c18', 'b2c19', 'b2c20', 'b2c21', 'b2c22', 'b2c23', 'b2c24', 'b2c25', 'b2c26', 'b2c27', 'b2c28', 'b2c29', 'b2c30', 'b2c31', 'b2c32', 'b2c33', 'b2c34', 'b2c35', 'b2c36', 'b2c37', 'b2c38', 'b2c39', 'b2c40', 'b2c41', 'b2c42', 'b2c43', 'b2c44', 'b2c45', 'b2c46', 'b2c47']


### <span style="color:#54585A"> Pickle File Saved as data/batch2.pkl</span>

--------------------------------------------------

## <span style="color:#003057"> batch3 - 2018-04-12_batchdata_updated_struct_errorcorrect.mat</span>

### <span style="color:#54585A"> Info</span>

Keys: ['#refs#', '#subsystem#', 'batch', 'batch_date']
Batch Keys: ['Vdlin', 'barcode', 'channel_id', 'cycle_life', 'cycles', 'policy', 'policy_readable', 'summary']
Number of cells: 46
Batch Dictionary Keys:['b3c0', 'b3c1', 'b3c2', 'b3c3', 'b3c4', 'b3c5', 'b3c6', 'b3c7', 'b3c8', 'b3c9', 'b3c10', 'b3c11', 'b3c12', 'b3c13', 'b3c14', 'b3c15', 'b3c16', 'b3c17', 'b3c18', 'b3c19', 'b3c20', 'b3c21', 'b3c22', 'b3c23', 'b3c24', 'b3c25', 'b3c26', 'b3c27', 'b3c28', 'b3c29', 'b3c30', 'b3c31', 'b3c32', 'b3c33', 'b3c34', 'b3c35', 'b3c36', 'b3c37', 'b3c38', 'b3c39', 'b3c40', 'b3c41', 'b3c42', 'b3c43', 'b3c44', 'b3c45']


### <span style="color:#54585A"> Pickle File Saved as data/batch3.pkl</span>

--------------------------------------------------

## <span style='color:#003057'> Export Source Code </span>

In [4]:
user = str(pathlib.Path.home()).split('\\')[-1]
export_flag = False
if user != 'bkeith' and user != 'Brian':
    raise Exception('User running code is not the student. No need to run below.')
else:
    export_flag = True
    print('User running code is the student. Continue to file Export.')

def export_code(cur_file: str, output_dir: str = '', output_name: str = '', cell_tags_exist: bool = False, template:str = 'lab'):
    """Export Jupyter Notebook as HTML file

    Args:
        cur_file (str, required): Name of the file function is being used in FULL PATH of the file. Defaults to the name of the ipynb file.
        output_dir (str, optional): Directory to output the file to. Defaults to local directory of Jupyter Notebook.
        output_name (str, optional): Name of the file that will be exported. Defaults to the name of the ipynb file.
        cell_tags_exist (bool, optional): Flag for if cell tags exist . Defaults to False.
        template (str, optional): Template to use for export. Defaults to 'lab'. Options are 'lab' or 'classic'. 'classic' should be used if you're planning to convert the HTML to PDF. 'lab' is better for viewing in browser.
    """
    
    from subprocess import run
    from os import getcwd
    
    if output_dir == '':
        output_dir = getcwd().replace('\\','/')

    if output_name == '':
        cur_file = cur_file.replace('\\', '/')
        output_name = cur_file.split('/')[-1].split('.')[0] + '.html'

    if cell_tags_exist == False:
        process = run([
            'jupyter', 
            'nbconvert',
            "--output-dir={}".format(output_dir),     
            '--to','html',  
            cur_file,
            '--template',f'{template}',
            '--output', f'{output_name}'], 
            shell=True,
            capture_output=True)
    else:
        process = run([
            'jupyter', 
            'nbconvert',
            "--output-dir={}".format(output_dir),     
            '--to','html',
            '--template',f'{template}',
            '--TagRemovePreprocessor.enabled=True',
            '--TagRemovePreprocessor.remove_cell_tags={\"remove_cell\"}',
            '--TagRemovePreprocessor.remove_input_tags={\"remove_input\"}',
            '--no-prompt',
            cur_file,
            '--output', f'{output_name}'], 
            shell=True,
            capture_output=True)
        
    if process.returncode == 0:
        display(Markdown(f'<h3>Code saved to {output_name}</h3>'))
    else:
        display(Markdown('<h1> REPORT ERROR:'))
        import re
        print(re.sub(r'\\.',lambda x:{'\\n':'\n','\\t':'\t', '\\r': '\r',"\\'":"'", '\\\\': '\\'}.get(x[0],x[0]),str(process.stderr)))

cur_file = __vsc_ipynb_file__
output_dir = 'zlogs'
output_name = cur_file.split('\\')[-1].split('.')[0] + f'.html'
cell_tags_exist = False
template = 'classic'

if export_flag == True:
    export_code(cur_file,output_dir, output_name, cell_tags_exist, template=template)

if export_flag == True:
    import os
    from bs4 import BeautifulSoup as soup
    import base64

    soup_html = soup(open(os.path.join(output_dir, output_name)).read())
    img_tags = soup_html.findAll('img')
    img_path = os.path.join(os.path.dirname(output_dir), 'imgs')

    for tag in img_tags:
        #skip any images that already have base64
        if 'base64' in tag['src']:
            continue
        
        img_src = tag['src'].split('/')[-1]
        print(f'Replacing {img_src}')
        tag['src'] = os.path.join(img_path, img_src)
        
        base64_str = base64.b64encode(open(os.path.join(img_path, img_src), 'rb').read()).decode('utf-8')
        new_src = 'data:image/png;base64,' + base64_str
        
        tag['src'] = new_src

    with open(os.path.join(output_dir, output_name), 'w') as f:
        f.write(str(soup_html))

User running code is the student. Continue to file Export.


<h3>Code saved to 00 Build Pickle Files for Data.html</h3>

Replacing GeorgiaTech_RGB.png
