# Cycle Life Prediction: Generalized Severson Analysis

This notebook generalizes the analysis presented in Severson's 2019 Nature Energy paper for cycle life prediction from the first 100 cycles of test data.

Through a series of widgets, users can select any number of Train and Test datasets from the Voltaiq server, featurize those datasets based on generalizations of the features in Severson et al, train and test the Severson Variance and/or Discharge models on the Train/Test datasets, evaluate model performance, and predict cycle life for any number of Prediction datasets of interest.

### Inputs and Outputs:
- **Train dataset:** Data which will be used to train the ML model(s) you choose. Choose from a number of curated publicly-available datasets, and/or choose custom data based on a test name search criteria
    - If custom datasets are added to the analysis, tests must include the same `start` and `end` cycle for the analysis range and tests should also obtain the expected end capacity retention %. If tests do not contain the `start` and `end`cycles, the code will throw an error. If tests do not meet the expected capacity retention %, those tests will be excluded from the Train and Test featurization and model evaluation.
- **Test dataset:** Data which will be used to test (evaluate the performance) of the ML model(s) you choose. You may either choose to perform a train-test split (with a configurable split ratio) on the Train dataset, or manually choose data in a manner similar to how you chose the Train dataset.
- **Featurization inputs:** The train and test data will be featurized based on the Severson models, with the following nuances:
    - Rather than using a hard-coded reference capacity for 80% capacity retention, we allow for a flexible capacity retention threshold based on the initial capacity of a test. Instead of choosing the first cycle to drop below the capacity retention threshold, we choose the first cycle to do so within a sequence of 5 consecutive cycles; this provides some robustness against noise/fluctuations
    - A user also inputs the `start` and `end` cycles for which to perform the differencing for the voltage vs capacity data. The default is cycles 9 and 99 to correspond with the Severson analysis (note that Voltaiq uses zero-indexing on cycles as a default, unless they are explicitly specified in an input file). Note that all tests within the Train and Test datasets must include these two cycles.
    - Voltage vs capacity data is still interpolated between the min and max values for each test record; note that the features based on a specific voltage cutoff are no longer applicable
- **Model(s):** Users can select which models they would like to evaluate. We currently offer comparisons between the Severson Variance, Severson Discharge and Dummy models, but will continue to add in additional models from the literature. We also plan to allow users to specify their own models in the future.
    - Once a model is selected, users can Train and Test their models, including showing parity plots, RMSE and MAPE performance plots for their selected model(s).
- **Prediction dataset:** After a model is trained and evaluated on the test dataset, users can select a Prediction dataset, and use the ML model(s) of their choice to predict the cycle life of this new dataset. Again, users can select from a curated list or choose a custom dataset based on a test name search criteria.
    - This prediction will generate a bar chart comparing the predicted cycle life for each model for each test record within the dataset, as well as the current (last) cycle of that test record.

Reference: Severson et al. Data-driven prediction of battery cycle life before capacity degradation. Nature Energy volume 4, pages 383–391 (2019)


Run all of the following cells, and provide inputs where needed:

#### Imports and set-up

In [125]:
import voltaiq_studio as vs
from voltaiq_studio import TraceFilterOperation

import severson_featurization
import CL_prediction, ML_models
import cl_pred_widgets as cpw
import importlib
importlib.reload(CL_prediction)
importlib.reload(ML_models)
importlib.reload(severson_featurization)
importlib.reload(cpw)
from CL_prediction import CLPrediction
from severson_featurization import calc_X_and_y, drop_unfinished_tests

import ipywidgets as widgets
from ipywidgets import interactive, interact, fixed

import numpy as np
import pandas as pd

import matplotlib.pyplot as plt
%matplotlib inline

In [3]:
trs = vs.get_test_records()

### User inputs: Select Train data, Test data, and input criteria

In [126]:
# we will start by instantiating a cycle life prediction object 
# which will store all relevant information for the datasets and models you will choose
prediction1 = CL_prediction.CLPrediction()

#### Select Training Dataset

In [132]:
select_train = interactive(cpw.select_widget, 
                           train_sets = widgets.SelectMultiple(value=[], options=cpw.std_train_datasets, description=f'Training Datasets:',style={'description_width': 'initial'}, ensure_option=True),
                          train_or_test=fixed('train'), pred_obj = fixed(prediction1), trs = fixed(trs), predict_button = fixed(None))
display(select_train)

interactive(children=(SelectMultiple(description='Training Datasets:', options=('Severson2019 - All (LFP)', 'S…

#### Select Testing Dataset (or train-test-split ratio)

In [128]:
test_select_dropdown = interactive(cpw.test_select_method,method = widgets.Dropdown(options = ['Use train_test_split on training dataset','Select test dataset manually'], 
                                                                                value = None, description = 'Test dataset selection method', style={'description_width': 'initial'},
                                                                                layout = widgets.Layout(width='500px')), prediction1 = fixed(prediction1), trs = fixed(trs))
display(test_select_dropdown)

interactive(children=(Dropdown(description='Test dataset selection method', layout=Layout(width='500px'), opti…

#### Select Featurization Inputs

In [129]:
# featurize the data, based on user inputs
# required user inputs: start cycle, end cycle, % capacity retention, (method - only have severson for now...)
featurize = interactive(cpw.featurize_inputs_widget, start_cycle = widgets.IntText(value = 9, description = 'Initial cycle: ', disabled=False,continuous_update = False),
                        end_cycle = widgets.IntText(value = 99, description = 'End cycle: ', disabled=False,continuous_update = False),
                        per_cap_ret = widgets.BoundedFloatText(value = 0.85,min = 0, max = 1, step = 0.01, description = '% Capacity Retention:',style={'description_width': 'initial'}, disabled=False,continuous_update = False),
                       prediction1 = fixed(prediction1))
output = widgets.Output()
perform_featurize = widgets.Button(description = 'Featurize data', button_style = 'danger', style={"button_color": "#38adad"})

display(featurize,perform_featurize, output)

def featurize(b):
    ''' function that will featurize the data'''
    with output:
        cpw.populate_test_train_data(prediction1, trs)
        print("Starting featurization...")
        prediction1.featurize(trs)
        print("Featurization complete!")

perform_featurize.on_click(featurize)

interactive(children=(IntText(value=9, description='Initial cycle: '), IntText(value=99, description='End cycl…

Button(button_style='danger', description='Featurize data', style=ButtonStyle(button_color='#38adad'))

Output()

In [130]:
model_options = ['All','Dummy','Severson variance','Severson discharge']
def set_model(model_choice):
    set_models = list(model_choice)
    if 'All' in model_choice:
        set_models = list(model_options)
        set_models.remove('All')
    prediction1.set_model(set_models)

#### Choose ML model(s)

In [131]:
# now that data has been featurized, we need to train the model, then test the model
choose_model = interactive(set_model, model_choice = widgets.SelectMultiple(options = model_options, value=['Dummy'], description='Choose ML model(s)',style={'description_width': 'initial'},disabled=False))
# add in the featurize data button here, with the text boxes showing up if one or more of the Severson models has been selected

train_model_button = widgets.Button(description = 'Train model', button_style = 'danger', style={"button_color": "#38adad"})
test_button = widgets.Button(description = 'Test model', button_style = 'danger', style={"button_color": "#38adad"},disabled=True,)
parity_button = widgets.Button(description = 'Generate Parity Plots', button_style = 'danger', style={"button_color": "#38adad"},disabled=True)
MAPE_button = widgets.Button(description = 'Plot MAPE results', button_style = 'danger', style={"button_color": "#38adad"},disabled=True)
RMSE_button = widgets.Button(description = 'Plot RMSE results', button_style = 'danger', style={"button_color": "#38adad"},disabled=True)


output = widgets.Output()
display(choose_model,train_model_button, test_button,parity_button,MAPE_button,RMSE_button, output)

def train_button(b):
    with output:
        prediction1.train_model()
        test_button.disabled = False

def test_click(b):
    with output:
        prediction1.test_predict()
        parity_button.disabled = False
        MAPE_button.disabled = False
        RMSE_button.disabled = False
        
def parity_click(b):
    with output:
        prediction1.create_parity_plots()

def mape_click(b):
    with output:
        prediction1.plot_model_stats('MAPE')
        
def rmse_click(b):
    with output:
        prediction1.plot_model_stats('RMSE')
        
train_model_button.on_click(train_button)
test_button.on_click(test_click)
parity_button.on_click(parity_click)
MAPE_button.on_click(mape_click)
RMSE_button.on_click(rmse_click)

interactive(children=(SelectMultiple(description='Choose ML model(s)', index=(1,), options=('All', 'Dummy', 'S…

Button(button_style='danger', description='Train model', style=ButtonStyle(button_color='#38adad'))

Button(button_style='danger', description='Test model', disabled=True, style=ButtonStyle(button_color='#38adad…

Button(button_style='danger', description='Generate Parity Plots', disabled=True, style=ButtonStyle(button_col…

Button(button_style='danger', description='Plot MAPE results', disabled=True, style=ButtonStyle(button_color='…

Button(button_style='danger', description='Plot RMSE results', disabled=True, style=ButtonStyle(button_color='…

Output()

#### Select Prediction dataset, and predict cycle lives

In [14]:
# next step is to allow users to select data for prediction. That needs to be featurized (but no y-values) and then the CL values can be predicted and shared (how to visualize...?)

predict_model_predict = widgets.Button(description = 'Predict Lifetime', button_style = 'danger', style={"button_color": "#38adad"}, disabled = True)

select_test = interactive(cpw.select_widget, 
                           train_sets = widgets.SelectMultiple(value=[], options=cpw.std_train_datasets, description=f'Prediction Datasets:',style={'description_width': 'initial'}, ensure_option=True),
                          train_or_test=fixed('predict'),pred_obj = fixed(prediction1), trs = fixed(trs),predict_button = fixed(predict_model_predict))
output = widgets.Output()

display(select_test,predict_model_predict, output)
    
        
def pred_model_predict(b):
    ''' function that will predict CL on prediction data. returns a plot of predicted cycle life'''
    with output:
        cpw.populate_test_train_data(prediction1, trs, predict = True)
        print("Starting featurization...")
        prediction1.featurize_predict(trs)
        print("Featurization complete!")
        prediction1.predict()
        prediction1.calc_predicted_cyclelife()
        prediction_df, time_pred_df = prediction1.return_predicted_cyclelife()
        log_scale = max(prediction_df.drop(columns = ['Name','Current cycle']).max()) > 10*(max(prediction_df['Current cycle']))
        prediction_df.set_index('Name').plot.barh(figsize=(10, len(prediction_df)/1.2), width = .8, logx = log_scale)
        plt.xlabel('Cycles')
        plt.title("Predicted cycle life by ML model for each test")
        plt.show()
        
        # want to only show predicted time to failure for tests that have not already 'failed'
        # so I want to add a filter criteria based on prediction_df
        if len(time_pred_df) >0:
            print("Predicted time remaining (hours) based on each ML model for tests which have not yet reached the capacity retention threshold")
            log_scale_time = max(time_pred_df.drop(columns = ['Name']).max()) > 10*(min(time_pred_df[time_pred_df.drop(columns = ['Name']) > 0].drop(columns = ['Name']).min()))

            time_pred_df.set_index('Name').plot.barh(figsize=(10, len(time_pred_df)/1.2),width = .8, logx = log_scale_time)
            plt.xlabel('Predicted Hours until Failure')
    #         plt.title
            plt.show()
        else:
            print("All tests in the Prediction dataset have already reached the set capacity retention threshold")

predict_model_predict.on_click(pred_model_predict)


interactive(children=(SelectMultiple(description='Prediction Datasets:', options=('Severson2019 - All (LFP)', …

Button(button_style='danger', description='Predict Lifetime', disabled=True, style=ButtonStyle(button_color='#…

Output()

In [113]:
import seaborn as sns
print(sns.color_palette('Set2')[0])

(0.4, 0.7607843137254902, 0.6470588235294118)
