# Cycle Life Prediction: Generalized Severson Analysis

This notebook generalizes the analysis presented in Severson's 2019 Nature Energy paper for cycle life prediction from the first 100 cycles of test data.

Through a series of widgets, users can select any number of Train and Test datasets from the Voltaiq Community server, featurize those datasets based on generalizations of the features in Severson et al, train and test the Severson Variance and/or Discharge models on the Train/Test datasets, evaluate model performance, and predict cycle life for any number of Prediction datasets of interest.

### Inputs:
- **Model(s):** Users can select which models they would like to evaluate. We currently offer comparisons between the Severson Variance, Severson Discharge and Dummy models, but will continue to add in additional models from the literature. We also plan to allow users to specify their own models in the future.
    - Once a model is selected, users can Train and Test their models, including showing parity plots, RMSE and MAPE performance plots for their selected model(s).
- **Featurization inputs:** The train and test data will be featurized based on the Severson models, with the following nuances:
    - Rather than using a hard-coded reference capacity for 80% capacity retention, we allow for a flexible capacity retention threshold based on the initial capacity of a test. Instead of choosing the first cycle to drop below the capacity retention threshold, we choose the first cycle to do so within a sequence of 5 consecutive cycles; this provides some robustness against noise/fluctuations
    - A user also inputs the `start` and `end` cycles for which to perform the differencing for the voltage vs capacity data. The default is cycles 9 and 99 to correspond with the Severson analysis (note that Voltaiq uses zero-indexing on cycles as a default, unless they are explicitly specified in an input file). Note that all tests within the Train and Test datasets must include these two cycles.
    - A cycle number must also be given from which to calculate reference capacity. The current script implementation allows a user to choose a cycle ordinal from which to calculate a reference capacity. The Severson model used cell nominal capacity as a reference capacity; however this is not known for each dataset on Voltaiq Community. Thus, a reference capacity based on the cycling data is chosen instead. Currently this cycle number must be the same for all datasets used for the model. A fixed reference cycle choice requires a user to have some information about what cycle to choose – the default ordinal is cycle 20 as that works for the curated datasets provided with this script.
    - Voltage vs capacity data is still interpolated between the min and max values for each test record; note that the features based on a specific voltage cutoff are no longer applicable
- **Train dataset:** Data which will be used to train the ML model(s) you choose. Choose from a number of curated publicly-available datasets, and/or choose custom data based on a test name search criteria
    - Tests must include the same `start` and `end` cycle for the analysis range and tests should also obtain the expected end capacity retention %. If tests do not contain the `start` and `end`cycles, the code will throw an error. If tests do not meet the expected capacity retention %, those tests will be excluded from the Train and Test featurization and model evaluation.
    - We provide an option for filtering data based on a minimum cycle count, as well as the set capacity retention threshold. Filtering by capacity retention threshold can be slow, so should be used in conjunction with a cycle number and/or test name filter.
- **Test dataset:** Data which will be used to test (evaluate the performance) of the ML model(s) you choose. You may either choose to perform a train-test split (with a configurable split ratio) on the Train dataset, or manually choose data in a manner similar to how you chose the Train dataset.
- **Prediction dataset:** After a model is trained and evaluated on the test dataset, users can select a Prediction dataset, and use the ML model(s) of their choice to predict the cycle life of this new dataset. Again, users can select from a curated list or choose a custom dataset based on a test name search criteria.
    

### Outputs:
- Train/Test parity plots, RMSE, MAPE performance plots
- The prediction step will generate a bar chart comparing the predicted cycle life for each model for each test record within the dataset, as well as the current (last) cycle of that test record
- All train/test results can be accessed through methods and attributes of the CL_prediction class. Further exploration of the data results is possible using the resulting dataframes.

### Recommended datasets:
The Severson models were developed on fast-charge LFP cycling data. The Variance model contains a single feature based on the variance of the difference between voltage vs capacity curves of two cycles (`start` and `end`). It is likely that these ML models are degradation mode specific. Since the expected degradation mode of the original dataset focused on loss of active material of the negative electrode, cells which have that degradation mode might show better fits. Additionally, it is recommended that similar discharge protocols and cut-off voltages are used for comparison/calculate purposes for the datasets. This is because the features are calculated based on discharge steps, and data is interpolated between the upper and lower cutoff voltages. Significantly different cycling protocols might not allow the ML model to capture the appropriate feature signatures.

#### References: 
[Schauser Nicole S., Lininger Christianna N., Leland Eli S., Sholklapper Tal Z. An open access tool for exploring machine learning model choice for battery life cycle prediction. Frontiers in Energy Research, 10 (2022) DOI: 10.3389/fenrg.2022.1058999](https://www.frontiersin.org/articles/10.3389/fenrg.2022.1058999)

[Severson et al. Data-driven prediction of battery cycle life before capacity degradation. Nature Energy volume 4, pages 383–391 (2019)](https://www.nature.com/articles/s41560-019-0356-8)


### Model uncertainty / prediction intervals
It is important to not only be able to look at a model's error during the training process, but also be able to estimate or identify the model uncertainty for a new prediction. This can be identified for every new prediction using a model's prediction interval.

There are a few ways to obtain the prediction intervals. For linear regression, these can be computed analytically. For other models, they are inherently included in the model framework (e.g. Bayesian approaches such as [Bayesian Ridge Regression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.BayesianRidge.html), [Gaussian Process Regression](https://scikit-learn.org/stable/modules/generated/sklearn.gaussian_process.GaussianProcessRegressor.html) and [Gradient Boosting Regression](https://scikit-learn.org/stable/auto_examples/ensemble/plot_gradient_boosting_quantile.html)). Lastly, there are options for computing the prediction uncertainty using python packages such as [MAPIE](https://github.com/scikit-learn-contrib/MAPIE), which stands for "Model Agnostic Prediction Interval Estimator". A good tutorial on using MAPIE for tabular regression (as is the case for cycle life prediciton) can be found in their [documentation](https://mapie.readthedocs.io/en/latest/examples_regression/4-tutorials/plot_main-tutorial-regression.html). The benefits of this approach is that it can be used for any sklearn-compatible regressor, making it a powerful option when trying to compare multiple models!

The MAPIE output provides both the prediction value obtained by a model (i.e. the cycle life predicted by a model for a specific input cell), as well as a prediction interval given by a lower and upper bound. This prediction interval provides the bounds for the expected error or noise on the prediction, such that the 'true' value is expected to fall within the prediction interval with `(100-alpha)`% coverage (that is, in `(100-alpha)`% of the cases, we expect the 'true' value to fall within this interval). Note that `alpha` is a tunable parameter but is most commonly taken as `5`, such that we have a 95% prediction interval. Additional documentation on the implementation (in our cse we have chosen the CV+ method) can be found in the [documentation](https://mapie.readthedocs.io/en/latest/theoretical_description_regression.html), which also references the original publications.

We have augmented this script (after the Frontiers in Energy Research 2022 publication) to include prediction intervals for both the Train/Test data (which can be accessed from the parity plots as well as from dataframes), and for prediction data. Initial inspiration was taken from this [blog post](https://www.valencekjell.com/posts/2022-09-14-prediction-intervals/)

#### Imports and set-up

In [3]:
pip install xgboost

Collecting xgboost
  Using cached xgboost-1.7.3-py3-none-manylinux2014_x86_64.whl (193.6 MB)
Installing collected packages: xgboost
Successfully installed xgboost-1.7.3
Note: you may need to restart the kernel to use updated packages.


In [4]:
pip install mapie

Collecting mapie
  Using cached MAPIE-0.6.0-py3-none-any.whl (98 kB)
Installing collected packages: mapie
Successfully installed mapie-0.6.0
Note: you may need to restart the kernel to use updated packages.


In [1]:
import voltaiq_studio as vs
from voltaiq_studio import TraceFilterOperation

import severson_featurization
import ML_models
import cl_widgets as cpw
import importlib
# importlib.reload(CL_prediction)
importlib.reload(ML_models)
importlib.reload(severson_featurization)
importlib.reload(cpw)
from CL_prediction import CLPrediction
from severson_featurization import calc_X_and_y, drop_unfinished_tests

import ipywidgets as widgets
from ipywidgets import interactive, interact, fixed

from IPython.display import display, Markdown

import pickle
from datetime import datetime

import numpy as np
import pandas as pd
from scipy import stats
import math

import matplotlib.pyplot as plt
import matplotlib as mpl
from cycler import cycler
import seaborn as sns

# set a few default figure parameters
mpl.rcParams['figure.figsize'] = (3,3)
colors = ['#332288','#882255','#117733','#AA4499','#44AA99','#CC6677','#88CCEE','#DDCC77','#A3E8E7']

mpl.rcParams['axes.prop_cycle'] = cycler(color=colors)
fontsize = 6
titlesize = 8
mpl.rcParams['font.size'] = fontsize
mpl.rcParams['legend.fontsize'] = fontsize
mpl.rcParams['figure.titlesize'] = titlesize
mpl.rcParams['axes.labelsize']=fontsize
mpl.rcParams['lines.markersize'] = fontsize
mpl.rcParams['figure.dpi'] = 150

In [2]:
trs = vs.get_test_records()

### User inputs: Select Model(s), Model Inputs, Train data and Test data

In [3]:
# we will start by instantiating a cycle life prediction object 
# which will store all relevant information for the datasets and models you will choose
prediction1 = CLPrediction()

#### Select Models

In [4]:
model_options = ['All','Dummy','Severson variance','Severson discharge','Severson discharge XGBoost']

In [5]:
choose_model = interactive(cpw.set_model, model_choice = widgets.SelectMultiple(options = model_options, value=['All'], description='Choose ML model(s)',style={'description_width': 'initial'},disabled=False), prediction_object = fixed(prediction1), model_options = fixed(model_options))
display(choose_model)

interactive(children=(SelectMultiple(description='Choose ML model(s)', index=(0,), options=('All', 'Dummy', 'S…

#### Select Featurization Criteria

In [6]:
featurize = interactive(cpw.featurize_inputs_widget, start_cycle = widgets.IntText(value = 20, description = 'Initial cycle: ', disabled=False,continuous_update = False),
                        end_cycle = widgets.IntText(value = 99, description = 'End cycle: ', disabled=False,continuous_update = False),
                        per_cap_ret = widgets.BoundedFloatText(value = 85,min = 0, max = 100, step = 1, description = '% Capacity Retention:',style={'description_width': 'initial'}, disabled=False,continuous_update = False),
                       prediction1 = fixed(prediction1),ref_cyc = widgets.BoundedFloatText(value = 20, min = 0, step = 1, description = "Reference cycle for capacity normalization",style={'description_width': 'initial'} ))
display(featurize)

interactive(children=(IntText(value=20, description='Initial cycle: '), IntText(value=99, description='End cyc…

#### Select Training Dataset

In [7]:
display(Markdown("#### Search for test records to add to Train dataset"))
display(Markdown("Filtering tests by capacity retention is slow; check kernel status for update on completion."))
# search_type = widgets.RadioButtons(options=['Test Name','Min Cycle Number','Both'],
#                                     disabled=False)
# Want to add in some search criteria here: min cycle number (if blank, ignore), min capacity retention

select_train = interactive(cpw.select_widget, 
                           train_sets = widgets.SelectMultiple(value=[], options=cpw.std_train_datasets, description=f'Training Datasets:',style={'description_width': 'initial'}, ensure_option=True),
                          train_or_test=fixed('train'), pred_obj = fixed(prediction1), trs = fixed(trs), predict_button = fixed(None))

display(select_train)

#### Search for test records to add to Train dataset

Filtering tests by capacity retention is slow; check kernel status for update on completion.

interactive(children=(SelectMultiple(description='Training Datasets:', options=('Severson2019 - All (LFP)', 'S…

#### Select Testing Dataset (or train-test-split ratio)

In [8]:
test_select_dropdown = interactive(cpw.test_select_method,method = widgets.Dropdown(options = ['Use train_test_split on training dataset','Select test dataset manually'], 
                                                                                value = None, description = 'Test dataset selection method', style={'description_width': 'initial'},
                                                                                layout = widgets.Layout(width='500px')), prediction1 = fixed(prediction1), trs = fixed(trs),predict_button=fixed(None))
display(test_select_dropdown)

interactive(children=(Dropdown(description='Test dataset selection method', layout=Layout(width='500px'), opti…

#### Featurize the data

In [9]:
output = widgets.Output()
perform_featurize = widgets.Button(description = 'Featurize data', button_style = 'danger', style={"button_color": "#38adad"})

display(perform_featurize, output)

def featurize(b):
    ''' function that will featurize the data'''
    with output:
        cpw.populate_test_train_data(prediction1, trs)
        print("Starting featurization...")
        prediction1.featurize(trs)
        print("Featurization complete!")

perform_featurize.on_click(featurize)

Button(button_style='danger', description='Featurize data', style=ButtonStyle(button_color='#38adad'))

Output()

#### Train & Test ML model(s)

In [10]:
train_model_button = widgets.Button(description = 'Train model', button_style = 'danger', style={"button_color": "#38adad"})
test_button = widgets.Button(description = 'Test model', button_style = 'danger', style={"button_color": "#38adad"},disabled=True,)
parity_button = widgets.Button(description = 'Generate Parity Plots', button_style = 'danger', style={"button_color": "#38adad"},disabled=True)
MAPE_button = widgets.Button(description = 'Plot MAPE results', button_style = 'danger', style={"button_color": "#38adad"},disabled=True)
RMSE_button = widgets.Button(description = 'Plot RMSE results', button_style = 'danger', style={"button_color": "#38adad"},disabled=True)


output = widgets.Output()
display(train_model_button, test_button,parity_button,MAPE_button,RMSE_button, output)

def train_button(b):
    with output:
        prediction1.train_model()
        test_button.disabled = False

def test_click(b):
    with output:
        prediction1.test_predict()
        parity_button.disabled = False
        MAPE_button.disabled = False
        RMSE_button.disabled = False
        
def parity_click(b):
    with output:
        prediction1.create_parity_plots()

def mape_click(b):
    with output:
        prediction1.plot_model_stats('MAPE')
        prediction1.plot_grouped_model_stats('MAPE')
        
def rmse_click(b):
    with output:
        prediction1.plot_model_stats('RMSE')
        
train_model_button.on_click(train_button)
test_button.on_click(test_click)
parity_button.on_click(parity_click)
MAPE_button.on_click(mape_click)
RMSE_button.on_click(rmse_click)

Button(button_style='danger', description='Train model', style=ButtonStyle(button_color='#38adad'))

Button(button_style='danger', description='Test model', disabled=True, style=ButtonStyle(button_color='#38adad…

Button(button_style='danger', description='Generate Parity Plots', disabled=True, style=ButtonStyle(button_col…

Button(button_style='danger', description='Plot MAPE results', disabled=True, style=ButtonStyle(button_color='…

Button(button_style='danger', description='Plot RMSE results', disabled=True, style=ButtonStyle(button_color='…

Output()

to do:
1. add in print-out or tabular view of MAPE/RMSE and width of 95% prediction interval, as well as % coverage
1. Add in error bars for the prediction bar charts
1. Add in error bar information into the printout dataframes (e.g. prediction1.return_prediction_dataframes(train_vs_test))


next step (for this afternoon): figure out how to store and plot the error bars, then compare how a mapie model performs relative to the non-mapie model. Then figure out how to add in the error estimate for every predicted datapoint.

#### Saving a model for future use
The next section allows a user to save a trained model for use on predicitons in the future. Users must select the model(s) they would like to save, as well as names for those models. Models will be saved as pickle files which can be loaded back into python for futher use.

In [None]:
# save the entire prediction object. This includes all models and formatted data
name = "prediction_example" + str(datetime.now())
with open(name,'wb') as files:
    pickle.dump(prediction1, files)

In [None]:
# just save the models
for model in prediction1.ml_model:
    print(model)

In [None]:
# choose a model from the list and edit the model_to_save variable
model_to_save = 'Severson variance'
model_name = model_to_save + str(datetime.now())
with open(model_name,'wb') as files:
    pickle.dump(prediction1.trained_models[model_to_save], files)

In [None]:
# just save the featurized data:
X_train, X_test, y_train, y_test = prediction1.get_featurized_data()
data_dict = {'X_train':X_train,' X_test':X_test, 'y_train':y_train, 'y_test':y_test}
data_name = "prediction_data" + str(datetime.now())
with open(data_name,'wb') as files:
    pickle.dump(data_dict, files)

To load in a prediction object, use the following code block:

In [None]:
load_name = name
with open(load_name, "rb") as f:
    prediction_load = pickle.load(f)

In [None]:
# we can examine the prediction object by creating the parity plots, for example
prediction_load.create_parity_plots()

Similar code can be used for models or data:

In [None]:
load_name = model_name
with open(load_name, "rb") as f:
    model_load = pickle.load(f)

In [None]:
load_name = data_name
with open(load_name, "rb") as f:
    data_load = pickle.load(f)
data_load['X_train'].head()

With the next block of code, a user can set the loaded model to be the model used for analysis moving forward:

In [None]:
prediction1 = prediction_load

#### Exploring data feature distributions

First, identify the most important features for the Severson Discharge model (skip this step if the model was not chosen.

Next, plot the feature distributions of the three most important features in terms of model weighting.

In [None]:
eNet_dchg_coef = pd.DataFrame()
eNet_dchg_coef['features'] = prediction1.trained_models['Severson discharge'].pipeline.named_steps['enet'].coef_
eNet_dchg_coef['coef'] = prediction1.trained_models['Severson discharge'].X_train.columns
eNet_dchg_coef['abs_features'] = abs(eNet_dchg_coef['features'])
eNet_dchg_coef_sorted = eNet_dchg_coef.sort_values('abs_features',ascending=False)
eNet_dchg_coef_sorted.reset_index(inplace=True,drop=True)
eNet_dchg_coef_sorted.drop(columns=['abs_features'],inplace=True)

eNet_dchg_coef_sorted

In [None]:
for feature in eNet_dchg_coef_sorted.coef[0:3]:
    prediction1.grouped_feature_distribution(feature)

#### Pearson correlation coefficient plots and analysis

In [None]:
train_test_variance_grp = pd.concat([prediction1.X_train[['Dataset_group','var_deltaQ']],prediction1.X_test[['Dataset_group','var_deltaQ']]],ignore_index=True)
train_test_log_cyc = pd.concat([prediction1.y_train[['log_cyc_life']],prediction1.y_test[['log_cyc_life']]],ignore_index=True)

In [None]:
unique_grps = pd.unique(train_test_variance_grp.Dataset_group)

for grp in unique_grps:
    train_idx = train_test_variance_grp[train_test_variance_grp.Dataset_group == grp].index
    plt.scatter(x=train_test_variance_grp.var_deltaQ[train_idx],y=train_test_log_cyc.log_cyc_life[train_idx],label=grp,alpha=0.6)
# plt.yscale('log')
plt.legend(loc='best',bbox_to_anchor=(1,1))
# plt.axis('square')
plt.ylabel('Log cycles to 85% capacity retention')
plt.xlabel('Log Variance feature')
plt.show()

In [None]:
pearson_correlation = pd.DataFrame()
names = []
correlation = []
for grp in unique_grps:
    train_idx = train_test_variance_grp[train_test_variance_grp.Dataset_group == grp].index
    var = train_test_variance_grp.var_deltaQ[train_idx]
    lftm = train_test_log_cyc.log_cyc_life[train_idx]
    names.append(grp)
    correlation.append(stats.pearsonr(var, lftm)[0])
pearson_correlation['Dataset']=names
pearson_correlation['Pearson Correlation Coefficient'] = correlation
pearson_correlation

In [None]:
i=0
colors = colors*math.ceil(len(unique_grps)/len(colors))
for grp in unique_grps:
    train_idx = train_test_variance_grp[train_test_variance_grp.Dataset_group == grp].index
    plt.scatter(x=train_test_variance_grp.var_deltaQ[train_idx],y=train_test_log_cyc.log_cyc_life[train_idx],label=grp,alpha=0.6,c=colors[i])
    plt.legend()
    plt.ylabel('Log cycles to 85% capacity retention')
    plt.xlabel('Log Variance feature')
    plt.show()
    i+=1

#### Tabular data
The following code sections allow users to look at tabular data of the Test and Train dataset performance. Users will have to change the model and train_vs_test to update the dataframe that is returned. Note that the error columns provide information for error bars, and thus provide the [lower, upper] error (rather than lower, upper values) for each model type.

In [11]:
train_vs_test = "train"

prediction1.return_prediction_dataframes(train_vs_test)

Unnamed: 0,Name,Dummy Predicted cycle life,Dummy Predicted CL error,Severson variance Predicted cycle life,Severson variance Predicted CL error,Severson discharge Predicted cycle life,Severson discharge Predicted CL error,Severson discharge XGBoost Predicted cycle life,Severson discharge XGBoost Predicted CL error,Actual cycle life
0,2017-05-12_5_4C-50per_3C_CH14_VDF,645.461828,"[370.46182773674116, 1053.7754615620356]",864.250392,"[174.13452666651506, 200.8597507820856]",780.715751,"[196.60511792955447, 184.21763162980346]",765.689636,"[347.32284329722546, 637.627380421251]",766.0
1,2017-05-12_7C-30per_3_6C_CH39_VDF,645.461828,"[370.46182773674116, 1053.7754615620356]",756.722178,"[147.00504202211425, 175.18545482750073]",748.235901,"[166.4644925446877, 212.83315201951177]",711.729858,"[322.3254135724249, 594.4394958967414]",711.0
2,2017-05-12_6C-40per_3_6C_CH33_VDF,645.461828,"[370.46182773674116, 1053.7754615620356]",730.944862,"[140.61282699689684, 169.3717672945237]",755.537399,"[150.71167876016716, 243.6166914254602]",822.727051,"[372.84017726998974, 686.3169647348911]",823.0
3,2017-06-30_5_6C-38per_4_25C_CH38_VDF,645.461828,"[370.46182773674116, 1053.7754615620356]",531.867103,"[100.9778506923227, 124.27047552669183]",539.388272,"[125.71741174261928, 143.9836784788606]",457.822083,"[207.52566342459053, 381.74078445726354]",458.0
4,2017-06-30_4C-13per_5C_CH27_VDF,645.461828,"[370.46182773674116, 1053.7754615620356]",552.246346,"[104.59368907405877, 128.90572609444973]",549.358691,"[134.54217003877795, 133.2937970129301]",456.748383,"[206.81267279699472, 381.60456636429467]",457.0
5,2017-06-30_4_4C-24per_5C_CH17_VDF,645.461828,"[370.46182773674116, 1053.7754615620356]",407.892429,"[78.75793116698452, 98.56568757397935]",416.346574,"[87.13054413270766, 142.89623270982906]",471.973145,"[214.10360192203217, 392.9920566563436]",472.0
6,2017-06-30_5_2C-71per_3C_CH36_VDF,645.461828,"[370.46182773674116, 1053.7754615620356]",539.862249,"[102.39754186577352, 126.08952201126158]",533.546654,"[130.86834660637783, 131.66592633390712]",446.680023,"[179.98927100737723, 440.31465139087277]",446.0
7,2017-06-30_1C-4per_6C_CH9_VDF,645.461828,"[370.46182773674116, 1053.7754615620356]",272.234651,"[53.897255296375846, 75.17247941642938]",282.43557,"[63.440643436865, 79.33753109336726]",275.389191,"[95.84809608557535, 276.18555952656527]",275.0
8,2017-05-12_8C-25per_3_6C_CH45_VDF,645.461828,"[370.46182773674116, 1053.7754615620356]",653.108681,"[122.35832233278381, 151.78278041378599]",661.933794,"[150.51016813869353, 182.92281618201548]",658.146606,"[287.7297455022204, 584.3331816173575]",657.0
9,2017-06-30_4C-40per_6C_CH29_VDF,645.461828,"[370.46182773674116, 1053.7754615620356]",421.697959,"[81.25305434988786, 100.72225729924492]",441.818258,"[96.47171630561672, 124.00880582413339]",451.106415,"[204.55382455430782, 375.89862295772934]",451.0


#### Select Prediction dataset, and predict cycle lives

In [12]:
# next step is to allow users to select data for prediction. That needs to be featurized (but no y-values) and then the CL values can be predicted and shared (how to visualize...?)

predict_model_predict = widgets.Button(description = 'Predict Lifetime', button_style = 'danger', style={"button_color": "#38adad"}, disabled = True)

display(Markdown("#### Search for test records to add to Prediction dataset"))
display(Markdown("Filtering tests by capacity retention is slow; check kernel status for update on completion."))
# search_type = widgets.RadioButtons(options=['Test Name','Min Cycle Number','Both'],
#                                     disabled=False)
# Want to add in some search criteria here: min cycle number (if blank, ignore), min capacity retention

select_predict = interactive(cpw.select_widget, 
                           train_sets = widgets.SelectMultiple(value=[], options=cpw.std_train_datasets, description=f'Prediction Datasets:',style={'description_width': 'initial'}, ensure_option=True),
                          train_or_test=fixed('predict'), pred_obj = fixed(prediction1), trs = fixed(trs), predict_button = fixed(predict_model_predict))

# interactive(cpw.custom_select, filter_by_cap_retention = widgets.Checkbox(value=False,description='Filter tests by capacity retention threshold',
#                                                                                          style={'description_width': 'initial'}),
#                            min_cyc_num = widgets.IntText(description = 'Minimum # of cycles:',style={'description_width': 'initial'}, value = prediction1.get_end_cycle()+1),
#                            other_search_text = widgets.Text(
#                 value = prediction1.get_last_custom_search(),description='Test name search:', 
#                 style={'description_width': 'initial'},continuous_update=False),
#                                      train_or_test=fixed('predict'), prediction1 = fixed(prediction1),
#                                      trs = fixed(trs),predict_button = fixed(predict_model_predict))

output = widgets.Output()

display(select_predict,predict_model_predict, output)
    
        
def pred_model_predict(b):
    ''' function that will predict CL on prediction data. returns a plot of predicted cycle life'''
    with output:
        cpw.populate_test_train_data(prediction1, trs, predict = True)
        print("Starting featurization...")
        prediction1.featurize_predict(trs)
        print("Featurization complete!")
        prediction1.predict()
        prediction1.calc_predicted_cyclelife()
        prediction_df, predictiondf_errors, time_pred_df = prediction1.return_predicted_cyclelife()
        # append zeros for the errors associated with current cycle and cycle to 85% capacity
        predictiondf_errors.append([np.array([0]*len(prediction_df))]*2)
        predictiondf_errors.append([np.array([0]*len(prediction_df))]*2)
        log_scale = max(prediction_df.drop(columns = ['Name','Current cycle']).max()) > 10*(max(prediction_df['Current cycle']))
        prediction_df.set_index('Name').plot.barh(xerr = predictiondf_errors, figsize=(10, len(prediction_df)/1.2), width = .8, logx = log_scale)
        plt.xlabel('Cycles')
        plt.title("Predicted cycle life by ML model for each test")
        plt.show()
        
        # want to only show predicted time to failure for tests that have not already 'failed'
        # so I want to add a filter criteria based on prediction_df
        if len(time_pred_df) >0:
            print("Predicted time remaining (hours) based on each ML model for tests which have not yet reached the capacity retention threshold")
            log_scale_time = max(time_pred_df.drop(columns = ['Name']).max()) > 10*(min(time_pred_df[time_pred_df.drop(columns = ['Name']) > 0].drop(columns = ['Name']).min()))

            time_pred_df.set_index('Name').plot.barh(figsize=(10, len(time_pred_df)/1.2),width = .8, logx = log_scale_time)
            plt.xlabel('Predicted Hours until Failure')
    #         plt.title
            plt.show()
        else:
            print("All tests in the Prediction dataset have already reached the set capacity retention threshold")

predict_model_predict.on_click(pred_model_predict)


#### Search for test records to add to Prediction dataset

Filtering tests by capacity retention is slow; check kernel status for update on completion.

interactive(children=(SelectMultiple(description='Prediction Datasets:', options=('Severson2019 - All (LFP)', …

Button(button_style='danger', description='Predict Lifetime', disabled=True, style=ButtonStyle(button_color='#…

Output()

To show the prediction dataframe including errors, run the following command:

In [13]:
pred_df_with_error = prediction1.get_predicted_cyclelife_with_error()
pred_df_with_error

Unnamed: 0_level_0,Current cycle,Cycle to 85.0% capacity retention,Dummy Predicted Cycle Life,Dummy Predicted Cycle Life error,Severson discharge Predicted Cycle Life,Severson discharge Predicted Cycle Life error,Severson discharge XGBoost Predicted Cycle Life,Severson discharge XGBoost Predicted Cycle Life error,Severson variance Predicted Cycle Life,Severson variance Predicted Cycle Life error
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
2017-05-12_3_6C-80per_3_6C_CH1_VDF,1191,1759,645.461828,"[370.46182773674116, 1053.7754615620356]",1463.223757,"[311.72375613793383, 448.2269856376365]",1214.794189,"[690.0599831967922, 1097.9500647550885]",1596.978946,"[320.65026723461233, 397.8397650422087]"
2017-05-12_3_6C-80per_3_6C_CH3_VDF,1178,2150,645.461828,"[370.46182773674116, 1053.7754615620356]",1738.733672,"[305.73473475030255, 628.5379261241496]",1288.681396,"[768.694837478683, 907.4090480840027]",1843.361907,"[360.80789635406813, 466.4769461039141]"
2017-05-12_4C-80per_4C_CH6_VDF,1228,1612,645.461828,"[370.46182773674116, 1053.7754615620356]",1368.098124,"[294.3910810658049, 421.49082352681944]",1312.785767,"[608.3746312980137, 1403.296307720393]",1448.692077,"[295.8211534968914, 357.0343319470237]"
2017-05-12_4_8C-80per_4_8C_CH10_VDF,637,609,645.461828,"[370.46182773674116, 1053.7754615620356]",745.809254,"[158.2849807320339, 224.76336091375288]",745.499756,"[359.8413248230073, 548.1044403390918]",750.421172,"[145.43833925793717, 173.7648546088609]"
2017-05-12_5_4C-40per_3_6C_CH20_VDF,1055,1006,645.461828,"[370.46182773674116, 1053.7754615620356]",888.360635,"[174.53986836637307, 290.85002813124584]",826.607361,"[380.44002444431607, 494.83747848126814]",885.07775,"[179.47028365490905, 206.27013604115348]"
2017-05-12_5_4C-60per_3C_CH15_VDF,881,839,645.461828,"[370.46182773674116, 1053.7754615620356]",858.316668,"[196.36044640119337, 235.21531019927943]",719.029419,"[306.5482235205232, 595.018992457374]",897.660331,"[182.70581165944725, 209.54533550466556]"
2017-05-12_5_4C-60per_3_6C_CH23_VDF,863,820,645.461828,"[370.46182773674116, 1053.7754615620356]",927.144297,"[201.71858439993753, 271.23739171183206]",833.861511,"[405.53879008498205, 557.2903943220476]",934.104622,"[192.1268253457647, 219.05893468724992]"
2017-05-12_5_4C-70per_3C_CH17_VDF,692,661,645.461828,"[370.46182773674116, 1053.7754615620356]",601.178492,"[162.3950452937146, 127.86443855885614]",607.282837,"[243.06613904598163, 614.3999029694819]",625.258487,"[117.4738267055805, 145.47604342508487]"
2017-05-12_5_4C-80per_5_4C_CH11_VDF,535,502,645.461828,"[370.46182773674116, 1053.7754615620356]",490.134179,"[117.78153250582915, 124.98131173801988]",496.3526,"[233.7289065342395, 384.55932785327036]",491.014571,"[93.6999917202408, 114.96414946506661]"
2017-05-12_6C-30per_3_6C_CH32_VDF,1015,971,645.461828,"[370.46182773674116, 1053.7754615620356]",847.783923,"[187.6563324734916, 242.72721099141756]",868.335022,"[428.1083959282411, 541.7399120533905]",858.569112,"[172.68340789316983, 199.38630105629545]"


To show the dataframes and error arrays that were used to generate the above plots for prediction data, run the following command:

In [14]:
prediction_df,predictiondf_errors, time_pred_df = prediction1.return_predicted_cyclelife()
prediction_df

Unnamed: 0,Name,Dummy Predicted Cycle Life,Severson variance Predicted Cycle Life,Severson discharge Predicted Cycle Life,Severson discharge XGBoost Predicted Cycle Life,Current cycle,Cycle to 85.0% capacity retention
0,2017-05-12_3_6C-80per_3_6C_CH1_VDF,645.461828,1596.978946,1463.223757,1214.794189,1191,1759
1,2017-05-12_3_6C-80per_3_6C_CH3_VDF,645.461828,1843.361907,1738.733672,1288.681396,1178,2150
2,2017-05-12_4C-80per_4C_CH6_VDF,645.461828,1448.692077,1368.098124,1312.785767,1228,1612
3,2017-05-12_4_8C-80per_4_8C_CH10_VDF,645.461828,750.421172,745.809254,745.499756,637,609
4,2017-05-12_5_4C-40per_3_6C_CH20_VDF,645.461828,885.07775,888.360635,826.607361,1055,1006
5,2017-05-12_5_4C-60per_3C_CH15_VDF,645.461828,897.660331,858.316668,719.029419,881,839
6,2017-05-12_5_4C-60per_3_6C_CH23_VDF,645.461828,934.104622,927.144297,833.861511,863,820
7,2017-05-12_5_4C-70per_3C_CH17_VDF,645.461828,625.258487,601.178492,607.282837,692,661
8,2017-05-12_5_4C-80per_5_4C_CH11_VDF,645.461828,491.014571,490.134179,496.3526,535,502
9,2017-05-12_6C-30per_3_6C_CH32_VDF,645.461828,858.569112,847.783923,868.335022,1015,971
