Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP - Cyclefeatures #568

Draft
wants to merge 16 commits into
base: master
Choose a base branch
from
Draft

WIP - Cyclefeatures #568

wants to merge 16 commits into from

Conversation

pasinger
Copy link
Collaborator

Merging code I have written offline to the main branch. These updates are to enable creation of beep feature matrix with diagnostic features calculated across cycles. Before this update, features are only calculated at the first two diagnostics and a difference of the two is reported.

Updates to intracell_analysis work towards using multiple rate data in the electrodes. Capability for fitting on discharge data has been added and is now the default mode.

Tests need to be written still. I am seeking some guidance on how to do this.

Some class names have changed, so I imagine there may be some changes to files that I did not capture here, such as files that are used to generate the pipeline data and call on the effected classes. I updated dependent files for using the build_feature_matrix notebook.

Issues fixed:

Addresses issue #567

add class BEEPCycleFeatureMatrix to combine features which were generated at every diagnostic cycle
Add CycleFeatures classes and change existing class names to EarlyFeatures where appropriate.

HPPCResistanceVoltageCycleFeatures returns hppc resistance features at every diagnostic.

DiagnosticCycleFeatures returns some standard quantities, like discharge capacity of various RPTs across all diagnostic cycles.

CyclingProtocol returns cycling protocol information for the cell. This was integrated from the ChargingProtocol class written by Amalie Trewartha.
Update DEFAULT_HYPERPARAMETERS and dependent calls to match updates to intracell_analysis.py.

Modify some variable names for easier understanding and consistency across files (e.g. using "diag_pos" in for loop instead of "i").

Update dataframe construction to match updates in intracell_analysis.py.
Modify reference electrode data to be a dictionary of rates and rate file names, instead of file names for one rate; these changes are necessary for other updates which seek to correct for effective rate in the electrodes and utilize multiple rate data of the electrodes.

Incorporate changes for using alternative error metrics, such as dVdQ instead of the V-Q curve.
@ardunn
Copy link
Collaborator

ardunn commented Mar 17, 2022

Quoting @pasinger from #567 :

Currently, diagnostic features are calculated at the first and second diagnostic and the difference between the two is reported. This format is amenable to early prediction, but analysis of the battery health across lifetime would be better served by having the features reported across every cycle.

Some diagnostic information is available in the structured data in the diagnostic_summary field, but what's available there is not comprehensive.

I propose making a distinction between 1) "EarlyFeatures", which are a difference on the first two diagnostics (or between the first and some other diagnostic number specified in the featurizer hyperparameters) and 2) "CycleFeatures", which are the diagnostic features reported across all diagnostic cycles.

@ardunn ardunn marked this pull request as draft March 17, 2022 22:58
@ardunn
Copy link
Collaborator

ardunn commented Mar 17, 2022

@pasinger

I think this is a great idea, thanks for the contribution!

The specific implementation can be improved upon. In fact, it might be worth replacing or overhauling the BEEPFeatureMatrix class to accommodate this kind of feature matrix (e.g., n_total_cycles_across_all_files x m_features rather than n_files x m_features). This is something I can do using your fork; this will be faster than going back/forth on changes in a PR. I'll include unit tests for this.

To do this though, I need two things from @pasinger

  1. At least some expected outputs/features for some of the featurizers on known files. So for DiagnosticCycleFeatures, CyclingProtocol, and the intracell changes, I'll need the input file and the expected features (as matrix or vector)
  2. Commit access to this PR branch. See how to do so here: https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/allowing-changes-to-a-pull-request-branch-created-from-a-fork

@ardunn ardunn changed the title Cyclefeatures WIP - Cyclefeatures Mar 17, 2022
@pasinger
Copy link
Collaborator Author

Hey @ardunn, sounds good. I should have updated the documentation - the BEEPCycleFeatureMatrix is already outputting a feature matrix of dimensions n_total_cycles_across_all_files x m_features. I am certainly open to reworking the class in other ways if there are benefits to it, though! I'll work on getting you permission now, then I'll send you a csv on slack of expected values for a particular cell. Does that work?

@ardunn
Copy link
Collaborator

ardunn commented Mar 24, 2022

Hey @pasinger, so after looking at this in depth, these are the things we should do to bring this PR to completion.

1. Merging BEEPCycleFeatureMatrix into BEEPFeatureMatrix (@ardunn )

Seems like 99% of what BEEPFeatureMatrix does is in BEEPCycleFeatureMatrix. I think we can actually merge that functionality into the BEEPFeatureMatrix class such that it works with both paradigms of feature generation: (a) a feature vector for all cycles and (b) what you have implemented here, a feature vector per cycle.

I will do this and push changes to your branch in the coming days.

2. Intracell analysis needs much more documentation (@pasinger )

So I actually have no idea what is going on in either of the intracell analysis files. If possible, we should add more documentation to both the modules and the classes to explain (scientifically) what the generated features are. The function level documentation is good, but it is really difficult to determine from a birds eye view what is going on in those modules.

I will leave this one up to @pasinger

3. Clarify why we need V1 and V2 of intracell analyses/losses (@pasinger )

I'm not against keeping them both in, but what are the key differences between them? Something to maybe explain in documentation, or remove the v1 if v2 supercedes it.

I will also leave this up to @pasinger

4. Refactoring existing uses of featurizers (@ardunn )

There are quite a few mentions of the existing feature classes which were not renamed (e.g., in the cmd line interface). I can correct these

5. Write tests for new feature classes and improve documentation for existing feature classes (@ardunn )

Based on the files @pasinger sent and completing (1.), I'll be able to do this for the new feature classes

@ardunn
Copy link
Collaborator

ardunn commented Mar 30, 2022

Done

  • Split BEEPFeaturizer into two child ABCs: BEEPPerCycleFeaturizer, which returns a feature vector for every cycle, and BEEPAllCyclesFeaturizer, which returns a feature vector for every cycler file
  • Refactor and reorganize all current featurizers into new featurizer submodules all_cycles and per_cycle, making it clear what classes can be used for what context
  • BEEPFeatureMatrix now accepts both sets of all-cycle and sets of per-cycle featurizers and can successfully collate them into a single dataframes/matrices

Still to do (@ardunn)

  • Refactor other modules (e.g., cmd) which use featurizers so that tests will pass
  • Update cmd to allow for per-cycler featurizers to be run (or add this as github issue)
  • Add tests for new per-cycle featurizers
  • Add documentation for new and existing feature classes (or push this to new issue)
  • Clean and refactor intracell analysis boilerplate based on @pasinger opinions
  • Clean and refactor featurizer_helpers into their respective correct modules

Still to do (@pasinger )

  • Confirm where intracell feature classes should go (in per cycle or all cycles featurizers)? it seems like various classes in these modules fit into different paradigms. For example, IntracellCyclesv2 are per-cycle featurizers, and IntracellFeaturesv2 are all-cycle (one-feature-vector-per-cycler-file) files based on the difference on the 1st vs. 2nd diag cycles
  • Confirm that CyclingProtocol class should indeed be a per-cycle featurizer considering it only utilizes a single cycle? Should this class be allowed for both per-cycle and all-cycles uses?
  • Clarify this where this get_hppc_resistance_cycle_features comes from?
# In HPPCResistanceVoltagePerCycle
       hppc_resistance_features = featurizer_helpers.get_hppc_resistance_cycle_features(
            self.datapath,
        )
  • Should we keep both Intracellv1 and v2?
  • Help write docstrings for the feature classes (i.e., what is the scientific relevance of each of the feature sets, and what kinds of features can users expect?)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants