ENH: Performance improvements for generating candidate models #254

bocklund · 2024-01-17T20:22:02Z

This set of changes improves the performance of parameter selection with two primary changes:

When we build candidate models (renamed build_feature_sets to build_candidate_models) we take all combinations of the product of composition-independent features with interaction features. The implication of this is that some models that have a lot of features, for example heat capacity temperature features with four binary interaction features, can get very expensive to generate candidate models because the current implementation has geometric complexity with respect to the temperature and interaction features (as documented). Here we make an optimization for cases when the general implementation will generate more than complex_algorithm_candidate_limit (default =1000) candidate models, where the simplified version will have the same number of composition-independent features for all interaction features. Instead of geometric complexity $N(1-N^M)/(1-N)$, the simplified version has complexity $NM$, where $N$ and $M$ are the number of composition-independent features and interaction features, respectively.
A profiling-guided optimization in espei.paramselect._build_feature_matrix. The feature matrix is a concrete matrix of reals (rows: observations, columns: feature coefficients). We use a symengine vector (ImmutableDenseMatrix) to fill the feature matrix row-wise, moving an inner loop to fast SymEngine rather than slow Python. Roughly 3x speedup of this function after this change.

…ngle set of features

…s successive The goal here is that we may have some models that don't want to use `make_successive` for feature sets.

modifies ESPEI code and def breaks gibbs energies, using this commit for backup

Factor out TDB analysis from notebook 2 Start moving get_data_quantities to a FittingStep staticmethod

Binary VM (VA) fitting looks like it works now!

The changes are mostly in adding parameters and not really the fitting itself - that was working. This may be throwaway code (see the comment added), so it's not too complex.

The idea of the modified version is that we also compute the actual site fractions because individual site fractions are not currently handled by ESPEI, but can slip in from existing models if not using a reference state where those contributions cancel (e.g. no _MIX or _FORM refstates keep the unary extrapolation). to do this, we'll use the config tuple and create site fractions from the points dict. tests currently pass locally

…ions Tests pass

tests still passing

fit_formation_energy -> fit_parameters

passing tests

It's working!

Pass through all the function indirection.

Add it to AbstractRKMPropertyStep

…tion.utils from espei.parameter_selection.utils import _get_sample_condition_dicts to from espei.error_functions.non_equilibrium_thermochemical_error import get_sample_condition_dicts

VA is normalized per atom

This is useful for organizing datasets for different runs while having one single source of truth for the data

This algorith has N*M complexity, which is an enormous simplification to the more complex algorith that converges to N^M complexity as N->inf. Before this change, even moderate N would cause _build_feature_matrix to become the dominant time-limiting function in profiling.

bocklund · 2024-01-17T21:15:53Z

Note that the performance issues this resolved are mostly due to a combination of the number of candidate models and the amount of data.

Generating the candidate models has an up front cost, but the result is cached so it's not overly expensive. The main contributor is that with many candidate models and data, most of the time is eventually spent in _build_feature_matrix. Since it's not entirely clear that the existing approach, while more general, actually generates better models, i.e. ones that are actually selected by the mAICc. In my experience, I haven't often seen models generated that have different features selected across different interaction parameter orders for the same interaction.

bocklund added 30 commits January 17, 2024 11:42

Refactor Redlich-Kister model candidate generation to operate on a si…

2a0c2d6

…ngle set of features

More model building refactoring to not automatically make feature set…

188579f

…s successive The goal here is that we may have some models that don't want to use `make_successive` for feature sets.

WIP: unary V0+VA modeling and binary+ternary V0 modeling in notebooks

6fff281

modifies ESPEI code and def breaks gibbs energies, using this commit for backup

Refactoring

0119da3

Factor out TDB analysis from notebook 2 Start moving get_data_quantities to a FittingStep staticmethod

Add some commented todos for shift_reference_state

f4bd443

Cleanup/rewrite get_data_quantities for VA parameters

fc7ddb7

Binary VM (VA) fitting looks like it works now!

Ensure support for higher order interaction parameters

d7520c2

The changes are mostly in adding parameters and not really the fitting itself - that was working. This may be throwaway code (see the comment added), so it's not too complex.

Support VM_MIX!

398da2d

Test of VM(T) data, seems to work

a32f42e

Cleanups for get_data_quantities for VA params

e7dd013

Working elastic constant fitting

dd2741a

Add TODO comment

183f51e

Cleanup elastic notebook

98fff12

ESPEI tests and notebook passing

06c855b

Move fitting steps and description to new notebook that works

491e57b

WIP: refactor: delete get_data_quantities shift_reference_state funct…

f84e43a

…ions Tests pass

WIP: more refactoring fit_formation_energy

d315498

tests still passing

Every day i'm factoring (tests passing)

fa218d2

Refactor: shared binary and ternary interaction code

116994b

Unify endmember and interaction parameter insertion

89f1b92

move insertion to fit_parameters

e54e795

fit_formation_energy -> fit_parameters

WIP: be able to insert parameters each step

5807734

passing tests

Tweaks to accept non G parameter types

18bf602

Fix to replace database symbols in case they slip in

54b8da6

Update notebook with the new fit_parameters!

3104d64

It's working!

Support fully qualified ModelFittingDescription in schema

21ebdb9

Pass through all the function indirection.

Sketch out some failing tests to get working

85cb874

Delete old notebooks

5297499

Add local directory to path to import qualified objects

daf8df6

bocklund added 26 commits January 17, 2024 11:42

test tweaks

f072ae8

Delete redundant test

5bced75

Implement test for normalizing per mole of formula units

f90aceb

Implement binary/ternary V0/VA absolute/mix tests

2d00e48

Remove testing notebook

009ce8b

Rename molar volume fitting description

78232e9

CALPHAD -> Calphad in docs

94f6854

Tutorial and input docs writeup

b2582bc

TODO cleanup

d7b152c

get_data_quantities -> get_response_vector

e35080c

Remove transform_data a a public api for FittingStep

bd2c696

Add it to AbstractRKMPropertyStep

rename AbstractRKMPropertyStep to AbstractLinearPropertyStep

d2d8fb5

refactor FittingStep.transform_feature for Gibbs params

897d9ae

Move/rename _get_sample_condition_dicts, delete espei.parameter_selec…

3630456

…tion.utils from espei.parameter_selection.utils import _get_sample_condition_dicts to from espei.error_functions.non_equilibrium_thermochemical_error import get_sample_condition_dicts

fitting_descriptions cleanup

1e7d105

Add typing in fitting_steps

ddf9167

Cleanup some TODOs

b8e2621

TODO and commenting cleanup

a741741

Remove Lu database

708a856

add normalization support to parameters

f71aeea

VA is normalized per atom

add mixing and absolute value test for V0

ca79f21

Follow links in dataset recursive glob

bc4c9e6

This is useful for organizing datasets for different runs while having one single source of truth for the data

Performance improvement in building feature sets

13a42f2

Merge master

465dea2

Add test for simplified candidate model generation

3a01367

bocklund changed the title ~~ENH: Performance improvements for generating candidates~~ ENH: Performance improvements for generating candidate models Jan 17, 2024

bocklund merged commit 5f6ff36 into PhasesResearchLab:master Jan 17, 2024
7 of 11 checks passed

bocklund deleted the performance-improvements-generating-candidates branch January 17, 2024 22:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Performance improvements for generating candidate models #254

ENH: Performance improvements for generating candidate models #254

bocklund commented Jan 17, 2024

bocklund commented Jan 17, 2024

ENH: Performance improvements for generating candidate models #254

ENH: Performance improvements for generating candidate models #254

Conversation

bocklund commented Jan 17, 2024

bocklund commented Jan 17, 2024