Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Current performance evaluation objects, recently added to TunedModel histories, are too big #1105

Closed
8 of 11 tasks
ablaom opened this issue Apr 17, 2024 · 2 comments · Fixed by JuliaAI/MLJTuning.jl#215
Closed
8 of 11 tasks

Comments

@ablaom
Copy link
Member

ablaom commented Apr 17, 2024

There's evidence that the recent addition of full PerformanceEvaluation objects to TunedModel histories is blowing up memory requirements in real use cases.

I propose that we create two PerformanceEvaluation objects - a detailed one (as we have now) and new CompactPerformanceEvaluation object. The evaluate method get's a new keyword argument compact=false and TunedModel gets a new hyperparameter compact_history=true (this default would technically break MLJTuning but I doubt this would effect more than one or two users - and the recent change is not actually documented anywhere yet.)

This would also allow us to ultimately address #575, which was shelved for fear of making evaluation objects too big.

Further thoughts anyone?

cc @CameronBieganek, @OkonSamuel

Below are the fields of the current struct. I've ticked off suggested fields for the compact case. I suppose the only one that might be controversial is observations_per_fold. This was always included in TunedModel histories previously, so it seems less disruptive to include it.

Fields

These fields are part of the public API of the PerformanceEvaluation struct.

  • model: model used to create the performance evaluation. In the case a
    tuning model, this is the best model found.

  • measure: vector of measures (metrics) used to evaluate performance

  • measurement: vector of measurements - one for each element of measure - aggregating
    the performance measurements over all train/test pairs (folds). The aggregation method
    applied for a given measure m is
    StatisticalMeasuresBase.external_aggregation_mode(m) (commonly Mean() or Sum())

  • operation (e.g., predict_mode): the operations applied for each measure to generate
    predictions to be evaluated. Possibilities are: $PREDICT_OPERATIONS_STRING.

  • per_fold: a vector of vectors of individual test fold evaluations (one vector per
    measure). Useful for obtaining a rough estimate of the variance of the performance
    estimate.

  • per_observation: a vector of vectors of vectors containing individual per-observation
    measurements: for an evaluation e, e.per_observation[m][f][i] is the measurement for
    the ith observation in the fth test fold, evaluated using the mth measure. Useful
    for some forms of hyper-parameter optimization. Note that an aggregregated measurement
    for some measure measure is repeated across all observations in a fold if
    StatisticalMeasures.can_report_unaggregated(measure) == true. If e has been computed
    with the per_observation=false option, then e_per_observation is a vector of
    missings.

  • fitted_params_per_fold: a vector containing fitted params(mach) for each machine
    mach trained during resampling - one machine per train/test pair. Use this to extract
    the learned parameters for each individual training event.

  • report_per_fold: a vector containing report(mach) for each machine mach training
    in resampling - one machine per train/test pair.

  • train_test_rows: a vector of tuples, each of the form (train, test), where train
    and test are vectors of row (observation) indices for training and evaluation
    respectively.

  • resampling: the resampling strategy used to generate the train/test pairs.

  • repeats: the number of times the resampling strategy was repeated.

@ablaom
Copy link
Member Author

ablaom commented Apr 17, 2024

Also relevant: #1025

@ablaom ablaom changed the title Current evaluations objects, recently added to TunedModel histories) are too big Current performance evaluation objects, recently added to TunedModel histories) are too big Apr 22, 2024
@ablaom ablaom changed the title Current performance evaluation objects, recently added to TunedModel histories) are too big Current performance evaluation objects, recently added to TunedModel histories, are too big Apr 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant