this was easier than expected #40

pat-alt · 2022-12-02T16:23:20Z

Just had to defined performance measures and then tapping into MLJ evaluate

pat-alt · 2022-12-02T16:37:16Z

cc @ablaom I've had a go at implementing evaluation metrics for conformal predictions. This was fairly straight-forward thanks to MLJ's existing infrastructure. I essentially only had to add custom performance measures and this seems to be working.

I have two questions though that you might be able to help me with 🙏🏽

Q1: Firstly, should I extend MMI.evaluate to assert that users only use one of the two applicable custom measures? Something like this:

function MMI.evaluate(model, data...; cache=true, kw_options...)
    @assert measure in available_measures "Performance measure not applicable to `ConformalModel`." 
    MMI.evaluate(model, data...; cache=true, measure=measure, kw_options...)
end

Q2: Secondly, while evaluation runs smoothly, the output it prints for my custom methods look odd. Below is lifted from the example in the README:

> _eval = evaluate!(mach; measure=[emp_coverage, ssc], verbosity=0)
PerformanceEvaluation object with these fields:
  measure, operation, measurement, per_fold,
  per_observation, fitted_params_per_fold,
  report_per_fold, train_test_rows
Extract:
┌──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
│ measure                                                                                                                                                                                         ⋯
├──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
│    \e[38;2;155;179;224m╭──── \e[38;2;227;172;141mFunction: \e[1m\e[38;5;12memp_coverage\e[22m\e[39m\e[39m\e[38;2;155;179;224m\e[38;2;155;179;224m ──────────────────────────────────────────╮\e ⋯
│    \e[0m\e[38;2;155;179;224m│\e[39m\e[0m                                                                      \e[0m\e[38;2;155;179;224m│\e[39m\e[0m                                             ⋯
│    \e[0m\e[38;2;155;179;224m│\e[39m\e[0m  \e[1m\e[2m(1) \e[22m\e[22m \e[1m\e[38;2;165;198;217memp_coverage\e[22m\e[39m\e[38;2;255;245;157m(\e[39mŷ, y\e[38;2;255;245;157m)\e[39m                ⋯
│    \e[0m\e[38;2;155;179;224m│\e[39m\e[0m                                                                      \e[0m\e[38;2;155;179;224m│\e[39m\e[0m                                             ⋯
│    \e[38;2;155;179;224m╰───────────────────────────────────────────────────────── \e[1m\e[37m1\e[22m\e[39m method\e[38;2;155;179;224m ───╯\e[39m\e[0m\e[39m\e[0m                                ⋯
│ \e[2m\e[32m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Docstring\e[0m \e[2m\e[32m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\e[0m\e[22m\e[39m\e[22m\e[39m\e[0m                                                        ⋯
│        \e[2m\e[37m\e[48;2;38;38;38m┌─────────────────────────────────────────────────────────┐\e[22m\e[39m\e[49m                                                                                ⋯
│        \e[0m\e[2m\e[37m\e[48;2;38;38;38m│\e[22m\e[39m\e[49m\e[0m\e[48;2;38;38;38m  \e[49m\e[0m\e[48;2;38;38;38m\e[38;2;232;212;114memp_coverage\e[39m\e[38;2;227;136;100m(\e[39m\e[38;2;222;222 ⋯
│        \e[2m\e[37m\e[48;2;38;38;38m└─────────────────────────────────────────────────────────┘\e[22m\e[39m\e[49m\e[0m                                                                           ⋯
│                                                                                                                                                                                                 ⋯
│    Computes the empirical coverage for conformal predictions \e[3m\e[38;2;255;245;157m`\e[23m\e[39m\e[0m\e[38;2;222;222;222mŷ\e[39m\e[3m\e[38;2;255;245;157m`\e[23m\e[39m.\e[0m                 ⋯
│                                                                                                                                                                                                 ⋯
│    \e[38;2;155;179;224m╭──── \e[38;2;227;172;141mFunction: \e[1m\e[38;5;12msize_stratified_coverage\e[22m\e[39m\e[39m\e[38;2;155;179;224m\e[38;2;155;179;224m ──────────────────────────────╮\e ⋯
│    \e[0m\e[38;2;155;179;224m│\e[39m\e[0m                                                                      \e[0m\e[38;2;155;179;224m│\e[39m\e[0m                                             ⋯
│    \e[0m\e[38;2;155;179;224m│\e[39m\e[0m  \e[1m\e[2m(1) \e[22m\e[22m \e[1m\e[38;2;165;198;217msize_stratified_coverage\e[22m\e[39m\e[38;2;255;245;157m(\e[39mŷ, y\e[38;2;255;245;157m)\e[39m    ⋯
│    \e[0m\e[38;2;155;179;224m│\e[39m\e[0m                                                                      \e[0m\e[38;2;155;179;224m│\e[39m\e[0m                                             ⋯
│    \e[38;2;155;179;224m╰───────────────────────────────────────────────────────── \e[1m\e[37m1\e[22m\e[39m method\e[38;2;155;179;224m ───╯\e[39m\e[0m\e[39m\e[0m                                ⋯
│                                                                                                                                                                                                 ⋱
└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

When I access the fields of _eval, the produced measurements all check out, but the report looks strange. Any idea what's happening here?

codecov-commenter · 2022-12-02T16:39:41Z

Codecov Report

Merging #40 (11e5c2e) into main (1f101ec) will increase coverage by 0.27%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main      #40      +/-   ##
==========================================
+ Coverage   97.59%   97.86%   +0.27%     
==========================================
  Files           8        9       +1     
  Lines         374      422      +48     
==========================================
+ Hits          365      413      +48     
  Misses          9        9

Impacted Files	Coverage Δ
src/conformal_models/conformal_models.jl	`92.30% <ø> (ø)`
src/conformal_models/inductive_regression.jl	`100.00% <ø> (ø)`
src/conformal_models/model_traits.jl	`100.00% <ø> (ø)`
src/conformal_models/plotting.jl	`88.52% <ø> (ø)`
src/conformal_models/inductive_classification.jl	`98.41% <100.00%> (ø)`
...rc/conformal_models/transductive_classification.jl	`100.00% <100.00%> (ø)`
src/conformal_models/transductive_regression.jl	`100.00% <100.00%> (ø)`
src/conformal_models/utils.jl	`100.00% <100.00%> (ø)`
src/evaluation/evaluation.jl	`100.00% <100.00%> (ø)`

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

pat-alt · 2022-12-02T16:40:23Z

Keeping the branch open until below is sorted out.

cc @ablaom I've had a go at implementing evaluation metrics for conformal predictions. This was fairly straight-forward thanks to MLJ's existing infrastructure. I essentially only had to add custom performance measures and this seems to be working.

I have two questions though that you might be able to help me with 🙏🏽

Q1: Firstly, should I extend MMI.evaluate to assert that users only use one of the two applicable custom measures? Something like this:

function MMI.evaluate(model, data...; cache=true, kw_options...)
    @assert measure in available_measures "Performance measure not applicable to `ConformalModel`." 
    MMI.evaluate(model, data...; cache=true, measure=measure, kw_options...)
end

Q2: Secondly, while evaluation runs smoothly, the output it prints for my custom methods look odd. Below is lifted from the example in the README:

> _eval = evaluate!(mach; measure=[emp_coverage, ssc], verbosity=0)
PerformanceEvaluation object with these fields:
  measure, operation, measurement, per_fold,
  per_observation, fitted_params_per_fold,
  report_per_fold, train_test_rows
Extract:
┌──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
│ measure                                                                                                                                                                                         ⋯
├──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
│    \e[38;2;155;179;224m╭──── \e[38;2;227;172;141mFunction: \e[1m\e[38;5;12memp_coverage\e[22m\e[39m\e[39m\e[38;2;155;179;224m\e[38;2;155;179;224m ──────────────────────────────────────────╮\e ⋯
│    \e[0m\e[38;2;155;179;224m│\e[39m\e[0m                                                                      \e[0m\e[38;2;155;179;224m│\e[39m\e[0m                                             ⋯
│    \e[0m\e[38;2;155;179;224m│\e[39m\e[0m  \e[1m\e[2m(1) \e[22m\e[22m \e[1m\e[38;2;165;198;217memp_coverage\e[22m\e[39m\e[38;2;255;245;157m(\e[39mŷ, y\e[38;2;255;245;157m)\e[39m                ⋯
│    \e[0m\e[38;2;155;179;224m│\e[39m\e[0m                                                                      \e[0m\e[38;2;155;179;224m│\e[39m\e[0m                                             ⋯
│    \e[38;2;155;179;224m╰───────────────────────────────────────────────────────── \e[1m\e[37m1\e[22m\e[39m method\e[38;2;155;179;224m ───╯\e[39m\e[0m\e[39m\e[0m                                ⋯
│ \e[2m\e[32m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Docstring\e[0m \e[2m\e[32m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\e[0m\e[22m\e[39m\e[22m\e[39m\e[0m                                                        ⋯
│        \e[2m\e[37m\e[48;2;38;38;38m┌─────────────────────────────────────────────────────────┐\e[22m\e[39m\e[49m                                                                                ⋯
│        \e[0m\e[2m\e[37m\e[48;2;38;38;38m│\e[22m\e[39m\e[49m\e[0m\e[48;2;38;38;38m  \e[49m\e[0m\e[48;2;38;38;38m\e[38;2;232;212;114memp_coverage\e[39m\e[38;2;227;136;100m(\e[39m\e[38;2;222;222 ⋯
│        \e[2m\e[37m\e[48;2;38;38;38m└─────────────────────────────────────────────────────────┘\e[22m\e[39m\e[49m\e[0m                                                                           ⋯
│                                                                                                                                                                                                 ⋯
│    Computes the empirical coverage for conformal predictions \e[3m\e[38;2;255;245;157m`\e[23m\e[39m\e[0m\e[38;2;222;222;222mŷ\e[39m\e[3m\e[38;2;255;245;157m`\e[23m\e[39m.\e[0m                 ⋯
│                                                                                                                                                                                                 ⋯
│    \e[38;2;155;179;224m╭──── \e[38;2;227;172;141mFunction: \e[1m\e[38;5;12msize_stratified_coverage\e[22m\e[39m\e[39m\e[38;2;155;179;224m\e[38;2;155;179;224m ──────────────────────────────╮\e ⋯
│    \e[0m\e[38;2;155;179;224m│\e[39m\e[0m                                                                      \e[0m\e[38;2;155;179;224m│\e[39m\e[0m                                             ⋯
│    \e[0m\e[38;2;155;179;224m│\e[39m\e[0m  \e[1m\e[2m(1) \e[22m\e[22m \e[1m\e[38;2;165;198;217msize_stratified_coverage\e[22m\e[39m\e[38;2;255;245;157m(\e[39mŷ, y\e[38;2;255;245;157m)\e[39m    ⋯
│    \e[0m\e[38;2;155;179;224m│\e[39m\e[0m                                                                      \e[0m\e[38;2;155;179;224m│\e[39m\e[0m                                             ⋯
│    \e[38;2;155;179;224m╰───────────────────────────────────────────────────────── \e[1m\e[37m1\e[22m\e[39m method\e[38;2;155;179;224m ───╯\e[39m\e[0m\e[39m\e[0m                                ⋯
│                                                                                                                                                                                                 ⋱
└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

When I access the fields of _eval, the produced measurements all check out, but the report looks strange. Any idea what's happening here?

ablaom · 2022-12-04T22:03:38Z

@pat-alt Great to hear about your progress!

Q1: Firstly, should I extend MMI.evaluate to assert that users only use one of the two applicable custom measures?

Generally the kind of target proxy the measure is used for is articulated with the prediction_type trait. (Measures have traits, just like models. The manual mentions this, but you'll want to look also here if you're contributing new measures.) So, you would do something like:

StatisticalTraits.prediction_type(::Type{<:YourMeasureType}) = :probablisitic_set

edited: The model version of this trait is already suitably overloaded here:

https://github.com/JuliaAI/MLJModelInterface.jl/blob/d9e9703947fc04b0a5e63680289e41d0ba0d65bd/src/model_traits.jl#L27

The evaluate apparatus in MLJBase should check the model matches the measure and throw an error if it doesn't. Possibly, as this is a new target proxy type, the behaviour at MLJBase may need to be adjusted. The relevant logic lives approximately here:

https://github.com/JuliaAI/MLJBase.jl/blob/d79f29b78c5068377e25363884e2ea1c4b4a149a/src/resampling.jl#L600

Q2:

Do you always see this rubbish, or just for your custom measure? Where are you viewing this? Is it in an ordinary terminal or VSCode, notebook, other? Could you please try MLJ.color_off() and see if that helps?

pat-alt · 2022-12-06T08:34:35Z

Thanks! I'll implement the trait with the goal to contribute once sorted.

As for how this is displayed, I'm working in the VSCode REPL (with Term.jl) and only get this issue for my custom measures. MLJ.color_off() hasn't helped I'm afraid. Perhaps it has to do with the fact that I haven't actually yet properly implemented the measures as outlined in the manual you linked. I'll have a go at that in #44

ablaom · 2022-12-06T21:04:49Z

Mmm. Not sure about the display issue. I doubt it's anything you are doing wrong. I don't have problem in an emacs term REPL:

julia> evaluate!(mach; measure=[emp_coverage, ssc], verbosity=0)
PerformanceEvaluation object with these fields:
  measure, operation, measurement, per_fold,
  per_observation, fitted_params_per_fold,
  report_per_fold, train_test_rows
Extract:
┌───────────────────────────────────────────────────────────┬───────────┬───────
│ measure                                                   │ operation │ meas ⋯
├───────────────────────────────────────────────────────────┼───────────┼───────
│ emp_coverage (generic function with 1 method)             │ predict   │ 0.95 ⋯
│ size_stratified_coverage (generic function with 1 method) │ predict   │ 0.75 ⋯
└───────────────────────────────────────────────────────────┴───────────┴───────

this was easier than expected

11e5c2e

pat-alt linked an issue Dec 2, 2022 that may be closed by this pull request

Evaluation: add support for evaluation of CP and interface to MLJ #30

Closed

pat-alt merged commit 2b2433e into main Dec 2, 2022

pat-alt mentioned this pull request Dec 6, 2022

Add traits for custom measures (and later contribute for general use) #44

Open

pat-alt deleted the 30-evaluation-add-support-for-evaluation-of-cp-and-interface-to-mlj branch December 6, 2022 08:34

ablaom mentioned this pull request Dec 6, 2022

Problems with display of PerformanceEvaluation object when using custom measures (functions) JuliaAI/MLJBase.jl#874

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

this was easier than expected #40

this was easier than expected #40

pat-alt commented Dec 2, 2022

pat-alt commented Dec 2, 2022

codecov-commenter commented Dec 2, 2022

pat-alt commented Dec 2, 2022

ablaom commented Dec 4, 2022 •

edited

Loading

pat-alt commented Dec 6, 2022

ablaom commented Dec 6, 2022

this was easier than expected #40

this was easier than expected #40

Conversation

pat-alt commented Dec 2, 2022

pat-alt commented Dec 2, 2022

codecov-commenter commented Dec 2, 2022

Codecov Report

pat-alt commented Dec 2, 2022

ablaom commented Dec 4, 2022 • edited Loading

pat-alt commented Dec 6, 2022

ablaom commented Dec 6, 2022

ablaom commented Dec 4, 2022 •

edited

Loading