Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

this was easier than expected #40

Conversation

pat-alt
Copy link
Member

@pat-alt pat-alt commented Dec 2, 2022

Just had to defined performance measures and then tapping into MLJ evaluate

@pat-alt pat-alt linked an issue Dec 2, 2022 that may be closed by this pull request
@pat-alt
Copy link
Member Author

pat-alt commented Dec 2, 2022

cc @ablaom I've had a go at implementing evaluation metrics for conformal predictions. This was fairly straight-forward thanks to MLJ's existing infrastructure. I essentially only had to add custom performance measures and this seems to be working.

I have two questions though that you might be able to help me with 🙏🏽

Q1: Firstly, should I extend MMI.evaluate to assert that users only use one of the two applicable custom measures? Something like this:

function MMI.evaluate(model, data...; cache=true, kw_options...)
    @assert measure in available_measures "Performance measure not applicable to `ConformalModel`." 
    MMI.evaluate(model, data...; cache=true, measure=measure, kw_options...)
end

Q2: Secondly, while evaluation runs smoothly, the output it prints for my custom methods look odd. Below is lifted from the example in the README:

> _eval = evaluate!(mach; measure=[emp_coverage, ssc], verbosity=0)
PerformanceEvaluation object with these fields:
  measure, operation, measurement, per_fold,
  per_observation, fitted_params_per_fold,
  report_per_fold, train_test_rows
Extract:
┌──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
│ measure                                                                                                                                                                                         
├──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
│    \e[38;2;155;179;224m╭──── \e[38;2;227;172;141mFunction: \e[1m\e[38;5;12memp_coverage\e[22m\e[39m\e[39m\e[38;2;155;179;224m\e[38;2;155;179;224m ──────────────────────────────────────────╮\e \e[0m\e[38;2;155;179;224m│\e[39m\e[0m                                                                      \e[0m\e[38;2;155;179;224m│\e[39m\e[0m                                             \e[0m\e[38;2;155;179;224m│\e[39m\e[0m  \e[1m\e[2m(1) \e[22m\e[22m \e[1m\e[38;2;165;198;217memp_coverage\e[22m\e[39m\e[38;2;255;245;157m(\e[39mŷ, y\e[38;2;255;245;157m)\e[39m                \e[0m\e[38;2;155;179;224m│\e[39m\e[0m                                                                      \e[0m\e[38;2;155;179;224m│\e[39m\e[0m                                             \e[38;2;155;179;224m╰───────────────────────────────────────────────────────── \e[1m\e[37m1\e[22m\e[39m method\e[38;2;155;179;224m ───╯\e[39m\e[0m\e[39m\e[0m                                \e[2m\e[32m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Docstring\e[0m \e[2m\e[32m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\e[0m\e[22m\e[39m\e[22m\e[39m\e[0m                                                        \e[2m\e[37m\e[48;2;38;38;38m┌─────────────────────────────────────────────────────────┐\e[22m\e[39m\e[49m                                                                                \e[0m\e[2m\e[37m\e[48;2;38;38;38m│\e[22m\e[39m\e[49m\e[0m\e[48;2;38;38;38m  \e[49m\e[0m\e[48;2;38;38;38m\e[38;2;232;212;114memp_coverage\e[39m\e[38;2;227;136;100m(\e[39m\e[38;2;222;222 \e[2m\e[37m\e[48;2;38;38;38m└─────────────────────────────────────────────────────────┘\e[22m\e[39m\e[49m\e[0m                                                                           
│    Computes the empirical coverage for conformal predictions \e[3m\e[38;2;255;245;157m`\e[23m\e[39m\e[0m\e[38;2;222;222;222mŷ\e[39m\e[3m\e[38;2;255;245;157m`\e[23m\e[39m.\e[0m                 \e[38;2;155;179;224m╭──── \e[38;2;227;172;141mFunction: \e[1m\e[38;5;12msize_stratified_coverage\e[22m\e[39m\e[39m\e[38;2;155;179;224m\e[38;2;155;179;224m ──────────────────────────────╮\e \e[0m\e[38;2;155;179;224m│\e[39m\e[0m                                                                      \e[0m\e[38;2;155;179;224m│\e[39m\e[0m                                             \e[0m\e[38;2;155;179;224m│\e[39m\e[0m  \e[1m\e[2m(1) \e[22m\e[22m \e[1m\e[38;2;165;198;217msize_stratified_coverage\e[22m\e[39m\e[38;2;255;245;157m(\e[39mŷ, y\e[38;2;255;245;157m)\e[39m    \e[0m\e[38;2;155;179;224m│\e[39m\e[0m                                                                      \e[0m\e[38;2;155;179;224m│\e[39m\e[0m                                             \e[38;2;155;179;224m╰───────────────────────────────────────────────────────── \e[1m\e[37m1\e[22m\e[39m method\e[38;2;155;179;224m ───╯\e[39m\e[0m\e[39m\e[0m                                
└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

When I access the fields of _eval, the produced measurements all check out, but the report looks strange. Any idea what's happening here?

@codecov-commenter
Copy link

Codecov Report

Merging #40 (11e5c2e) into main (1f101ec) will increase coverage by 0.27%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main      #40      +/-   ##
==========================================
+ Coverage   97.59%   97.86%   +0.27%     
==========================================
  Files           8        9       +1     
  Lines         374      422      +48     
==========================================
+ Hits          365      413      +48     
  Misses          9        9              
Impacted Files Coverage Δ
src/conformal_models/conformal_models.jl 92.30% <ø> (ø)
src/conformal_models/inductive_regression.jl 100.00% <ø> (ø)
src/conformal_models/model_traits.jl 100.00% <ø> (ø)
src/conformal_models/plotting.jl 88.52% <ø> (ø)
src/conformal_models/inductive_classification.jl 98.41% <100.00%> (ø)
...rc/conformal_models/transductive_classification.jl 100.00% <100.00%> (ø)
src/conformal_models/transductive_regression.jl 100.00% <100.00%> (ø)
src/conformal_models/utils.jl 100.00% <100.00%> (ø)
src/evaluation/evaluation.jl 100.00% <100.00%> (ø)

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@pat-alt pat-alt merged commit 2b2433e into main Dec 2, 2022
@pat-alt
Copy link
Member Author

pat-alt commented Dec 2, 2022

Keeping the branch open until below is sorted out.

cc @ablaom I've had a go at implementing evaluation metrics for conformal predictions. This was fairly straight-forward thanks to MLJ's existing infrastructure. I essentially only had to add custom performance measures and this seems to be working.

I have two questions though that you might be able to help me with 🙏🏽

Q1: Firstly, should I extend MMI.evaluate to assert that users only use one of the two applicable custom measures? Something like this:

function MMI.evaluate(model, data...; cache=true, kw_options...)
    @assert measure in available_measures "Performance measure not applicable to `ConformalModel`." 
    MMI.evaluate(model, data...; cache=true, measure=measure, kw_options...)
end

Q2: Secondly, while evaluation runs smoothly, the output it prints for my custom methods look odd. Below is lifted from the example in the README:

> _eval = evaluate!(mach; measure=[emp_coverage, ssc], verbosity=0)
PerformanceEvaluation object with these fields:
  measure, operation, measurement, per_fold,
  per_observation, fitted_params_per_fold,
  report_per_fold, train_test_rows
Extract:
┌──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
│ measure                                                                                                                                                                                         
├──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
│    \e[38;2;155;179;224m╭──── \e[38;2;227;172;141mFunction: \e[1m\e[38;5;12memp_coverage\e[22m\e[39m\e[39m\e[38;2;155;179;224m\e[38;2;155;179;224m ──────────────────────────────────────────╮\e \e[0m\e[38;2;155;179;224m│\e[39m\e[0m                                                                      \e[0m\e[38;2;155;179;224m│\e[39m\e[0m                                             \e[0m\e[38;2;155;179;224m│\e[39m\e[0m  \e[1m\e[2m(1) \e[22m\e[22m \e[1m\e[38;2;165;198;217memp_coverage\e[22m\e[39m\e[38;2;255;245;157m(\e[39mŷ, y\e[38;2;255;245;157m)\e[39m                \e[0m\e[38;2;155;179;224m│\e[39m\e[0m                                                                      \e[0m\e[38;2;155;179;224m│\e[39m\e[0m                                             \e[38;2;155;179;224m╰───────────────────────────────────────────────────────── \e[1m\e[37m1\e[22m\e[39m method\e[38;2;155;179;224m ───╯\e[39m\e[0m\e[39m\e[0m                                \e[2m\e[32m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Docstring\e[0m \e[2m\e[32m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\e[0m\e[22m\e[39m\e[22m\e[39m\e[0m                                                        \e[2m\e[37m\e[48;2;38;38;38m┌─────────────────────────────────────────────────────────┐\e[22m\e[39m\e[49m                                                                                \e[0m\e[2m\e[37m\e[48;2;38;38;38m│\e[22m\e[39m\e[49m\e[0m\e[48;2;38;38;38m  \e[49m\e[0m\e[48;2;38;38;38m\e[38;2;232;212;114memp_coverage\e[39m\e[38;2;227;136;100m(\e[39m\e[38;2;222;222 \e[2m\e[37m\e[48;2;38;38;38m└─────────────────────────────────────────────────────────┘\e[22m\e[39m\e[49m\e[0m                                                                           
│    Computes the empirical coverage for conformal predictions \e[3m\e[38;2;255;245;157m`\e[23m\e[39m\e[0m\e[38;2;222;222;222mŷ\e[39m\e[3m\e[38;2;255;245;157m`\e[23m\e[39m.\e[0m                 \e[38;2;155;179;224m╭──── \e[38;2;227;172;141mFunction: \e[1m\e[38;5;12msize_stratified_coverage\e[22m\e[39m\e[39m\e[38;2;155;179;224m\e[38;2;155;179;224m ──────────────────────────────╮\e \e[0m\e[38;2;155;179;224m│\e[39m\e[0m                                                                      \e[0m\e[38;2;155;179;224m│\e[39m\e[0m                                             \e[0m\e[38;2;155;179;224m│\e[39m\e[0m  \e[1m\e[2m(1) \e[22m\e[22m \e[1m\e[38;2;165;198;217msize_stratified_coverage\e[22m\e[39m\e[38;2;255;245;157m(\e[39mŷ, y\e[38;2;255;245;157m)\e[39m    \e[0m\e[38;2;155;179;224m│\e[39m\e[0m                                                                      \e[0m\e[38;2;155;179;224m│\e[39m\e[0m                                             \e[38;2;155;179;224m╰───────────────────────────────────────────────────────── \e[1m\e[37m1\e[22m\e[39m method\e[38;2;155;179;224m ───╯\e[39m\e[0m\e[39m\e[0m                                
└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

When I access the fields of _eval, the produced measurements all check out, but the report looks strange. Any idea what's happening here?

@ablaom
Copy link

ablaom commented Dec 4, 2022

@pat-alt Great to hear about your progress!

Q1: Firstly, should I extend MMI.evaluate to assert that users only use one of the two applicable custom measures?

Generally the kind of target proxy the measure is used for is articulated with the prediction_type trait. (Measures have traits, just like models. The manual mentions this, but you'll want to look also here if you're contributing new measures.) So, you would do something like:

StatisticalTraits.prediction_type(::Type{<:YourMeasureType}) = :probablisitic_set

edited: The model version of this trait is already suitably overloaded here:

https://github.com/JuliaAI/MLJModelInterface.jl/blob/d9e9703947fc04b0a5e63680289e41d0ba0d65bd/src/model_traits.jl#L27

The evaluate apparatus in MLJBase should check the model matches the measure and throw an error if it doesn't. Possibly, as this is a new target proxy type, the behaviour at MLJBase may need to be adjusted. The relevant logic lives approximately here:

https://github.com/JuliaAI/MLJBase.jl/blob/d79f29b78c5068377e25363884e2ea1c4b4a149a/src/resampling.jl#L600

Q2:

Do you always see this rubbish, or just for your custom measure? Where are you viewing this? Is it in an ordinary terminal or VSCode, notebook, other? Could you please try MLJ.color_off() and see if that helps?

@pat-alt
Copy link
Member Author

pat-alt commented Dec 6, 2022

Thanks! I'll implement the trait with the goal to contribute once sorted.

As for how this is displayed, I'm working in the VSCode REPL (with Term.jl) and only get this issue for my custom measures. MLJ.color_off() hasn't helped I'm afraid. Perhaps it has to do with the fact that I haven't actually yet properly implemented the measures as outlined in the manual you linked. I'll have a go at that in #44

@pat-alt pat-alt deleted the 30-evaluation-add-support-for-evaluation-of-cp-and-interface-to-mlj branch December 6, 2022 08:34
@ablaom
Copy link

ablaom commented Dec 6, 2022

Mmm. Not sure about the display issue. I doubt it's anything you are doing wrong. I don't have problem in an emacs term REPL:

julia> evaluate!(mach; measure=[emp_coverage, ssc], verbosity=0)
PerformanceEvaluation object with these fields:
  measure, operation, measurement, per_fold,
  per_observation, fitted_params_per_fold,
  report_per_fold, train_test_rows
Extract:
┌───────────────────────────────────────────────────────────┬───────────┬───────
│ measure                                                   │ operation │ meas 
├───────────────────────────────────────────────────────────┼───────────┼───────
│ emp_coverage (generic function with 1 method)             │ predict   │ 0.95 
│ size_stratified_coverage (generic function with 1 method) │ predict   │ 0.75 
└───────────────────────────────────────────────────────────┴───────────┴───────

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Evaluation: add support for evaluation of CP and interface to MLJ
3 participants