Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature validation profiles closes #94 #658

Merged
merged 9 commits into from
May 12, 2021

Conversation

jvanhoefer
Copy link
Member

Provides basic functionality to compute validation significances, as in Kreutz et al. 2012.

Basically one has to provide a pypesto.Problem encoding the fitting of the full data set as well as pypesto.results for the training and the full data set. The functionality then computes the significance of the validation data set.

For a whole "validation profile" (i.e. a profile of a new data point in the "data space"...) one would need to specify the concept of "data" in more detail. Quite possible to also do this via one more layer around the functionality provided in this PR (i.e. write a loop, that gradually in-/decreases the new measurement value and computes the validation_profile_significance...).

@codecov-commenter
Copy link

codecov-commenter commented May 7, 2021

Codecov Report

Merging #658 (a3560b5) into develop (69e834d) will decrease coverage by 0.10%.
The diff coverage is 71.05%.

❗ Current head a3560b5 differs from pull request most recent head bca4092. Consider uploading reports for the commit bca4092 to get more accurate results
Impacted file tree graph

@@             Coverage Diff             @@
##           develop     #658      +/-   ##
===========================================
- Coverage    89.72%   89.61%   -0.11%     
===========================================
  Files           94       95       +1     
  Lines         6083     6117      +34     
===========================================
+ Hits          5458     5482      +24     
- Misses         625      635      +10     
Impacted Files Coverage Δ
pypesto/problem.py 91.57% <44.44%> (-0.34%) ⬇️
pypesto/profile/validation_intervals.py 78.57% <78.57%> (ø)
pypesto/profile/__init__.py 100.00% <100.00%> (ø)
pypesto/engine/multi_process.py 92.59% <0.00%> (-7.41%) ⬇️
pypesto/objective/amici_util.py 84.21% <0.00%> (-0.88%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 69e834d...bca4092. Read the comment docs.

pypesto/profile/validation_intervals.py Show resolved Hide resolved
lsq_objective: float = False
) -> float:
"""
Computes the significance of an validation experiment as described in
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Computes the significance of an validation experiment as described in
Computes the significance of a validation experiment as described in

) -> float:
"""
Computes the significance of an validation experiment as described in
Kreutz et al. BMC Systems Biology 2012.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

an actual reference would be good to make it easier to find in the future. See https://github.com/ICB-DCM/pyABC/blob/main/pyabc/inference/smc.py#L61 (and maybe use an actual url 🙈 )

pypesto/profile/validation_intervals.py Outdated Show resolved Hide resolved
----------
problem_full_data:
pypesto.problem, such that the objective is the
negative-log-likelihood of the training and validation data set.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what would be if there is a Bayesian prior defined?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering whether it's better to define the "training problem" and the "full problem" or rather the "training problem" and the "validation problem". While I see that it's handy just to call an objective function of the full problem, it seems 1. hard to "subtract" the training problem from the full problem, if at some point it will be necessary to use the validation problem itself... moreover, it might 2. be less intuitive for the user to provide a "training" and a "full" problem rather than a "training" and a "validation" problem...
2nd is just a guess, the actual argument would be 1. What do you think about this?

result_training_data.optimize_result.get_for_key('x')[0]))

if nllh_new > nllh_old:
raise RuntimeError("Fitting of the full data set provided a worse fit "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would raise errors only in the case of programmatic errors, e.g. wrong inputs, unexpected values, ... which should not be possible with correct usage of the tool. as this is "only" a bad situation having nothing to do with the code, but the application, I would logger.warning(...).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to correctly understand this: Are you raising an error if the optimized nllh value is worse than the value for the vector, which was used to initialize the optimization? Can this happen at all? 🤔

"result_full_data or running the fitting from the "
"best parameters found from the training data.")

if lsq_objective:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment on what happens here, i.e. sf = survival function, and how that should be interpreted.

cls.result_all_data = optimize.minimize(cls.problem_all_data,
n_starts=5)

def test_validation_intervals(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

docstring

"""
Computes the significance of an validation experiment as described in
Kreutz et al. BMC Systems Biology 2012.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would appreciate a longer / more informative docstring here what "significance of a validation experiment" is, as e.g. I have no idea and would need to look up the publication to see whether this function is what I'm looking for ...


# fit with refitting inside function
profile.validation_profile_significance(self.problem_all_data,
self.result_training_data)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it somehow possible to test whether the output makes sense? i.e. the returned values are as expected when fitting on full data from the same model, vs they give point to problems when that's not the case?

pypesto/profile/validation_intervals.py Outdated Show resolved Hide resolved
pypesto/profile/validation_intervals.py Outdated Show resolved Hide resolved
"""

import numpy as np
import unittest
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would avoid mixing unittestand pytest to keep things simpler, but not sure what the pypesto policy is there.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haha, my policy would always be to use pytest instead of the class-syntax, which I find hard to read. Would be possible here for sure, but all would be fine for me.

Comment on lines 58 to 61
return (x[0]-d)**2

def grad(x):
return 2 * (x-d)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

x vs x[0]

assert x.size == 1?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm... agreeing: either ensure using np.ndarray, then this thing goes through also for vectors, or enforce x to be a number... Implicitly, this code assumes x to be an iterable object, while it strictly assumes (implicitly) that d is a number... I find that weird.

Copy link
Contributor

@paulstapor paulstapor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just added comments where there weren't any so far, and @yannikschaelte made this job easy... :D Overall, I think that's fine, I'm just wondering because this is somewhat orthogonal to the prediction functionality. This is okay per se, I'm simply wondering how to best combine things if one wants to actually compute prediction profiles from a pyPESTO prediction...

----------
problem_full_data:
pypesto.problem, such that the objective is the
negative-log-likelihood of the training and validation data set.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering whether it's better to define the "training problem" and the "full problem" or rather the "training problem" and the "validation problem". While I see that it's handy just to call an objective function of the full problem, it seems 1. hard to "subtract" the training problem from the full problem, if at some point it will be necessary to use the validation problem itself... moreover, it might 2. be less intuitive for the user to provide a "training" and a "full" problem rather than a "training" and a "validation" problem...
2nd is just a guess, the actual argument would be 1. What do you think about this?

Comment on lines 58 to 61
return (x[0]-d)**2

def grad(x):
return 2 * (x-d)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm... agreeing: either ensure using np.ndarray, then this thing goes through also for vectors, or enforce x to be a number... Implicitly, this code assumes x to be an iterable object, while it strictly assumes (implicitly) that d is a number... I find that weird.

result_training_data.optimize_result.get_for_key('x')[0]))

if nllh_new > nllh_old:
raise RuntimeError("Fitting of the full data set provided a worse fit "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to correctly understand this: Are you raising an error if the optimized nllh value is worse than the value for the vector, which was used to initialize the optimization? Can this happen at all? 🤔

Comment on lines 66 to 79
problem = Problem(
objective=problem_full_data.objective,
lb=problem_full_data.lb,
ub=problem_full_data.ub,
dim_full=problem_full_data.dim_full,
x_fixed_indices=problem_full_data.x_fixed_indices,
x_fixed_vals=problem_full_data.x_fixed_vals,
x_guesses=x_0,
startpoint_method=problem_full_data.startpoint_method,
x_names=problem_full_data.x_names,
x_scales=problem_full_data.x_scales,
x_priors_defs=problem_full_data.x_priors,
lb_init=problem_full_data.lb_init,
ub_init=problem_full_data.ub_init)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't another possible solution just be to yet use deepcopy and then overwrite the x_guesses data member with x0? To me, this feels much more handy (as just as stable as what Yannik suggested) than adding a lot of interface to the problem class...

Copy link
Member

@yannikschaelte yannikschaelte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me

@jvanhoefer jvanhoefer merged commit 6acc530 into develop May 12, 2021
@jvanhoefer jvanhoefer deleted the feature_validation_profiles branch May 12, 2021 17:35
@yannikschaelte yannikschaelte mentioned this pull request May 17, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants