Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added option for weighted mean in ensemble #702

Merged
merged 23 commits into from
Sep 30, 2021

Conversation

PaulJonasJost
Copy link
Collaborator

This for now is only a draft.
It assumes, that the post processor either returns only a np.ndarray or a dict/dict like object.

Perhaps want to add another default post processor, that would only need to receive AMICI_outputs do creat a return dict of things that should be returned (like 'x', 'y', 'llh' etc).

Also need to add sigmas.

Currently the weighting is done in terms of llh of each condition separately. should I go with the llh for one parameter across all conditions?

@codecov-commenter
Copy link

codecov-commenter commented Jul 12, 2021

Codecov Report

Merging #702 (afc6aa6) into develop (160c2a8) will increase coverage by 1.36%.
The diff coverage is 84.85%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop     #702      +/-   ##
===========================================
+ Coverage    88.16%   89.53%   +1.36%     
===========================================
  Files           79       97      +18     
  Lines         5257     6613    +1356     
===========================================
+ Hits          4635     5921    +1286     
- Misses         622      692      +70     
Impacted Files Coverage Δ
pypesto/engine/mpi_pool.py 0.00% <0.00%> (ø)
pypesto/ensemble/covariance_analysis.py 18.36% <0.00%> (-0.39%) ⬇️
pypesto/optimize/__init__.py 100.00% <ø> (ø)
pypesto/profile/util.py 95.74% <ø> (ø)
pypesto/sample/diagnostics.py 100.00% <ø> (+30.76%) ⬆️
pypesto/sample/geweke_test.py 94.36% <ø> (ø)
pypesto/store/hdf5.py 79.16% <0.00%> (-3.45%) ⬇️
pypesto/visualize/clust_color.py 91.25% <42.85%> (ø)
pypesto/predict/task.py 43.75% <43.75%> (ø)
pypesto/petab/__init__.py 63.63% <50.00%> (ø)
... and 72 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c013b9a...afc6aa6. Read the comment docs.

Copy link
Member

@dilpath dilpath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a test would be good

@@ -138,7 +140,8 @@ def condense_to_arrays(self):
}

def compute_summary(self,
percentiles_list: Sequence[int] = (5, 20, 80, 95)
percentiles_list: Sequence[int] = (5, 20, 80, 95),
weighting: bool = True
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
weighting: bool = True
weigh: bool = True

Option can be weigh and values can be weights

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps this should be False by default?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is just a matter of taste I think. As LLH is not included in a predictor by default, this will get turned to False in line 168. The idea behind that for me was that if you include llh in predictor for ensembles the default you probably would want is True.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So in the default setting, weighting==True but no weights are used? Might be confusing for users. Could call it something more descriptive, e.g. weights_if_llh_provided, or make it False by default?

Comment on lines 166 to 167
# check if weightings shall be used or not depending on whether llh
# is part of the prediciton output
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# check if weightings shall be used or not depending on whether llh
# is part of the prediciton output
# Check whether LLH-based weights can be used.

Comment on lines 168 to 169
if AMICI_LLH not in self.prediction_results[0].conditions[0].output:
weighting = False
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A user might explicitly want weights but not realize they're not used because of this. A warning might help?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That warning would then be problematic if the weighting flag is True by default, because the ensemble prediction would issue a warning by default. But default settings should not be chosen such that they issue warnings, I think.

Comment on lines 230 to 232
summary[STANDARD_DEVIATION] = np.average(
(tmp_array.T-summary[MEAN].T).T**2,
axis=-1, weights=weights)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variance?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh yes, there needs to be an additional np.sqrt() 👍

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then equivalent to np.average(np.abs(tmp_array - summary[MEAN]), axis=-1, weights=weights) I guess

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, at least that should not be the case mathematically...

Comment on lines 168 to 169
if AMICI_LLH not in self.prediction_results[0].conditions[0].output:
weighting = False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That warning would then be problematic if the weighting flag is True by default, because the ensemble prediction would issue a warning by default. But default settings should not be chosen such that they issue warnings, I think.

output_list = [prediction.conditions[ic].output
for prediction in self.prediction_results]
else:
output_list = [prediction.conditions[ic].output[AMICI_Y]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just saw below, that it seems you intend to change the type of prediction.conditions[ic].output. However, this is a data member of the class PredictionConditionResult. I'd strongly argue that data members of classes should not switch type, if possible, and if inevitable: Only during major refactorings, as this will typically break backwards compatibility. Rather add a new data member for the weights...

Comment on lines 218 to 219
output_list = [prediction.conditions[ic].output[AMICI_LLH]
for prediction in self.prediction_results]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm... I have the impression that this will be problematic. By default, a PredictionConditionResult has a member "output", which is supposed to be an ndarray. You're making a dict out of it, which might brake quite some dependencies...
I would rather suggest you add a new member "output_weight" to the PredictionConditionResult. This way, you can just use this new member here, and an additional field will not affect the functioning of old script, hence not destroy backwards compatibility.

@PaulJonasJost PaulJonasJost marked this pull request as ready for review September 13, 2021 10:37
Copy link
Member

@jvanhoefer jvanhoefer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since I am not really into that part of the code, I did only review code style (otherwise I would have to get an overview of the rest of the ensemble code first). So please let someone else review regarding functionality/structrue, but from style point of view: Looks fine to me :)

Copy link
Member

@dilpath dilpath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good 👍

Maybe an example in a notebook?

pypesto/ensemble/ensemble.py Outdated Show resolved Hide resolved
pypesto/ensemble/ensemble.py Outdated Show resolved Hide resolved
pypesto/ensemble/ensemble.py Outdated Show resolved Hide resolved
pypesto/ensemble/ensemble.py Outdated Show resolved Hide resolved
pypesto/ensemble/ensemble.py Show resolved Hide resolved
compute_weighted_sigma=True)
except TypeError:
raise ValueError('Computing a summary failed.')
n_conditions = len(self.prediction_results[0].conditions)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can conditions differ between predictions? If so, can simply raise a NotImplementedError if there are different conditions between the prediction results.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

they should not be able to, as the ensembles currently are created from the same amici_model which should generate the prediction the same no matter what the actual parameter values are.

Might be interesting way down the train, when we allow for different model structures to contribute to the same ensemble.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Conditions are provided via amici.ExpData (a single amici.Model can simulate different sets of conditions).

pypesto/ensemble/ensemble.py Outdated Show resolved Hide resolved
Comment on lines +349 to +351
if y_meas.shape != mean_traj.shape:
raise ValueError('Shape of trajectory and shape '
'of measurements does not match.')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this assumes equal number of timepoints -> same timepoints. However, given the pypesto.objective.AmiciObjective.set_custom_timepoints method, maybe this can't be assumed. Could check timepoints explicitly.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure, but shouldn't the whole optimization fail if we set custom timepoints as we could not compute a likelihood anymore? 🤔 correct me if I am wrong.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, often used for predictions rather than methods that require a likelihood.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in that case you wouldn't compute the chi^2 value either I'd say.

pypesto/ensemble/ensemble.py Show resolved Hide resolved
pypesto/ensemble/constants.py Outdated Show resolved Hide resolved
@PaulJonasJost PaulJonasJost merged commit 363c091 into develop Sep 30, 2021
@PaulJonasJost PaulJonasJost deleted the feature_ensemble_weights branch September 30, 2021 10:58
@yannikschaelte yannikschaelte mentioned this pull request Oct 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants