You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@Janno and I did a quick implementation of this in the copy of cClosure being used for mTac and it reduced reduction time by about 40% for their test case. Apparently this is because when dealing with heavily universe polymorphic code, by default Coq produces different universes for each constant invocation; therefore, the constant cache has a high miss rate.
What we did doesn't solve the high miss rate problem, but it does make adding universe polymorphic constants to the cache much less costly, by keeping a delayed universe substitution in every fterm and stack_member that currently has a regular delayed substitution (as opposed to the current implementation, which substitutes the universe instance through the whole constant body the first time it's encountered). Actually substituting is deferred until mk_clos is called on something that needs it (i.e. a Const, Ind, Construct, or Sort), and otherwise we just propagate the universe substitution down.
This might lead to a bit of slowdown or increased memory usage in cases where there is no universe polymorphism, but in other cases I suspect it's usually faster, since we avoid substituting into subterms of the constant body that are never converted to fterms.
I can work with @Janno to provide a PR that applies to the real kernel if people are interested.
The text was updated successfully, but these errors were encountered:
Note that there is probably a variant of this that would solve the cache miss problem (by allowing reductions in constant bodies to be shared even if they needed different universe substitutions) but that would be much more complex to implement.
For the record, the category-theory library seems to be passing its time performing universe substitution from within conversion. Fixing this bug would likely make it faster by a large margin.
Do we already have smaller test cases for this problem? I came up with this here but I am not sure if it captures the problem:
Module Poly.
UnsetUniverse Minimization ToSet.
#[universes(polymorphic)] Definition mynat@{U} := 1.
Ltac make_term n :=
match n with
| O => constr:(O)
| S ?n => let r := make_term n inconstr:(mynat + r)
end.
Goal ltac:(let t := make_term 1000 in exact t) = 1000.
exact eq_refl.
TimeQed. (* 25ms *)End Poly.
Module Mono.
#[universes(monomorphic)] Definition mynat@{U} := 1.
Ltac make_term n :=
match n with
| O => constr:(O)
| S ?n => let r := make_term n inconstr:(mynat + r)
end.
Goal ltac:(let t := make_term 1000 in exact t) = 1000.
exact eq_refl.
TimeQed. (* 5ms *)End Mono.
EDIT: bigger numbers (such as 5000) lead to a bigger factor between the polymorphic and monomorphic example. At 1000 the factor seems to be 5x, at 5000 it is already 10x on my machine with Coq 8.13.1.
EDIT 2: looking at the profile I don't think this is a useful example for the problem of universe substitution in conversion. :(
@Janno and I did a quick implementation of this in the copy of cClosure being used for mTac and it reduced reduction time by about 40% for their test case. Apparently this is because when dealing with heavily universe polymorphic code, by default Coq produces different universes for each constant invocation; therefore, the constant cache has a high miss rate.
What we did doesn't solve the high miss rate problem, but it does make adding universe polymorphic constants to the cache much less costly, by keeping a delayed universe substitution in every fterm and stack_member that currently has a regular delayed substitution (as opposed to the current implementation, which substitutes the universe instance through the whole constant body the first time it's encountered). Actually substituting is deferred until mk_clos is called on something that needs it (i.e. a Const, Ind, Construct, or Sort), and otherwise we just propagate the universe substitution down.
This might lead to a bit of slowdown or increased memory usage in cases where there is no universe polymorphism, but in other cases I suspect it's usually faster, since we avoid substituting into subterms of the constant body that are never converted to fterms.
I can work with @Janno to provide a PR that applies to the real kernel if people are interested.
The text was updated successfully, but these errors were encountered: