Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: Integrate count and sum feat sq into the interaction generation routine #2987

Merged
merged 25 commits into from May 12, 2021

Conversation

jackgerrits
Copy link
Member

@jackgerrits jackgerrits commented May 7, 2021

This change updates the way in which features are counted and sum of features squared is calculated. For some multi_ex reductions, ccb and slates in particular, the number of features reported to the logger is incorrect. They were incorrect before this change and this doesn't fix it. The way in which features are counted needs to be updated for the multi_ex abstraction.

@jackgerrits jackgerrits marked this pull request as draft May 7, 2021 16:33
@jackgerrits jackgerrits marked this pull request as ready for review May 7, 2021 17:31
ec.num_features -=
ec.feature_space[static_cast<int>(i[0])].size() * ec.feature_space[static_cast<int>(i[1])].size();
ec.num_features += ec.feature_space[static_cast<int>(i[0])].size() * d.rank;
ec.num_features += ec.feature_space[static_cast<int>(i[1])].size() * d.rank;
ec.num_features_from_interactions +=
ec.feature_space[static_cast<size_t>(i[0])].size() * ec.feature_space[static_cast<size_t>(i[1])].size();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is calculated on line 107 too

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, gd_mf does it's interaction manually using quadratics only and does some weird stuff with feature counting. It's not clear to me why it subtracts that value. I can calc once though and reuse the value though.


finished run
number of examples = 4
weighted example sum = 4.000000
weighted label sum = 0.000000
average loss = 0.000000
total feature number = 32
total feature number = 48
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the explanation behind these 2 counters, for example how does one arrive at 8 or 12 features in this one example?

ccb shared |User b
ccb action |Action d
ccb action |Action e
ccb slot 0:0:0.2 |Slot h

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For multi_ex, most of the time it is wrong at the moment. It is just an approximation. This is because the multi_ex abstraction and output example didn't really solve how to count features. The feature count reporting should happen at the base learner.

For that example: The slot is merged into shared, which is in turn merged into each action when is then sent down the stack. There is additionally the constant feature and the slot feature which are hidden as well, but do contribute to the count.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adding --noconstant to this cmd halves the total features (this case 4) of that line so its probably just counting d, e, b, h

vowpalwabbit/example.h Outdated Show resolved Hide resolved
vowpalwabbit/csoaa.cc Show resolved Hide resolved
total_sum_feat_sq_calculated = false;
}

friend void VW::copy_example_data(example* dst, const example* src);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of making these friends could we add copy methods/setters for the private members?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I toyed with copy methods for example a while ago but there are multiple different versions, with and without labels, metadata only etc. So I opted to just let the existing one have access rather than try and encapsulate it

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I agree on the copy_example makes sense for it to be a friend, and if we keep using the permutation we can think about making it public in future

}
ec.num_features_from_interactions = num_features_from_interactions;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this assignment of num_feature_from_interactions be happening at the spot where they are calculated (since the example is also passed down)? I am worried that we are going to forget that this is needed at some future point when we call foreach or gd::inline_predict

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Valid concern, I thought the same. But I think there are situations where foreach_feature is called and you don't want to set the num_feature_from_interactions. Additionally it prevents examples from being const where they otherwise can be

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you think it would be worth to wrap those calls?

(a) where examples can be const there is a non-const overload that calls the const version and then sets the number of features on the example and
(b) if a foreach call doesn't want to set the number of features we could again overload and just not set the example with the result

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The feature count is a very non-critical operation and it's only used as a measure of work. If it's missed really all that happens is the progressive validation log will show fewer features. There are already quite a few overloads of foreach_feature and this PR is adding more overloads there. I don't think it would be worth it

vowpalwabbit/interactions_predict.h Show resolved Hide resolved
float partial_prediction = 0.f; // shared data for prediction.
float updated_prediction = 0.f; // estimated post-update prediction.
float loss = 0.f;
float total_sum_feat_sq = 0.f; // precomputed, cause it's kind of fast & easy.

float total_sum_feat_sq = 0.f;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment here about how this value is used and what will invalidate it? I see it getting reset a lot in this PR and I don't understand why places that used to update the value now reset it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anything that used to change it now just becomes a cache invalidation

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added comment

@@ -149,6 +150,7 @@ inline void generate_interactions(namespace_interactions& interactions, bool per
auto begin = second.audit_cbegin();
if (same_namespace) { begin += (PROCESS_SELF_INTERACTIONS(ft_value)) ? i : i + 1; }
auto end = second.audit_cend();
num_features += std::distance(begin, end);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like num_features isn't used by the callers all the time. Is it better to compute the distance every time than to have num_features be an optional input?
I'm assuming the distance calls are relatively expensive since the iterators are more complex than raw pointers

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The std::distance call here is constant time (and very cheap) since the iterators are random access iterators. The addition of this extra logic hasn't affected the benchmarks in a meaningful way. Making it optional would require a branch instead of simple arithmetic.

The other option is an overload without this out parameter. But that results in a lot of code duplication with no noticeable difference in perf.

@@ -225,7 +223,7 @@ void predict_or_learn_multi(nn& n, single_learner& base, example& ec)

CONVERSE: // That's right, I'm using goto. So sue me.

n.output_layer.total_sum_feat_sq = 1;
n.output_layer.reset_total_sum_feat_sq();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original code sets the sum_feat_sq to 1 while the function sets it to 0. Is that ok?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, since it will get calculated from scratch when needed and that 0 wont make it out

Copy link
Collaborator

@olgavrou olgavrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM: my understanding is that the calculation of interaction features is still evolving (especially for ccb) and that it isn't always exact right now and that is ok since we are moving towards correct calculations (especially compared to before)

@jackgerrits jackgerrits merged commit b250d85 into VowpalWabbit:master May 12, 2021
@jackgerrits jackgerrits deleted the jagerrit/stats_take2 branch May 12, 2021 15:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants