Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gtools version of merge #76

Open
NilsJPWerner opened this issue Jul 26, 2021 · 4 comments
Open

gtools version of merge #76

NilsJPWerner opened this issue Jul 26, 2021 · 4 comments

Comments

@NilsJPWerner
Copy link

NilsJPWerner commented Jul 26, 2021

What would you like gtools to add or change (and why)?
It would be fantastic if gtools had a gmerge command. Ftools seems to have join/fmerge that is a 2x speedup over merge but since it is implemented in mata it can't support mixed types.

Please include a specific suggestion
Add gmerge command that implements the standard merge functionality.

@NilsJPWerner NilsJPWerner changed the title gtools version of gmerge gtools version of merge Jul 26, 2021
@mcaceresb
Copy link
Owner

@NilsJPWerner In theory I'd like to implement this, but in practice I've looked into it a bit and it's very complicated and not at all clear that I'd get a very large speed improvement. I'd like to look into this again in the future but it won't be any time soon. Sorry!

@fpet19
Copy link

fpet19 commented Dec 1, 2021

Unrelated to merge but also a suggestion: it would be great if you could provide a gtools enhancement for carryforward. This is an essential (to me) but often overlooked command, and currently extremely slow. Thanks!

@mcaceresb
Copy link
Owner

@fpet19 I am curious, what is a specific scenario/example where carryforward is very slow? I have not used it but itsn't it a wrapper for replace var = var[_n-1] if mi(var)?

It's surprising this is specially slow. Or is the issue that if you call it with by that you have to sort the data first? Since it's sensitive to sort order I would have assumed sorting might have been an unavoidable operation.

@fpet19
Copy link

fpet19 commented Dec 1, 2021

Yes, that seems to be the case, I always call it with by. I use gegen to create a group variable for a subset of the group, and then I populate it for the whole group using carryforward. The second command is over 5 times slower than the first.

I assume that whatever magic gtools does for gegen which does not require sorting and then resorting should be useful here. In very long datasets just avoiding having to xtset after gegen-related commands is worth it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants