Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

class-aware C coercion function, no-alloc coerce where possible #3913

Closed
jangorecki opened this issue Sep 26, 2019 · 2 comments · Fixed by #4491
Closed

class-aware C coercion function, no-alloc coerce where possible #3913

jangorecki opened this issue Sep 26, 2019 · 2 comments · Fixed by #4491
Assignees

Comments

@jangorecki
Copy link
Member

jangorecki commented Sep 26, 2019

We should have a function for coercing an object to another type and class, in a efficient way, and class aware (for double type, different classes uses different NA representation so we need to handle that).
It is important because coercion is a very common operation, especially num-int or int-num for user convenience so they don't have to type L suffix, but there are tons of other applications.

Initial work on that can be found in #3765 (generic coerceClass)

No-alloc coerce written for C assign function #3909 is good foundation, that potentially could be extracted out to an standalone function.

@mattdowle
Copy link
Member

mattdowle commented Jan 10, 2021

If this is just about integer64, then I get that. The title could be about integer64, and memrecycle could be improved to handle no-alloc zero-copy integer64 coercion (from a comment by Michael, it sounds like it doesn't right now) as it already does for int-num and num-int. But general class coercion, to me, refers to as.<class>; i.e. the R level method which would involve calling eval(). Is that what you mean by class-aware?

for double type, different classes uses different NA representation so we need to handle that

you've used plural here, but it's just one class that does that: integer64. Enhancing memrecycle to do no-alloc integer64 coercion would be much easier for me to understand and merge.

it is important because coercion is a very common operation, especially num-int or int-num for user convenience so they don't have to type L suffix, but there are tons of other applications.

This line could maybe be removed now, since "No-alloc coerce written for C assign function #3909 is good foundation, that potentially could be extracted out to an standalone function." is at the end. Or could you provide one example from the tons of applications (that isn't integer64) to help me understand.

Extracting the no-alloc coercion parts of memrecycle into a separate internal function makes sense to me. That would be much easier to merge a PR that did that. However, have you seen the internals of memrecycle with the big macro I did? The idea was that the no-alloc coerce was only possible for the very reason it is part of memrecycle. To take the no-alloc part of it out would mean returning the coerced type which means allocating the coerced type. IIUC.

@jangorecki
Copy link
Member Author

jangorecki commented Jan 10, 2021

If this is just about integer64, then I get that.

Yes it is about integer64 (or nanotime). Eventually other classes if we will decide to handle them here.

general class coercion, to me, refers to as.; i.e. the R level method which would involve calling eval(). Is that what you mean by class-aware?

Yes, we don't want to go via R's eval for coercion. That was already solved by coerceFill before but it was not very efficient because it was always coercing input to int, double and int64. Then nafill (which called coerceFill) was using only the type it needed. There is a need to use this kind of coercion in multiple place so I filled this issue.

memrecycle could be improved to handle no-alloc zero-copy integer64 coercion (from a comment by Michael, it sounds like it doesn't right now) as it already does for int-num and num-int.

Unless I am missing something, looking at the following lines it seems that it does handle zero-copy assignment: https://github.com/Rdatatable/data.table/pull/4491/files#diff-22b103646a1efab9bbfc374791ccfc3fd1422eefc48918a3e126fc2f30d1f572R857

Extracting the no-alloc coercion parts of memrecycle into a separate internal function makes sense to me.

To me no-alloc coercion is a secondary feature. The more important is a helper to deal with int64 <-> num / int. Separate internal function makes perfect sense, and that was proposed in #3765, but it ended up duplicating a lot of functionality that was in memrecycle, where it was written much better. So this new PR re-used memrecycle instead. coerceAs allocates anyway so it is not designed to be called from parallel region, so no-alloc is not really crucial for coerceAs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants