Add Apply<TResult>method to PrimitiveDataFrameColumn #2807
Conversation
{ | ||
int[] values = { 1, 2, 3, 4, 5 }; | ||
var col = new PrimitiveDataFrameColumn<int>("Ints", values); | ||
var df = new DataFrame(col); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think df
is used. Can it be removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, definitely.
@@ -487,6 +487,13 @@ public override GroupBy GroupBy(int columnIndex, DataFrame parent) | |||
|
|||
public void ApplyElementwise(Func<T?, long, T?> func) => _columnContainer.ApplyElementwise(func); | |||
|
|||
public PrimitiveDataFrameColumn<TResult> Apply<TResult>(Func<T?, TResult?> func) where TResult : unmanaged |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The existing method is named ApplyElementwise
, but this is just Apply
. Is there a reason for the discrepancy?
/cc @pgovind - thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally, I find ApplyElementwise
wouldve been better as Apply
too since the func
parameter explains that it applies the function to each element.
Also, what Apply
does is also similar to what pandas' apply
method does, so that should feel good for developers with a python background.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zHaytam : I agree with @eerhardt that this should be named ApplyElementwise
. We wouldn't need to go through API review then.
Also, how do you(and @eerhardt) feel about renaming to ApplyElementwise
and merging the 2 overloads so we end up with 1 ApplyElementwise
method that looks like this:
public PrimitiveDataFrameColumn<TResult> ApplyElementwise<TResult>(Func<T?, long, TResult?> func, bool inPlace = false) where TResult : unmanaged
2 reasons:
- Adding the
inPlace
parameter makes it more consistent with APIs likeClip
,FillNulls
etc that already take in aninPlace
parameter. - Having just 1 API is cleaner IMO.
The minor problem I see is a statement such asintColumn.ApplyElementwise((value, index) => return 0.0f), inPlace: true)
. We would have to throw here because an int column cannot holds floats. Arguably it is ugly that we allow users to call this API and throw immediately? Thoughts?
This will only need minor changes I think. You'd just have to check if inPlace
is set and set resultColumn
to this
or a new PrimitiveDataFrameColumn
and everything else should just work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there are separate discussions here, so I'll try to separate them:
The name Apply
or ApplyElementwise
.
We don't have Elementwise
in every method name that does something element-wise. For example, Clip
and FillNulls
don't have Elementwise
in their name. So does Apply
need to? My personal opinion is that it is obvious the method acts on each element, so we should be able to just name it Apply
. The reason we added Elementwise
to some method names was so Equals
and GreaterThan
could return a collection of bools instead of a single bool.
Merging this into a single method
I think it makes sense to keep them separate overloads:
public PrimitiveDataFrameColumn<T> Apply(Func<T?, long, T?> func, bool inPlace = false)
public PrimitiveDataFrameColumn<TResult> Apply<TResult>(Func<T?, TResult?> func) where TResult : unmanaged
This solves the issue that if the T
's don't match up, you can't do an inPlace
operation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, that sounds fine to me. I'll rename ApplyElementwise
to Apply
the next time I go through this file or @zHaytam, you can do it in this PR too if you want :). I'm approving the PR. Thanks for the work!!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to me too! One question though, the current ApplyElementwise
doesn't have an inPlace
argument, should that be added?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zHaytam, since we're not touching the ApplyElementwise
method, maybe not in this PR. I think this is good to go in as it is. We should add the inPlace
option later.
Squash and merging this. Thank you @zHaytam for the patch! |
For issue #2805.
This PR adds an
Apply<TResult>
method toPrimitiveDataFrameColumn
that takes aFunc<T?, TResult?>
and returns a new column with the new type.Example usage (taken from the written unit test):