Skip to content
This repository has been archived by the owner on Aug 2, 2023. It is now read-only.

Add Apply<TResult>method to PrimitiveDataFrameColumn #2807

Merged
merged 4 commits into from Jan 13, 2020

Conversation

zHaytam
Copy link
Contributor

@zHaytam zHaytam commented Dec 19, 2019

For issue #2805.

This PR adds an Apply<TResult> method to PrimitiveDataFrameColumn that takes a Func<T?, TResult?> and returns a new column with the new type.

Example usage (taken from the written unit test):

int[] values = { 1, 2, 3, 4, 5 };
var col = new PrimitiveDataFrameColumn<int>("Ints", values);
PrimitiveDataFrameColumn<double> newCol = col.Apply(i => i + 0.5d);

@eerhardt eerhardt requested a review from pgovind January 2, 2020 20:19
{
int[] values = { 1, 2, 3, 4, 5 };
var col = new PrimitiveDataFrameColumn<int>("Ints", values);
var df = new DataFrame(col);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think df is used. Can it be removed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, definitely.

@@ -487,6 +487,13 @@ public override GroupBy GroupBy(int columnIndex, DataFrame parent)

public void ApplyElementwise(Func<T?, long, T?> func) => _columnContainer.ApplyElementwise(func);

public PrimitiveDataFrameColumn<TResult> Apply<TResult>(Func<T?, TResult?> func) where TResult : unmanaged
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The existing method is named ApplyElementwise, but this is just Apply. Is there a reason for the discrepancy?

/cc @pgovind - thoughts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I find ApplyElementwise wouldve been better as Apply too since the func parameter explains that it applies the function to each element.
Also, what Apply does is also similar to what pandas' apply method does, so that should feel good for developers with a python background.

Copy link
Contributor

@pgovind pgovind Jan 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zHaytam : I agree with @eerhardt that this should be named ApplyElementwise. We wouldn't need to go through API review then.

Also, how do you(and @eerhardt) feel about renaming to ApplyElementwise and merging the 2 overloads so we end up with 1 ApplyElementwise method that looks like this:

public PrimitiveDataFrameColumn<TResult> ApplyElementwise<TResult>(Func<T?, long, TResult?> func, bool inPlace = false) where TResult : unmanaged

2 reasons:

  1. Adding the inPlace parameter makes it more consistent with APIs like Clip, FillNulls etc that already take in an inPlace parameter.
  2. Having just 1 API is cleaner IMO.
    The minor problem I see is a statement such as intColumn.ApplyElementwise((value, index) => return 0.0f), inPlace: true). We would have to throw here because an int column cannot holds floats. Arguably it is ugly that we allow users to call this API and throw immediately? Thoughts?

This will only need minor changes I think. You'd just have to check if inPlace is set and set resultColumn to this or a new PrimitiveDataFrameColumn and everything else should just work.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there are separate discussions here, so I'll try to separate them:

The name Apply or ApplyElementwise.

We don't have Elementwise in every method name that does something element-wise. For example, Clip and FillNulls don't have Elementwise in their name. So does Apply need to? My personal opinion is that it is obvious the method acts on each element, so we should be able to just name it Apply. The reason we added Elementwise to some method names was so Equals and GreaterThan could return a collection of bools instead of a single bool.

Merging this into a single method

I think it makes sense to keep them separate overloads:

public PrimitiveDataFrameColumn<T> Apply(Func<T?, long, T?> func, bool inPlace = false)
public PrimitiveDataFrameColumn<TResult> Apply<TResult>(Func<T?, TResult?> func) where TResult : unmanaged

This solves the issue that if the T's don't match up, you can't do an inPlace operation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, that sounds fine to me. I'll rename ApplyElementwise to Apply the next time I go through this file or @zHaytam, you can do it in this PR too if you want :). I'm approving the PR. Thanks for the work!!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me too! One question though, the current ApplyElementwise doesn't have an inPlace argument, should that be added?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zHaytam, since we're not touching the ApplyElementwise method, maybe not in this PR. I think this is good to go in as it is. We should add the inPlace option later.

@pgovind
Copy link
Contributor

pgovind commented Jan 13, 2020

Squash and merging this. Thank you @zHaytam for the patch!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants