Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

An ApplyElementWise method that returns a column #2805

Closed
zHaytam opened this issue Dec 17, 2019 · 9 comments
Closed

An ApplyElementWise method that returns a column #2805

zHaytam opened this issue Dec 17, 2019 · 9 comments

Comments

@zHaytam
Copy link
Contributor

@zHaytam zHaytam commented Dec 17, 2019

I'm trying out the Microsoft.Data.Analysis and was looking for a method that applies a Func<,> but returns the result as an array of values (maybe not efficient) or a column, just like in Pandas.

Here's what I'm currently doing:

public static PrimitiveDataFrameColumn<TResult> Apply<T, TResult>(this PrimitiveDataFrameColumn<T> column, 
    Func<T, TResult> func) 
    where T : unmanaged
    where TResult : unmanaged
{
    var resultColumn = new PrimitiveDataFrameColumn<TResult>(string.Empty, 0);

    foreach (var row in column)
        resultColumn.Append(func(row.Value));

    return resultColumn;
}

Example usage:

var birthdayColumn = df["Birthday"] as PrimitiveDataFrameColumn<DateTime>;
var currentYear = DateTime.Now.Year;
df["Age"] = birthdayColumn.Apply(d => currentYear - d.Year);
df.PrettyPrint();

Thank you!

@pgovind

This comment has been minimized.

Copy link
Member

@pgovind pgovind commented Dec 18, 2019

Hey @zHaytam, have you looked at https://github.com/dotnet/corefxlab/blob/master/src/Microsoft.Data.Analysis/PrimitiveDataFrameColumn.cs#L488? The ApplyElementwise API does something very similar to what you want. The only difference is that it works in place. To do what you want, you can do:

df["Age"] = df["Birthday"].Clone().ApplyElementwise((d, index) => return currentYear - d.Year));
@zHaytam

This comment has been minimized.

Copy link
Contributor Author

@zHaytam zHaytam commented Dec 18, 2019

Hello @pgovind, the problem with ApplyElementWise is that the output type is the same as the input. My birthday column is of type DateTime while the age column, I want it to be int :/

@pgovind

This comment has been minimized.

Copy link
Member

@pgovind pgovind commented Dec 18, 2019

Ah, my bad. I didn't see the return type. Yup, you're right. This is on my list of new APIs to add to PrimitiveDataFrameColumn. Would you be interested in putting up a PR perhaps? We could get this API in for the next preview :)

@zHaytam

This comment has been minimized.

Copy link
Contributor Author

@zHaytam zHaytam commented Dec 18, 2019

I would gladly try. I'm pretty sure the example I gave is far from being the best way to do it, I'm assuming it needs to be implemented directly in the Container?

@pgovind

This comment has been minimized.

Copy link
Member

@pgovind pgovind commented Dec 18, 2019

Yup. I would define the API in PrimitiveDataFrameColumn, maybe create the returnColumn in the API and then implement the loop inside PrimitiveDataFrameColumnContainer.

@zHaytam

This comment has been minimized.

Copy link
Contributor Author

@zHaytam zHaytam commented Dec 18, 2019

The same loop using the IEnumerable? I thought I would need to use the buffers in the container directly, which I understand for setting a value (like ApplyElementWise) but not for creating a new column out of it :/

@pgovind

This comment has been minimized.

Copy link
Member

@pgovind pgovind commented Dec 18, 2019

Right, that's why we'd define the resultColumn outside. Something like this ought to work:

// Define this in PrimitiveDataFrameColumn.cs

        public PrimitiveDataFrameColumn<TResult> Apply<TResult>(Func<T?, TResult?> func)
            where TResult : unmanaged
        {
            PrimitiveDataFrameColumn<TResult> resultColumn = new PrimitiveDataFrameColumn<TResult>("Result", Length);
            _columnContainer.Apply(func, resultColumn._columnContainer);
            return resultColumn;
        }

// Define this in PrimitiveColumnContainer.cs
        public void Apply<TResult>(Func<T?, TResult?> func, PrimitiveColumnContainer<TResult> resultContainer)
            where TResult : unmanaged
        {
            for (int b = 0; b < Buffers.Count; b++)
            {
                ReadOnlyDataFrameBuffer<T> buffer = Buffers[b];
                long prevLength = checked(Buffers[0].Length * b);
                DataFrameBuffer<T> mutableBuffer = DataFrameBuffer<T>.GetMutableBuffer(buffer);
                Buffers[b] = mutableBuffer;
                Span<T> span = mutableBuffer.Span;
                DataFrameBuffer<byte> mutableNullBitMapBuffer = DataFrameBuffer<byte>.GetMutableBuffer(NullBitMapBuffers[b]);
                NullBitMapBuffers[b] = mutableNullBitMapBuffer;
                Span<byte> nullBitMapSpan = mutableNullBitMapBuffer.Span;


                // Get Span<TResult> resultSpan = resultColumnMutableBuffer.Span similar to the above lines. We've assumed equals column lengths, so this should be straightforward
                // Similarly, get the resultColumnMutableNullBitMapBuffer
                for (int i = 0; i < span.Length; i++)
                {
                    long curIndex = i + prevLength;
                    bool isValid = IsValid(nullBitMapSpan, i);
                    TResult? value = func(isValid ? span[i] : default(T?));
                    resultSpan[i] = value.GetValueOrDefault();
                    SetValidityBit(resultColumnNullBitMapSpan, i, value != null);
                }
            }
        }

@zHaytam

This comment has been minimized.

Copy link
Contributor Author

@zHaytam zHaytam commented Dec 23, 2019

I have created the PR if you have the time to look at it.

@pgovind

This comment has been minimized.

Copy link
Member

@pgovind pgovind commented Jan 24, 2020

Closed with #2807

@pgovind pgovind closed this Jan 24, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants
You can’t perform that action at this time.