Skip to content
This repository was archived by the owner on Aug 2, 2023. It is now read-only.

Add support for window operations on columns#2827

Closed
pgovind wants to merge 10 commits intodotnet:masterfrom
pgovind:rolling
Closed

Add support for window operations on columns#2827
pgovind wants to merge 10 commits intodotnet:masterfrom
pgovind:rolling

Conversation

@pgovind
Copy link
Copy Markdown

@pgovind pgovind commented Feb 4, 2020

This patch also adds support for Rolling operations on columns.

Fixes 1 part of https://github.com/dotnet/corefxlab/issues/2815. We should add Expanding APIs too, but that can come in another PR.

Comment thread src/Microsoft.Data.Analysis/DataFrameColumnCollection.cs Outdated
Comment thread src/Microsoft.Data.Analysis/DataFrameWindow.Rolling.PrimitiveDataFrameColumn.cs Outdated
Comment thread src/Microsoft.Data.Analysis/DataFrameWindow.Rolling.PrimitiveDataFrameColumn.cs Outdated
Comment thread src/Microsoft.Data.Analysis/DataFrameWindow.Rolling.PrimitiveDataFrameColumn.cs Outdated
Comment thread src/Microsoft.Data.Analysis/PrimitiveColumnContainer.cs Outdated
Comment thread src/Microsoft.Data.Analysis/StringDataFrameColumn.cs Outdated
Comment thread src/Microsoft.Data.Analysis/DataFrameColumn.cs Outdated
Comment thread tests/Microsoft.Data.Analysis.Tests/DataFrameTests.cs Outdated
Comment thread src/Microsoft.Data.Analysis/DataFrameColumnCollection.cs Outdated
else if (typeof(T) == typeof(decimal))
{
throw new NotImplementedException();
return (IDoubleConverter<T>)new DecimalDoubleConverter();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a test for this change?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added one now. We didn't have 1 before because there was no way to reach this code.

@eerhardt
Copy link
Copy Markdown
Member

eerhardt commented Feb 7, 2020

using System;

copyright on all code files


Refers to: src/Microsoft.Data.Analysis/DataFrameWindow.Rolling.PrimitiveDataFrameColumn.cs:1 in 83b9e30. [](commit_id = 83b9e30, deletion_comment = False)

Comment thread src/Microsoft.Data.Analysis/StringDataFrameColumn.cs
Comment thread src/Microsoft.Data.Analysis/StringDataFrameColumn.cs
Comment thread src/Microsoft.Data.Analysis/PrimitiveDataFrameColumn.cs
Comment thread tests/Microsoft.Data.Analysis.Tests/DataFrameTests.cs Outdated
Comment thread src/Microsoft.Data.Analysis/DataFrameColumnWindow.cs Outdated
Comment thread src/Microsoft.Data.Analysis/ArrowStringDataFrameColumnRollingWindow.cs Outdated
@eerhardt
Copy link
Copy Markdown
Member

using System;

copyright on all code files


Refers to: src/Microsoft.Data.Analysis/ArrowStringDataFrameColumnRollingWindow.cs:1 in f3ab53b. [](commit_id = f3ab53b, deletion_comment = False)

Comment thread src/Microsoft.Data.Analysis/DataFrameColumnCollection.cs Outdated
Comment thread src/Microsoft.Data.Analysis/PrimitiveDataFrameColumnRollingWindow.cs Outdated
Comment thread tests/Microsoft.Data.Analysis.Tests/DataFrameTests.cs Outdated
TResult? value = func(isValid ? span[i] : default(T?));
resultSpan[i] = value.GetValueOrDefault();
SetValidityBit(resultNullBitMapSpan, i, value != null);
resultContainer.SetValidityBit(resultNullBitMapSpan, i, value != null);
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a bug. Fixed it and added a unit test for it now

return $"{Name}: {_columnContainer.ToString()}";
}

public new PrimitiveDataFrameColumnRollingWindow<T> Rolling(int windowSize)
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if inherit doc works here

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It won't work on new members. Only override and interface implementations.


In reply to: 379294117 [](ancestors = 379294117)

buffer.Add(default);
}
}
_nullCount = length;
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another subtle bug that I found and added unit tests for

@pgovind pgovind requested a review from eerhardt February 20, 2020 17:43
public class ArrowStringDataFrameColumnRollingWindow : DataFrameColumnWindow
{
private int _windowSize;
private ArrowStringDataFrameColumn _currentColumn;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nit) private readonly ArrowStringDataFrameColumn _currentColumn;

}
}

private void RollingColumnWindowVerifyElementwiseEquals<T>(PrimitiveDataFrameColumn<T> verify, PrimitiveDataFrameColumn<T> values)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't really "RollingColumnWindow" specific, is it? Can't this just be "VerifyElementwiseEquals(PrimitiveDataFrameColumn expected, PrimitiveDataFrameColumn actual)`?

public PrimitiveDataFrameColumn<T> GetPrimitiveColumn<T>(string name)
where T : unmanaged
{
int columnIndex = IndexOf(name);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should have an indexer by column name, so we don't need to duplicate this logic every time:

int columnIndex = IndexOf(name);
            if (columnIndex == -1)
            {
                throw new ArgumentException(Strings.InvalidColumnName, nameof(name));
            }
            DataFrameColumn column = this[columnIndex];

ReadOnlyDataFrameBuffer<T> buffer = Buffers[b];
long prevLength = checked(Buffers[0].Length * b);
DataFrameBuffer<T> mutableBuffer = DataFrameBuffer<T>.GetMutableBuffer(buffer);
Buffers[b] = mutableBuffer;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we setting anything here? Or making a mutableBuffer on the current column container? This shouldn't be modifying the current container at all - it should just be modifying the resultContainer.

where T : unmanaged
{
private int _windowSize;
private PrimitiveDataFrameColumn<T> _currentColumn;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nit) readonly here too.

public partial class PrimitiveDataFrameColumn<T> : DataFrameColumn
where T : unmanaged
{
internal PrimitiveDataFrameColumn<U> ApplyRollingFunc<U>(Func<LinkedList<T?>, long, U?> func, int windowSize)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not put this in the file that contains the rest of the class? It is a bit confusing when a class is spread all over the place - and with other classes in the same file.

{
list.RemoveFirst();
}
list.AddLast(isValid ? span[i] : default(T?));
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure a LinkedList is the best option here. Every time you call AddLast it will allocate a new node:

https://source.dot.net/#System.Collections/System/Collections/Generic/LinkedList.cs,146

Instead, maybe we could have a class that holds a ColumnContainer, a long index and an int length - and it implements IReadOnlyList<T>. We give it to the func as an IReadOnlyList. This would mean we only need to allocate 1 object for the whole Apply, and we just update the index and length as needed. When the user tries to get the value - we look it up on the column container.

@pgovind
Copy link
Copy Markdown
Author

pgovind commented Mar 4, 2020

Not sure if this will make it into 0.3.0. Closing this for now. We can re-open if we have time

@pgovind pgovind closed this Mar 4, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants