Add support for window operations on columns#2827
Add support for window operations on columns#2827pgovind wants to merge 10 commits intodotnet:masterfrom
Conversation
| else if (typeof(T) == typeof(decimal)) | ||
| { | ||
| throw new NotImplementedException(); | ||
| return (IDoubleConverter<T>)new DecimalDoubleConverter(); |
There was a problem hiding this comment.
Is there a test for this change?
There was a problem hiding this comment.
Added one now. We didn't have 1 before because there was no way to reach this code.
| TResult? value = func(isValid ? span[i] : default(T?)); | ||
| resultSpan[i] = value.GetValueOrDefault(); | ||
| SetValidityBit(resultNullBitMapSpan, i, value != null); | ||
| resultContainer.SetValidityBit(resultNullBitMapSpan, i, value != null); |
There was a problem hiding this comment.
This was a bug. Fixed it and added a unit test for it now
| return $"{Name}: {_columnContainer.ToString()}"; | ||
| } | ||
|
|
||
| public new PrimitiveDataFrameColumnRollingWindow<T> Rolling(int windowSize) |
There was a problem hiding this comment.
I wonder if inherit doc works here
There was a problem hiding this comment.
It won't work on new members. Only override and interface implementations.
In reply to: 379294117 [](ancestors = 379294117)
| buffer.Add(default); | ||
| } | ||
| } | ||
| _nullCount = length; |
There was a problem hiding this comment.
Another subtle bug that I found and added unit tests for
| public class ArrowStringDataFrameColumnRollingWindow : DataFrameColumnWindow | ||
| { | ||
| private int _windowSize; | ||
| private ArrowStringDataFrameColumn _currentColumn; |
There was a problem hiding this comment.
(nit) private readonly ArrowStringDataFrameColumn _currentColumn;
| } | ||
| } | ||
|
|
||
| private void RollingColumnWindowVerifyElementwiseEquals<T>(PrimitiveDataFrameColumn<T> verify, PrimitiveDataFrameColumn<T> values) |
There was a problem hiding this comment.
This isn't really "RollingColumnWindow" specific, is it? Can't this just be "VerifyElementwiseEquals(PrimitiveDataFrameColumn expected, PrimitiveDataFrameColumn actual)`?
| public PrimitiveDataFrameColumn<T> GetPrimitiveColumn<T>(string name) | ||
| where T : unmanaged | ||
| { | ||
| int columnIndex = IndexOf(name); |
There was a problem hiding this comment.
We should have an indexer by column name, so we don't need to duplicate this logic every time:
int columnIndex = IndexOf(name);
if (columnIndex == -1)
{
throw new ArgumentException(Strings.InvalidColumnName, nameof(name));
}
DataFrameColumn column = this[columnIndex];| ReadOnlyDataFrameBuffer<T> buffer = Buffers[b]; | ||
| long prevLength = checked(Buffers[0].Length * b); | ||
| DataFrameBuffer<T> mutableBuffer = DataFrameBuffer<T>.GetMutableBuffer(buffer); | ||
| Buffers[b] = mutableBuffer; |
There was a problem hiding this comment.
Why are we setting anything here? Or making a mutableBuffer on the current column container? This shouldn't be modifying the current container at all - it should just be modifying the resultContainer.
| where T : unmanaged | ||
| { | ||
| private int _windowSize; | ||
| private PrimitiveDataFrameColumn<T> _currentColumn; |
| public partial class PrimitiveDataFrameColumn<T> : DataFrameColumn | ||
| where T : unmanaged | ||
| { | ||
| internal PrimitiveDataFrameColumn<U> ApplyRollingFunc<U>(Func<LinkedList<T?>, long, U?> func, int windowSize) |
There was a problem hiding this comment.
Why not put this in the file that contains the rest of the class? It is a bit confusing when a class is spread all over the place - and with other classes in the same file.
| { | ||
| list.RemoveFirst(); | ||
| } | ||
| list.AddLast(isValid ? span[i] : default(T?)); |
There was a problem hiding this comment.
I'm not sure a LinkedList is the best option here. Every time you call AddLast it will allocate a new node:
https://source.dot.net/#System.Collections/System/Collections/Generic/LinkedList.cs,146
Instead, maybe we could have a class that holds a ColumnContainer, a long index and an int length - and it implements IReadOnlyList<T>. We give it to the func as an IReadOnlyList. This would mean we only need to allocate 1 object for the whole Apply, and we just update the index and length as needed. When the user tries to get the value - we look it up on the column container.
|
Not sure if this will make it into 0.3.0. Closing this for now. We can re-open if we have time |
This patch also adds support for
Rollingoperations on columns.Fixes 1 part of https://github.com/dotnet/corefxlab/issues/2815. We should add
ExpandingAPIs too, but that can come in another PR.