-
Notifications
You must be signed in to change notification settings - Fork 225
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AppendToSlice - Issue #43 #46
Conversation
14fd4b4
to
6795e99
Compare
We had #45 to fix this already. I am not sure we should be replacing underlying slices. |
I suppose that replacing slices may result in an unexpected behaviour. |
@ahmetalpbalkan As a ".NET LINQ-like" library, I think that the behavior of ToSlice should be the same as the original LINQ ToList. In LINQ ToList implementation the return is always a new instance. using C#: options = options.OrderBy(o => o.Position).ToList(); using Go: From(options).OrderBy(func(o interface{}) interface{}{
return o.(Option).Position
}).ToSlice(&options) Currently, the code in Go will not have the same result of the C# code. My Pull Request tries to fix that. |
@cleitonmarx No arguing, you are absolutely right about the .net implementation. But, Go is a different story. Due to strong typing and generics result := make([]Option, 10)
From(options).OrderBy(func(o interface{}) interface{}{
return o.(Option).Position
}).ToSlice(&result)
@ahmetalpbalkan What do you think? |
265e252
to
33db641
Compare
Hi I have a question, if we have a Could we just create a new slice with |
@aisk The idea is that if you know what result to expect, you allocate the exact amount of memory that you need and call ToSlice() result := make([]int, 3)
From([]int{1,2,3,4,5}).Where(func(i interface{}) bool {
return i.(int) >= 3
}).ToSlice(&result) if you don't, you append result := []bool{}
From(data).Where(func(i interface{}) bool {
return i.(bool)
}).AppendToSlice(&result) For @cleitonmarx example with sorting From(options).OrderBy(func(o interface{}) interface{}{
return o.(Option).Position
}).ToSlice(&options) will make an in-place ordering without allocating a new slice inside |
@kalaninja Ok I know. But I think this method name So I think if we should keep the two "unexpected" method (they have the behavior that is not so strait to most people, because the performance case), I think we should use an also "unexpected" method name, like |
I think we should keep the
|
@kalaninja, I think that will be hard, in some scenarios, know the size of the resulting slice in func (q Query) ToSlice(result interface{}) {
res := reflect.ValueOf(result)
typeResult := res.Type().Elem()
count := q.Count()
slice := reflect.MakeSlice(typeResult, count, count)
count = 0
next := q.Iterate()
for item, ok := next(); ok; item, ok = next() {
slice.Index(count).Set(reflect.ValueOf(item))
count++
}
res.Elem().Set(slice)
} But in this case, I think is not a good idea Iterate over the Query twice using Count and q.Iterate(). @ahmetalpbalkan, I think this pattern |
@kalaninja Actually, calling
|
@ahmetalpbalkan I like your point about naming pattern. Using conventions makes you code more understandable and easier to use.
@cleitonmarx what I offer is that you use
It's my fault, I didn't think about it while writing the code.
Absolutely agree. Iterating a query two times is something we cannot afford since no iterator is guaranteed to be idempotent.
I assume that you have a very simple query. So that the second iteration is less consuming than slice appending. Add some sorting and the situation will change. |
Yeah, You are totally right. I could see your point putting more complexity on the benchmark. |
The runtime performance assumption in |
@ahmetalpbalkan That is actually a good idea. I can reimplement |
@kalaninja isn't that already what happens today, if I give you a slice of len=0 cap=100 ( |
@ahmetalpbalkan @kalaninja I pushed the optimization described by both of you using the lg(N,2) algorithm in memory allocation. Please review it. |
@@ -409,11 +409,13 @@ func TestToMapBy(t *testing.T) { | |||
|
|||
func TestToSlice(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we have tests for edge cases:
- output is nil slice.
- cap(out)>len(result): we get the same slice, just resized.
- cap(out)==len(result): we get the same slice, just resized.
- cap(out)<len(result): we get a new slice with len(out)=len(result) and cap(out)==??
- len(out)>0 and appending/growing doesn't lose the original elements in out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
len(out)>0 and appending/growing doesn't lose the original elements in out.
@ahmetalpbalkan the behavior in this scenario will be: copy the result reusing the len(out) and appending if len(result) > len(out). In this case output slice will be overwritten(and appended if necessary) with the result elements. If len(out) > len(result) the elements of result will be copied to the output and resized to len(result). What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm the godoc for ToSlice
doesn't really tell me that it starts overwriting the given slice from index=0. Is that the behavior? Perhaps we should rewrite ToSlice
godoc?
//output is nil slice | ||
{[]int{1, 2, 3, 4, 5, 6, 7, 8, 9, 10}, []int{}, []int{1, 2, 3, 4, 5, 6, 7, 8, 9, 10}, 16}, | ||
//cap(out)>len(result): we get the same slice, just resized. | ||
{[]int{1, 2, 3, 4, 5, 6, 7, 8, 9, 10}, make([]int, 0, 11), []int{1, 2, 3, 4, 5, 6, 7, 8, 9, 10}, 11}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cleitonmarx you're not testing the we get the same slice
part. :) that'd be a pointer comparison.
@@ -381,17 +381,42 @@ func (q Query) ToMapBy( | |||
res.Elem().Set(m) | |||
} | |||
|
|||
// ToSlice iterates over a collection and populates the result slice with elements. | |||
// ToSlice iterates over a collection, copy to result slice the collection elements, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's fix this godoc. It's confusing to me, as it says:
- "iterates over a collection" ––do we mean "slice"? what are some other possible collections here?
- "copy to result slice the collection elements" ––sounds like a grammatical mistake here?
- "appends items if initial capacity is not enough" ––probably should sound more like: "If it has sufficient capacity, the destination is resliced to accommodate the new elements. If it does not, a new underlying array will be allocated." [source]
- we should still indicate that the slice is resliced to reflect the size of the result.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"iterates over a collection" ––do we mean "slice"? what are some other possible collections here?
@ahmetalpbalkan I don't know, I'm just following the current documentation terminology like here , here and here
"copy to result slice the collection elements" ––sounds like a grammatical mistake here?
Sorry man, English is not my first language. Will be amazing have more help with documentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cleitonmarx no problem! None of us are native English speakers here. :) If you could check the box for "Allow edits from maintainers" on the right menu, I can help you with this patch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cleitonmarx I now addreesed this. Please take a look at the new ToSlice()
doc.
cc: @kalaninja as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great for me @ahmetalpbalkan 👍 , thanks for helping me with documentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lol forgot to push the commit here. see e354c24.
BTW I'll edit this PR to change the base branch to |
if cap(test.output) != test.wantedOutputCap { | ||
t.Errorf("cap(output)=%d expected %d", cap(test.output), test.wantedOutputCap) | ||
} | ||
isNewSlice := initialOutputValue.Pointer() != modifiedOutputValue.Pointer() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can just do this to get uintptr: (*reflect.SliceHeader)(unsafe.Pointer(&v)).Data
@cleitonmarx just pushed 75d6687 reorganizing the test a little bit. While I'm at it, I updated some test case comments and added new test cases. I simplified the pointer equality check as well. PTAL |
@ahmetalpbalkan I liked the new ToSlice tests scenarios. Looking forward to merging this changes and start to upgrade go-linq v3 in my application. 🎉 |
Changed base branch to |
e354c24
to
702310f
Compare
for item, ok := next(); ok; item, ok = next() { | ||
slice = reflect.Append(slice, reflect.ValueOf(item)) | ||
if index >= slice.Len() { | ||
slice = grow(slice, 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cleitonmarx Sorry it's taking too many iterations but I think I found a critical perf issue:
I know that we don't know the result side in advance, but it looks like this code is not super optimal.
slice.Len()
(which contains a type switch == slow) heregrow(slice,1)
every time we add an element (N
times). I think this algorithm is inefficient. How about the following one:
- we read the cap of slice
s
, and reslice thes
to its full capacity (cap(s)
). so nowlen(s)==cap(s)
. - store this
len:=s.Len()
into a variable. - we go into the for loop and to add elements
- if
index >= len
, we grow the slice by a factor of 2 (meaning, double the slice)- if we do this, then reslice the
s
to the news.Cap()
and updatelen = s.Cap()
as well
- if we do this, then reslice the
- else
- just set the element at
index
- just set the element at
- if
- before returning, reslice s to
0:index
This way at the end, we'll still keep doubling the cap oft he given slice --however, we're not increasing len one at a time. in this algorithm, it will be always len(s) == cap(s)
before the last step which resizes. By doing that:
- you actually call
grow()
only a few times (logarithmic) - no more
N
calls tos.Len()
as you storelen
/cap
in variables as you grow - Perhaps
grow()
can also return the new capacity, this way you don't actually ever need to call.Cap()
in the algorithm above.
Can we implement ToSlice()
like this?
That said, I'd ❤️ to see how this current algorithm vs the one I proposed above behaves in a benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, we can do this in another follow-up PR. Let me know if you want to keep working on the current PR to implement the proposed algorithm, or if you want this to just get merged and you can open a PR later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ahmetalpbalkan Let's do this!
This is my first attempt:
type reslicer struct {
slice reflect.Value
cap int
len int
}
func (r *reslicer) Set(index int, value reflect.Value) {
if index >= r.len {
r.grow()
}
r.slice.Index(index).Set(value)
}
func (r *reslicer) grow() {
if r.cap == 0 {
r.cap = 1
} else {
r.cap *= 2
}
newSlice := reflect.MakeSlice(r.slice.Type(), r.cap, r.cap)
reflect.Copy(newSlice, r.slice)
r.len = r.cap
r.slice = newSlice
}
func (r *reslicer) Reslice(size int) reflect.Value {
return r.slice.Slice(0, size)
}
func newReslicer(slice reflect.Value) *reslicer {
cap := slice.Cap()
len := slice.Len()
if cap > len {
slice = slice.Slice(0, cap)
len = cap
}
return &reslicer{
slice: slice,
cap: cap,
len: len,
}
}
func (q Query) ToSliceNew(v interface{}) {
res := reflect.ValueOf(v)
slice := newReslicer(reflect.Indirect(res))
index := 0
next := q.Iterate()
for item, ok := next(); ok; item, ok = next() {
slice.Set(index, reflect.ValueOf(item))
index++
}
res.Elem().Set(slice.Reslice(index))
}
I used this code to benchmark:
x := make([]int, 0, 0)
Range(1, 1000).ToSlice(&x)
Scenario1: x := make([]int, 0, 0)
BenchmarkToSlice-4 10000 140259 ns/op 56520 B/op 2017 allocs/op
BenchmarkToSliceNew-4 20000 72395 ns/op 24920 B/op 1029 allocs/op
Scenario2: x := make([]int, 0, 1000)
BenchmarkToSlice-4 10000 137851 ns/op 48336 B/op 2007 allocs/op
BenchmarkToSliceNew-4 20000 67148 ns/op 16416 B/op 1009 allocs/op
Scenario3: x := make([]int, 1000, 1000)
BenchmarkToSlice-4 20000 67890 ns/op 16336 B/op 1007 allocs/op
BenchmarkToSliceNew-4 20000 67034 ns/op 16384 B/op 1008 allocs/op
Apparently @ahmetalpbalkan, this new algorithm improved the memory allocation time. What do you think about the implementation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cleitonmarx that's great! although we can probably do this without a reslicer
type? I used to like when all code was in ToSlice
and grow
, now it's too many extra methods and types. Do you mind making it look like the old implementation (just 2 methods) and keep track of len/cap in just local variables? We don't need an extra struct to store those.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, It makes sense to me. I will try to send a commit with this changes tonight.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cleitonmarx I just pushed the changes I suggested above (1d3f5b0). (you can easily tell I'm enjoying free time between jobs 🤣) It passes the tests, let me know if it looks good to you.
This code is simpler to understand and performant as you also verified.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ahmetalpbalkan Looks fine to me. I think we are ready to merge this PR.
@kalaninja It will be nice have your thoughts about this change. :)
I'll simplify the commit history and merge. |
Prior to this, ToSlice(&v) method used to just appends to the underlying slice. With this patch, we now overwrite the underlying slice, starting from index 0. Fixes ahmetb#46.
Signed-off-by: Ahmet Alp Balkan <ahmetalpbalkan@gmail.com>
Signed-off-by: Ahmet Alp Balkan <ahmetalpbalkan@gmail.com>
1d3f5b0
to
d385348
Compare
Closes #43