Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AppendToSlice - Issue #43 #46

Merged
merged 3 commits into from
Dec 17, 2016
Merged

Conversation

cleitonmarx
Copy link
Collaborator

Closes #43

@ahmetb
Copy link
Owner

ahmetb commented Dec 5, 2016

We had #45 to fix this already. I am not sure we should be replacing underlying slices.

@kalaninja
Copy link
Collaborator

I suppose that replacing slices may result in an unexpected behaviour.
And look at the signature of the method. It expects an existing slice, so it seems really strange to me that someone needs to allocate a slice to call this method and get a new slice.

@cleitonmarx
Copy link
Collaborator Author

cleitonmarx commented Dec 5, 2016

@ahmetalpbalkan As a ".NET LINQ-like" library, I think that the behavior of ToSlice should be the same as the original LINQ ToList. In LINQ ToList implementation the return is always a new instance.
A good use case is ordering a slice using linq (example found here):

using C#:

options = options.OrderBy(o => o.Position).ToList();

using Go:

From(options).OrderBy(func(o interface{}) interface{}{
  return o.(Option).Position
}).ToSlice(&options)

Currently, the code in Go will not have the same result of the C# code. My Pull Request tries to fix that.

@kalaninja
Copy link
Collaborator

@cleitonmarx No arguing, you are absolutely right about the .net implementation. But, Go is a different story. Due to strong typing and generics ToList() in C# can allocate memory efficiently. The thing we cannot achieve in Go.
So, the thing that I offer is to rename current method ToSlice to AppendToSlice, and introduce a new ToSlice that works with fully prepared slice. So, if we call it like this:

result := make([]Option, 10)
From(options).OrderBy(func(o interface{}) interface{}{
  return o.(Option).Position
}).ToSlice(&result)

ToSlice will iterate over the query and put each item to a corresponding index in the slice, so the first item goes to result[0], second to result[1], etc.

@ahmetalpbalkan What do you think?

@cleitonmarx cleitonmarx force-pushed the AppendToSlice branch 4 times, most recently from 265e252 to 33db641 Compare December 5, 2016 09:56
@aisk
Copy link

aisk commented Dec 5, 2016

ToSlice will iterate over the query and put each item to a corresponding index in the slice, so the first item goes to result[0], second to result[1], etc.

Hi I have a question, if we have a Where call before ToSlice, the final result count will be less than the origin. And I think this behavior is surprised to most people.

Could we just create a new slice with make function call and using the result's actually count as make's second parameter so I think it's efficiently enough to most situation.

@kalaninja
Copy link
Collaborator

@aisk The idea is that if you know what result to expect, you allocate the exact amount of memory that you need and call ToSlice()

result := make([]int, 3)
From([]int{1,2,3,4,5}).Where(func(i interface{}) bool {
			return i.(int) >= 3
		}).ToSlice(&result)

if you don't, you append

result := []bool{}
From(data).Where(func(i interface{}) bool {
			return i.(bool)
		}).AppendToSlice(&result)

For @cleitonmarx example with sorting

From(options).OrderBy(func(o interface{}) interface{}{
  return o.(Option).Position
}).ToSlice(&options)

will make an in-place ordering without allocating a new slice inside ToSlice

@aisk
Copy link

aisk commented Dec 5, 2016

@kalaninja Ok I know. But I think this method name ToSlice is misunderstanding to some people(like me). I think ToSlice is a generic and common name that will do all the magic for me like what I'm expecting, like allocate memory fast or replace current slice's elements and adjust it to excepted size.

So I think if we should keep the two "unexpected" method (they have the behavior that is not so strait to most people, because the performance case), I think we should use an also "unexpected" method name, like ReplaceToSlice or RelaceInSlice or ReplaceElementInSlice and AppenedToSlice, and leave the ToSlice method name until go have some way to resolve the performance case.

@ahmetb
Copy link
Owner

ahmetb commented Dec 5, 2016

I think we should keep ToSlice() the same (that keeps appending) with some heavy documentation that it doesn't replace the slice. There might be people using it since pre-v1.0.

the AppendToSlice sounds a bit long to me and doesn't follow the ToXxx pattern, how about:

  • ToSlice(&v): appends
  • ToNewSlice(): []interface{} returns a new slice

@cleitonmarx
Copy link
Collaborator Author

cleitonmarx commented Dec 5, 2016

@kalaninja, I think that will be hard, in some scenarios, know the size of the resulting slice in ToSlice() method. One question, if append a slice is so bad for memory allocation, why are we using the same pattern for ToSlice() currently?
Another implementation proposal could be something like this:

func (q Query) ToSlice(result interface{}) {
	res := reflect.ValueOf(result)
	typeResult := res.Type().Elem()
	count := q.Count()
	slice := reflect.MakeSlice(typeResult, count, count)
	count = 0

	next := q.Iterate()
	for item, ok := next(); ok; item, ok = next() {
		slice.Index(count).Set(reflect.ValueOf(item))
		count++
	}

	res.Elem().Set(slice)
}

But in this case, I think is not a good idea Iterate over the Query twice using Count and q.Iterate().

@ahmetalpbalkan, I think this pattern ToNewSlice(): []interface{} is an antipattern, because I will always need to convert the result to a typed slice. I think the pattern used in V2 with ToSlice() solved this existent issue (Results()[]interface{}) and we need to keep using the same pattern.

@cleitonmarx
Copy link
Collaborator Author

@kalaninja Actually, calling MakeSlice() using the Query.Count() improved performance and memory allocation, check it out:

BenchmarkToSlice_UsingMakeSlice-4         	      20	  96989273 ns/op	24003928 B/op	 2000010 allocs/op
BenchmarkToSlice_UsingAppend-4            	      10	 155321097 ns/op	81418236 B/op	 2000047 allocs/op
BenchmarkAppendToSlice_UsingMakeSlice-4   	      20	  95919171 ns/op	24003841 B/op	 2000010 allocs/op

@kalaninja
Copy link
Collaborator

kalaninja commented Dec 5, 2016

@ahmetalpbalkan I like your point about naming pattern. Using conventions makes you code more understandable and easier to use.

I think that will be hard, in some scenarios, know the size of the resulting slice in ToSlice() method

@cleitonmarx what I offer is that you use ToSlice only if you know the size, and if you don't you allocate an empty slice and use AppendToSlice (or whatever its name might be)

One question, if append a slice is so bad for memory allocation, why are we using the same pattern for ToSlice() currently?

It's my fault, I didn't think about it while writing the code.

I think is not a good idea Iterate over the Query twice using Count and q.Iterate().

Absolutely agree. Iterating a query two times is something we cannot afford since no iterator is guaranteed to be idempotent.

calling MakeSlice() using the Query.Count() improved performance and memory allocation

I assume that you have a very simple query. So that the second iteration is less consuming than slice appending. Add some sorting and the situation will change.

@cleitonmarx
Copy link
Collaborator Author

I assume that you have a very simple query. So that the second iteration is less consuming than slice appending. Add some sorting and the situation will change.

Yeah, You are totally right. I could see your point putting more complexity on the benchmark.

@ahmetb
Copy link
Owner

ahmetb commented Dec 5, 2016

The runtime performance assumption in ToSlice() is basically, if I have input of, say, 100,000 elements and I know that my query will match to ~2,000 results, I can allocate a slice with cap=2000 and pass that to ToSlice(). If I get more results, the append will double the capacity and copy N=cap elements over to the new slice. However if you give it a small slice (e.g. cap=1) and have N=2000 results, it'll double (like 2, 4, 8, 16, ...) and will take lg(N,2) steps to reach to cap=2048.

@kalaninja
Copy link
Collaborator

@ahmetalpbalkan That is actually a good idea. I can reimplement ToSlice with this logic. So that you allocate a slice of 2000 elements. Call ToSlice and the method uses this memory, appends items if initial capacity is not enough and returns subslice if initial capacity is excessive. How about this?

@ahmetb
Copy link
Owner

ahmetb commented Dec 7, 2016

@kalaninja isn't that already what happens today, if I give you a slice of len=0 cap=100 (make([]T,0,100)) ?

@cleitonmarx
Copy link
Collaborator Author

@ahmetalpbalkan @kalaninja I pushed the optimization described by both of you using the lg(N,2) algorithm in memory allocation. Please review it.

@@ -409,11 +409,13 @@ func TestToMapBy(t *testing.T) {

func TestToSlice(t *testing.T) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we have tests for edge cases:

  1. output is nil slice.
  2. cap(out)>len(result): we get the same slice, just resized.
  3. cap(out)==len(result): we get the same slice, just resized.
  4. cap(out)<len(result): we get a new slice with len(out)=len(result) and cap(out)==??
  5. len(out)>0 and appending/growing doesn't lose the original elements in out.

Copy link
Collaborator Author

@cleitonmarx cleitonmarx Dec 13, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

len(out)>0 and appending/growing doesn't lose the original elements in out.

@ahmetalpbalkan the behavior in this scenario will be: copy the result reusing the len(out) and appending if len(result) > len(out). In this case output slice will be overwritten(and appended if necessary) with the result elements. If len(out) > len(result) the elements of result will be copied to the output and resized to len(result). What do you think?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm the godoc for ToSlice doesn't really tell me that it starts overwriting the given slice from index=0. Is that the behavior? Perhaps we should rewrite ToSlice godoc?

//output is nil slice
{[]int{1, 2, 3, 4, 5, 6, 7, 8, 9, 10}, []int{}, []int{1, 2, 3, 4, 5, 6, 7, 8, 9, 10}, 16},
//cap(out)>len(result): we get the same slice, just resized.
{[]int{1, 2, 3, 4, 5, 6, 7, 8, 9, 10}, make([]int, 0, 11), []int{1, 2, 3, 4, 5, 6, 7, 8, 9, 10}, 11},
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cleitonmarx you're not testing the we get the same slice part. :) that'd be a pointer comparison.

@@ -381,17 +381,42 @@ func (q Query) ToMapBy(
res.Elem().Set(m)
}

// ToSlice iterates over a collection and populates the result slice with elements.
// ToSlice iterates over a collection, copy to result slice the collection elements,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's fix this godoc. It's confusing to me, as it says:

  1. "iterates over a collection" ––do we mean "slice"? what are some other possible collections here?
  2. "copy to result slice the collection elements" ––sounds like a grammatical mistake here?
  3. "appends items if initial capacity is not enough" ––probably should sound more like: "If it has sufficient capacity, the destination is resliced to accommodate the new elements. If it does not, a new underlying array will be allocated." [source]
  4. we should still indicate that the slice is resliced to reflect the size of the result.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"iterates over a collection" ––do we mean "slice"? what are some other possible collections here?

@ahmetalpbalkan I don't know, I'm just following the current documentation terminology like here , here and here

"copy to result slice the collection elements" ––sounds like a grammatical mistake here?

Sorry man, English is not my first language. Will be amazing have more help with documentation.

Copy link
Owner

@ahmetb ahmetb Dec 14, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cleitonmarx no problem! None of us are native English speakers here. :) If you could check the box for "Allow edits from maintainers" on the right menu, I can help you with this patch.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cleitonmarx I now addreesed this. Please take a look at the new ToSlice() doc.
cc: @kalaninja as well.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great for me @ahmetalpbalkan 👍 , thanks for helping me with documentation.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lol forgot to push the commit here. see e354c24.

@ahmetb
Copy link
Owner

ahmetb commented Dec 13, 2016

BTW I'll edit this PR to change the base branch to v3-dev as well. Since it's a behavior change, we should bring this in as part of v3.

if cap(test.output) != test.wantedOutputCap {
t.Errorf("cap(output)=%d expected %d", cap(test.output), test.wantedOutputCap)
}
isNewSlice := initialOutputValue.Pointer() != modifiedOutputValue.Pointer()
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can just do this to get uintptr: (*reflect.SliceHeader)(unsafe.Pointer(&v)).Data

@ahmetb
Copy link
Owner

ahmetb commented Dec 14, 2016

@cleitonmarx just pushed 75d6687 reorganizing the test a little bit. While I'm at it, I updated some test case comments and added new test cases. I simplified the pointer equality check as well. PTAL

@cleitonmarx
Copy link
Collaborator Author

@ahmetalpbalkan I liked the new ToSlice tests scenarios. Looking forward to merging this changes and start to upgrade go-linq v3 in my application. 🎉

@ahmetb ahmetb changed the base branch from master to v3-dev December 15, 2016 00:13
@ahmetb
Copy link
Owner

ahmetb commented Dec 15, 2016

Changed base branch to v3-dev and rebased.

for item, ok := next(); ok; item, ok = next() {
slice = reflect.Append(slice, reflect.ValueOf(item))
if index >= slice.Len() {
slice = grow(slice, 1)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cleitonmarx Sorry it's taking too many iterations but I think I found a critical perf issue:

I know that we don't know the result side in advance, but it looks like this code is not super optimal.

  1. slice.Len() (which contains a type switch == slow) here
  2. grow(slice,1)

every time we add an element (N times). I think this algorithm is inefficient. How about the following one:

  • we read the cap of slice s, and reslice the s to its full capacity (cap(s)). so now len(s)==cap(s).
  • store this len:=s.Len() into a variable.
  • we go into the for loop and to add elements
    • if index >= len, we grow the slice by a factor of 2 (meaning, double the slice)
      • if we do this, then reslice the s to the new s.Cap() and update len = s.Cap() as well
    • else
      • just set the element at index
  • before returning, reslice s to 0:index

This way at the end, we'll still keep doubling the cap oft he given slice --however, we're not increasing len one at a time. in this algorithm, it will be always len(s) == cap(s) before the last step which resizes. By doing that:

  • you actually call grow() only a few times (logarithmic)
  • no more N calls to s.Len() as you store len/cap in variables as you grow
  • Perhaps grow() can also return the new capacity, this way you don't actually ever need to call .Cap() in the algorithm above.

Can we implement ToSlice() like this?

That said, I'd ❤️ to see how this current algorithm vs the one I proposed above behaves in a benchmark.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, we can do this in another follow-up PR. Let me know if you want to keep working on the current PR to implement the proposed algorithm, or if you want this to just get merged and you can open a PR later.

Copy link
Collaborator Author

@cleitonmarx cleitonmarx Dec 15, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ahmetalpbalkan Let's do this!
This is my first attempt:

type reslicer struct {
	slice reflect.Value
	cap   int
	len   int
}

func (r *reslicer) Set(index int, value reflect.Value) {
	if index >= r.len {
		r.grow()
	}
	r.slice.Index(index).Set(value)
}

func (r *reslicer) grow() {
	if r.cap == 0 {
		r.cap = 1
	} else {
		r.cap *= 2
	}
	newSlice := reflect.MakeSlice(r.slice.Type(), r.cap, r.cap)
	reflect.Copy(newSlice, r.slice)
	r.len = r.cap
	r.slice = newSlice
}

func (r *reslicer) Reslice(size int) reflect.Value {
	return r.slice.Slice(0, size)
}

func newReslicer(slice reflect.Value) *reslicer {
	cap := slice.Cap()
	len := slice.Len()
	if cap > len {
		slice = slice.Slice(0, cap)
		len = cap
	}
	return &reslicer{
		slice: slice,
		cap:   cap,
		len:   len,
	}
}

func (q Query) ToSliceNew(v interface{}) {
	res := reflect.ValueOf(v)
	slice := newReslicer(reflect.Indirect(res))
	index := 0
	next := q.Iterate()
	for item, ok := next(); ok; item, ok = next() {
		slice.Set(index, reflect.ValueOf(item))
		index++
	}

	res.Elem().Set(slice.Reslice(index))
}

I used this code to benchmark:

x := make([]int, 0, 0)
Range(1, 1000).ToSlice(&x)

Scenario1: x := make([]int, 0, 0)

BenchmarkToSlice-4      	   10000	    140259 ns/op	   56520 B/op	    2017 allocs/op
BenchmarkToSliceNew-4   	   20000	     72395 ns/op	   24920 B/op	    1029 allocs/op

Scenario2: x := make([]int, 0, 1000)

BenchmarkToSlice-4      	   10000	    137851 ns/op	   48336 B/op	    2007 allocs/op
BenchmarkToSliceNew-4   	   20000	     67148 ns/op	   16416 B/op	    1009 allocs/op

Scenario3: x := make([]int, 1000, 1000)

BenchmarkToSlice-4      	   20000	     67890 ns/op	   16336 B/op	    1007 allocs/op
BenchmarkToSliceNew-4   	   20000	     67034 ns/op	   16384 B/op	    1008 allocs/op

Apparently @ahmetalpbalkan, this new algorithm improved the memory allocation time. What do you think about the implementation?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cleitonmarx that's great! although we can probably do this without a reslicer type? I used to like when all code was in ToSlice and grow, now it's too many extra methods and types. Do you mind making it look like the old implementation (just 2 methods) and keep track of len/cap in just local variables? We don't need an extra struct to store those.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, It makes sense to me. I will try to send a commit with this changes tonight.

Copy link
Owner

@ahmetb ahmetb Dec 16, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cleitonmarx I just pushed the changes I suggested above (1d3f5b0). (you can easily tell I'm enjoying free time between jobs 🤣) It passes the tests, let me know if it looks good to you.

This code is simpler to understand and performant as you also verified.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ahmetalpbalkan Looks fine to me. I think we are ready to merge this PR.
@kalaninja It will be nice have your thoughts about this change. :)

@ahmetb
Copy link
Owner

ahmetb commented Dec 17, 2016

I'll simplify the commit history and merge.

cleitonmarx and others added 3 commits December 17, 2016 12:44
Prior to this, ToSlice(&v) method used to just appends to the
underlying slice. With this patch, we now overwrite the underlying
slice, starting from index 0.

Fixes ahmetb#46.
Signed-off-by: Ahmet Alp Balkan <ahmetalpbalkan@gmail.com>
Signed-off-by: Ahmet Alp Balkan <ahmetalpbalkan@gmail.com>
@ahmetb ahmetb merged commit 397245e into ahmetb:v3-dev Dec 17, 2016
@cleitonmarx cleitonmarx deleted the AppendToSlice branch December 18, 2016 00:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants