Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a static Shuffle method to Array and an instance method to List<T> #13993

Closed
Eirenarch opened this issue Jan 17, 2015 · 86 comments
Closed

Add a static Shuffle method to Array and an instance method to List<T> #13993

Eirenarch opened this issue Jan 17, 2015 · 86 comments
Assignees
Labels
api-needs-work API needs work before it is approved, it is NOT ready for implementation area-System.Collections
Milestone

Comments

@Eirenarch
Copy link

Consider adding a static Shuffle method to Array and an instance method to List

Motivation

Shuffling data is often required in various applications. .NET does not contain a method to shuffle an array which results in many people implementing it themselves. Implementing a Shuffle method is not really hard but it does require like 10 lines of code and people often wonder where to put these so they just prefer quick and dirty solutions like abusing sort methods with random sort key or even worse random comparison. These methods are far from ideal both because they depend on implementation details of the sorting methods and because they are less effective than the ideal O(N) implementation.

Proposed API

Methods on Array:

public static void Shuffle<T>(T[] array)
public static void Shuffle<T>(T[] array, int index, int length)
public static void Shuffle<T>(T[] array, Random rng)
public static void Shuffle<T>(T[] array, int index, int length, Random rng)

Methods on List

public void Shuffle()
public void Shuffle(Random rng)

Details

The Shuffle method on Array has an overload that can be used to shuffle just part of the array. This is in line with existing Array methods like Sort and Copy and can be useful when implementing Shuffle on List where only the elements of the List will be shuffled without shuffling the full capacity of the list. I also suggest adding an instance Shuffle method on List since this is the most common usage of such a method.

@sharwell
Copy link
Member

I'm not a particular fan of this for two reasons:

  1. "Shuffle" is not a well-defined operation.
  2. The proposed methods do not include overloads which allow the user to specify their own "Shuffle" behavior. This differs from other algorithm-type operations which leverage IComparer<T> and IEqualityComparer<T> to provide additional functionality to the user.

I see this as an application-specific operation which is far less commonly used than the other operations on Array and List<T>.

@terrajobst
Copy link
Member

@Eirenarch, can you explain what shuffle is supposed to do? I assume you want to randomly sort the contents of the array, i.e. shuffling it up?

@Eirenarch
Copy link
Author

In my opinion Shuffle is quite well defined as "uniformly randomize the array". There is no comparison in this case but it may make sense to add an overload accepting a Random object.

@Eirenarch
Copy link
Author

More precisely "randomly permute the array"

@AdamSpeight2008
Copy link
Contributor

Easily implementable via Extension method. Eg Exts.Random

@Eirenarch
Copy link
Author

@AdamSpeight2008 it is certainly true but as I said people prefer to simply use OrderBy or Sort than implement it or take a dependency on a package.

@AdamSpeight2008
Copy link
Contributor

@Eirenarch
Eg. shuffle = theArray.OrderBy(Function(x) Guid.NewGuid() ).ToArray
or

private System.Random rnd = new System.Random();

public static T[] Shuffle<T>( this IEnumerable<T> xs ) 
{
  return xs.OrderBy( x => rnd.NextDouble(); );
}

If it that easy to implement and also independent of type. Why should Array provide one?

@pdelvo
Copy link

pdelvo commented Jan 18, 2015

@AdamSpeight2008 This does not work. Have a look at this: http://bost.ocks.org/mike/algorithms/#shuffling

@AdamSpeight2008
Copy link
Contributor

@pdelvo I know that it is biased. That wasn't my point.

@Eirenarch
Copy link
Author

@AdamSpeight2008 no one wraps it in extension methods. People just do it when they need to shuffle. No one goes looking for a good place to put this extension method.

@jakesays-old
Copy link

I think 'no one' is a little too broad. This is definitely something I would add to my application extension class if I needed to shuffle an array. Not something that should be added to the framework.

@Eirenarch
Copy link
Author

I stand corrected. A lot of people do, but as evident by upvotes on various related questions on SO like this one http://stackoverflow.com/questions/108819/best-way-to-randomize-a-string-array-with-net and this one http://stackoverflow.com/questions/273313/randomize-a-listt-in-c-sharp a lot of people are OK with OrderBy based solution. I have used it too and I knew fully well that it is far from ideal. I don't think the fact that it is easy to add such a method precludes it from being added to the standard library. After all String.IsNullOrEmpty is even easier but it is still included.

I know it is not an argument in itself but I'd also like to note many languages choose to include a shuffle function in their standard libraries - C++, Java, Python. Honestly I cannot see a downside.

@AlexArchive
Copy link

Granted, this isn't absolutely pressing but I do think it would be a nice addition to the Api.

Personally I have written code like this on more than one occasion (reference):

Random rnd = new Random();
string[] MyRandomArray = MyArray.OrderBy(x => rnd.Next()).ToArray();

I would much prefer to leverage an Api like the one described here.

@AArnott
Copy link
Contributor

AArnott commented Jan 18, 2015

I'm not sure this will meet the bar of broad applicability sufficient for
inclusion into the core framework. I have written many apps and even more
libraries, and I have never had the need for such a method as this.
Specific types of apps such as media I could see it be quite popular. But I
don't think that's sufficient to put it into the core library.

On Sun, Jan 18, 2015, 2:39 AM ByteBlast notifications@github.com wrote:

This isn't absolutely pressing but I do think it would be a nice
addition to the Api.


Reply to this email directly or view it on GitHub
https://github.com/dotnet/corefx/issues/461#issuecomment-70403314.

@AlexArchive
Copy link

@AArnott Sincerely, what is the downside?

@davkean
Copy link
Member

davkean commented Jan 18, 2015

Even with media applications, this wouldn't be applicable. If you look at what modern media players do when they shuffle, is they weight the shuffle by number of plays, so that content that you play more often, is more likely to play again.

@Eirenarch
Copy link
Author

Media apps are not a good example. I would assume that they do not shuffle an array and play the songs in the array but rather choose a new song at random from the list every time. More likely applications include various games (like card games) and cases where something should be displayed differently every time like answers to questions in a test or the Windows Browser Choice screen. I do think there are a good number of applications.

@AArnott
Copy link
Contributor

AArnott commented Jan 18, 2015

All the more reason then that it shouldn't go into the core framework. When Shuffle is defined in different ways to different apps, it further reduces the applicability of the method.
What is the downside? The downside is that it's another method, and methods aren't free. Every API has a cost and must pay for itself. An API can usually pay for itself both because it's very useful and because it is broadly applicable (it multiplies its usefulness when you have both). But when you have just one of those dimensions (no matter how useful, if it isn't applicable to a great number of apps or libraries) it's not nearly as useful as other APIs you find in corefx and thus often shouldn't be added.

And there is a fully supported story for these useful but less applicable methods: other libraries.

@jakesays-old
Copy link

@Eirenarch I would argue that string.IsNullOrEmpty() is probably one of the most useful and used extensions around. Not sure it helps your argument much. And @AArnott is correct - every API does indeed have a cost, no matter how simple.

@Eirenarch
Copy link
Author

Well, I do believe that the cost (at least for the static versions) is small enough and the method is indeed useful enough (not String.IsNullOrEmpty useful but still useful) that it is worth the cost.

@colombod
Copy link
Member

Code will always have impact in load time and workspace size. To shuffle is a very personal choice when it comes to the definition of "random position" so whatever is offered will very likely never fulfil the need of the user. I don't see much relevance is a shuffle function on collections.

Fat classes with unused methods are just small devices and speed unfriendly.

If you had a class library with cool random generators that I could argue that such a library could also offer utilities to shuffle collections according to statistic distributions.

@terrajobst
Copy link
Member

I fully agree with the sentiment that @AArnott stated: an API starts with negative points.

How negative depends on how broadly used and complex the API already is. Adding another API to a WPF class is less of an issue than adding another API to a very common and broadly used type like Array. In particular, we need to be careful that we don't turn core types (such as Array and String) into a grab bag of potentially useful APIs. There are all sorts of considerations from code size, to complexity as to the exposed concept count in IntelliSense. On that note, it's already considered a problem that all Linq extensions show up on String.

As a rule of thumb, the more utilitarian an API is the more usage is required to justify it's existence (see this related API review discussion). A shuffle method seems to be quite specialized with at best one in a thousand apps using it. Truthfully, I don't think that makes the bar for Array.

What do others think? @KrzysztofCwalina @stephentoub?

@stephentoub
Copy link
Member

I've had a need here and there for such functionality in recent memory (e.g. in a Sudoku generator determining the order in which cells should attempt to be removed from a solved board, in a photomosaic app determining the order in which regions of tiles should be greedily matched, etc.), and the few times I've needed it, I've just written the five or six lines of code to accomplish it (half of which was a swap).

I don't think it's something that should be added to Array or List; it's too specialized and policy-specific for such core, broadly-applicable, and largely policy-free types. I'm not against having such functionality somewhere in the libraries, but not as part of those types. If, for example, there were a numerical library with a focus on randomization, probabilities, statistics, etc., it could very well make sense there (though if all I needed from the library was this one function, I'd probably still copy-and-paste the code in or re-write the five or six lines of code, rather than take a dependency on another library).

If we wanted to do something to help this case, I'd prefer to see us add a super simple Swap method, which would save half of the lines necessary for this, bringing it down to something like three lines, and also helping to eliminate boilerplate in lots of other code.

@Eirenarch
Copy link
Author

The arguments against adding the method in Array do make sense, maybe Array is not the right place to put these methods. Is there any other place that is still in Core where such method can go. Some place like the Enumerable class maybe?

@AdamSpeight2008
Copy link
Contributor

@Eirenarch Wrapping them as extension methods make them simple to use with LINQ

 // Pick 4 differently positioned items out of the list.
Enumerable.Range(1,100).RandomPick(4).ToArray;
 // Choice 4 items out of the list, same positioned item allowed..
Enumerable.Range(1,100).RandomTake(4).ToArray();  

SwapWith<T>

LeftHand.Contents.SwapWith( RightHand.Contents );

Also should there be two methods, one that mutates the source, and one that doesn't?

Source Mutating Sub Shuffled(Of T)(ByRef source As ICollection(Of T))
Source Non-Mutating ICollection Shuffled<T>(ICollection source)

var deck = new DeckOfCards()
deck.Shuffled();

scrabblebag = scrabbleBag.Shuffle();
tileRack = tileRack.Shuffle();

How would you shuffle an Immutable Collection?

@Eirenarch
Copy link
Author

RandomTake is actually a cool idea. Of course if the method is put in Enumerable then it will not mutate the collection. On the other hand I was not literally suggesting Enumerable. I was thinking of a similar class that could be used. For example Java has a static Collections class ( http://docs.oracle.com/javase/7/docs/api/java/util/Collections.html ) that contains this method. There is no equivalent in .NET but maybe some kind of utility class can accommodate this method.

@AdamSpeight2008
Copy link
Contributor

I already have see nuget Exts.Random. To get access to it all you need is.
using Exts.Random;

@Eirenarch
Copy link
Author

I understand. My point is that people would just go for the quick and dirty solution OrderBy(random) rather than get a NuGet package or writing a method. Empirical evidence points that this is what happens in practice although theoretically people should either implement it or get it from somewhere.

@AdamSpeight2008
Copy link
Contributor

You'll generally only have to write it once, in a library and import it when you need it.

@colombod
Copy link
Member

I suggest to have that method as extension instead of static and have also support of random generators or other facilities to drive the shuffling. This way it could also be used in scenarios that requires specific behaviours like weighted shuffling like has been pointed out before. What do you think?

On 19 Jan 2015, at 22:20, Adam Speight notifications@github.com wrote:

You'll generally only have to write it once, in a library and import it when you need it.


Reply to this email directly or view it on GitHub.

@leppie
Copy link
Contributor

leppie commented Mar 6, 2015

@nvivo: Given your usage examples, it should be provided in ASP.NET, not a framework library. Also, your examples just want random results, which you can do by ordering a list by random numbers (which is a side-effect of shuffling).

The fact that other languages has it, is also not a reason.

For something to be added to the BCL it either needs to be a core functionality used within the framework (more than twice) or be used frequently by users of the framework. Shuffle is neither.

I still dont see why this NEEDS to be in the BCL when it can be perfectly provided by a library.

@Eirenarch
Copy link
Author

@nvivo OK but where does an extension method to IList sit?

@nvivo
Copy link

nvivo commented Mar 6, 2015

@Eirenarch, that's the implementation part we need to discuss. =) I gave an idea here

@Eirenarch
Copy link
Author

@leppie it has been said and demonstrated (via Stack Overflow questions) that if people do not have this particular method ready in the standard library they do not implement it but do OrderBy random guid which is wrong. Of course the method can be part of ASP.NET but why create the same problem for WPF or WinRT devs?

@nvivo in my opinion this class would need more reasons to exist than the shuffle method. Otherwise I agree that it would be a fine place to put it.

@nvivo
Copy link

nvivo commented Mar 6, 2015

@leppie, your reasoning is funny. Shuffling should be part of ASP.NET framework, but not .NET framework. Because lists are only useful in ASP.NET...? Because all types of applications are done in ASP.NET only...? Because this is such a huge implementation that would cause a headache to most people doing other things... ?

I think this guy put it perfectly

The next version of the framework will almost certainly include at least 100k lines of code you will never use. Why this small algorithm that is already included everywhere bothers you so much?

There are a dozen tickets open to do things with reflection that I would never, ever, ever use, but I'm not there saying no one should implement them.

@nvivo
Copy link

nvivo commented Mar 6, 2015

@Eirenarch, I agree. It shouldn't be only for this. But there are other things that could be put there, as binary search, an stable sort (the only stable sort available today is on IEnumerable with Linq).

I know some work is being done to better support array segments in a way they would be interchangeable with arrays. Currenty, ArraySegment doesn't implement anything. Creating extensions like this would give support to every operation in these types.

I do some stuff with DNS caching where it would be useful to sort/shuffle only part of an array in place, and that would be handy.

Of course, this goes beyond this issue, but it's just a discussion. I don't think its the only way either.

@Eirenarch
Copy link
Author

If framework designers can justify the existence of this class then it is certainly the right place to put this method. Arrays implement IList if I recall correctly so the method can be used on arrays as well as lists.

@KrzysztofCwalina
Copy link
Member

Since it's not clear what the API should be like, what the RNG algorithm should be used, etc., why don't people who feel strongly about this create a community project, prototype some APIs, and try to make them successful. If they become successful being an out-of-band library, then the problem is solved. If the library won't be successful and you will be able to argue that the reason is that it's not directly in CoreFx, we will reconsider add some (or all of these APIs) to CoreFx.

@KrzysztofCwalina
Copy link
Member

I can imagine such library to have not only various shuffle extension methods, but also various random number generator algorithms. I know people have asked about these in the past.

@KrzysztofCwalina
Copy link
Member

Also, if you prefer, I would be more than happy for you to do this library in corefxlab repo. Such library of RNGs and helpers is core enough to be interesting for corefxlab.

@cdrnet
Copy link

cdrnet commented Mar 6, 2015

Regarding RNGs, there are a couple existing .Net pseudo-RNG projects/libraries out there, e.g. Torschuetz and Math.NET - but none of them currently offering shuffling routines.

@nvivo
Copy link

nvivo commented Mar 6, 2015

@KrzysztofCwalina, Most people here (except you) in general agree how the API should look like.

It's a general purpose method, no other API is .NET is as detailed as you are saying. For example, you call Enumerable.OrderBy and don't pass which sort algorithm it should use. Shuffling is no different.

BCL stuff should be generic enough that they are useful to most people, not be the most complete implementation ever. That is the job of specific libraries.

If you wanna create a library, I think its a good idea, go for it. But it won't affect my desire to have in the BCL. Anyone can implement this today, the issue was opened because it was not included in the BCL yet.

I'm sure you don't use 100% of the framework. So, you can just add this method it to the list of the 90% you never touch already.

@KrzysztofCwalina
Copy link
Member

Nobody uses 100% of the framework, but many people use close to 100% of low level primitives (like ints, arrays, etc.). The bar for adding APIs to these low level primitives is super high.

@nvivo
Copy link

nvivo commented Mar 6, 2015

That's why I'm suggesing not adding it to the core types, but as extensions methods somewhere else. We already have a ton of those for Enumerables, I don't see why that would be unacceptable for shuffling.

@KrzysztofCwalina
Copy link
Member

If it's an extension somewhere else, then why do you want it in CoreFx repo? It seems like it can be an out of band library (at least initially). If you publish it on nugget, the marginal value of having it in CoreFx seems to be small. And to repeat, if your library is very successful, many people use it, and you still want to add it to CoreFx for some reason, we can discuss the reason and possibly add it. This is how our own team (CoreFx) works. We try to do as much as possible out of band, make the APIs pass the test of time, and only then we consider a tighter integration. It seems like a good model that lets libraries evolve and mature over time (without compatibility constrains of the CoreFx). Can you articulate reasons why you think it needs to be in CoreFx? If there are good reasons, I would like to hear them and possibly fix them.

@ghost
Copy link

ghost commented Mar 6, 2015

The fact that other languages has it, is also not a reason

Since all use-case examples above are rendered dubious and non-pertinent, what other motivation factors you suggest which qualify as a strong reason?

Trick question: Hey! there is a language feature which every widespread/reasonable language offers in standard library except C#. Do you think C# guys should add it too?

BTW, PHP also made this terrible mistake by adding shuffle in their standard library: http://php.net/manual/en/function.shuffle.php.

Going by your rational of "taking inspiration from other languages is folly", .NET framework wouldn't had came into existence in first place (hint: Java).

@nvivo
Copy link

nvivo commented Mar 6, 2015

@KrzysztofCwalina,

If it's an extension somewhere else, then why do you want it in CoreFx repo? [...] We try to do as much as possible out of band, make the APIs pass the test of time, and only then we consider a tighter integration.

This is an algorithm from the 30's that has been implemented basically everywhere since then! If that doesn't pass the test of time, I don't know what would. =)

Now, seriously. I know the original proposal was to modify the Array type, and that may be unacceptable, that's fine. But then, do you think its unacceptable to have this simple method anywhere else?

I know everything can be an user library, but that's not like someone is asking to add json parsing to System.String. Shuffle is a basic data manipulation thing, it's simple, it's used in a lot of places like music players, games, lots of commercial apps, even C++ has it as part of the standard library.

I'm not saying to put it on mscorlib. But corefx is quite big. I would like for list.Shuffle() it to be as easily accessible as a list.OrderBy(...) is. I would be ok with adding some System.Collections.Extensions library that is part of the framework and that provides more than that. What I don't want to do is add a RandomUserLibrary.ShuffeMethodExtension just for that. It makes no sense, if that is the option, we can just do what we are doing today and add a 6 line method somewhere in our code.

As for creating a library just to provide 10 ways of shuffling, I think this is over-engineering. This is a simple request, there's no need to solve every problem in the world.

@KrzysztofCwalina
Copy link
Member

One more thing I would add: if I were doing such library, I would not make it an extension method over IEnumerable, nor I would use yield, and probably not Lazy. All these things have too high overhead for some shuffling scenarios (like shuffling a small array of ints in a game engine in a place where you don't want to trigger a GC), and a low level CoreFx APIs need to consider many such exotic scenarios. Having a helper API that is Linq based is fine.
Also, the implementation should allow to shuffle arrays in place (without allocating) would probably be a hard requirement for many scenarios.

@nvivo
Copy link

nvivo commented Mar 7, 2015

A quick point in favor of adding this to .NET, instead of another library. This question has 313 votes for the question and 485 for the answer.

This shows this is not a problem few people have. In fact, it's more voted than 99.4% of all c# tagged questions.

@Eirenarch
Copy link
Author

Even more important is that many people preferred the second most upvoted answer despite being told that it is wrong. They did it knowingly because they didn't feel like adding the provided method to their code. I have done this too. This is why I claim that this method should be in the standard library. Otherwise people will just not bother. They will not bother downloading a new get package or looking for a place to put the Shuffle method.

@nvivo
Copy link

nvivo commented Mar 7, 2015

I agree. Proof of that is that there are already packages providing this, but many people won't use them. Because it's simpler to make OrderBy(new Guid()).

It seems to me @KrzysztofCwalina agreed on some LINQ style helper, but I'm not sure what this means. I'd still like to see this as a more unified change, to merge with dotnet/corefx#494 as I proposed in dotnet/corefx#1098.

@colombod
Copy link
Member

colombod commented Mar 7, 2015

Agree,
the point of the initiative is to make it easier to have a lightweight framework if the application is small or not using lots of stuff. That will also make it easier to make sure is as cross platform as possible without having to rewrite massive amount of stuff.

I agree that such extension should be separate nuggets but still belonging to a repository controlled, or supervised, by CLR team as they do with the CoreFX repo.

@terrajobst
Copy link
Member

Thanks for filing this request!

We reviewed the proposal and concluded that's not a feature that's a good fit for the BCL as:

  1. The scenario isn't important for enough people
  2. The implementation would have to be super opinionated, meaning has to implement a lot of policy where it's not clear whether that's the behavior the consumer would need/expect.

You can, however, submit concrete implementations in http://github.com/dotnet/corefxlabs. If enough people chime in, we may reconsider.

@VitaliiIsaenko
Copy link

VitaliiIsaenko commented Jul 16, 2019

Wow, I am very surprised that it came down to the discussion about the NEED of the method. Declaring that the method is not important because someone "worked on many projects and implemented many libraries and didn't have a need" can be considered as a valid argument :D
I'm surprised, but okay, let everyone reinvent the wheel themselves! So much fun!

@msftgits msftgits transferred this issue from dotnet/corefx Jan 31, 2020
@msftgits msftgits added this to the 1.0.0-rtm milestone Jan 31, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Jan 7, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api-needs-work API needs work before it is approved, it is NOT ready for implementation area-System.Collections
Projects
None yet
Development

No branches or pull requests