Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Issue 10131 - To remove duplicates and keep order #5774

Closed
wants to merge 1 commit into from

Conversation

RazvanN7
Copy link
Collaborator

I'm not sure if this function should be added as an overload to uniq.

@dlang-bot
Copy link
Contributor

Thanks for your pull request, @RazvanN7! We are looking forward to reviewing it, and you should be hearing from a maintainer soon.

Some tips to help speed things up:

  • smaller, focused PRs are easier to review than big ones

  • try not to mix up refactoring or style changes with bug fixes or feature enhancements

  • provide helpful commit messages explaining the rationale behind each change

Bear in mind that large or tricky changes may require multiple rounds of review and revision.

Please see CONTRIBUTING.md for more information.

Bugzilla references

Auto-close Bugzilla Description
10131 To remove duplicates and keep order

@ghost
Copy link

ghost commented Oct 10, 2017

deduplicate() ?

@RazvanN7
Copy link
Collaborator Author

@bbasile maybe it's just me, but deduplicate sounds very confusing.

@RazvanN7
Copy link
Collaborator Author

RazvanN7 commented Oct 10, 2017

@wilzbach why is dscanner failing? The code in the unittest is valid user code.

{
import std.algorithm : canFind;
T[] result;
foreach (T elem; s)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

first element of s can be inserted, always, unconditionally. (in the other else part of the static branch too)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this would imply checking manually for an empty array, instead the foreach does that for you, plus the code is more readable this way

@ghost
Copy link

ghost commented Oct 10, 2017

@RazvanN7: about deduplicate(), no it's not confusing it's kinda of a valid english word that tells well what the function does and that's used in programming (1, 2 and probably much more)

}

///
@system unittest
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's not safe to deduplicate an array of int ?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way the content here is quite horrid for a documentation unittest

if the $(D opEquals) operator is not overriden for classes, $(D noDupes) considers the class
pointer for duplication removal.
*/
T[] noDupes(T)(T[] s)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not auto ref ?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aw and inout ? Since deduplication doesn't happend in place ?

@@ -5185,3 +5185,100 @@ if (isRandomAccessRange!Range && hasLength!Range)
[1, 2, 0],
[2, 1, 0]]));
}
/*
Returns the array without any duplicates, keeping the initial order of appearance.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Params and Returns standard DDOC sections ?

else
{
bool[T] set;
T[] result;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't an Appender be possible here ? we know how many to reserve (top).

foreach (T elem; s)
if (elem !in set)
{
set[elem] = true;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this strategy is something really clever with struct that are postblitable. The postblit will run once in the AA and then a second time in the result.

@trikko
Copy link
Contributor

trikko commented Oct 10, 2017

I have a custom function for this:

auto distinct(Range)(Range r) if (isInputRange!Range)
{
   bool[ForeachType!Range] justSeen;
   return r.filter!(
      (k) {
         if (k in justSeen) return false;
         return justSeen[k] = true;
      }
   );
}

It works in a lazy way, so it works for infinite ranges too like:

roundRobin(sequence!"n"(0), sequence!"n"(0)).distinct.take(10).writeln;

and doesn't read the whole array if not needed.

@wilzbach
Copy link
Member

@wilzbach why is dscanner failing? The code in the unittest is valid user code.

Have a look at the error messages:

std/algorithm/iteration.d(5198:5)[warn]: Public declaration 'noDupes' is undocumented.
std/algorithm/iteration.d(5233:25)[warn]: Methods 'opCmp', 'toHash', 'opEquals', 'opCast', and/or 'toString' are non-const.
std/algorithm/iteration.d(5234:23)[warn]: Methods 'opCmp', 'toHash', 'opEquals', 'opCast', and/or 'toString' are non-const.
std/algorithm/iteration.d(5244:11)[warn]: 'B' has method 'toHash', but not 'opEquals'.
std/algorithm/iteration.d(5249:25)[warn]: Methods 'opCmp', 'toHash', 'opEquals', 'opCast', and/or 'toString' are non-const.
std/algorithm/iteration.d(5257:11)[warn]: 'C' has method 'opEquals', but not 'toHash'.
std/algorithm/iteration.d(5262:23)[warn]: Methods 'opCmp', 'toHash', 'opEquals', 'opCast', and/or 'toString' are non-const.

Regarding distinct

Btw I have something similar to @trikko's solution as well. However, as of now the paradigm is that std.algorithm should be @nogc in theory (not all functions are this in practice, but they could), so I have never considered suggesting it.
I would be in favor of considering to add such a utility function like distinct, but I think we should go for std.experimental.algorithm (or similar), so that we are flexible and can experiment with adding an allocator API to it.

Regarding this PR

FWIW there are at least these valid approaches:
A) sort.uniq ("Phobos out of the box")
B) filter + AA (e.g. by @trikko)

(A) is good if you want to be @nogc or don't have many duplicates, but (B) is the lazy and usually faster approach. Imho this PR combines both disadvantages of (A) and (B)

@wilzbach
Copy link
Member

@RazvanN7 please don't forget to add a link to your PR to the regarding Bugzilla issue, s.t. people in the future know about all work done and duplicate the work (I have added it for this one).

See also: dlang/dmd#7205

@RazvanN7
Copy link
Collaborator Author

@wilzbach

Regarding circleCi : since the compiler allows having to hash without opEquals why does dscanner complain about that? I admit that having a non-const toHash, opCmp etc. is most likely a mistake.

Other then that, I haven't thought about the @nogc aspect. I will put @trikko 's code into std/algorithm/experimental and hopefully that will close the bug.

Thanks for the extensive explanation. I'm pretty noobish when it comes to function annotations and your explanations are extremely helpful.

@RazvanN7 RazvanN7 reopened this Oct 11, 2017
@RazvanN7 RazvanN7 closed this Oct 11, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants