-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support KPermutation #10
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello Charlie,
Thank you for submitting this pull request implementing the KPermutationGenerator and unit tests that covers all the cases!
I added my comments there. Let me know what do you think?
- Dmytro
.simple(length) | ||
.stream() | ||
.flatMap(combination -> Generator.permutation(combination).simple().stream() ) | ||
.distinct() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line #83
I wonder if we should call this .distinct() method here? I think the KPermutationGenerator should return the permutations regardless of the duplicates in the original combinations. Users of the KPermutationGenerator can call .distinct() explicitly in their code only if it is really needed. The library will not always perform unnecessary distinct() operation and let users decide what to do with the duplicates. What do you think?
List<List<String>> permutations =
Generator.permutation("a", "a", "b")
.k(2)
.stream()
.distinct() // <- explicitly removes duplicates
.collect(toList());
If no duplicates present:
List<List<String>> permutations =
Generator.permutation("a", "b", "c")
.k(2)
.stream()
.collect(toList());
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When the input vector do not contains duplicated element, k-permutation return same result, with or without "distinct()" line
When the input vector contains duplicated element, I give 4 examples that return different result.
The result of Example 1 looks weird, it return 5 elements.
Example 1
Generator.permutation("a", "a", "b").k(2)
With distinct, return
["a", "a"],["a", "b"],["b", "a"]
Without distinct, return
['a', 'a'], ['a', 'b'], ['b', 'a'], ['a', 'b'], ['b', 'a']
Example 2
Generator.permutation("a", "a", "b").k(1)
With distinct, return
['a'], ['b']
Without distinct, return
['a'], ['a'], ['b']
Example 3
Generator.permutation("a", "a", "a").k(2)
With distinct, return
['a', 'a']
Without distinct, return
['a', 'a'], ['a', 'a'], ['a', 'a']
Example 4
Generator.permutation(Arrays.asList(1, 2, 2, 3)).k(3)
With distinct(), return
[1, 2, 2],[2, 1, 2],[2, 2, 1],[1, 2, 3],[1, 3, 2],[3, 1, 2],[3, 2, 1],[2, 3, 1],[2, 1, 3],[2, 2, 3],[2, 3, 2],[3, 2, 2],
Without distinct(), return
[1, 2, 2], [2, 1, 2], [2, 2, 1], [1, 2, 3], [1, 3, 2], [3, 1, 2], [3, 2, 1], [2, 3, 1], [2, 1, 3], [1, 2, 3], [1, 3, 2], [3, 1, 2], [3, 2, 1], [2, 3, 1], [2, 1, 3], [2, 2, 3], [2, 3, 2], [3, 2, 2]
Then I tried changed from
.flatMap(combination -> Generator.permutation(combination).simple().stream())
.distinct()
into
.flatMap(combination -> Generator.permutation(combination).simple(PermutationGenerator.TreatDuplicatesAs.IDENTICAL).stream())
I get ['a', 'a'], ['a', 'a'], ['a', 'a'],['a', 'a'], ['a', 'a'], ['a', 'a'] in example 3
So which style do you recommended?
Style 1:
.flatMap(combination -> Generator.permutation(combination).simple().stream())
.distinct()
Style 2:
.flatMap(combination -> Generator.permutation(combination).simple().stream())
Style 3:
.flatMap(combination -> Generator.permutation(combination).simple(PermutationGenerator.TreatDuplicatesAs.IDENTICAL).stream())
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see two options here:
- Removing the
distinct()
from the generator (line #83). It will produce "normal" permutations if an input vector does not have duplicates (this is expected) and it will generate the "weird" permutations for input vectors that contain duplicates like the examples above (1-4). I think these "weird" results are expected too. This is because the input vectors don't contain unique values and the combinations are not unique as the result of that. Moreover I think the generator of the k-permutations should always return expected number of elements based on the formula: https://en.wikipedia.org/wiki/Permutation#k-permutations_of_n. The formula does not include any duplicates. So I think having duplicates in the input vector should be considered as a special case and treated accordingly by calling thedistinct()
only if it is really needed. For example, the first example will be "fixed" by adding.distinct()
before.collect()
somewhere in a client's code (not in the generator's code). Other examples can be "fixed" similarly. But user should explicitly do that rather than we silently include that.distinct()
into our generator and will always do it even if it is not needed (no duplicates in the input vector) potentially reducing the performance.
If we choose this option we will not need this test as it doesn't test the generator, but tests the .distinct()
call.
@Test
public void test_identical_k2_permutation() {
List<List<String>> permutations =
Generator.permutation("a", "a", "b")
.k(2)
.stream()
.distinct() // <- explicitly removes duplicates
.collect(toList());
assertThat(permutations).hasSize(3);
System.out.println("Identical k2-permutations of (a, a, b):");
permutations.stream().forEach(System.out::println);
assertThat(permutations.get(0)).containsExactly("a", "a");
assertThat(permutations.get(1)).containsExactly("a", "b");
assertThat(permutations.get(2)).containsExactly("b", "a");
}
- Keeping the
distinct()
as it is there right now. In this case we will perform unnecessary actions for the vectors of the unique elements (I expect these are the majority of the cases). If we choose that option I think we should highlight it somehow, so users are not surprised when our generator in the example 4 returns 12 permutations instead of the calculated 24 according to the formula https://en.wikipedia.org/wiki/Permutation#k-permutations_of_n. We have 12 because of the duplicates. We can add word 'Distinct' somewhere to the class name and the corresponding methods javadocs (for example,KPermutationDistinctGenerator
). Also as the real source of the duplicates is the combinations, we should call.distinct()
before.flatMap()
:
@Override
public Iterator<List<T>> iterator() {
return Generator.combination(originalVector)
.simple(length)
.stream()
.distinct()
.flatMap(combination -> Generator.permutation(combination).simple().stream() )
.iterator();
}
Personally I prefer the first option of removing the distinct()
from the generator, but I am ok if you think we should keep it there. In that case we should follow case 2 and rename the class to something like KPermutationDistinctGenerator. What do you prefer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I've forgotten that the Generator.permutation(combination).simple()
can automatically use the DuplicatedPermutationIterator
if a combination has duplicates. So I think we should follow that pattern and implement the option 2 as you originally suggested, but put .distinct()
before .flatMap()
. Users can still use a combination of both the Combination and Permutation generators if they need to ignore the duplicates https://github.com/dpaukov/combinatoricslib3#5-k-permutations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about we provide an optional parameter into k() method:
-
without optional parameter:
Generator.permutation("a", "b", "c")
.k(2)
The implementation will check whether input parameter contains duplicated element. if duplication found, throw exception something like "Found duplicated element xxx in input element"
And the implemantion will not call "distinct()", because the useless distinct() call may consume lots of CPU and memory if the input vector is large. -
with optional parameter:
Generator.permutation("a", "a", "c")
.k(2, ALLOW_IDENTICAL_INPUT) //you may suggest the name of enum, I'm not good at English.
The implementation will ALLOW input parameter contains duplicated element. and distinct() will be called to remove duplicated result. -
with optional parameter:
Generator.permutation("a", "a", "c")
.k(2, ALLOW_IDENTICAL_OUTPUT) //you may suggest the name of enum, I'm not good at English.
The implementation will ALLOW input parameter contains duplicated element. and distinct() will NOT be called, duplicated result could be exist.
duplicated result could make sense under some context. for example
Generator.permutation("Boy", "Boy", "Boy", "Girl", "Girl")
.k(2, ALLOW_IDENTICAL_OUTPUT)
.filer( MUST_HAVE_BOY)
.count()
will answer the question: "3 boys and 2 girls pick up a team leader and a vice team leader, at least one boy should be pick, how many possible permutation?"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code updated, please review changed code.
@Test | ||
public void test_duplicated_k_permutation_iterator_remove_operation() { | ||
Iterator<List<Integer>> permutations = | ||
Generator.permutation(Arrays.asList(1, 2, 2)) | ||
.k(2) | ||
.iterator(); | ||
|
||
assertThat(permutations).isNotNull(); | ||
assertThat(permutations.hasNext()).isTrue(); | ||
|
||
// this method should throw a UnsupportedOperationException | ||
assertThrows(UnsupportedOperationException.class, permutations::remove); | ||
} | ||
|
||
@Test | ||
public void test_duplicated_k_permutations_iterator_toString() { | ||
Iterator<List<Integer>> permutations = | ||
Generator.permutation(Arrays.asList(1, 2, 2)) | ||
.k(2) | ||
.iterator(); | ||
|
||
assertThat(permutations).isNotNull(); | ||
assertThat(permutations.hasNext()).isTrue(); | ||
assertThat(permutations.next()).containsExactly(1, 2); | ||
assertThat(permutations.next()).containsExactly(2, 1); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test_duplicated_k_permutation_iterator_remove_operation()
test_duplicated_k_permutations_iterator_toString()
These tests should be also removed, they do not test the iterator methods here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@Test | ||
public void test_simple_k_permutations_iterator_toString() { | ||
Iterator<List<Integer>> permutations = | ||
Generator.permutation(Arrays.asList(1, 2, 3)) | ||
.k(2) | ||
.iterator(); | ||
|
||
assertThat(permutations).isNotNull(); | ||
assertThat(permutations.hasNext()).isTrue(); | ||
assertThat(permutations.next()).containsExactly(1, 2); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test_simple_k_permutations_iterator_toString()
This test should be removed, it does not test the toString() method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
||
@Test | ||
public void test_identical_k2_permutation() { | ||
//Todo more case |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line can be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
039fef9
to
520e939
Compare
Hi, Paukov: Any comment to the changed PR? |
Hello Charlie, I am sorry for missing the comments. It's been very busy month on my side. I agree with your idea to use a flag to let users decide whether they need to support duplicates or not. However I think the generator should produce all the k-permutations (without calling |
Agree with you the default behavior is allow duplicate, and some optional flag can remove duplicate. |
No description provided.