Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support KPermutation #10

Merged
merged 1 commit into from
Apr 11, 2021
Merged

Support KPermutation #10

merged 1 commit into from
Apr 11, 2021

Conversation

fengertao
Copy link
Contributor

No description provided.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.008%) to 99.744% when pulling f1eca99 on fengertao:develop into dfc2c23 on dpaukov:master.

@coveralls
Copy link

coveralls commented Feb 12, 2021

Coverage Status

Coverage decreased (-0.2%) to 99.513% when pulling 6ea8210 on fengertao:develop into dfc2c23 on dpaukov:master.

@dpaukov dpaukov linked an issue Feb 13, 2021 that may be closed by this pull request
Copy link
Owner

@dpaukov dpaukov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello Charlie,

Thank you for submitting this pull request implementing the KPermutationGenerator and unit tests that covers all the cases!

I added my comments there. Let me know what do you think?

  • Dmytro

.simple(length)
.stream()
.flatMap(combination -> Generator.permutation(combination).simple().stream() )
.distinct()
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line #83

I wonder if we should call this .distinct() method here? I think the KPermutationGenerator should return the permutations regardless of the duplicates in the original combinations. Users of the KPermutationGenerator can call .distinct() explicitly in their code only if it is really needed. The library will not always perform unnecessary distinct() operation and let users decide what to do with the duplicates. What do you think?

List<List<String>> permutations =
            Generator.permutation("a", "a", "b")
                    .k(2)
                    .stream()
                    .distinct()                    // <- explicitly removes duplicates
                    .collect(toList());

If no duplicates present:

List<List<String>> permutations =
            Generator.permutation("a", "b", "c")
                    .k(2)
                    .stream()
                    .collect(toList());

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the input vector do not contains duplicated element, k-permutation return same result, with or without "distinct()" line

When the input vector contains duplicated element, I give 4 examples that return different result.

The result of Example 1 looks weird, it return 5 elements.

Example 1
Generator.permutation("a", "a", "b").k(2)
With distinct, return
["a", "a"],["a", "b"],["b", "a"]
Without distinct, return
['a', 'a'], ['a', 'b'], ['b', 'a'], ['a', 'b'], ['b', 'a']

Example 2
Generator.permutation("a", "a", "b").k(1)
With distinct, return
['a'], ['b']
Without distinct, return
['a'], ['a'], ['b']

Example 3
Generator.permutation("a", "a", "a").k(2)
With distinct, return
['a', 'a']
Without distinct, return
['a', 'a'], ['a', 'a'], ['a', 'a']

Example 4
Generator.permutation(Arrays.asList(1, 2, 2, 3)).k(3)
With distinct(), return
[1, 2, 2],[2, 1, 2],[2, 2, 1],[1, 2, 3],[1, 3, 2],[3, 1, 2],[3, 2, 1],[2, 3, 1],[2, 1, 3],[2, 2, 3],[2, 3, 2],[3, 2, 2],
Without distinct(), return
[1, 2, 2], [2, 1, 2], [2, 2, 1], [1, 2, 3], [1, 3, 2], [3, 1, 2], [3, 2, 1], [2, 3, 1], [2, 1, 3], [1, 2, 3], [1, 3, 2], [3, 1, 2], [3, 2, 1], [2, 3, 1], [2, 1, 3], [2, 2, 3], [2, 3, 2], [3, 2, 2]

Then I tried changed from
.flatMap(combination -> Generator.permutation(combination).simple().stream())
.distinct()
into
.flatMap(combination -> Generator.permutation(combination).simple(PermutationGenerator.TreatDuplicatesAs.IDENTICAL).stream())

I get ['a', 'a'], ['a', 'a'], ['a', 'a'],['a', 'a'], ['a', 'a'], ['a', 'a'] in example 3

So which style do you recommended?
Style 1:
.flatMap(combination -> Generator.permutation(combination).simple().stream())
.distinct()
Style 2:
.flatMap(combination -> Generator.permutation(combination).simple().stream())
Style 3:
.flatMap(combination -> Generator.permutation(combination).simple(PermutationGenerator.TreatDuplicatesAs.IDENTICAL).stream())

Copy link
Owner

@dpaukov dpaukov Feb 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see two options here:

  1. Removing the distinct() from the generator (line #83). It will produce "normal" permutations if an input vector does not have duplicates (this is expected) and it will generate the "weird" permutations for input vectors that contain duplicates like the examples above (1-4). I think these "weird" results are expected too. This is because the input vectors don't contain unique values and the combinations are not unique as the result of that. Moreover I think the generator of the k-permutations should always return expected number of elements based on the formula: https://en.wikipedia.org/wiki/Permutation#k-permutations_of_n. The formula does not include any duplicates. So I think having duplicates in the input vector should be considered as a special case and treated accordingly by calling the distinct() only if it is really needed. For example, the first example will be "fixed" by adding .distinct() before .collect() somewhere in a client's code (not in the generator's code). Other examples can be "fixed" similarly. But user should explicitly do that rather than we silently include that .distinct() into our generator and will always do it even if it is not needed (no duplicates in the input vector) potentially reducing the performance.

If we choose this option we will not need this test as it doesn't test the generator, but tests the .distinct() call.

@Test
public void test_identical_k2_permutation() {
  List<List<String>> permutations =
          Generator.permutation("a", "a", "b")
                  .k(2)
                  .stream()
                  .distinct()              // <- explicitly removes duplicates
                  .collect(toList());

  assertThat(permutations).hasSize(3);
  System.out.println("Identical k2-permutations of (a, a, b):");
  permutations.stream().forEach(System.out::println);
  assertThat(permutations.get(0)).containsExactly("a", "a");
  assertThat(permutations.get(1)).containsExactly("a", "b");
  assertThat(permutations.get(2)).containsExactly("b", "a");
}
  1. Keeping the distinct() as it is there right now. In this case we will perform unnecessary actions for the vectors of the unique elements (I expect these are the majority of the cases). If we choose that option I think we should highlight it somehow, so users are not surprised when our generator in the example 4 returns 12 permutations instead of the calculated 24 according to the formula https://en.wikipedia.org/wiki/Permutation#k-permutations_of_n. We have 12 because of the duplicates. We can add word 'Distinct' somewhere to the class name and the corresponding methods javadocs (for example, KPermutationDistinctGenerator). Also as the real source of the duplicates is the combinations, we should call .distinct() before .flatMap():
  @Override
  public Iterator<List<T>> iterator() {
    return  Generator.combination(originalVector)
            .simple(length)
            .stream()
            .distinct()
            .flatMap(combination -> Generator.permutation(combination).simple().stream() )
            .iterator();
  }

Personally I prefer the first option of removing the distinct() from the generator, but I am ok if you think we should keep it there. In that case we should follow case 2 and rename the class to something like KPermutationDistinctGenerator. What do you prefer?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I've forgotten that the Generator.permutation(combination).simple() can automatically use the DuplicatedPermutationIterator if a combination has duplicates. So I think we should follow that pattern and implement the option 2 as you originally suggested, but put .distinct() before .flatMap(). Users can still use a combination of both the Combination and Permutation generators if they need to ignore the duplicates https://github.com/dpaukov/combinatoricslib3#5-k-permutations

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about we provide an optional parameter into k() method:

  • without optional parameter:
    Generator.permutation("a", "b", "c")
    .k(2)
    The implementation will check whether input parameter contains duplicated element. if duplication found, throw exception something like "Found duplicated element xxx in input element"
    And the implemantion will not call "distinct()", because the useless distinct() call may consume lots of CPU and memory if the input vector is large.

  • with optional parameter:
    Generator.permutation("a", "a", "c")
    .k(2, ALLOW_IDENTICAL_INPUT) //you may suggest the name of enum, I'm not good at English.
    The implementation will ALLOW input parameter contains duplicated element. and distinct() will be called to remove duplicated result.

  • with optional parameter:
    Generator.permutation("a", "a", "c")
    .k(2, ALLOW_IDENTICAL_OUTPUT) //you may suggest the name of enum, I'm not good at English.
    The implementation will ALLOW input parameter contains duplicated element. and distinct() will NOT be called, duplicated result could be exist.
    duplicated result could make sense under some context. for example

Generator.permutation("Boy", "Boy", "Boy", "Girl", "Girl")
.k(2, ALLOW_IDENTICAL_OUTPUT)
.filer( MUST_HAVE_BOY)
.count()
will answer the question: "3 boys and 2 girls pick up a team leader and a vice team leader, at least one boy should be pick, how many possible permutation?"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code updated, please review changed code.

Comment on lines 275 to 300
@Test
public void test_duplicated_k_permutation_iterator_remove_operation() {
Iterator<List<Integer>> permutations =
Generator.permutation(Arrays.asList(1, 2, 2))
.k(2)
.iterator();

assertThat(permutations).isNotNull();
assertThat(permutations.hasNext()).isTrue();

// this method should throw a UnsupportedOperationException
assertThrows(UnsupportedOperationException.class, permutations::remove);
}

@Test
public void test_duplicated_k_permutations_iterator_toString() {
Iterator<List<Integer>> permutations =
Generator.permutation(Arrays.asList(1, 2, 2))
.k(2)
.iterator();

assertThat(permutations).isNotNull();
assertThat(permutations.hasNext()).isTrue();
assertThat(permutations.next()).containsExactly(1, 2);
assertThat(permutations.next()).containsExactly(2, 1);
}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test_duplicated_k_permutation_iterator_remove_operation()
test_duplicated_k_permutations_iterator_toString()

These tests should be also removed, they do not test the iterator methods here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines 263 to 352
@Test
public void test_simple_k_permutations_iterator_toString() {
Iterator<List<Integer>> permutations =
Generator.permutation(Arrays.asList(1, 2, 3))
.k(2)
.iterator();

assertThat(permutations).isNotNull();
assertThat(permutations.hasNext()).isTrue();
assertThat(permutations.next()).containsExactly(1, 2);
}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test_simple_k_permutations_iterator_toString()

This test should be removed, it does not test the toString() method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


@Test
public void test_identical_k2_permutation() {
//Todo more case
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line can be removed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@fengertao fengertao force-pushed the develop branch 2 times, most recently from 039fef9 to 520e939 Compare February 13, 2021 13:12
@fengertao
Copy link
Contributor Author

Hi, Paukov:

Any comment to the changed PR?

@dpaukov
Copy link
Owner

dpaukov commented Apr 11, 2021

Hello Charlie,

I am sorry for missing the comments. It's been very busy month on my side.

I agree with your idea to use a flag to let users decide whether they need to support duplicates or not. However I think the generator should produce all the k-permutations (without calling distinct()) by default (if no flags are provided) even if an input vector contains duplicates. Also I believe we can reuse the existing enum TreatDuplicatesAs without introducing a new one that can confuse users. What do you think if we merge this PR and I will add another PR that removes your enum kTreatDuplicatesAs and changes the default implementation?

@fengertao
Copy link
Contributor Author

Agree with you the default behavior is allow duplicate, and some optional flag can remove duplicate.
Please merge and change default implementation as you purosed.

@dpaukov dpaukov merged commit 5efcaed into dpaukov:master Apr 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make k-permutation API more easy to use
3 participants