Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disjoint Sets primitives #9145

Merged
merged 9 commits into from
Apr 24, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
52 changes: 52 additions & 0 deletions analyzers/src/SonarAnalyzer.Common/Common/DisjointSets.cs
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
/*
* SonarAnalyzer for .NET
* Copyright (C) 2015-2024 SonarSource SA
* mailto: contact AT sonarsource DOT com
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of the GNU Lesser General Public
* License as published by the Free Software Foundation; either
* version 3 of the License, or (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* Lesser General Public License for more details.
*
* You should have received a copy of the GNU Lesser General Public License
* along with this program; if not, write to the Free Software Foundation,
* Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
*/

namespace SonarAnalyzer.Common;

/// <summary>
/// Data structure for working with disjoint sets of strings, to perform union-find operations with equality semantics:
/// i.e. reflexivity, symmetry and transitivity.
///
/// See https://en.wikipedia.org/wiki/Disjoint-set_data_structure for an introduction to the data structure and
/// https://www.geeksforgeeks.org/introduction-to-disjoint-set-data-structure-or-union-find-algorithm/ for examples of
/// its use.
///
/// An example of use is to build undirected connected components of dependencies, where each node is the identifier.
///
/// Uses a dictionary of strings as a backing store. The dictionary represents a forest of trees, where each node is
/// a string and each tree is a set of nodes.
/// </summary>
public class DisjointSets
mary-georgiou-sonarsource marked this conversation as resolved.
Show resolved Hide resolved
{
private readonly Dictionary<string, string> parents;

public DisjointSets(IEnumerable<string> elements) =>
parents = elements.ToDictionary(x => x, x => x);

public void Union(string from, string to) =>
parents[FindRoot(from)] = FindRoot(to);

public string FindRoot(string element) =>
parents[element] is var root && root != element ? FindRoot(root) : root;

// Set elements are sorted in ascending order. Sets are sorted in ascending order by their first element.
public List<List<string>> GetAllSets() =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also like the idea that this is a Property renamed to "DisjointSets".
@gregory-paidis-sonarsource wdyt?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer it as a method, since "property" semantically means light-weight execution, and this method has a lot of logic.

On a more general note, what about not exposing any of the Union, FindRoot methods?
Isn't what we want from this class always the same?

We could make the class static and expose only an Execute, and return the list of lists. E.g.:

var groups = DisjointSets.Execute(myList);

What is the reason that the caller needs to care about the algorithm's implementation?
This is even more true now that we renamed it from DisjointSetPrimitives to DisjointSets and made it non-static.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like @gregory-paidis-sonarsource suggestion even more.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed it offline, here is the wrap-up:

  • It was mentioned that maybe we should have left this functionality in the rule, as extracting it prematurely as a stand-alone data structure does not have that much value.
  • For now, to be efficient, let's keep this as it is and focus on merging the rule to peach validate.

(Antonio feel free to resolve this when you read it)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect the person who opened the discussion thread to resolve it.
@mary-georgiou-sonarsource As a reviewer and initiator of the proposal to extract the three methods into a dedicated PR, please solve the discussion if you agree with the conclusion above.

[.. parents.GroupBy(x => FindRoot(x.Key), x => x.Key).Select(x => x.OrderBy(x => x).ToList()).OrderBy(x => x[0])];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
[.. parents.GroupBy(x => FindRoot(x.Key), x => x.Key).Select(x => x.OrderBy(x => x).ToList()).OrderBy(x => x[0])];
[..parents.GroupBy(x => FindRoot(x.Key), x => x.Key).Select(x => x.OrderBy(x => x).ToList())];

I think the last OrderBy can be removed.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is there to get deterministic ordering of the groups.
If this is something pertinent specifically to the too-many-responsibilities PR, we might want to do it after this method returns the value, in the analyzer.

If you think it's supposed to always happen, we should leave it here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was my understanding as well, but then if you remove it, tests don't fail.
We should add a test that fails if it's removed :/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method returns a list of lists. Lists have order, so if an order is present, it either has semantics or should be at least deterministic. Returning a set of sets to go through them again and sort them has just a performance cost and doesn't bring any benefits.

I think that generalizing prematurely this data structure to capture cases that don't exist yet is unfruitful.

We should add a test that fails if it's removed :/

Yes, that's a good point. I have added a test that specifically tests for ordering.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also add a comment about the last OrderBy to explain that it's for retaining the lists order.

}
82 changes: 82 additions & 0 deletions analyzers/tests/SonarAnalyzer.Test/Common/DisjointSetsTest.cs
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
namespace SonarAnalyzer.Test.Common;

[TestClass]
public class DisjointSetsTest
{
private static readonly string[] FirstSixPositiveInts = Enumerable.Range(1, 6).Select(x => x.ToString()).ToArray();

[TestMethod]
public void FindRootAndUnion_AreConsistent()
{
var sets = new DisjointSets(FirstSixPositiveInts);
foreach (var element in FirstSixPositiveInts)
{
sets.FindRoot(element).Should().Be(element);
}

sets.Union("1", "1");
sets.FindRoot("1").Should().Be("1"); // Reflexivity
sets.Union("1", "2");
sets.FindRoot("1").Should().Be(sets.FindRoot("2")); // Correctness
sets.Union("1", "2");
sets.FindRoot("1").Should().Be(sets.FindRoot("2")); // Idempotency
sets.Union("2", "1");
sets.FindRoot("1").Should().Be(sets.FindRoot("2")); // Symmetry

sets.FindRoot("3").Should().Be("3");
sets.Union("2", "3");
sets.FindRoot("2").Should().Be(sets.FindRoot("3"));
sets.FindRoot("1").Should().Be(sets.FindRoot("3")); // Transitivity
sets.Union("3", "4");
sets.FindRoot("1").Should().Be(sets.FindRoot("4")); // Double transitivity
sets.Union("4", "1");
sets.FindRoot("4").Should().Be(sets.FindRoot("1")); // Idempotency after transitivity
}

[TestMethod]
public void GetAllSetsAndUnion_AreConsistent()
{
var sets = new DisjointSets(FirstSixPositiveInts);
AssertSets([["1"], ["2"], ["3"], ["4"], ["5"], ["6"]], sets); // Initial state
sets.Union("1", "2");
AssertSets([["1", "2"], ["3"], ["4"], ["5"], ["6"]], sets); // Correctness
sets.Union("1", "2");
AssertSets([["1", "2"], ["3"], ["4"], ["5"], ["6"]], sets); // Idempotency

sets.Union("2", "3");
AssertSets([["1", "2", "3"], ["4"], ["5"], ["6"]], sets); // Transitivity
sets.Union("1", "3");
AssertSets([["1", "2", "3"], ["4"], ["5"], ["6"]], sets); // Idempotency after transitivity

sets.Union("4", "5");
AssertSets([["1", "2", "3"], ["4", "5"], ["6"]], sets); // Separated trees
sets.Union("3", "4");
AssertSets([["1", "2", "3", "4", "5"], ["6"]], sets); // Merged trees
}

[TestMethod]
public void GetAllSetsAndUnion_OfNestedTrees()
{
var sets = new DisjointSets(FirstSixPositiveInts);
sets.Union("1", "2");
sets.Union("3", "4");
sets.Union("5", "6");
AssertSets([["1", "2"], ["3", "4"], ["5", "6"]], sets); // Merge of 1-height trees
sets.Union("2", "3");
AssertSets([["1", "2", "3", "4"], ["5", "6"]], sets); // Merge of 2-height trees
sets.Union("4", "5");
AssertSets([["1", "2", "3", "4", "5", "6"]], sets); // Merge of 1-height tree and 2-height tree
}

[TestMethod]
public void GetAllSets_ReturnsSortedSets()
{
var sets = new DisjointSets(["3", "2", "1"]);
AssertSets([["1"], ["2"], ["3"]], sets);
sets.Union("3", "1");
AssertSets([["1", "3"], ["2"]], sets);
}

private static void AssertSets(List<List<string>> expected, DisjointSets sets) =>
sets.GetAllSets().Should().BeEquivalentTo(expected, options => options.WithStrictOrdering());
}