Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bh/set-algebra #416

Merged
merged 5 commits into from
Feb 25, 2020
Merged

bh/set-algebra #416

merged 5 commits into from
Feb 25, 2020

Conversation

bheni
Copy link
Contributor

@bheni bheni commented Feb 23, 2020

Package setalgebra provides the ability to perform algebraic set operations on mathematical sets built directly on noms
types. Unlike standard sets in computer science, which define a finitely sized collection of unordered unique values,
sets in mathematics are defined as a well-defined collection of distint objects. This can include infinitely sized
groupings such as the set of all real numbers greater than 0.

See https://en.wikipedia.org/wiki/Set_(mathematics)

There are 3 types of sets defined in this package: FiniteSet, Interval, and CompositeSet.

FiniteSet is your typical computer science set representing a finite number of unique objects stored in a map. An
example would be the set of strings {"red","blue","green"}, or the set of numbers {5, 73, 127}.

Interval is a set which can be written as an inequality such as {n | n > 0} (set of all numbers n such that n > 0) or a
chained comparison {n | 0.0 <= n <= 1.0 } (set of all floating point values between 0.0 and 1.0)

CompositeSet is a set which is made up of a FiniteSet and one or more non overlapping intervals such as
{n | n < 0 or n > 100} (set of all numbers n below 0 or greater than 100) this set contains 2 non overlapping intervals
and an empty finite set. Alternatively {n | n < 0 or {5,10,15}} (set of all numbers n below 0 or n equal to 5, 10 or 15)
which would be represented by one Interval and a FiniteSet containing 5,10, and 15.

There are 2 special sets also defined in this package: EmptySet, UniversalSet.

The EmptySet is a set that has no values in it. It has the property that when unioned with any set X, X will be the
result, and if intersected with any set X, EmptySet will be returned.

The UniversalSet is the set containing all values. It has the property that when unioned with any set X, UniversalSet is
returned and when intersected with any set X, X will be returned.

@bheni bheni requested a review from zachmu February 23, 2020 17:44
Copy link
Member

@zachmu zachmu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, but I think a lot of what you have here could be significantly simplified by writing the functions as composites of more primitive intersection and union operations. You already have a type that represents the union of a FiniteSet and N IntervalSets, and you could be using it a lot more liberally to simplify these functions. You have lots of special case logic for each of these types that probably isn't necessary and seems like premature optimization. For example I would replace most of the interval squashing code you have here with a single Canonicalize() method on the Set interface, which does nothing for the simple types but squashes the intervals in a CompositeSet. Even that might be overkill though. None of these optimizations are necessary for correctness as far as I can tell, and the times you're going to see significantly better performance from any of them seem like they are probably rare. (Although Canonicalize() will probably make it easier to write tests, so maybe it's worth it just for that).

go/libraries/doltcore/sqle/setalgebra/finite_set.go Outdated Show resolved Hide resolved
go/libraries/doltcore/sqle/setalgebra/finite_set.go Outdated Show resolved Hide resolved
go/libraries/doltcore/sqle/setalgebra/intersection.go Outdated Show resolved Hide resolved
go/libraries/doltcore/sqle/setalgebra/intersection.go Outdated Show resolved Hide resolved
// * EmptySet for an interval defined as: N < X < N
// * EmptySet for an interval where end < start
// * an unchanged interval will be returned for all other conditions
func simplifyInterval(in Interval) (Set, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would consider removing this, not sure it's pulling its weight. Seems like a premature optimization.

// unionWithMultipleIntervals takes an interval and a slice of intervals and returns a slice of intervals containing
// the minimum number of intervals required to represent the union. The src []Interval argument must be in sorted
// order and only contain non-overlapping intervals.
func unionWithMultipleIntervals(in Interval, src []Interval) ([]Interval, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this returned a Set (which would be a CompositeSet in practice) then you could use it as a primitive to compose other union operations

@bheni bheni merged commit aef16f8 into bh/filter-commits2 Feb 25, 2020
@bheni bheni deleted the bh/set-algebra branch February 25, 2020 06:39
bheni pushed a commit that referenced this pull request Feb 25, 2020
* filter commits used by history table (#418)
* setalgebra package (#416)
* history table iterator optimizations (#417)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants