Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Branch Control Pt. 1 #4507

Closed
wants to merge 2 commits into from
Closed

Branch Control Pt. 1 #4507

wants to merge 2 commits into from

Conversation

Hydrocharged
Copy link
Contributor

This PR goes over the general branch_control.go primary file, along with the expression matcher.

Copy link
Member

@zachmu zachmu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this looks pretty decent, going to submit comments on all 4 PRs simultaneously. Only writing overall thoughts on this one.

First about the process for these PRs: it was a nice attempt to make this easier to review, but failed. The last one was actually separable, but the first 3 are unavoidably coupled. A better process for next time would be to do one of two approaches: 1) thin vertical slice with lots of TODOs, expanded in subsequent PRs to be wider, 2) start at the top or the bottom and sketch out interfaces, add layers as you. Or some combination of these two approaches can work too. Just give me less to look at and think about at a time.

Overall architecture: you're missing a separation of concerns between the sqle package / subpackages, (which ideally is limited to user-visible SQL schema objects like tables, indexes, etc.) and the logic / storage for these control tables. Needs to be separated. The fact that we store this information is something resembling tabular form is relevant when selecting from the user table, but otherwise that implementation detail should be hidden in a lower layer of doltcore.

MatchExpression: I found this abstraction difficult to understand, although it eventually made sense. Some more comments are in order. Also the way that various interfaces require or return slices of indexes into slices of these objects, rather than the slices of objects themselves, is needlessly confusing. All of the logic around matching should live in the library layer, separate from the SQL layer.

Access / namespace implementations: a lot of functionality that shouldn't be exported is. Think about the overall abstractions you want to vend here, what's an implementation detail as opposed to an interface. CheckAccess() / CheckCreate() etc. is the heart of the public interface for these things; the rest should be hidden away from other library code and the SQL layer.

This is good enough to get checked in and start iterating on so long as you are confident the checking code isn't actually running anywhere.

// Match takes the match expression collection, and returns a slice of which indices matched against the given string.
// The given indices may be used to further reduce the match expression collection, which will also reduce the total
// number of comparisons as they're narrowed down.
func Match(matchExprCollection []MatchExpression, str string, collation sql.CollationID) []uint32 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Call this Filter and have it return []MatchExpression instead of indexes into the same

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is a compelling performance reason to return indexes instead of actual elements from the input slice, you should spell it out.

I don't see one here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you need the index version for the use in access.go, make a separate method called FilterIndexes or something

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There actually is a performance reason! To test, I completely stripped out []uint32 slices altogether, and reimplemented the functions returning []MatchExpression instead. Performance was dramatically worse. Maybe there are some optimizations that Go uses when using a pure int slice that don't apply to a MatchExpression slice, but I reverted it back to []uint32. Even then, I only need the indexes anyway, which is exactly what I'm returning. Returning []MatchExpression just means I'm discarding most of it to begin with. I extended the comment to reflect this.

Also returning []MatchExpression is slower with pooling than returning []uint32 without pooling. Pooling []uint32 just makes the delta even larger.

if branchAwareSession == nil {
return nil
}
StaticController.Access.RWMutex.RLock()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Access to this types mutexes should be handled by methods of that type. Should be possible to handle inside access.Match, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not quite. I'm also using Match inside of the table methods (Insert, Delete, etc.) which acquire a write lock, so acquiring a read lock in Match would cause a deadlock. Same goes for GetIndex. I moved some stuff around to make it work, but it ended up being far less straightforward (the removal of all defer Unlock() calls), so I reverted the changes.

// Preallocate a slice that we will append matches to. The larger the initial testing set, the higher the
// likelihood that there will be more matches, hence the larger the preallocated slice.
matchSubset := matchExprPool.Get().([]MatchExpression)[:0]
// We do a pass using the first rune over all expressions to get the subset that we'll be testing against
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? Why not let a test expression bail out early if there's no match?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Used to have different code there, and forgot to update the comment. That's been fixed!


// ParseExpression parses the given string expression into a slice of sort ints. Returns nil if the string is too long.
// Assumes that the given string expression has already been folded.
func ParseExpression(str string, collation sql.CollationID) []int32 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why doesn't this return a MatchExpression?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A MatchExpression is two things: the list of sort orders (which are what runes become when passed through a collation), along with the MatchExpression's index in its parent collection. ParseExpression can't know the index that it's result is going to go to, so it doesn't make sense to return a full MatchExpression. It just converts an expression to sort orders.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also added a bit more to the comment.

// obtained from a collation. Also contains its index in the table. MatchExpression contents are not meant to be
// comparable to one another, therefore please use the index to compare equivalence.
type MatchExpression struct {
CollectionIndex uint32
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this field represent?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's index in the parent slice. In the comment it's the sentence Also contains its index in the table. I added another comment to further clarify it. Basically, if you have a slice of 5 MatchExpressions, the last one will have a CollectionIndex of 4, and the first one will have a CollectionIndex of 0, reflecting their actual index. The MatchExpressions are modified as they're passed through (being value types, I'm using the slice to the start location which saves space), so they need to refer to their parent at the end.


// IsValid returns whether the match expression has any potential matches
func (matchExpr MatchExpression) IsValid() bool {
return matchExpr.CollectionIndex == math.MaxUint32
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic seems backwards given the definition of invalidMatchExpression

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You were right! Another case of "logic was swapped, comment/name wasn't updated"

@Hydrocharged
Copy link
Contributor Author

Replaced by #4528

@Hydrocharged Hydrocharged deleted the daylon/branch-control-1 branch October 13, 2022 12:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants