-
-
Notifications
You must be signed in to change notification settings - Fork 480
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Branch Control Pt. 1 #4507
Branch Control Pt. 1 #4507
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall this looks pretty decent, going to submit comments on all 4 PRs simultaneously. Only writing overall thoughts on this one.
First about the process for these PRs: it was a nice attempt to make this easier to review, but failed. The last one was actually separable, but the first 3 are unavoidably coupled. A better process for next time would be to do one of two approaches: 1) thin vertical slice with lots of TODOs, expanded in subsequent PRs to be wider, 2) start at the top or the bottom and sketch out interfaces, add layers as you. Or some combination of these two approaches can work too. Just give me less to look at and think about at a time.
Overall architecture: you're missing a separation of concerns between the sqle package / subpackages, (which ideally is limited to user-visible SQL schema objects like tables, indexes, etc.) and the logic / storage for these control tables. Needs to be separated. The fact that we store this information is something resembling tabular form is relevant when selecting from the user table, but otherwise that implementation detail should be hidden in a lower layer of doltcore.
MatchExpression: I found this abstraction difficult to understand, although it eventually made sense. Some more comments are in order. Also the way that various interfaces require or return slices of indexes into slices of these objects, rather than the slices of objects themselves, is needlessly confusing. All of the logic around matching should live in the library layer, separate from the SQL layer.
Access / namespace implementations: a lot of functionality that shouldn't be exported is. Think about the overall abstractions you want to vend here, what's an implementation detail as opposed to an interface. CheckAccess() / CheckCreate() etc. is the heart of the public interface for these things; the rest should be hidden away from other library code and the SQL layer.
This is good enough to get checked in and start iterating on so long as you are confident the checking code isn't actually running anywhere.
// Match takes the match expression collection, and returns a slice of which indices matched against the given string. | ||
// The given indices may be used to further reduce the match expression collection, which will also reduce the total | ||
// number of comparisons as they're narrowed down. | ||
func Match(matchExprCollection []MatchExpression, str string, collation sql.CollationID) []uint32 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Call this Filter and have it return []MatchExpression instead of indexes into the same
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there is a compelling performance reason to return indexes instead of actual elements from the input slice, you should spell it out.
I don't see one here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you need the index version for the use in access.go, make a separate method called FilterIndexes or something
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There actually is a performance reason! To test, I completely stripped out []uint32
slices altogether, and reimplemented the functions returning []MatchExpression
instead. Performance was dramatically worse. Maybe there are some optimizations that Go uses when using a pure int slice that don't apply to a MatchExpression
slice, but I reverted it back to []uint32
. Even then, I only need the indexes anyway, which is exactly what I'm returning. Returning []MatchExpression
just means I'm discarding most of it to begin with. I extended the comment to reflect this.
Also returning []MatchExpression
is slower with pooling than returning []uint32
without pooling. Pooling []uint32
just makes the delta even larger.
if branchAwareSession == nil { | ||
return nil | ||
} | ||
StaticController.Access.RWMutex.RLock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Access to this types mutexes should be handled by methods of that type. Should be possible to handle inside access.Match, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not quite. I'm also using Match
inside of the table methods (Insert
, Delete
, etc.) which acquire a write lock, so acquiring a read lock in Match
would cause a deadlock. Same goes for GetIndex
. I moved some stuff around to make it work, but it ended up being far less straightforward (the removal of all defer Unlock()
calls), so I reverted the changes.
// Preallocate a slice that we will append matches to. The larger the initial testing set, the higher the | ||
// likelihood that there will be more matches, hence the larger the preallocated slice. | ||
matchSubset := matchExprPool.Get().([]MatchExpression)[:0] | ||
// We do a pass using the first rune over all expressions to get the subset that we'll be testing against |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why? Why not let a test expression bail out early if there's no match?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Used to have different code there, and forgot to update the comment. That's been fixed!
|
||
// ParseExpression parses the given string expression into a slice of sort ints. Returns nil if the string is too long. | ||
// Assumes that the given string expression has already been folded. | ||
func ParseExpression(str string, collation sql.CollationID) []int32 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why doesn't this return a MatchExpression?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A MatchExpression
is two things: the list of sort orders (which are what runes become when passed through a collation), along with the MatchExpression
's index in its parent collection. ParseExpression
can't know the index that it's result is going to go to, so it doesn't make sense to return a full MatchExpression
. It just converts an expression to sort orders.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also added a bit more to the comment.
// obtained from a collation. Also contains its index in the table. MatchExpression contents are not meant to be | ||
// comparable to one another, therefore please use the index to compare equivalence. | ||
type MatchExpression struct { | ||
CollectionIndex uint32 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this field represent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's index in the parent slice. In the comment it's the sentence Also contains its index in the table.
I added another comment to further clarify it. Basically, if you have a slice of 5 MatchExpression
s, the last one will have a CollectionIndex
of 4
, and the first one will have a CollectionIndex
of 0
, reflecting their actual index. The MatchExpressions
are modified as they're passed through (being value types, I'm using the slice to the start location which saves space), so they need to refer to their parent at the end.
|
||
// IsValid returns whether the match expression has any potential matches | ||
func (matchExpr MatchExpression) IsValid() bool { | ||
return matchExpr.CollectionIndex == math.MaxUint32 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic seems backwards given the definition of invalidMatchExpression
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You were right! Another case of "logic was swapped, comment/name wasn't updated"
Replaced by #4528 |
This PR goes over the general
branch_control.go
primary file, along with the expression matcher.