-
-
Notifications
You must be signed in to change notification settings - Fork 513
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cache table and schema indexes on schema address #7859
Conversation
#benchmark |
@max-hoffman DOLT
|
@max-hoffman DOLT
|
@max-hoffman DOLT
|
@max-hoffman DOLT
|
@max-hoffman DOLT
|
@max-hoffman DOLT
|
@max-hoffman DOLT
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General idea is sound, but I have some strong objections about coupling construction of a writer object to session cache management in this way. See comments.
Also it looks like benchmark results are kind of mixed? Do they need to be re-run?
} | ||
} | ||
|
||
type writeSessFunc func(nbf *types.NomsBinFormat, ws *doltdb.WorkingSet, aiTracker globalstate.AutoIncrementTracker, opts editor.Options) WriteSession |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be an exported type since it's in constructors
Also needs comment
) | ||
|
||
// SessionCache caches various pieces of expensive to compute information to speed up future lookups in the session. | ||
type SessionCache struct { | ||
indexes map[doltdb.DataCacheKey]map[string][]sql.Index | ||
tables map[doltdb.DataCacheKey]map[TableCacheKey]sql.Table | ||
// indexes is keyed by table schema |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Table schema's hash?
func (c *SessionCache) GetCachedWriterState(key doltdb.DataCacheKey) (*WriterState, bool) { | ||
c.mu.RLock() | ||
defer c.mu.RUnlock() | ||
if c.writers == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Think you can safely omit this, reads from a nil map work fine
@@ -33,6 +33,7 @@ import ( | |||
"github.com/dolthub/dolt/go/libraries/doltcore/schema" | |||
"github.com/dolthub/dolt/go/libraries/doltcore/sqle/dsess" | |||
"github.com/dolthub/dolt/go/libraries/doltcore/sqle/sqlutil" | |||
"github.com/dolthub/dolt/go/libraries/doltcore/sqle/writer" | |||
"github.com/dolthub/dolt/go/libraries/doltcore/table/editor" | |||
config2 "github.com/dolthub/dolt/go/libraries/utils/config" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change this while you're at it
sqlSch, err := sqlutil.FromDoltSchema("", w.tableName.Name, sch) | ||
if err != nil { | ||
return err | ||
func (w *prollyTableWriter) Reset(ctx *sql.Context, sess *prollyWriteSession, tbl *doltdb.Table, sch schema.Schema) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really bad, we really don't want to couple session state management to correct writer behavior like this. If this needs to happen for correctness, then the control of information flow needs to be one way only. Either the session owns this information and the caches of it and is responsible for clearing them, or this writer only manages its own internal state on reset.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I follow this. The writer schema state needs to live in the session to be cached between transactions. It's only used for writing, so it's going to be coupled to writers. It's exclusively a property of the table, so anywhere we try to write the table seems valid to check for the cached write state as long as the table is well-formed (its schema is consistent with its data). If we passed the latest table to GetTableWriter
would that alleviate your concerns? Then there's no ambiguity maybe, we're disregarding whatever is in the existing write session, replacing it with the schema values for the given table.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So follow up on this, the preexisting coupling is probably worse than we originally thought. I can't add table parameter to these methods without breaking correctness. Certain writers rely on using the tables in prollyTableWriter.workingSet
rather than whatever is available from the calling scope. I don't really like the indirection, but I would at least need to rewrite all of the fulltext logic to make this not the case. It's possible it extends to other areas. All of the action happens in GetTableWriter
/Reset
and package cycles prevent me from moving the logic out of dsess
. The places where we have the context to create writers is the only place to manage the caching lifecycle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Punt I guess. This is at least a bit more self-contained now.
|
||
// GetTableWriter implemented WriteSession. | ||
func (s *prollyWriteSession) GetTableWriter(ctx *sql.Context, table doltdb.TableName, db string, setter SessionRootSetter) (TableWriter, error) { | ||
func (s *prollyWriteSession) GetTableWriter(ctx *sql.Context, table doltdb.TableName, db string, setter dsess.SessionRootSetter) (dsess.TableWriter, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as above. This is really bad separation of concerns.
Looking around, it appears that this method is called in exactly one place where caching is critical: (t *WritableDoltTable) getTableEditor
. That's where this caching logic should be applied.
@@ -106,6 +123,44 @@ func (s *prollyWriteSession) GetTableWriter(ctx *sql.Context, table doltdb.Table | |||
return twr, nil | |||
} | |||
|
|||
func getWriterSchemas(ctx *sql.Context, table *doltdb.Table, tableName string) (*dsess.WriterState, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic belongs in a layer above as well (either tables.go, or in the SessionCache, either would be fine). Then pass this object into GetTableWriter
#benchmark |
@max-hoffman DOLT
|
@max-hoffman DOLT
|
@coffeegoddd DOLT
|
@max-hoffman DOLT
|
@max-hoffman DOLT
|
@max-hoffman DOLT
|
@max-hoffman DOLT
|
@coffeegoddd DOLT
|
sqlSch, err := sqlutil.FromDoltSchema("", w.tableName.Name, sch) | ||
if err != nil { | ||
return err | ||
func (w *prollyTableWriter) Reset(ctx *sql.Context, sess *prollyWriteSession, tbl *doltdb.Table, sch schema.Schema) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Punt I guess. This is at least a bit more self-contained now.
@coffeegoddd DOLT
|
@coffeegoddd DOLT
|
@coffeegoddd DOLT
|
The bulk of ~1ms read and write TPC-C queries benefit from caching table and index schemas, which have a lifecycle between schema migrations/alter statements/new table additions. This is in contrast to how we've typically cached objects using the root value hash, which is great for read-only workflows, but has a much shorter half-life.