Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vinai/refactor docs #1210

Merged
merged 30 commits into from Jan 25, 2021
Merged

Vinai/refactor docs #1210

merged 30 commits into from Jan 25, 2021

Conversation

VinaiRachakonda
Copy link
Contributor

@VinaiRachakonda VinaiRachakonda commented Jan 12, 2021

This pr refactors DocsReadWriter to simplify the interface. It then removes many of the methods in environment.go that are related to the drw interface.

@VinaiRachakonda VinaiRachakonda changed the title [WIP] Vinai/refactor docs Vinai/refactor docs Jan 12, 2021
Copy link
Member

@zachmu zachmu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good start, but has a long way to go. Keep on trucking.

if err != nil {
return nil, err
}
func resetDocs(ctx context.Context, dbData env.DbData, headRoot *doltdb.RootValue, staged *doltdb.RootValue, docDetails env.Docs) (newStgRoot *doltdb.RootValue, err error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs a comment

PutDocsToStaged(ctx context.Context, docDetails []doltdb.DocDetails) (*doltdb.RootValue, error)
ResetWorkingDocsToStagedDocs(ctx context.Context) error
GetDocDetail(docName string) (doc doltdb.DocDetails, err error)
GetDocDetailOnDisk(docName string) (doc doltdb.DocDetails, err error)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just get rid of this and add a ...string param to GetDocsOnDisk

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't the return value still be an array then?

@@ -76,6 +76,16 @@ type RepoState struct {
Branches map[string]BranchConfig `json:"branches"`
}

type RootType int
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RepoStateRef?

@reltuk have a better idea for a name or an alternative?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I bet you don't need this anymore once you fully commit to separating the logic into methods that update roots, vs. methods that update repo state

// appended to a docDetails slice. We return a tuple of tables, docDetails and error.
func GetTblsAndDocDetails(drw env.DocsReadWriter, tbls []string) (tables []string, docDetails []doltdb.DocDetails, err error) {
func GetTablesAndDocDetails(drw env.DocsReadWriter, tbls []string) (tables []string, docDetails []doltdb.DocDetails, err error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bad method. I had to read this three times and look at usages everywhere to figure out what it's doing and why.

Rename this: GetTablesOrDocs

Rename second param: tablesOrFiles

Have it return ([]string, Docs, err)

Change comment to:

Takes a slice of table or file names. Table names are returned as given. Valid doc names are read from disk and their name replace with the names of the dolt_docs system table in the input slice. Valid Docs are returned in the second return param.

Finally move this out of staged.go, it has nothing to do with staged values. A good home would be docs.go.

}

for i, doc := range docs {
doc, err = doltdb.AddNewerTextToDocFromTbl(ctx, docTbl, &sch, doc)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not be a method on doltdb

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NewerText is a bad name. Get rid of NewerText from the DocDetails object all together. If we want to compare to versions of a doc, we should use two different DocDetails objects.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move DocDetails out of the doltdb package in general.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None of the methods that manipulate docs in general should be in the root_val.go file

@@ -100,3 +103,145 @@ func hasDocFile(fs filesys.ReadWriteFS, file string) bool {
exists, isDir := fs.Exists(getDocFile(file))
return exists && !isDir
}

// WorkingRootWithDocs returns a copy of the working root that has been updated with the Dolt docs from the file system.
func WorkingRootWithDocs(ctx context.Context, dbData DbData) (*doltdb.RootValue, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You've moved these out of environment.go (good) but there are still waaaay too many of them (bad).

There should be only one: UpdateRootWithDocs. It should take a root value and update it with whatever docs are on the file system, then return it.

Copy link
Member

@zachmu zachmu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is already looking way, way better.

Another batch of comments. Getting closer now, over halfway to cleaning up this mess.

if docDetails == nil {
docDetails, _ = dEnv.GetAllValidDocDetails()
if currentDocDetails == nil {
currentDocDetails, _ = drw.GetDocsOnDisk()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to inspect and handle error here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually why do we need this entire block? Can't we enforce that we get the current docs before calling this? Then we don't need to pass a drw at all

if toTbl != nil {
sch1, _ := toTbl.GetSchema(ctx)
doc, _ = doltdb.AddNewerTextToDocFromTbl(ctx, toTbl, &sch1, doc)
for _, doc := range currentDocDetails {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error handling in this entire block is missing

LicenseFile = "../LICENSE.md"
)

type DocDetails struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably time to rename this to Doc.

And params of this type to doc / docs

{DocPk: doltdb.LicensePk, File: LicenseFile},
}

func GetDocFile(filename string) string {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GetDocFilePath

{DocPk: doltdb.LicensePk, File: LicenseFile},
}

func GetDocFile(filename string) string {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need method doc

@@ -123,17 +125,17 @@ func getUpdatedWorkingAndStagedWithDocs(ctx context.Context, dEnv *env.DoltEnv,
root = staged
}

docs, err := dEnv.GetDocsWithNewerTextFromRoot(ctx, root, docDetails)
docs, err := env.GetDocsWithTextFromRoot(ctx, root, docDetails)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be a method in the doltdocs package, takes a root and returns docs.

DocsInRoot

// UpdateRootWithDocs takes in a root value, a drw, and some docs and writes those docs to the dolt_docs table
// (perhaps creating it in the process). The table might not necessarily need to be created if there are no docs in the
// repo yet.
func UpdateRootWithDocs(ctx context.Context, dbData DbData, root *doltdb.RootValue, rootType RootType, docDetails doltdocs.Docs) (*doltdb.RootValue, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As in other comments: this method goes in doltdocs package

}
// GetDocsWithTextFromRoot returns Docs with the Text value(s) from the provided root. If docs are provided,
// only those docs will be retrieved and returned. Otherwise, all valid doc details are returned with the updated Text.
func GetDocsWithTextFromRoot(ctx context.Context, root *doltdb.RootValue, docs doltdocs.Docs) (doltdocs.Docs, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doltdocs package

func (dEnv *DoltEnv) GetAllValidDocDetails() (docs []doltdb.DocDetails, err error) {
docs = []doltdb.DocDetails{}
for _, doc := range *AllValidDocDetails {
func (dEnv *DoltEnv) GetAllValidDocDetails() (docs doltdocs.Docs, err error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method and the one below belong in doltdocs package

// See the License for the specific language governing permissions and
// limitations under the License.

package docsTable
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Put this in dtables with other persisted system tables (dolt_query_catalog), not in a new package

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocked on this because of import loop stemming from status table :/

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with it being separate for now, not a huge deal

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But you should try again at the end once you do everything else, it might unknot itself.

Copy link
Member

@zachmu zachmu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking better with every revision. You're getting close, stay with me!

}

func printDocDiffs(ctx context.Context, dEnv *env.DoltEnv, fromTbl, toTbl *doltdb.Table, docDetails []doltdb.DocDetails) {
func printDocDiffs(ctx context.Context, fromTbl, toTbl *doltdb.Table, currentDocDetails doltdocs.Docs) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

currentDocs


// DocDiff returns the added, modified and removed docs when comparing a root value with an other (newer) value. If the other value,
// is not provided, then we compare the docs on the root value to the docDetails provided.
func DocDiff(ctx context.Context, root *doltdb.RootValue, other *doltdb.RootValue, docDetails doltdocs.Docs) (added, modified, removed []string, err error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

docs


type docComparison struct {
CurrentText []byte
DocPk string
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be the first field

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be DocName, not Pk

return docComparisons, nil
}

func getDocComparisonObjFromDocDetail(ctx context.Context, tbl *doltdb.Table, sch *schema.Schema, doc doltdocs.Doc) (docComparison, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

newDocComparison

@@ -88,21 +88,21 @@ func (rvu RootValueUnreadable) Error() string {
}

// NewDocDiffs returns DocDiffs for Dolt Docs between two roots.
func NewDocDiffs(ctx context.Context, older *doltdb.RootValue, newer *doltdb.RootValue, docDetails []doltdb.DocDetails) (*DocDiffs, error) {
func NewDocDiffs(ctx context.Context, older *doltdb.RootValue, newer *doltdb.RootValue, docDetails doltdocs.Docs) (*DocDiffs, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

docs

// UpdateRootWithDocs takes in a root value, a drw, and some docs and writes those docs to the dolt_docs table
// (perhaps creating it in the process). The table might not necessarily need to be created if there are no docs in the
// repo yet.
func UpdateRootWithDocs(ctx context.Context, dbData DbData, root *doltdb.RootValue, rootType RootType, docDetails doltdocs.Docs) (*doltdb.RootValue, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrong abstraction. You're doing too much here. We need two things:

  1. methods that update root values with new tables or doc contents
  2. methods that update the repo state of [working,staged,head] to root value hashes.

You're combining them here.

You don't need, and shouldn't have, methods that update repo state specific to docs contents. If you want to update docs persisted in the repo state, then

  1. modify a root value to have the dolt_docs table you want,
  2. write that thing to staged or head

This method should do 1) only, and it should go in the dolt_docs package.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method should take (ctx, root, docs) and return (root, error)

@@ -76,6 +76,16 @@ type RepoState struct {
Branches map[string]BranchConfig `json:"branches"`
}

type RootType int
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I bet you don't need this anymore once you fully commit to separating the logic into methods that update roots, vs. methods that update repo state

// See the License for the specific language governing permissions and
// limitations under the License.

package docsTable
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with it being separate for now, not a huge deal

// See the License for the specific language governing permissions and
// limitations under the License.

package docsTable
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But you should try again at the end once you do everything else, it might unknot itself.

var DoltDocsSchema = schema.MustSchemaFromCols(doltDocsColumns)

// updateDocsTable takes in docTbl param and updates it with the value in docDetails. It returns the updated table.
func UpdateDocsTable(ctx context.Context, docTbl *doltdb.Table, docDetails doltdocs.Docs) (*doltdb.Table, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

docs

if err != nil {
return nil, nil, err
}
doc := docAr[0]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a better way to do this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really, but you should do a bound check

@VinaiRachakonda VinaiRachakonda linked an issue Jan 22, 2021 that may be closed by this pull request
Copy link
Member

@zachmu zachmu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really good, nice work.

The last big thing I want you to do before checking this in is to move the docs table / row iteration code out of the diff package and into the doltdocs package, and write a new differ algorithm based on two doltdocs.Docs. Then there will be no reason for anyone outside that package to care about how docs are stored in the dolt_docs table.

return nil, nil, nil, err
}

workingRoot, err = env.UpdateRootWithDocs(ctx, workingRoot, docs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't be in the env package. Put it in the doltdocs or similar package.

@@ -0,0 +1,174 @@
// Copyright 2020 Dolthub, Inc.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might as well update this now

GoLand can automate for you

// DocsDiff returns the added, modified and removed docs when comparing a root value with an other (newer) value. If the other value,
// is not provided, then we compare the docs on the root value to the docs provided.
func DocsDiff(ctx context.Context, root *doltdb.RootValue, other *doltdb.RootValue, docs doltdocs.Docs) (added, modified, removed []string, err error) {
oldTbl, oldTblFound, err := root.GetTable(ctx, doltdb.DocTableName)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic to get the docs table should maybe go into the doltdocs package, but not a big deal

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comments on getDocComparisonsBtwnRoots, you shouldn't need to get any tables or schemas in this method.


type docComparison struct {
CurrentText []byte
DocPk string
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be DocName, not Pk

if err != nil {
return nil, err
}
err = newRows.IterAll(ctx, func(key, val types.Value) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This entire loop should take place in doltdocs.

doltdocs.GetAllDocs(*rootValue) Docs

No need for this layer of the business logic to iterate over rows in the storage layer like this. Ask an abstraction to give you all the docs in a root value.

}

// GetDocTextFromTbl returns the Text field of a doc using the provided table and schema and primary key.
func GetDocTextFromTbl(ctx context.Context, tbl *doltdb.Table, sch *schema.Schema, docPk string) ([]byte, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

if err != nil {
return nil, nil, err
}
doc := docAr[0]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really, but you should do a bound check

if err != nil {
return err
}

if len(docs) > 0 {
working, err = env.UpdateRootWithDocs(ctx, working, docs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what i'm talking about, so much cleaner

if doc.DocPk == docName {
return true
}
// UpdateRootWithDocs takes in a root value, and some docs and writes those docs to the dolt_docs table
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely belongs in the doltdocs package

PutDocsToStaged(ctx context.Context, docDetails []doltdb.DocDetails) (*doltdb.RootValue, error)
ResetWorkingDocsToStagedDocs(ctx context.Context) error
GetDocDetail(docName string) (doc doltdb.DocDetails, err error)
GetDocsOnDisk(docNames ...string) (doltdocs.Docs, error)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Method doc for both these, describe params

@VinaiRachakonda VinaiRachakonda merged commit 3342b8d into master Jan 25, 2021
@Hydrocharged Hydrocharged deleted the vinai/refactor-docs branch February 19, 2021 11:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Refactor Docs methods out of environment.go
2 participants