Fixed race when multiple clients attempt to connect to the same branc… #5330

zachmu · 2023-02-07T23:30:08Z

…h the first time a replica fetches it

…te.sh

go/libraries/doltcore/sqle/read_replica_database.go

…ranch on a read replica (maybe should be an error to create a local branch in the first place but we don't enforce that)

…n newly pulled branch)

…f has not

…te.sh

…lt into zachmu/remote-ref-replication

zachmu · 2023-02-09T04:32:21Z

This breaks a couple of the async push bats tests. I didn't change anything directly related to that, and adding sleeps makes it work again. Which makes me think that clean shutdown logic has been broken for some time, and we just passed some threshold where we are now losing the race.

zachmu · 2023-02-09T05:07:44Z

It looks like graceful server shutdown on kill is just broken... there's no signal handling to invoke the shutdown logic, it just relies on a defer:

	sqlEngine, err := engine.NewSqlEngine(
		ctx,
		mrEnv,
		engine.FormatTabular,
		config,
	)
	if err != nil {
		return err, nil
	}
	defer sqlEngine.Close()

Edit: no, this works fine, confirmed with logging. Background threads are allegedly being canceled.

max-hoffman

I think the changes are not incorrect, but I don't understand how the edge cases are triggered, and i'm not seeing tests that try to trigger optimistic lock failures.

max-hoffman · 2023-02-09T01:36:08Z

go/libraries/doltcore/sqle/read_replica_database.go

+
+			// loop on optimistic lock failures
+		OPTIMISTIC_RETRY:
+			for {


would it be equivalent ditch the gotos and just break out of the for when we don't have a lock failure? otherwise continue

The problem is that break breaks out of the switch, not the loop. I'll see if I can eliminate the switch and make this easier to read.

max-hoffman · 2023-02-09T16:02:30Z

go/libraries/doltcore/sqle/database_provider.go

 		if err != nil {
 			return "", false, err
 		}

 		if branchExists {
-			err = createLocalBranchFromRemoteTrackingBranch(ctx, srcDB.DbData(), ddb, branchName, remoteRef)


max-hoffman · 2023-02-09T16:24:05Z

go/libraries/doltcore/sqle/read_replica_database.go

+					case ref.BranchRefType:
+						err := rrd.createNewBranchFromRemote(ctx, remoteRef, trackingRef)
+						if errors.Is(err, datas.ErrOptimisticLockFailed) {
+							continue OPTIMISTIC_RETRY


is there a different rrd for every client? i'm confused how two clients could trigger this create retry, since the limiter is supposed to collapse the parallel calls. same for pullLocalBranch but that one seems more reasonable. What are the concurrent events that trigger retries? a write overlapping a pull?

Any manifest update can cause an optimistic lock write failure. There are writes that take place outside a limiter run (specifically, during pullBranchesAndUpdateWorkingSet)

Actually the work in CreateLocalBranch needs similar protection for this reason, but I'll follow up for that.

Fixed race when multiple clients attempt to connect to the same branc…

bbc1824

…h the first time a replica fetches it

zachmu requested a review from max-hoffman February 7, 2023 23:30

[ga-format-pr] Run go/utils/repofmt/format_repo.sh and go/Godeps/upda…

1fcbd1f

…te.sh

max-hoffman reviewed Feb 8, 2023

View reviewed changes

go/libraries/doltcore/sqle/read_replica_database.go Show resolved Hide resolved

zachmu and others added 15 commits February 8, 2023 09:44

Fixed bats test. No longer an error to connect to a locally created b…

034c5d1

…ranch on a read replica (maybe should be an error to create a local branch in the first place but we don't enforce that)

Separated pulling branches from updating working set (not necessary o…

ebf1232

…n newly pulled branch)

Better concurrency control, and some bug fixes in new branch creation

18c9812

looping logic bug fix

e2a895a

Fixed race where a branch ref has been created but the working set re…

c8e8b88

…f has not

Write remote tracking refs during replica pull

a994e00

Save upstream tracking info for new branches

dc70db6

Ripped out upstream tracking for now

677e681

refactored for readability

bbe82f0

Test for replication pull creating new remote branches

92a85a6

Test of connecting to a branch not on a remote

a631719

[ga-format-pr] Run go/utils/repofmt/format_repo.sh and go/Godeps/upda…

499d3bd

…te.sh

Missed error check

d7d2b2a

Merge branch 'zachmu/remote-ref-replication' of github.com:dolthub/do…

079d3b7

…lt into zachmu/remote-ref-replication

Fixed bats test, needed replica config. Added new test

60b87d3

max-hoffman approved these changes Feb 9, 2023

View reviewed changes

Fixed bugginess in server shutdown wait for bats

1004919

zachmu merged commit e13275b into main Feb 9, 2023

tbantle22 deleted the zachmu/remote-ref-replication branch March 9, 2023 00:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixed race when multiple clients attempt to connect to the same branc… #5330

Fixed race when multiple clients attempt to connect to the same branc… #5330

zachmu commented Feb 7, 2023

zachmu commented Feb 9, 2023

zachmu commented Feb 9, 2023 •

edited

max-hoffman left a comment •

edited

max-hoffman Feb 9, 2023

zachmu Feb 9, 2023

max-hoffman Feb 9, 2023

max-hoffman Feb 9, 2023

zachmu Feb 9, 2023

Fixed race when multiple clients attempt to connect to the same branc… #5330

Fixed race when multiple clients attempt to connect to the same branc… #5330

Conversation

zachmu commented Feb 7, 2023

zachmu commented Feb 9, 2023

zachmu commented Feb 9, 2023 • edited

max-hoffman left a comment • edited

Choose a reason for hiding this comment

max-hoffman Feb 9, 2023

Choose a reason for hiding this comment

zachmu Feb 9, 2023

Choose a reason for hiding this comment

max-hoffman Feb 9, 2023

Choose a reason for hiding this comment

max-hoffman Feb 9, 2023

Choose a reason for hiding this comment

zachmu Feb 9, 2023

Choose a reason for hiding this comment

zachmu commented Feb 9, 2023 •

edited

max-hoffman left a comment •

edited