[store] Change the `Restore` action on objects to update instead of delete/create #2281

cyli · 2017-06-21T23:10:06Z

Since if the object already exists, the event produced should be an update and not a delete/create.

This is probably the cause of moby/moby#33541, where an unlock key change is sometimes is not picked up by a snapshot restore from another node.

cc @aaronlehmann

Also, :( generics

codecov · 2017-06-21T23:20:55Z

Codecov Report

Merging #2281 into master will increase coverage by 0.58%.
The diff coverage is 82.6%.

@@            Coverage Diff             @@
##           master    #2281      +/-   ##
==========================================
+ Coverage   60.38%   60.96%   +0.58%     
==========================================
  Files         125      126       +1     
  Lines       20412    20373      -39     
==========================================
+ Hits        12326    12421      +95     
+ Misses       6699     6581     -118     
+ Partials     1387     1371      -16

aaronlehmann · 2017-06-22T01:33:17Z

I tried to simplify the Restore function. This is what I came up with:

                Restore: func(tx Tx, snapshot *api.StoreSnapshot) error {
                        clusters, err := FindClusters(tx, All)
                        if err != nil {
                                return err
                        }
                        updated := make(map[string]struct{})
                        for _, n := range snapshot.Clusters {
                                if err := UpdateCluster(tx, n); err == ErrNotExist {
                                        if err := CreateCluster(tx, n); err != nil {
                                                return err
                                        }
                                } else if err != nil {
                                        return err
                                } else {
                                        updated[n.ID] = struct{}{}
                                }
                        }
                      for _, n := range clusters {
                                if _, ok := updated[n.ID]; !ok {
                                        if err := DeleteCluster(tx, n.ID); err != nil {
                                                return err
                                        }
                                }
                        }
                        return nil
                },

I'm not sure it's really better, to be honest. What do you think?

Another idea I have is to have the type-specific Restore functions convert the slice to []api.StoreObject and call a type-independent helper function that uses tx methods:

func RestoreTable(tx Tx, table string, newObjects []api.StoreObject) error {
        checkType := func(by By) error {
                return nil
        }
        var oldObjects []api.StoreObject
        appendResult := func(o api.StoreObject) {
                oldObjects = append(oldObjects, o)
        }

        err := tx.find(table, All, checkType, appendResult)
        if err != nil {
                return nil
        }

        updated := make(map[string]struct{})

        for _, o := range newObjects {
                if existing := tx.lookup(table, indexID, o.GetID()); existing != nil {
                        if err := tx.update(table, o); err != nil {
                                return err
                        }
                        updated[o.GetID()] = struct{}{}
                } else {
                        if err := tx.create(table, o); err != nil {
                                return err
                        }
                }
        }
        for _, o := range oldObjects {
                if _, ok := updated[o.GetID()]; !ok {
                        if err := tx.delete(table, o.GetID()); err != nil {
                                return err
                        }
                }
        }
        return nil
}

I used my algorithm here as a proof of concept, but I'd be fine with using yours instead as long as it's only implemented in one place. Yours might be a bit more memory-efficient.

The downsides of this approach are that it needs an extra slice copy, and that it skips the validation in the type-specific Create* / Update* wrappers. But any objects inside a snapshot should already have passed validation. Ultimately I like this solution because it removes the bulk of the duplicated code.

cyli · 2017-06-22T21:01:27Z

@aaronlehmann I really like the idea of de-duplication - if we get more objects (such as generic resources) it will make the store code much easier to maintain. I like your algorithm better - I think it's actually the more memory efficient one, since it only creates one map, and a bit easier to read. Thanks!

aaronlehmann · 2017-06-22T23:04:36Z

manager/state/store/object.go

+	}
+	for _, o := range oldObjects {
+		if _, ok := updated[o.GetID()]; !ok {
+			if err := tx.delete(table, o.GetID()); err != nil {


Let's call GetID a single time for each loop iteration. I was sloppy when I wrote the PoC.

Happy to make this change, but out of curiosity, is the function call particularly expensive? All the objects just seem to return the ID field, and there doesn't seem to be much indirection?

No, it's not expensive.

Why is it better practice to only make the function call once? I don't particularly have an opinion either way, I was just wondering if it was a style reason or some other reason?

I think it makes the code a little easier to read and it has a very marginal performance impact because it avoids a redundant function call, but that's so insignificant I hesitate to bring it up.

I guess I don't care either. Just disagreeing with myself, reviewing my own code.

aaronlehmann · 2017-06-22T23:04:56Z

manager/state/store/object.go

+			if err := tx.update(table, o); err != nil {
+				return err
+			}
+			updated[o.GetID()] = struct{}{}


Let's call GetID a single time for each loop iteration. I was sloppy when I wrote the PoC.

aaronlehmann · 2017-06-22T23:06:16Z

Almost LGTM. extensions.go and resources.go also need these changes.

cyli · 2017-06-23T00:22:59Z

@aaronlehmann Question about adding this to extensions - currently extensions are immutable - if we restore, we'd emit an EventUpdateExtension. Technically I guess we aren't updating anything, we are just restoring from a snapshot, but I just wanted to check that this was ok?

…eady exist are updated, rather than everything being deleted and re-created. Signed-off-by: Ying Li <ying.li@docker.com>

aaronlehmann · 2017-06-23T01:32:33Z

LGTM

- moby/swarmkit#2266 (support for templating Node.Hostname in docker executor) - moby/swarmkit#2281 (change restore action on objects to be update, not delete/create) - moby/swarmkit#2285 (extend watch queue with timeout and size limit) - moby/swarmkit#2253 (version-aware failure tracking in the scheduler) - moby/swarmkit#2275 (update containerd and port executor to container client library) - moby/swarmkit#2292 (rename some generic resources) - moby/swarmkit#2300 (limit the size of the external CA response) - moby/swarmkit#2301 (delete global tasks when the node running them is deleted) Minor cleanups, dependency bumps, and vendoring: - moby/swarmkit#2271 - moby/swarmkit#2279 - moby/swarmkit#2283 - moby/swarmkit#2282 - moby/swarmkit#2274 - moby/swarmkit#2296 (dependency bump of etcd, go-winio) Signed-off-by: Ying Li <ying.li@docker.com> Upstream-commit: 4509a00 Component: engine

- moby/swarmkit#2281 - fixes an issue where some cluster updates could be missed if a manager receives a catch-up snapshot from another manager - moby/swarmkit#2300 - fixes a possible memory issue if an external CA sends an overlarge response Signed-off-by: Ying <ying.li@docker.com>

- moby/swarmkit#2266 (support for templating Node.Hostname in docker executor) - moby/swarmkit#2281 (change restore action on objects to be update, not delete/create) - moby/swarmkit#2285 (extend watch queue with timeout and size limit) - moby/swarmkit#2253 (version-aware failure tracking in the scheduler) - moby/swarmkit#2275 (update containerd and port executor to container client library) - moby/swarmkit#2292 (rename some generic resources) - moby/swarmkit#2300 (limit the size of the external CA response) - moby/swarmkit#2301 (delete global tasks when the node running them is deleted) Minor cleanups, dependency bumps, and vendoring: - moby/swarmkit#2271 - moby/swarmkit#2279 - moby/swarmkit#2283 - moby/swarmkit#2282 - moby/swarmkit#2274 - moby/swarmkit#2296 (dependency bump of etcd, go-winio) Signed-off-by: Ying Li <ying.li@docker.com> Upstream-commit: 4509a00 Component: engine

- moby/swarmkit#2281 - fixes an issue where some cluster updates could be missed if a manager receives a catch-up snapshot from another manager - moby/swarmkit#2300 - fixes a possible memory issue if an external CA sends an overlarge response Signed-off-by: Ying <ying.li@docker.com>

cyli force-pushed the ensure-cluster-updates-when-restoring-from-snapshot branch from 2fdfc46 to e866b7e Compare June 21, 2017 23:11

cyli force-pushed the ensure-cluster-updates-when-restoring-from-snapshot branch from e866b7e to a8e0adf Compare June 21, 2017 23:40

cyli force-pushed the ensure-cluster-updates-when-restoring-from-snapshot branch 2 times, most recently from 5488fbb to a2ce9dd Compare June 22, 2017 21:55

aaronlehmann reviewed Jun 22, 2017

View reviewed changes

Change the Restore action on store objects so that objects that alr…

cfff7d1

…eady exist are updated, rather than everything being deleted and re-created. Signed-off-by: Ying Li <ying.li@docker.com>

cyli force-pushed the ensure-cluster-updates-when-restoring-from-snapshot branch from a2ce9dd to cfff7d1 Compare June 23, 2017 00:38

aaronlehmann merged commit 7d0a128 into moby:master Jun 23, 2017

thaJeztah mentioned this pull request Jun 23, 2017

integration-cli: Replace sleeps with polling in swarm lock/unlock tests moby/moby#33541

Merged

cyli deleted the ensure-cluster-updates-when-restoring-from-snapshot branch June 23, 2017 07:08

cyli mentioned this pull request Jul 6, 2017

Flaky test: TestManagerLockUnlock #2182

Closed

aaronlehmann mentioned this pull request Jul 11, 2017

Flaky test: DockerSwarmSuite.TestSwarmLockUnlockCluster moby/moby#34051

Open

This was referenced Jul 11, 2017

Re-vendor swarmkit. moby/moby#34061

Merged

[17.06] backport (swarmkit) cluster update and memory issue fixes docker-archive/docker-ce#114

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[store] Change the `Restore` action on objects to update instead of delete/create #2281

[store] Change the `Restore` action on objects to update instead of delete/create #2281

cyli commented Jun 21, 2017

codecov bot commented Jun 21, 2017 •

edited

Loading

aaronlehmann commented Jun 22, 2017

cyli commented Jun 22, 2017

aaronlehmann Jun 22, 2017

cyli Jun 22, 2017

aaronlehmann Jun 22, 2017

cyli Jun 23, 2017

aaronlehmann Jun 23, 2017

aaronlehmann Jun 22, 2017

aaronlehmann commented Jun 22, 2017

cyli commented Jun 23, 2017

aaronlehmann commented Jun 23, 2017

[store] Change the Restore action on objects to update instead of delete/create #2281

[store] Change the Restore action on objects to update instead of delete/create #2281

Conversation

cyli commented Jun 21, 2017

codecov bot commented Jun 21, 2017 • edited Loading

Codecov Report

aaronlehmann commented Jun 22, 2017

cyli commented Jun 22, 2017

aaronlehmann Jun 22, 2017

Choose a reason for hiding this comment

cyli Jun 22, 2017

Choose a reason for hiding this comment

aaronlehmann Jun 22, 2017

Choose a reason for hiding this comment

cyli Jun 23, 2017

Choose a reason for hiding this comment

aaronlehmann Jun 23, 2017

Choose a reason for hiding this comment

aaronlehmann Jun 22, 2017

Choose a reason for hiding this comment

aaronlehmann commented Jun 22, 2017

cyli commented Jun 23, 2017

aaronlehmann commented Jun 23, 2017

[store] Change the `Restore` action on objects to update instead of delete/create #2281

[store] Change the `Restore` action on objects to update instead of delete/create #2281

codecov bot commented Jun 21, 2017 •

edited

Loading