Write a basic Dotmesh Operator that replicates our current DaemonSet setup #344

alaric-dotmesh · 2018-03-22T13:22:44Z

This is part of epic #385 .

With #343 done, we can write a simple Dotmesh Operator that runs Dotmesh on every node in the cluster.

This can become the canonical way of running DM in k8s once documented!

lukemarsden · 2018-03-29T07:58:34Z

Per #343 (comment), we don't have a StatefulSet template per se, but we do have a design for an operator which will set up PVCs, node labels etc appropriately!

lukemarsden · 2018-04-05T15:55:37Z

As part of this issue, we need a testing strategy.

We are going to develop simple dind-flexvolume and dind-dynamic-provisioner modules which can be used to simulate cloud provider volumes (insofar as they provide writeable filesystems which dotmesh can initialize zfs pools-in-files on) in the dind tests.

Then we can write dind tests for the dotmesh operator creating pods which consuming PVCs from a dind storageclass and providing dotmesh PVs.

Later, we'll be able to simulate killing a dotmesh node and having the PV failover and "reattach" to a different dind.

…olders

* 'master' of https://github.com/dotmesh-io/dotmesh: #344: a failing test which isn't run yet in CI, tee hee.|

lukemarsden · 2018-04-06T17:23:54Z

We now have dind flexvolume and dynamic provisioners, which work according to this test.

Next steps, as I see them:

write a test that cordoning a node results in the fake "block device" for a pod being reattached by Kubernetes to the pod on the new node

LOCAL

write a test that providing the dotmesh operator (which is partially implemented here), works just as well the current daemonset. make it pass with the following config by implementing enough of the algorithm.

storageMode: local
localMode:
  poolSizePerNode: 10G
  poolLocation: /var/lib/dotmesh

PV PER NODE

write a test that providing the dotmesh operator with the following config provisions PVs from the dind provisioner. make it pass by implementing more of the algorithm:

storageMode: pvPerNode
pvPerNodeMode:
  pvSizePerNode: 100G
  storageClass: fast

write a test which deletes a dind node and demonstrates that the dotmesh operator fails over the PV to another node. make it pass by implementing more of the algorithm. you'll need to support multiple zpools on a single node (by running two dotmesh storage instances on one node).

POOL OF DOTMESHES

implement NFS support... and then write a test for the following config where storage nodes are only a subset of the nodes in the cluster, demonstrating accessing a dot from a different node to where the storage is hosted:

storageMode: storageNodes
storageNodesMode:
  storageNodes: 3
  sizePerNode: 100G

lukemarsden · 2018-04-06T17:42:24Z

Note that some of the above plan spans different github issues!

lukemarsden · 2018-04-06T20:54:54Z

In particular "pool of dotmeshes" depends on #341, #345 and #346 and the unissued final bullet in #100

lukemarsden · 2018-04-09T08:57:36Z

see https://kubernetes.io/blog/2018/01/introducing-client-go-version-6 Updating dependencies – golang/dep

We now have a thing we can compile and run in a test cluster locally, which prints out messages when nodes come/go/change, and the start of a code structure that will run The Algorithm whenever something interesting happens.

…ve no pod bound to them! Downside: the pods crash and burn on startup. But I think that should be just a matter of tweaking the pod template.

… fix stuff, and improved the template

…rt new dotmesh pods while old ones are dying.

…on through at build time!

alaric-dotmesh · 2018-04-17T16:59:59Z

I'm moving the "extra work" out into a new epic which this is just the first part of. This is now part of epic #385, which is a sub-epic of #100!

…red for testing

…DM namespace already), make node labelling two-stage.

…iceAccount

…od deployment, so GC works correctly (and kubectl drain?)

… happy)

…xes CI

…works. `kubectl drain` fails if the node has pods controlled by operators on it. This makes it intermittent already because sometimes the etcd pod is on that node, and downright failsome with the dotmesh operator in play.

…p), and they're not referenced from the docs any more.

alaric-dotmesh · 2018-05-04T08:34:33Z

This is in production, so I'm calling it done.

* master: NFC: More logging dotscience#3 make subdot roots writeable by all, for containers which run as non-root FIX: Missed space :-( Testing stuff in CI is tedious. FIX: Missed the `-c` option to the `dm dot delete...` FIX: Typo... #17: Pull the right image, use a dedicated config, and test `dm dot delete` on the remote NFC: Test adding sleep to ensure replication. #17: Avoid echoing the API key, and run the smoke tests on Linux (it's easier for me to debug them there) #17: Made the smoke test push to a remote cluster (if credentials are passed into SMOKE_TEST_REMOTE and SMOKE_TEST_APIKEY). NFC: Fix logging on error messages #352: Attempt to reduce flakiness by checking replication status on both nodes in a cluster NFC: Comments concerning pod health checking NFC: Re-enable flaky test for debugging #344: We no longer need the GKE yamls (that's handled in the ConfigMap), and they're not referenced from the docs any more. NFC: fix typo sneaked into yaml NFC: Comment out test until we can work out how to fix it

alaric-dotmesh added the task label Mar 22, 2018

alaric-dotmesh mentioned this issue Mar 22, 2018

Extend the Dotmesh Operator to support NFS #347

Open

This was referenced Mar 22, 2018

Support consuming cloud PVs, multi-host subdots and failover on cloud volumes (aka CVv2) #100

Closed

Develop a StatefulSet template to run Dotmesh for a given cluster size #343

Closed

lukemarsden self-assigned this Mar 29, 2018

lukemarsden added a commit that referenced this issue Mar 31, 2018

#344: example and dep configuration for a Kubernetes operator

f8e4007

lukemarsden added a commit that referenced this issue Mar 31, 2018

#344: tidy up line widths

cec5dcf

lukemarsden added a commit that referenced this issue Mar 31, 2018

#344: what to do next

8487911

lukemarsden added a commit that referenced this issue Mar 31, 2018

#344: tweak

a7fdd52

lukemarsden added a commit that referenced this issue Apr 5, 2018

#344: a failing test which isn't run yet in CI, tee hee.|

cf35cae

binocarlos added a commit that referenced this issue Apr 5, 2018

#344: template dynamic provisioner and flexvolume for dind PV using f…

ef2a78d

…olders

binocarlos added a commit that referenced this issue Apr 5, 2018

#344: move folders up a level

2ed444a

binocarlos added a commit that referenced this issue Apr 5, 2018

#344: implementation for dind dynamic provisioner and dind flexvolume

a733b11

binocarlos added a commit that referenced this issue Apr 5, 2018

Merge branch 'master' of https://github.com/dotmesh-io/dotmesh

5207af5

* 'master' of https://github.com/dotmesh-io/dotmesh: #344: a failing test which isn't run yet in CI, tee hee.|

lukemarsden added a commit that referenced this issue Apr 6, 2018

#344: WIP on dind-provisioner.go

702c56d

lukemarsden added a commit that referenced this issue Apr 6, 2018

#344: bump citools

c7bf7e0

lukemarsden added a commit that referenced this issue Apr 6, 2018

#344: get dind fv + dp tests working and running in CI

c43ee4f

prisamuel self-assigned this Apr 9, 2018

prisamuel pushed a commit that referenced this issue Apr 10, 2018

WIP: #344 Extend test to cordon off a node

2f5df80

alaric-dotmesh added a commit that referenced this issue Apr 11, 2018

#344: Re-enable cordoning test

285406c

alaric-dotmesh added a commit that referenced this issue Apr 11, 2018

#344: Enable CI for k8s tooling tests

a01da1c

alaric-dotmesh added a commit that referenced this issue Apr 12, 2018

#344 WIP: Operator pod template is in development...

cc408d9

alaric-dotmesh added a commit that referenced this issue Apr 12, 2018

#344: Restrict pod listing to only app=dotmesh pods

5688038

prisamuel pushed a commit that referenced this issue Apr 12, 2018

#344 Added pod spec template

596903b

alaric-dotmesh added a commit that referenced this issue Apr 12, 2018

#344: Pod spec template is now compiling!

93ad3f1

alaric-dotmesh added a commit that referenced this issue Apr 13, 2018

#344: Tidied up the labelling, cleared the updatesNeeded flag when we…

0195cc3

… fix stuff, and improved the template

alaric-dotmesh added a commit that referenced this issue Apr 13, 2018

#344: Dotmesh pod cleanup, much better logging.

41c8693

alaric-dotmesh added a commit that referenced this issue Apr 16, 2018

#344: Rate-limiting restarting when an upgrade happens, and don't sta…

103d496

…rt new dotmesh pods while old ones are dying.

alaric-dotmesh added a commit that referenced this issue Apr 16, 2018

#344: Dockerising the operator, so it can be run inside Kubernetes.

5678d73

alaric-dotmesh added a commit that referenced this issue Apr 17, 2018

#344: Configuration via ConfigMap and passing the correct image versi…

9578b18

…on through at build time!

alaric-dotmesh mentioned this issue Apr 17, 2018

Dotmesh Operator #385

Closed

8 tasks

alaric-dotmesh changed the title ~~Write a basic Dotmesh Operator to instantiate our StatefulSet template~~ Write a basic Dotmesh Operator that replicates our current DaemonSet setup Apr 17, 2018

alaric-dotmesh added a commit that referenced this issue Apr 17, 2018

#344: Make the dotmesh YAML fire up the Operator rather than a DaemonSet

78d271e

alaric-dotmesh added a commit that referenced this issue Apr 20, 2018

#344: Typo fix, plus additional configurables for the operator, requi…

75a50b4

…red for testing

alaric-dotmesh added a commit that referenced this issue Apr 23, 2018

#344: Change pod name to "server" rather than "dotmesh" (it's in the …

a31f940

…DM namespace already), make node labelling two-stage.

alaric-dotmesh added a commit that referenced this issue Apr 23, 2018

#344: Actually push the image to the registry, and use the right Serv…

2bebe65

…iceAccount

alaric-dotmesh added a commit that referenced this issue Apr 23, 2018

#344: Correct Service selector, and create an OwnerReference to the p…

6e61257

…od deployment, so GC works correctly (and kubectl drain?)

alaric-dotmesh added a commit that referenced this issue Apr 30, 2018

#344: Sneakily vendor uncommitted citools work (HACK, dep will not be…

d0decee

… happy)

alaric-dotmesh added a commit that referenced this issue Apr 30, 2018

#344 HACK: Disable image pre-pulling as an experiment to see if it fi…

6555982

…xes CI

alaric-dotmesh added a commit that referenced this issue Apr 30, 2018

#344 FIX: I did the last commit wrong...

18b7f2f

alaric-dotmesh added a commit that referenced this issue Apr 30, 2018

#344: Don't try to start dotmesh on unschedulable nodes

1372bd1

alaric-dotmesh added a commit that referenced this issue May 2, 2018

#344: ConfigMap in YAML, tailored for GKE/AKS

fa18beb

alaric-dotmesh added a commit that referenced this issue May 3, 2018

#344: We no longer need the GKE yamls (that's handled in the ConfigMa…

8af6e53

…p), and they're not referenced from the docs any more.

alaric-dotmesh closed this as completed May 4, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Write a basic Dotmesh Operator that replicates our current DaemonSet setup #344

Write a basic Dotmesh Operator that replicates our current DaemonSet setup #344

alaric-dotmesh commented Mar 22, 2018 •

edited

lukemarsden commented Mar 29, 2018 •

edited

lukemarsden commented Apr 5, 2018 •

edited

lukemarsden commented Apr 6, 2018 •

edited by prisamuel

lukemarsden commented Apr 6, 2018

lukemarsden commented Apr 6, 2018 •

edited

lukemarsden commented Apr 9, 2018

alaric-dotmesh commented Apr 17, 2018

alaric-dotmesh commented May 4, 2018

Write a basic Dotmesh Operator that replicates our current DaemonSet setup #344

Write a basic Dotmesh Operator that replicates our current DaemonSet setup #344

Comments

alaric-dotmesh commented Mar 22, 2018 • edited

lukemarsden commented Mar 29, 2018 • edited

lukemarsden commented Apr 5, 2018 • edited

lukemarsden commented Apr 6, 2018 • edited by prisamuel

lukemarsden commented Apr 6, 2018

lukemarsden commented Apr 6, 2018 • edited

lukemarsden commented Apr 9, 2018

alaric-dotmesh commented Apr 17, 2018

alaric-dotmesh commented May 4, 2018

alaric-dotmesh commented Mar 22, 2018 •

edited

lukemarsden commented Mar 29, 2018 •

edited

lukemarsden commented Apr 5, 2018 •

edited

lukemarsden commented Apr 6, 2018 •

edited by prisamuel

lukemarsden commented Apr 6, 2018 •

edited