Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

k8s 1.8 flexvolume driver is not recognized #263

Closed
binocarlos opened this issue Feb 6, 2018 · 10 comments
Closed

k8s 1.8 flexvolume driver is not recognized #263

binocarlos opened this issue Feb 6, 2018 · 10 comments
Assignees
Labels

Comments

@binocarlos
Copy link
Contributor

When running a k8s v1.8.7-gke.1 cluster on GKE - with all of the dotmesh pods running:

$ kubectl get po -n dotmesh
NAME                                           READY     STATUS    RESTARTS   AGE
dotmesh-5z7t7                                  1/1       Running   0          4h
dotmesh-dynamic-provisioner-5b8548c84b-hw2f8   1/1       Running   0          1h
dotmesh-etcd-cluster-0000                      1/1       Running   0          4h
dotmesh-etcd-cluster-0001                      1/1       Running   0          4h
dotmesh-etcd-cluster-0002                      1/1       Running   0          4h
etcd-operator-56b49b7ffd-n42tf                 1/1       Running   0          4h

And the flexvolume binary present in /usr/libexec/kubernetes/kubelet-plugins/volume/exec/dotmesh.io~dm

and having done sudo systemctl restart kubelet on the host lead to this error in the kubelet:

Feb 06 17:52:18 gke-dotmesh-gke-cluster-default-pool-bfb89fc5-d1bn kubelet[20547]: E0206 17:52:18.236820   20547 desired_state_of_world_populator.go:288] Failed to add volume "test-storage" (specName: "pvc-db880397-0b65-11e8-9e61-42010a9a0fd9") for pod "dbca7409-0b65-11e8-9e61-42010a9a0fd9" to desiredStateOfWorld. err=failed to get Plugin from volumeSpec for volume "pvc-db880397-0b65-11e8-9e61-42010a9a0fd9" err=no volume plugin matched

Reading this part of the release notes - it seems that the plugin discovery mechanism has changed. Update the flexvolume driver to work with k8s >= 1.8

@lukemarsden
Copy link
Collaborator

lukemarsden commented Feb 7, 2018

I don't think our flex plugin installer is broken, we already do the atomic rename from a file starting with . trick in http://gitlab.dotmesh.io:9999/dotmesh/dotmesh/blob/master/cmd/dotmesh-server/pkg/main/kubernetes.go#L25

@lukemarsden
Copy link
Collaborator

lukemarsden commented Feb 7, 2018

so there must be some other reason why you got that error...

@lukemarsden
Copy link
Collaborator

lukemarsden commented Feb 7, 2018

when this is fixed and we start working with Kubernetes 1.8, hopefully we also start working with 1.9 (which is what ships with latest Docker for Mac)

@alaric-dotmesh
Copy link
Contributor

We do almost exactly what the docs @binocarlos refers to says:

https://github.com/kubernetes/community/blob/master/contributors/design-proposals/storage/flexvolume-deployment.md#recommended-driver-deployment-method

I shall attempt to work out what's wrong.

@alaric-dotmesh
Copy link
Contributor

Ok, I've tried this on k8s 1.9 and everything works flawlessly (with no kubelet restart!) in the "hello dotmesh on k8s" tutorial.

On k8s 1.8 (1.8.8 to be precise) I'm getting a problem, but it's not with the flexvolume driver being picked up. The driver is being invoked correctly (with no kubelet restart), but logging this into /var/log/dotmesh-flexvolume.log:

22239: 2018/02/21 10:34:19.840566 RPC FAIL: {Namespace:admin Name:moby_counter2@newbranch Subdot:__default__} -> DotmeshRPC.Procure -> A volume called admin/moby_counter2 already exists with id b50d97a7-78ff-4fae-7ed1-8305e42d452a
22239: 2018/02/21 10:34:19.840789 MOUNT: Procure of admin/moby_counter2@newbranch.__default__ failed: Couldn't decode response '{"jsonrpc":"2.0","error":{"code":-32000,"message":"A volume called admin/moby_counter2 already exists with id b50d97a7-78ff-4fae-7ed1-8305e42d452a","data":null},"id":5577006791947779410}
': A volume called admin/moby_counter2 already exists with id b50d97a7-78ff-4fae-7ed1-8305e42d452a

As you might guess from it being called moby_counter2 in there, it did this originally with moby_counter and I tried again with a new name in case it was just broken state left from an earlier experiment, but it instantly starts failing.

So: FV driver is working fine, but DM is giving errors back from the Procure call on k8s 1.8. Investigating...

@alaric-dotmesh
Copy link
Contributor

False alarm, that was because I'd left my redis-pvc.yaml pinned to a branch, and was getting:

Error procuring filesystem: Cannot use branch-pinning syntax (docker run -v volume@branch:/path) to create a non-existent volume with a non-master branch, pausing and trying again...

...but that was in docker logs dotmesh-server-inner, it wasn't getting passed back up to the FV driver!

#300 raised to improve flexvolume error reporting.

alaric-dotmesh added a commit that referenced this issue Feb 21, 2018
Gett the etcd-browser up and running in an arbitrary k8s cluster might not
be worth the hassle, so this works at a pinch, with a slightly worse UI:

```bash
APIKEY=...
NODE=...
curl --user admin:$APIKEY  -H 'Content-Type: application/json' http://$NODE:6969/rpc --data-binary "{\"jsonrpc\":\"2.0\",\"method\":\"DotmeshRPC.DumpEtcd\",\"params\":{\"Prefix\":\"\"},\"id\":6129484611666146000}"
```

You can change the prefix in the params to narrow down to a specific subtree, too.
@alaric-dotmesh
Copy link
Contributor

After a bit of fiddling with stuff purely local to my setup, it works fine for kubernetes 1.8 with the stock dotmesh-k8s-1.8.yaml YAML!

So: This may be a GKE-specific problem. Now investigating a GKE cluster...

@alaric-dotmesh
Copy link
Contributor

Well, one problem is that GKE uses a different place for FV plugins:

https://github.com/rancher/longhorn#troubleshooting

  • they go in /home/kubernetes/flexvolume, but I'm struggling to get it working nonetheless.

@alaric-dotmesh
Copy link
Contributor

Ok, it needs the socket moved as well, and I can't hack that with mv as it's on a different filesystem, so I need to rebuild the server to install it differently.

alaric-dotmesh added a commit that referenced this issue Feb 22, 2018
Also, rejigged YAML generation to run from a single master YAML and generate
all the others via sed, so I can make a GKE-specific YAML that sets
the flexvolume driver path.
@alaric-dotmesh
Copy link
Contributor

Ok, fixed it! We now have a YAML parameter to choose where the FV plugins go, and separate .gke.yaml files packaged that have it set for GKE.

$ kubectl apply -f https://get.dotmesh.io/unstable/263-gke-flexvolume/yaml/dotmesh-k8s-1.8.gke.yaml
serviceaccount "dotmesh" created
clusterrole "dotmesh" created
clusterrolebinding "dotmesh" created
service "dotmesh" created
daemonset "dotmesh" created
serviceaccount "dotmesh-provisioner" created
clusterrole "dotmesh-provisioner-runner" created
clusterrolebinding "dotmesh-provisioner" created
deployment "dotmesh-dynamic-provisioner" created
storageclass "dotmesh" created
$ kubectl get po -n dotmesh
NAME                                          READY     STATUS    RESTARTS   AGE
dotmesh-7pl9w                                 1/1       Running   1          1m
dotmesh-dynamic-provisioner-f796dd6df-88m7k   1/1       Running   0          1m
dotmesh-g9j22                                 1/1       Running   1          1m
dotmesh-lx687                                 1/1       Running   1          1m
etcd-operator-56b49b7ffd-rltzp                1/1       Running   0          1d
$ kubectl apply -f . 
deployment "redis" created
persistentvolumeclaim "redis-pvc" created
service "redis" created
deployment "web" created
service "web" created
$ kubectl get po
NAME                     READY     STATUS    RESTARTS   AGE
redis-7d85fc658d-krl8b   1/1       Running   0          15s
web-57c7d94cb9-b8jf6     1/1       Running   0          15s

Tada!

binocarlos added a commit that referenced this issue Feb 23, 2018
* 'master' of https://github.com/dotmesh-io/dotmesh:
  263: Google still used the default FV dir up to 1.8
  263: Made a typo! That's why it wasn't working!
  263: Log flexvolume installation
  263: Make 1.7 versions of YAML, deprecate old `dotmesh.yaml`
  #263: Ability to pass the flexvolume driver path in via k8s yaml.
  NFC: Print out the actual URL base in the unstable build job, so it's easier to find it.
  263: Forgot the `dotmesh.io~dm` in the pathname!
  263: Temporarily kill kubernetes tests, as they won't pass in this GKE-only configuration.
  263: Use /home/kubernetes/flexvolume as the FV plugin dir, for GKE.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants