Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consolidate and improve Nomad-related documentation #168

Closed
scaleoutsean opened this issue Mar 27, 2022 · 10 comments
Closed

Consolidate and improve Nomad-related documentation #168

scaleoutsean opened this issue Mar 27, 2022 · 10 comments

Comments

@scaleoutsean
Copy link

It looks like Nomad-related information is almost correct, so I propose to improve some details after Nomad v1.3.0 comes out (by then things should largely settle).

General points:

Nomad info in the Wiki and repo (examples) are inconsistent. It may be better to have content info in just one location (repo) to save people time from trying examples that can't work.

There are also some inaccuracies in both the Wiki and repo examples for Nomad, but looking at the progress being made in other Nomad-related issues, it's likely there will be known good configs that can be used to update repo examples.

Some specific issues with the repo examples:

This syntax is confusing. If config is the exact string that should be there, then that's fine, but it looks like # config... (i.e. insert your config) is what we're supposed to add in there.

Here it looks like file system access is suggested instead of block device (Nomad docs).

For D-CSI users who want to use node-manual (such as myself), repo's main README.md is short on details for K8s and Nomad examples don't show how K8s PVs translate into Nomad-style configuration and volume claims. As an example, I got to the point where controller & node are running (node-manual mode) and an iSCSI volume is visible to the Nomad client, but even though I "volume register"-ed this existing volume in Nomad as dcsi-267 it's not clear to me how Nomad or Democratic-CSI can link that volume to an iSCSI Target visible to the client/server (I use singleton Nomad cluster) with node-manual setup. I figure I need this iSCSI target info inserted to either controller or node (Nomad) job file, but I'm not sure where and how.

I may submit a PR with some examples if I figure this out, but I'm new to this CSI driver.

@travisghansen
Copy link
Member

Any help with Nomad docs is extremely appreciated! There are definitely some items that need improvement.

@scaleoutsean
Copy link
Author

Unfortunately, volume-manual and local-hostpath are two least documented modes, so it's a steep learning curve for now. local-hostpath seems easier as device discovery, mount and format can be done manually, but there are no examples.

node-manual is basically empty and that makes sense considering it doesn't do much. But node-common.yaml has something in it, presumably because mount/format may apply to both NFS and iSCSI devices. Should the latter be concatenated to the former to form a single config file?

And do this "final" config file need to be identical on both node and controller?

Next, for volume-manual (iSCSI), what goes into this config part?

      template {
        destination = "${NOMAD_TASK_DIR}/driver-config-file.yaml"

        data = <<EOH
config
EOH
      }

I tried to infer that from other examples, and it seems at minimum we would also need an iscsi section with portal IP so that Node host knows where to discover devices?

iscsi:
  targetPortal: "server[:port]"

And maybe also CHAP credentials, but here it gets hard because related config entries differ between each back-end (FreeNAS, ZFS, Synology), so I temporarily opened access to all volumes hoping to skip that.

In this node-volume scenario, how does a Nomad client (Node container) know which iSCSI disk is which?

In my example Nomad client can access the volume I created, I can also use it (mount, write). But it's visible as /dev/sde - there's no way for Nomad node to know which of several volumes is dcsi (my Nomad volume name on storage and in Nomad), so I imagine that some config entries (such as baseiqn + NomadVolName) would be used to form a complete IQN Target Name.
I saw this (below) in node-manual-iscsi-pv.yaml, but I don't know if this should be added to Nomad Controller or Node (or both) job definition what would be the correct generic (iscsiadm) syntax required?

  iscsi:
    baseiqn: ""
    targetPortals: [""]
    interface: eth1
    shareStrategyTargetCli:
      basename: ""

Or is baseiqn perhaps one per back-end, while basename calls out each iSCSI device exposed to clients using this back-end? In that case I would have to name Nomad disks the same as they're named on storage, so that IQN becomes valid, but I'm not sure if that's how ID-to-IQN mapping works with node-manual.

I spent some time trying several different combinations, but node-manual is hard to troubleshoot because in Nomad all devices are always shown "healthy".

@travisghansen
Copy link
Member

Wow you're digging in deep! Great stuff.

I'm not even sure if node-manual is relevant for Nomad. It essentially does no controller operations (you manually create nfs shares, iscsi volumes, etc, etc) and then just feed all the relevant data into the container orchestration system to attach. In the case of k8s this means manually creating a PV with all the gory details of what normally gets added/managed by the controller operations. Not being familiar enough with Nomad I'm unclear if any of the concepts map close enough that respective assets can be created of this nature.

There are 3 drivers (currently, local-hostpath, zfs-local-dataset and zfs-local-zvol) which are unique in deployment pattern (figure 2 and 3 here: https://github.com/container-storage-interface/spec/blob/master/spec.md). Again in this case, I'm unclear if Nomad has everything this is required to support such a use-case. Essentially these drivers are creating storage local to the node on which they are provisioned and therefore controller operations and node operations must be invoked on the same node for the same volume, from there on any workloads which need to use the volume must be scheduled on that same node as well. Even in k8s this type of driver deployment isn't fully fleshed out: kubernetes-csi/external-resizer#195 It's a relatively obscure use-case but important. These drivers are incredibly helpful in single-node clusters (eliminating the need for centralized storage of any kind, while reaping the benefits of auto-provisioned assets), or in scenarios where performance requirements are high (think a DB cluster, etc). The request for these is initially documented here: #148

zfs-local-ephemeral-inline is exclusively k8s (it's a non-conforming csi driver) and therefore not relevant to Nomad either.

The node-common.yaml file is not tied to a specific driver but rather has details that impact the behavior of the node services for all drivers (node service is what handles mounts, formatting, etc, etc). So that file has common options that technically may apply to any of the driver configs (although in some cases they may be practically useless, for example formatting options for an nfs-based driver would never come into play).

Perhaps @tgross can comment on the ability (or perhaps future ability) of the above concerns.

@scaleoutsean
Copy link
Author

Very interesting, thank you for the patience and information!

Not being familiar enough with Nomad I'm unclear if any of the concepts map close enough that respective assets can be created of this nature.

Glad that I asked (after I realized I wasn't getting any closer)!

Again in this case, I'm unclear if Nomad has everything this is required to support such a use-case.

This was what I had hoped would work:

  • if Target IQNs look like iqn.2010-01.com.vendor:array-id.uniqVolName1
  • then given configuration like this:
  iscsi:
    baseiqn: "iqn.2010-01.com.vendor:array-id."
    shareStrategyTargetCli:
      - basename: "uniqVolName1"
      - basename: "uniqVolName2"
  • a discovery scan would find iqn.2010-01.com.vendor:array-id.uniqVolName1 and iqn.2010-01.com.vendor:array-id.uniqVolName2 and let us use uniqVolName1 and uniqVolName2 as volume IDs in Nomad
  • Node controller just needs to scan, mount and format the volume. The advantage over host volume with local (host path) volumes on single node would be (based on local hostpath explanation from README.md that a job (container) could be drained on one iSCSI client and rescheduled to another which non-CSI local hostpath couldn't do with block devices formatted with a single-host filesystem.

For iSCSI storage I think that would be valuable compared to non-CSI hostpath if different Nodes could rescan and provision the same named volume on another Nomad client. If node-common can't work but there's a chance of configuring Node to provide CSI-style local hostpath, that'd still be interesting.

Meanwhile I'll also look at other modes and check the spec document to gain a better understanding of how Democratic CSI works.

@travisghansen
Copy link
Member

I think I understand where you're coming from. node-manual is not likely to be the answer you're looking for. If (csi) volumes can be manually created with Nomad then node-manual may be helpful. The approach would likely be slightly different than what you've described though...instead an independent process (neither Nomad nor democratic-csi) could be run which would scan your storage system (using whatever criteria/wildcards/etc that make sense for you) and then create the volumes inside Nomad with csi data created which follows these datapoints exactly in name: https://github.com/democratic-csi/democratic-csi/blob/master/examples/node-manual-iscsi-pv.yaml#L34-L40

I'm not sure if such a thing is even possible with Nomad however. If so, currently it would likely be of limited use as the iscsi node process has the following assumptions:

  • the target has a single lun
  • democratic-csi will entirely manage the entry for the target in the node iscsi DB

Of particular issue would be, upon detach democratic-csi logs out of the target entirely. If you run other luns on the nodes using the same target this could be extremely detrimental as democratic-csi is effectively removing access to all the other luns on the node as well. I've considered adding a parameter to control this behavior but as yet it has not been needed and so I haven't bothered.

Your understanding of the pros/cons of local storage seem correct.

Thanks for the help sorting this out, it's really appreciated and I'm excited to see better documentation around Nomad.

@scaleoutsean
Copy link
Author

scaleoutsean commented Mar 28, 2022

Got it!
I didn't know that per-target logout (iscsiadm --mode node --targetname iqn.2010-01.com.vendor:array-id.uniqVolName2 -u) isn't available. All right, we're making progress... Back to drawing board...

Then create the volumes inside Nomad with csi data created which follows these datapoints exactly in name: https://github.com/democratic-csi/democratic-csi/blob/master/examples/node-manual-iscsi-pv.yaml#L34-L40

Yes, that's exactly desired end state once a PV is "registered" in Nomad. In Nomad we can only specify volume-name in volume stanza, but volume register can't pass these details to Nomad like below.

id    = "dcsi.267"
name  = "dcsi.267"
type  = "csi"
plugin_id    = "org.democratic-csi.node-manual"
capacity_min = "4GiB"
capacity_max = "5GiB"
portal = "192.168.1.4:3260"
iqn    = "iqn.2010-01.com.vendor:array-id.uniqVolName2" 
lun    = 0
node_attach_driver = "iscsi"

That's why thought if this LUN info was in Node configuration, then I'd just need id and name in nomad volume register and everything would come together. But maybe there's a way to do this some other way.

@travisghansen
Copy link
Member

Per-target is available, per-lun is not. So if you have luns on the target that are otherwise used it's problematic.

@scaleoutsean
Copy link
Author

In the case I use local-hostpath (as per this example), I could update all the Node jobs to and do a rolling restart (rescan and resize, if necessary, could be done externally with limited downtime).

Does /var/lib/csi-local-hostpath in that example go to only Node or both Node and Controller config? And is the assumption one volume per Node and the same volume on all Nodes in cluster? In that case we'd need multiple controllers and multiple node containers for each volume - not convenient, but better than nothing.

@travisghansen
Copy link
Member

For that style of deployment it's actually a single democratic-csi container per node running both the controller and node services (figure 3 from the spec document). My guess is the csi stanza in such a scenario would also need to be altered to indicate it's providing both node and controller services (if this is even supported). To do this, simply deploy the container with args:

--csi-mode=node --csi-mode=controller ...

When deployed as such, it's a single config file as it's a single process (per-node). The fact that directory is in that config file twice is just a nuance of the abstract code I used to implement that driver :( The values should be the same as indicated in the example, but they can be any path you'd like on the node.

@scaleoutsean
Copy link
Author

They support all-in-one, I'll give it a try.

Monolith Plugins are plugins that perform both the controller and node roles in the same instance. Not every plugin provider has or needs a controller; that's specific to the provider implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants