Skip to content
This repository has been archived by the owner on Jan 23, 2020. It is now read-only.

[Discussion] Swarm rebuild and best way to retain data #52

Open
djeeg opened this issue Mar 3, 2018 · 0 comments
Open

[Discussion] Swarm rebuild and best way to retain data #52

djeeg opened this issue Mar 3, 2018 · 0 comments

Comments

@djeeg
Copy link

djeeg commented Mar 3, 2018

Hi,

Over last 6 months I have encountered more than a few situations where the current most stable solution seems to be "rebuild swarm"

eg

  • Loss of SSH after scale set deallocation/reallocation
  • Loss of sudo after scale set reboot
  • Unable to upgrade to latest STABLE version 17.12.0 (with plenty of network fixes) as upgrade container delayed till 17.12.1
  • Loss of swarm when restarting dockerd (due to ghost containers)
  • Switching from EDGE to STABLE channel
  • Changing VM tshirt size

I dont mind rebuilding the swarm, as it means I can review/refactor/clean-up my configuration

Mostly all of my configuration is scripted/documented, so its not too much effort.

  1. Create a new swarm from the template
  2. Assign some node metadata
  3. Create networks/volumes/secrets
  4. Deploy the stacks
  5. Update DNS to the new swarm public IP

The sticking point with a rebuild, would be relinking the data from first swarm to the second swarm.
I could not see guidance on how best to configure Azure to handle a swarm rebuild (rather than a swarm upgrade)

My naive setup of the swarm was:

Swarm v17.09

  • node1
  • nodeN
  • (default)cloudstor:azure -> docker4x/cloudstor -> azure storage account -> RANDOMSTRING123

Which used the defaults provided by the template, where the resources all live in the same resource group

When I rebuild, I would need to preserve the data contained within the storage account RANDOMSTRING123

Azure Storage Explorer

My first thought would be to create the new swarm and copy the data using Azure Storage Explorer
Transfers should be free within the same region
Storage requirements would be doubled for a short time
This may only work while data size is small.

Override cloudstor:azure

My second thought would be to create the new swarm and override the default cloudstor:azure plugin with my own.
Using https://docs.docker.com/docker-for-azure/persistent-data-volumes/#use-a-different-storage-endpoint
(I have used a separate cloudstor:azure instance/storage for backups and that seems to work okay for short lived commands)
Not sure if overriding the default plugin instance is possible/stable/recommended.
There are a few issues on the forums where users are unable to re/create the plugin (error message is similar to "offer expired")
I am hesitant about overriding anything "default"
eg What happens if the default plugin instance needs to be changed/reset/locked-down as part of a future upgrade.

Separate cloudstor:azure

My third thought would be to store the swarm data in a separate 'named/aliased' cloudstor:azure instance
Either in the same or possibly a completely separate resource group
A separate resource group feels better from an isolation perspective, as that would allow me to completely purge the swarm resource group without data loss, no matter what future deployment restrictions are made on the docker swarm template/resource group.
As long as the custom cloudstor:azure plugin instance could always reach into another storage account.
Considering how quickly the platform changes, this third option seems the best.

I would then configure the swarm like this:

Swarm v17.12

  • node1
  • nodeN
  • (default)cloudstor:azure -> docker4x/cloudstor -> azure storage account -> RANDOMSTRINGABC
  • cloudstor:azuresafe -> docker4x/cloudstor -> azure storage account -> SITENAMEDOCKERDATA
  • cloudstor:azurebackup -> docker4x/cloudstor -> azure storage account -> SITENAMEDOCKERBACKUP

However, I would not be allocating any volumes on the (default)cloudstor:azure plugin instance

Upgrading all my stack templates to use the new plugin instance should be straight forward.

volumes:
  volsomename:
    name: 'somename'
    driver: cloudstor:safeazure

Some questions would be:

  1. Are there any online resources recommending the best approach?
  2. Are there downsides to the third approach?
  3. Would this make upgrading harder in the future?
  4. With the upcoming changes for virtual machine scale sets and attached storage, would a seperate resource group be better or worse?
  5. Would there be a extra performance hit for using a storage account in a seperate resource group?
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant