Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: first iteration of a unique daemon image #78

Merged
merged 1 commit into from
Jun 15, 2015
Merged

Conversation

leseb
Copy link
Member

@leseb leseb commented Jun 10, 2015

The idea is pretty straighforward we simply pass a new env var and boot
a monitor like this:

sudo docker run -d --net=host -v /etc/ceph:/etc/ceph -e CEPH_DAEMON=MON -e MON_IP=192.168.0.20 -e CEPH_NETWORK=192.168.0.0/24 ceph/daemon

So far, I've been able to successfully bootstrap MON, MDS and RGW.
I had to fix some MDS issue. Because we use set -e we can not really
trap a command return code in a variable, the script will exist before
that since the command potentially returns something different than 0.

For the OSD, it's probably me...

I couldn't find a better name than "daemon" for now.
Let's first discuss the implementation.
I also added some meaningful log messages for the OSD.

Signed-off-by: Sébastien Han seb@redhat.com

@leseb
Copy link
Member Author

leseb commented Jun 10, 2015

This probably needs a bit more documentation...

@Ulexus
Copy link
Contributor

Ulexus commented Jun 10, 2015

Thanks @leseb . "daemon" works for me.

A few ideas:

  • CEPH_DAEMON should accept a space-delimited list of daemons to run. Yes, recommended practice is to not combine them, but (among other things), this would make your demo package easier.
  • For the two machine-bound services (mon and osd), I wonder if we could build on the OSD's autodetection concept by checking to see if directories for mon and osd services exist in standard locations. In this way, we could have the daemons start automatically based on directory structure, allowing a fully generic host service definition (i.e., "start Ceph on this server").

The bootstrapping, to me, offers an interesting quandary. If we are not instructed to start a mon, but we are not bootstrapped (no ceph config or keys), should the container simply die? Should it attempt to create a mon and bootstrap? Perhaps if there is no daemon specified, it could bootstrap a mon?

@leseb
Copy link
Member Author

leseb commented Jun 10, 2015

I was initially thinking of allowing multiple daemons running into a single container. Later I remembered that it is not what we recommend. The implementation might be tricky as well lead to undesired behaviours.
Moreover I don't think this will help the demo container as the entrypoint is really unique. The way services are configured is intended for a demo nothing else. (low pg/pgp, hardcode pool name etc...)
I don't really want to mix what we consider for production usage and what we recommend as a sandbox.
This is why I'd like to keep both 'daemon' and 'demo' separated. This will probably avoid confusions too.

So yes, in the end I'm more incline to force users to run micro service containers instead of running multiple daemons. Given that containers are really lightweight I don't any reason why someone would want to run more than one daemons inside a container.

There is definitely room for improvement for the OSD part as I believe the current state is too complex and not user-friendly at all. We should think of a new design I guess.

Regarding the bootstrapping, we could let daemons die but a check to verify that a communication can be established with a monitor is probably better. This will avoid unpleasant debugging. I can work on something.
Finally, should we bootstrap a mon if non exist, I'd say why not. Generally I'd rather return an error if the user didn't follow the proper steps. So the error provides some guidance on how to do thing properly. Doing thing under the hood by workarounding users mistake is not really a good idea. If it fails the user will learn why and will set it up properly.

@Ulexus
Copy link
Contributor

Ulexus commented Jun 10, 2015

Doing things under the hood by workarounding users mistake is not really a good idea

That's a good way of thinking about it. Yes, I would have to agree.

You're also correct about the multiple daemons (aside from the OSD and workarounds required there...but that doesn't have any bearing on the daemon selection).

Another thing to consider with the single-container thing: we don't have to worry about backward compatibility, so we should probably reorganize the osd entrypoint, since it's rather messy. Maybe the mon, too. I've been thinking of the various ways to integrate these with etcd/confd (in a flexible and not-mandatory way).

I'll definitely say that I am presently caught up with the concept of directory-described execution... and I think we could definitely provide means of pulling down configs and keys (from etcd, consult, S3, URL, etc.).

@Ulexus
Copy link
Contributor

Ulexus commented Jun 10, 2015

(perhaps?) Configuration and key extraction procedure:

  1. Check for local <file> in /etc/ceph/
  2. Check for CONFIG_METHOD (defaults to none)
  3. Attempt to pull <file> via CONFIG_METHOD handler
  4. (For <file> other than ceph.conf and ceph.client.admin.keyring) Call ceph auth get-or-create
  5. Otherwise fail or bootstrap, as appropriate

@Ulexus
Copy link
Contributor

Ulexus commented Jun 10, 2015

I think the current OSD detection routine is actually fairly good; it should just be simplified and better documented. (referring to the directory detection).

Bootstrapping OSDs, though, is a bit painful, at the moment. Specifically, the need to create the OSD outside of the container is counter-intuitive. If the client.admin keyring is available, though, we should be able to have the script fully bootstrap an OSD (including creation of that OSD). This would come close to @hookenz addition concept: create and mount the directory, and the container takes care of everything else.

Even better if we had a small execution wrapper instead of the current startup script, which could continually monitor the osd directory structure to add (and maybe remove) OSDs as they appear.

@leseb
Copy link
Member Author

leseb commented Jun 10, 2015

Just did some cleanup and added some functions to check several things (ceph.conf and admin key).

If we focus on the CONFIG_METHOD, I agree that we should be compliant with several config stores.
It looks like you want to use these stores to 'store' keys and ceph.conf.
I also think that we could use this for configuration options too, that is more difficult to do actually.

I can try to prototype a default push config or maybe simply relies on the monitors store just like @hunter suggested in #34.
I think we can start with 'default' which I'd like to call file and then try to implement ceph-monitor.
Or just use the monitor store as a default store. However this will require access to the ceph cluster (conf and key) which ends up being a chicken-and-egg problem :-(

@leseb
Copy link
Member Author

leseb commented Jun 10, 2015

Can we maybe first merge the daemon prototype and then in another PR I'll work on the config store backends?
I think it's just too much to do in one shoot.

@leseb leseb force-pushed the single-container branch 2 times, most recently from 9f71a2d to a517cac Compare June 10, 2015 16:58
@Ulexus
Copy link
Contributor

Ulexus commented Jun 10, 2015

Oh, certainly.

On Wed, Jun 10, 2015, 12:48 Leseb notifications@github.com wrote:

Can we maybe first merge the daemon prototype and then in another PR I'll
work on the config store backends?
I think it's just too much to do in one shoot.


Reply to this email directly or view it on GitHub
#78 (comment).

Seán C McCord
CyCore Systems, Inc
888-240-0308

@hookenz
Copy link
Contributor

hookenz commented Jun 10, 2015

This looks interesting.

By the way, why do you pass CEPH_DAEMON= as an environment variable and not as a command to run?
And why are you using environment variables for CEPH_NETWORK when you could read this from ceph.conf?

@hunter
Copy link
Contributor

hunter commented Jun 11, 2015

Looks like a nice approach.

Is the assumption that multiple OSDs will be run from a single container to get around the issues with inter-host communication?

@leseb
Copy link
Member Author

leseb commented Jun 11, 2015

@hookenz mainly because we need to configure the container upfront, if we simply run the command we need to know the id of the daemon. Does it answer your question because I'm not sure if I got it correctly.
We use the CEPH_NETWORK var only for the monitors while building the first monitor and its ceph.conf. Since monitors can't run without the --net=host flag we need to specify the host network to the container.

@hunter to be honest I'm not sure what's the plan. However what I'm sure is that I'd like to refactor the OSD part and restart something from scratch. For example, I think we should start using ceph-disk to bootstrap them.

@hunter
Copy link
Contributor

hunter commented Jun 11, 2015

Agreed. Since the OSDs are such a critical part of running a scalable Ceph cluster it's important that we build something that's going to be flexible for mounting, hot swapping, journals, etc. ceph-disk will probably help with a number of those things. I'll test out a few ideas when I get a spare minute.

@leseb
Copy link
Member Author

leseb commented Jun 11, 2015

I started to play with ceph-disk inside a container, results seem to be rather random. From what I observed privileged mode is needed.

@leseb
Copy link
Member Author

leseb commented Jun 11, 2015

@Ulexus I just added the support for bootstrap keys since it's part of the best practices to used them instead of always requiring the admin key.

@leseb leseb force-pushed the single-container branch 6 times, most recently from 153e117 to 278ed44 Compare June 11, 2015 17:47
@leseb leseb mentioned this pull request Jun 12, 2015
@hookenz
Copy link
Contributor

hookenz commented Jun 12, 2015

I don't think you should default to starting any daemon if not specified. That might lead to rogue or orphaned mon's if your not careful. Better to just exit with error and some help text.

@leseb
Copy link
Member Author

leseb commented Jun 13, 2015

@hookenz no worries I don't :)

@leseb leseb force-pushed the single-container branch 6 times, most recently from 84e444a to af6f846 Compare June 15, 2015 12:38
@leseb
Copy link
Member Author

leseb commented Jun 15, 2015

@Ulexus can you do a last round on this one?
I think we are good to merge it.

* Run multiple OSDs within the same container

To run multiple OSDs within the same container, simply bind-mount each OSD datastore directory:
* `docker run -v /osds/1:/var/lib/ceph/osd/ceph-1 -v /osds/2:/var/lib/ceph/osd/ceph-2`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or, since we are now providing the OSD directory option as a top-level feature, the example here should probably reflect that: export the osd directory

@Ulexus
Copy link
Contributor

Ulexus commented Jun 15, 2015

Architecturally, I don't know that we particularly need to separate the execution commands for OSD_DEVICE and OSD_DIRECTORY. We should be able to simply check the /var/lib/ceph/osd tree on start to see if there exist OSDs in that directory, then check for OSD_DEVICE, then fail. That also means that you can bootstrap an OSD using --privileged, but on subsequent runs, have the host mount the OSD and run without --privileged.

-v /var/lib/ceph/:/var/lib/ceph/ \
-v /dev/:/dev/ \
-e CEPH_DAEMON=OSD \
-e OSD_DEVICE=/dev/vdd \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't support multiple devices?

What about the docker run 'devices' option? Redundant with mounting /dev but any point in also using?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tried the --device option without luck. Even doing -v /dev/vdb didn't work. The only way for me to get it working was to use -v /dev:/dev...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing ceph-disk needs access to the other parts of /dev? /dev/disk/by-*

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. sgdisk uses disk's uuid.

The idea is pretty straighforward we simply pass a new env var and boot
a monitor like this:

`sudo docker run -d --net=host -v /etc/ceph:/etc/ceph -e CEPH_DAEMON=MON
-e MON_IP=192.168.0.20 -e CEPH_NETWORK=192.168.0.0/24 ceph/daemon`

So far, I've been able to successfully bootstrap MON, MDS and RGW.
I had to fix some MDS issue. Because we use `set  -e` we can not really
trap a command return code in a variable, the script will exist before
that since the command potentially returns something different than 0.

For the OSD, it's probably me...

I couldn't find a better name than "daemon" for now.
Let's first discuss the implementation.
I also added some meaningful log messages for the OSD.

Signed-off-by: Sébastien Han <seb@redhat.com>
@Ulexus
Copy link
Contributor

Ulexus commented Jun 15, 2015

One other thing: is there any particular reason to require the execution be an environment variable instead of a parameter? If we are calling this the entrypoint script, docker automatically passes the arguments to its execution on. Hence, we could simply execute docker run ceph/daemon osd instead of docker run -e CEPH_DAEMON=CEPH_OSD_DEVICE ceph/daemon. The former seems more natural, to me.

@leseb
Copy link
Member Author

leseb commented Jun 15, 2015

I agree it makes more sense to run docker run ceph/daemon osd unfortunately I don't know how to do this :). Assistance required on this. :)
Is it simply a $1?

@Ulexus
Copy link
Contributor

Ulexus commented Jun 15, 2015

Ah, no problem. I say we go ahead and merge this but not publish it to Docker Hub yet. Then we can work on it... WIP, after all.

leseb added a commit that referenced this pull request Jun 15, 2015
WIP: first iteration of a unique daemon image
@leseb leseb merged commit 5bb877d into master Jun 15, 2015
* use `OSD_DIRECTORY` where you specify an OSD mount point to your container


### Ceph disk ###
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems Ceph disk is for single use only? Looks like it checks the partition and exits (unless a zap is needed). Should it be documented that the directory mode should be used on following runs?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we should clarify that preparation steps are needed before running the container and exposing the OSD directory.

leseb added a commit that referenced this pull request Jun 17, 2015
Since #78, we introduced the daemon container that centralizes all the
Ceph daemons instead of having separate container images. We now only
have one.

Signed-off-by: Sébastien Han <seb@redhat.com>
leseb added a commit that referenced this pull request Jul 21, 2015
Since #78, we introduced the daemon container that centralizes all the
Ceph daemons instead of having separate container images. We now only
have one.

Signed-off-by: Sébastien Han <seb@redhat.com>
leseb added a commit that referenced this pull request Oct 16, 2015
Since #78, we introduced the daemon container that centralizes all the
Ceph daemons instead of having separate container images. We now only
have one.

Signed-off-by: Sébastien Han <seb@redhat.com>
leseb added a commit that referenced this pull request Dec 3, 2015
Since #78, we introduced the daemon container that centralizes all the
Ceph daemons instead of having separate container images. We now only
have one.

Signed-off-by: Sébastien Han <seb@redhat.com>
@leseb leseb deleted the single-container branch December 17, 2015 18:00
mkkie pushed a commit to mkkie/ceph-container that referenced this pull request Nov 1, 2017
Docker under v1.12 can't build with symbolic link file
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants