-
Notifications
You must be signed in to change notification settings - Fork 520
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: first iteration of a unique daemon image #78
Conversation
This probably needs a bit more documentation... |
Thanks @leseb . "daemon" works for me. A few ideas:
The bootstrapping, to me, offers an interesting quandary. If we are not instructed to start a |
I was initially thinking of allowing multiple daemons running into a single container. Later I remembered that it is not what we recommend. The implementation might be tricky as well lead to undesired behaviours. So yes, in the end I'm more incline to force users to run micro service containers instead of running multiple daemons. Given that containers are really lightweight I don't any reason why someone would want to run more than one daemons inside a container. There is definitely room for improvement for the OSD part as I believe the current state is too complex and not user-friendly at all. We should think of a new design I guess. Regarding the bootstrapping, we could let daemons die but a check to verify that a communication can be established with a monitor is probably better. This will avoid unpleasant debugging. I can work on something. |
That's a good way of thinking about it. Yes, I would have to agree. You're also correct about the multiple daemons (aside from the OSD and workarounds required there...but that doesn't have any bearing on the daemon selection). Another thing to consider with the single-container thing: we don't have to worry about backward compatibility, so we should probably reorganize the osd entrypoint, since it's rather messy. Maybe the mon, too. I've been thinking of the various ways to integrate these with etcd/confd (in a flexible and not-mandatory way). I'll definitely say that I am presently caught up with the concept of directory-described execution... and I think we could definitely provide means of pulling down configs and keys (from etcd, consult, S3, URL, etc.). |
(perhaps?) Configuration and key extraction procedure:
|
I think the current OSD detection routine is actually fairly good; it should just be simplified and better documented. (referring to the directory detection). Bootstrapping OSDs, though, is a bit painful, at the moment. Specifically, the need to create the OSD outside of the container is counter-intuitive. If the client.admin keyring is available, though, we should be able to have the script fully bootstrap an OSD (including creation of that OSD). This would come close to @hookenz addition concept: create and mount the directory, and the container takes care of everything else. Even better if we had a small execution wrapper instead of the current startup script, which could continually monitor the osd directory structure to add (and maybe remove) OSDs as they appear. |
Just did some cleanup and added some functions to check several things ( If we focus on the I can try to prototype a default push config or maybe simply relies on the monitors store just like @hunter suggested in #34. |
Can we maybe first merge the daemon prototype and then in another PR I'll work on the config store backends? |
9f71a2d
to
a517cac
Compare
Oh, certainly. On Wed, Jun 10, 2015, 12:48 Leseb notifications@github.com wrote:
|
This looks interesting. By the way, why do you pass CEPH_DAEMON= as an environment variable and not as a command to run? |
Looks like a nice approach. Is the assumption that multiple OSDs will be run from a single container to get around the issues with inter-host communication? |
@hookenz mainly because we need to configure the container upfront, if we simply run the command we need to know the id of the daemon. Does it answer your question because I'm not sure if I got it correctly. @hunter to be honest I'm not sure what's the plan. However what I'm sure is that I'd like to refactor the OSD part and restart something from scratch. For example, I think we should start using ceph-disk to bootstrap them. |
Agreed. Since the OSDs are such a critical part of running a scalable Ceph cluster it's important that we build something that's going to be flexible for mounting, hot swapping, journals, etc. ceph-disk will probably help with a number of those things. I'll test out a few ideas when I get a spare minute. |
I started to play with ceph-disk inside a container, results seem to be rather random. From what I observed privileged mode is needed. |
@Ulexus I just added the support for bootstrap keys since it's part of the best practices to used them instead of always requiring the admin key. |
153e117
to
278ed44
Compare
I don't think you should default to starting any daemon if not specified. That might lead to rogue or orphaned mon's if your not careful. Better to just exit with error and some help text. |
@hookenz no worries I don't :) |
84e444a
to
af6f846
Compare
@Ulexus can you do a last round on this one? |
* Run multiple OSDs within the same container | ||
|
||
To run multiple OSDs within the same container, simply bind-mount each OSD datastore directory: | ||
* `docker run -v /osds/1:/var/lib/ceph/osd/ceph-1 -v /osds/2:/var/lib/ceph/osd/ceph-2` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or, since we are now providing the OSD directory option as a top-level feature, the example here should probably reflect that: export the osd directory
Architecturally, I don't know that we particularly need to separate the execution commands for |
-v /var/lib/ceph/:/var/lib/ceph/ \ | ||
-v /dev/:/dev/ \ | ||
-e CEPH_DAEMON=OSD \ | ||
-e OSD_DEVICE=/dev/vdd \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't support multiple devices?
What about the docker run 'devices' option? Redundant with mounting /dev but any point in also using?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've tried the --device
option without luck. Even doing -v /dev/vdb
didn't work. The only way for me to get it working was to use -v /dev:/dev
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm guessing ceph-disk needs access to the other parts of /dev? /dev/disk/by-*
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct. sgdisk
uses disk's uuid.
The idea is pretty straighforward we simply pass a new env var and boot a monitor like this: `sudo docker run -d --net=host -v /etc/ceph:/etc/ceph -e CEPH_DAEMON=MON -e MON_IP=192.168.0.20 -e CEPH_NETWORK=192.168.0.0/24 ceph/daemon` So far, I've been able to successfully bootstrap MON, MDS and RGW. I had to fix some MDS issue. Because we use `set -e` we can not really trap a command return code in a variable, the script will exist before that since the command potentially returns something different than 0. For the OSD, it's probably me... I couldn't find a better name than "daemon" for now. Let's first discuss the implementation. I also added some meaningful log messages for the OSD. Signed-off-by: Sébastien Han <seb@redhat.com>
One other thing: is there any particular reason to require the execution be an environment variable instead of a parameter? If we are calling this the entrypoint script, docker automatically passes the arguments to its execution on. Hence, we could simply execute |
I agree it makes more sense to run |
Ah, no problem. I say we go ahead and merge this but not publish it to Docker Hub yet. Then we can work on it... WIP, after all. |
WIP: first iteration of a unique daemon image
* use `OSD_DIRECTORY` where you specify an OSD mount point to your container | ||
|
||
|
||
### Ceph disk ### |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems Ceph disk is for single use only? Looks like it checks the partition and exits (unless a zap is needed). Should it be documented that the directory mode should be used on following runs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes we should clarify that preparation steps are needed before running the container and exposing the OSD directory.
Since #78, we introduced the daemon container that centralizes all the Ceph daemons instead of having separate container images. We now only have one. Signed-off-by: Sébastien Han <seb@redhat.com>
Since #78, we introduced the daemon container that centralizes all the Ceph daemons instead of having separate container images. We now only have one. Signed-off-by: Sébastien Han <seb@redhat.com>
Since #78, we introduced the daemon container that centralizes all the Ceph daemons instead of having separate container images. We now only have one. Signed-off-by: Sébastien Han <seb@redhat.com>
Since #78, we introduced the daemon container that centralizes all the Ceph daemons instead of having separate container images. We now only have one. Signed-off-by: Sébastien Han <seb@redhat.com>
Docker under v1.12 can't build with symbolic link file
The idea is pretty straighforward we simply pass a new env var and boot
a monitor like this:
sudo docker run -d --net=host -v /etc/ceph:/etc/ceph -e CEPH_DAEMON=MON -e MON_IP=192.168.0.20 -e CEPH_NETWORK=192.168.0.0/24 ceph/daemon
So far, I've been able to successfully bootstrap MON, MDS and RGW.
I had to fix some MDS issue. Because we use
set -e
we can not reallytrap a command return code in a variable, the script will exist before
that since the command potentially returns something different than 0.
For the OSD, it's probably me...
I couldn't find a better name than "daemon" for now.
Let's first discuss the implementation.
I also added some meaningful log messages for the OSD.
Signed-off-by: Sébastien Han seb@redhat.com