Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

installer: added coreos.inst.callhome_url option #25

Closed
wants to merge 1 commit into from

Conversation

@lzap
Copy link

@lzap lzap commented May 3, 2019

Implements a simple callback mechanism for provisioning management software (like Foreman or Red Hat Satellite) to inform the system to exit build mode and reconfigure TFTP/PXE services.

Issue: #21

CALLHOME_URL=$(cat /tmp/callhome_url)
if [ ! -z "$CALLHOME_URL" ]
echo "Calling home: $CALLHOME_URL" >> /tmp/debug
curl -skf -o /dev/null "$CALLHOME_URL" >/tmp/debug 2>&1

This comment has been minimized.

@cgwalters

cgwalters May 3, 2019
Member

Would it be helpful to add any other data here? And related make this a POST rather than GET? Something like:

(for x in /sys/class/net/*; do
  if ! -f $x/address; then continue; fi
  echo $x $(cat $x/address)
) > /tmp/net-devices.txt
curl -F coreos.inst=1,net-devices=@/tmp/net-devices -skf $CALLHOME_URL

or so?

This comment has been minimized.

@lzap

lzap May 3, 2019
Author

Cheers Colin! The POST is a great comment, we actually had GET for years and I've only fixed this few months ago so our endpoint actually accepts more correct POST. Let me fix that.

For the data, it's actually too late to provide any info to a provisioning management system because at this point everything was already feeded and initial configuration was done. At least in all our workflows, we heavily rely on configuration management systems like Ansible or Puppet which takes over on the initial boot and talks to our system in a different way (gathering facts and stuff).

However, it's a good point. Our endpoint actually accepts two calls - "finished" and "failed" when the latter accepts arbitrary text (log file) which is stored for later use. However this would require providing two URLs - one for success and one for failed case. This would probably be too complex. We were doing fine with just a simple "call home we are done" call, errors can be always investigated later on.

I am going to change the call to POST then.

This comment has been minimized.

@lzap

lzap May 3, 2019
Author

Rebased, how about this? Sending the /tmp/debug/ file in the POST body. I can modify our "success" endpoint to store it in the database or at least log it.

@lzap lzap force-pushed the lzap:callhome branch from 53e2c6a to 75e554c May 3, 2019
if [ ! -z "$CALLHOME_URL" ]
then
echo "preset call home url to $CALLHOME_URL" >> /tmp/debug
echo $CALLHOME_URL >> /tmp/callhome_url

This comment has been minimized.

@ekohl

ekohl May 3, 2019

I'd quote this just to be sure. It can contain & which can break in interesting ways.

Suggested change
echo $CALLHOME_URL >> /tmp/callhome_url
echo "$CALLHOME_URL" >> /tmp/callhome_url

This comment has been minimized.

@lzap

lzap May 16, 2019
Author

Very nice catch.

############################################################
call_home() {
CALLHOME_URL=$(cat /tmp/callhome_url)
if [ ! -z "$CALLHOME_URL" ]

This comment has been minimized.

@ekohl

ekohl May 3, 2019

For my understanding: isn't this equivalent to if [ -n "$CALLHOME_URL" ]?

This comment has been minimized.

@lzap

lzap May 16, 2019
Author

I copy-pasted this, this is my very first dracut patch and I am really not sure if this is being executed by GNU bash or something else. So keeping.

CALLHOME_URL=$(cat /tmp/callhome_url)
if [ ! -z "$CALLHOME_URL" ]
echo "Calling home: $CALLHOME_URL" >> /tmp/debug
curl -skf -o /dev/null --data-binary @/tmp/debug "$CALLHOME_URL" >/tmp/debug 2>&1

This comment has been minimized.

@arithx

arithx May 3, 2019
Contributor

Is there a particular reason for this to be an insecure request?

As well, why the silent failure? If a user has specified a call home URL it seems like not being able to dial home is a potentially critical failure.

This comment has been minimized.

@lzap

lzap May 16, 2019
Author

Is there a particular reason for this to be an insecure request?

Yes, there is. Where would I get a server certificate from? Or is there a mechanism to inject/embed certificates in the initramdisk? Then I will happily remove that, however this will force users to rebuild initramdisk everytime. Pulling it from network does not make any sense to me.

As well, why the silent failure?

This was a copy-paste error. No particular reason.

This comment has been minimized.

@cgwalters

cgwalters May 16, 2019
Member

Not easily. However, on a related topic for OpenShift 4 the installer sets things up so that the Ignition served to the device is just a (URL, cert) pair. See e.g. openshift/machine-config-operator#759
However, we can't quite fit a cert into a kernel command line.

Maybe what we should do is have e.g. coreos.inst.callhome_cert_fingerprint and trust the cert matching that fingerprint. But eh, don't need to do it in this patch.

This comment has been minimized.

@lzap

lzap May 16, 2019
Author

I'll do what you say, however I also lean towards just waiving this.

This comment has been minimized.

@arithx

arithx May 16, 2019
Contributor

I was more thinking along the lines of forcing the user to specify HTTP endpoints to convey that the call is insecure rather than accepting HTTPS endpoints and ignoring cert failures.

@lzap lzap force-pushed the lzap:callhome branch from 75e554c to d353663 May 16, 2019
@lzap
Copy link
Author

@lzap lzap commented May 16, 2019

I rebased, however I am keeping -k for now until we confirm.

coreos-installer Outdated Show resolved Hide resolved
coreos-installer Outdated Show resolved Hide resolved
@lzap lzap force-pushed the lzap:callhome branch from d353663 to f7164f9 May 16, 2019
@lzap
Copy link
Author

@lzap lzap commented May 16, 2019

Resolved the two comments, thanks.

@cgwalters
Copy link
Member

@cgwalters cgwalters commented May 23, 2019

Looks like this needs a rebase 🏄‍♂️

@lzap lzap force-pushed the lzap:callhome branch from f7164f9 to a3c0cc3 May 31, 2019
@lzap
Copy link
Author

@lzap lzap commented May 31, 2019

Sure, done.

CALLHOME_URL=$(cat /tmp/callhome_url)
if [ ! -z "$CALLHOME_URL" ]
echo "Calling home: $CALLHOME_URL" >> /tmp/debug
curl -k -o /dev/null --data-binary @/tmp/debug "$CALLHOME_URL" >>/tmp/debug 2>&1

This comment has been minimized.

@ekohl

ekohl May 31, 2019

I always like using long versions in scripts to see what's going on. So --insecure instead of -k

@lzap
Copy link
Author

@lzap lzap commented Jun 12, 2019

For the record we agreed that it is not the best approach to start adding more code to the initram, but we failed to figure out something better. Let's have another meeting a look on how to take a different solution probably by using coreos-installer directly from a discovered node. I am keeping this opened for now.

@dustymabe
Copy link
Member

@dustymabe dustymabe commented Jun 12, 2019

Thanks @lzap for updating us here.

I am keeping this opened for now.

+1

@bgilbert
Copy link
Member

@bgilbert bgilbert commented Jun 21, 2019

coreos/fedora-coreos-tracker#203 proposes a solution that could be implemented in the medium term.

@fsperling
Copy link

@fsperling fsperling commented Jul 31, 2019

We'd really like to see this feature as well.

@StykMartin
Copy link

@StykMartin StykMartin commented Jul 31, 2019

Same for Beaker project

@miabbott
Copy link
Contributor

@miabbott miabbott commented Aug 27, 2019

For the record we agreed that it is not the best approach to start adding more code to the initram, but we failed to figure out something better. Let's have another meeting a look on how to take a different solution probably by using coreos-installer directly from a discovered node. I am keeping this opened for now.

It's been over two months since we had the original meeting, but no follow-up yet. Additionally, there is growing interest in provisioning FCOS/RHCOS in Beaker and this patch would get us closer to that goal.

Do we want to stick with the original plan and meet again to find a better solution? Or could we merge this as is and iterate towards a more complete solution?

@bgilbert
Copy link
Member

@bgilbert bgilbert commented Aug 28, 2019

For Fedora CoreOS we're going to prioritize live PXE (coreos/fedora-coreos-tracker#105), which will allow running the installer from a full system instead of the initramfs. This will enable arbitrary phone-home functionality via an Ignition config that specifies arbitrary systemd service units to run before/after the installer. That approach seems better than hardcoding a specific callback which runs in initramfs context. RHEL CoreOS should eventually pick up these changes as well.

@cgwalters
Copy link
Member

@cgwalters cgwalters commented Aug 28, 2019

For Fedora CoreOS we're going to prioritize live PXE (coreos/fedora-coreos-tracker#105), which will allow running the installer from a full system instead of the initramfs.

Basically, this would be a new installer, and hence require its own different documentation, correct?

I agree with implementing it; although I wonder if for RHCOS we could just do a wholesale switch (probably? in practice maybe we ship both for a release?)

But why not ship the obvious trivial enhancement for the existing code now?

@bgilbert
Copy link
Member

@bgilbert bgilbert commented Aug 28, 2019

Basically, this would be a new installer, and hence require its own different documentation, correct?

Nope. It'd be the same installer code, respecting the same kargs. It'd just run in a live system rather than the initramfs. For complex cases, users would be able to pass an Ignition config to the live system, and for existing cases, the installer would be backward-compatible. The only docs change should be the name of the boot image.

in practice maybe we ship both for a release?

Could do, if desired.

But why not ship the obvious trivial enhancement for the existing code now?

The problem is that 1. every user will likely want slightly different functionality (HTTP GET or POST? what metadata should be sent and how should it be formatted?), and 2. once we add the argument we'll have to support it forever.

@ashcrow
Copy link
Member

@ashcrow ashcrow commented Jan 20, 2020

Bump

@bgilbert
Copy link
Member

@bgilbert bgilbert commented Jan 20, 2020

Fedora CoreOS now runs the installer from the real root, in the live ISO/PXE image. To call home, the live image is itself booted with an Ignition config, which configures additional systemd services that run before/after the installer as desired. Those services can use all the facilities of the real root, and can perform arbitrary custom logic.

RHEL CoreOS doesn't support that functionality yet, but we should get it ported over. Meanwhile, if you're feeling adventurous, it should be possible to use Fedora CoreOS to install RHEL CoreOS. 🙂

I'll close this. It has too many caveats to support in the long term.

@bgilbert bgilbert closed this Jan 20, 2020
@bgilbert
Copy link
Member

@bgilbert bgilbert commented Jan 22, 2020

I've confirmed that the Fedora CoreOS live image + installer can successfully install RHCOS with:

qemu-system-x86_64 -machine accel=kvm -m 2048 \
    -netdev user,id=eth0,hostfwd=tcp::2222-:22,hostname="live" \
    -device virtio-net-pci,netdev=eth0 \
    -object rng-random,filename=/dev/urandom,id=rng0 -device virtio-rng-pci,rng=rng0 \
    -kernel fedora-coreos-31.20200113.3.1-live-kernel-x86_64 \
    -initrd fedora-coreos-31.20200113.3.1-live-initramfs.x86_64.img \
    -hda rhcos-install.qcow2 \
    -append 'rd.neednet=1 ip=dhcp ignition.platform.id=qemu coreos.inst.install_dev=/dev/sda coreos.inst.image_url=http://10.0.2.2:8080/rhcos-44.81.202001220953.0-metal.x86_64.raw.gz coreos.inst.ignition_url=http://10.0.2.2:8080/sshkey.ign coreos.inst.insecure'
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

10 participants