How is it all connected

Hippie Hacker edited this page Jun 23, 2016 · 6 revisions

Overview

You can install tshark and follow the dhcp, tftp, and api(http) traffic to hanlon.

On the hanlon/tftp host:

tshark -Y 'bootp or tftp or http' -f 'port 67 or port 68 or port 69 or port 8026' -i eth0

Full tcpdump w/ Screenshots

  1   0.000000      0.0.0.0 -> 255.255.255.255 DHCP 590 DHCP Discover - Transaction ID 0x9097d478
  2   0.000291    1.1.1.250 -> 255.255.255.255 DHCP 342 DHCP Offer    - Transaction ID 0x9097d478
  3   0.000463      1.1.1.6 -> 255.255.255.255 DHCP 419 DHCP Offer    - Transaction ID 0x9097d478
  4   2.000472      0.0.0.0 -> 255.255.255.255 DHCP 590 DHCP Request  - Transaction ID 0x9097d478
  5   2.000631    1.1.1.250 -> 255.255.255.255 DHCP 342 DHCP ACK      - Transaction ID 0x9097d478

Intel PXE

Hanlon Boot Menu

  6   7.220480      1.1.1.1 -> 1.1.1.6      TFTP 69 Read Request, File: pxelinux.0, Transfer type: octet, tsize\000=0\000
  7   7.221381      1.1.1.1 -> 1.1.1.6      TFTP 74 Read Request, File: pxelinux.0, Transfer type: octet, blksize\000=1456\000
  8   7.278234      1.1.1.1 -> 1.1.1.6      TFTP 121 Read Request, File: pxelinux.cfg/00000000-0000-0000-0000-00259097d478, Transfer type: octet, tsize\000=0\000, blksize\000=1408\000
  9   7.279156      1.1.1.1 -> 1.1.1.6      TFTP 105 Read Request, File: pxelinux.cfg/01-00-25-90-97-d4-78, Transfer type: octet, tsize\000=0\000, blksize\000=1408\000
 10   7.280216      1.1.1.1 -> 1.1.1.6      TFTP 93 Read Request, File: pxelinux.cfg/01010101, Transfer type: octet, tsize\000=0\000, blksize\000=1408\000
 11   7.281290      1.1.1.1 -> 1.1.1.6      TFTP 92 Read Request, File: pxelinux.cfg/0101010, Transfer type: octet, tsize\000=0\000, blksize\000=1408\000
 12   7.282250      1.1.1.1 -> 1.1.1.6      TFTP 91 Read Request, File: pxelinux.cfg/010101, Transfer type: octet, tsize\000=0\000, blksize\000=1408\000
 13   7.283270      1.1.1.1 -> 1.1.1.6      TFTP 90 Read Request, File: pxelinux.cfg/01010, Transfer type: octet, tsize\000=0\000, blksize\000=1408\000
 14   7.284294      1.1.1.1 -> 1.1.1.6      TFTP 89 Read Request, File: pxelinux.cfg/0101, Transfer type: octet, tsize\000=0\000, blksize\000=1408\000
 15   7.285284      1.1.1.1 -> 1.1.1.6      TFTP 88 Read Request, File: pxelinux.cfg/010, Transfer type: octet, tsize\000=0\000, blksize\000=1408\000
 16   7.286219      1.1.1.1 -> 1.1.1.6      TFTP 87 Read Request, File: pxelinux.cfg/01, Transfer type: octet, tsize\000=0\000, blksize\000=1408\000
 17   7.287174      1.1.1.1 -> 1.1.1.6      TFTP 86 Read Request, File: pxelinux.cfg/0, Transfer type: octet, tsize\000=0\000, blksize\000=1408\000
 18   7.288151      1.1.1.1 -> 1.1.1.6      TFTP 92 Read Request, File: pxelinux.cfg/default, Transfer type: octet, tsize\000=0\000, blksize\000=1408\000
 19   7.289176      1.1.1.1 -> 1.1.1.6      TFTP 80 Read Request, File: menu.c32, Transfer type: octet, tsize\000=0\000, blksize\000=1408\000
 20   7.308579      1.1.1.1 -> 1.1.1.6      TFTP 92 Read Request, File: pxelinux.cfg/default, Transfer type: octet, tsize\000=0\000, blksize\000=1408\000
 21  12.502984      1.1.1.1 -> 1.1.1.6      TFTP 81 Read Request, File: ipxe.lkrn, Transfer type: octet, tsize\000=0\000, blksize\000=1408\000
 22  12.592271      1.1.1.1 -> 1.1.1.6      TFTP 83 Read Request, File: hanlon.ipxe, Transfer type: octet, tsize\000=0\000, blksize\000=1408\000

image

 23  13.719624      0.0.0.0 -> 255.255.255.255 DHCP 444 DHCP Discover - Transaction ID 0x14cbee33
 24  13.720643      1.1.1.6 -> 255.255.255.255 DHCP 419 DHCP Offer    - Transaction ID 0x14cbee33
 25  13.720872      0.0.0.0 -> 255.255.255.255 DHCP 456 DHCP Request  - Transaction ID 0x14cbee33
 28  21.676904      1.1.1.1 -> 1.1.1.6      HTTP 326 GET /hanlon/api/v1/boot?uuid=00000000-0000-0000-0000-00259097d478&mac_id=00%3A25%3A90%3A97%3Ad4%3A78_00%3A25%3A90%3A97%3Ad4%3A79______&dhcp_mac=00%3A25%3A90%3A97%3Ad4%3A78 HTTP/1.1 
 30  21.878985      1.1.1.6 -> 1.1.1.1      HTTP 586 HTTP/1.1 200 OK 
 32  21.885853      1.1.1.1 -> 1.1.1.6      HTTP 218 GET /hanlon/api/v1/image/mk/2YAL74TcnUjaO6k8fqr6tX/boot/vmlinuz HTTP/1.1 
2916  22.207271      1.1.1.6 -> 1.1.1.1      HTTP 2063 HTTP/1.1 200 OK  (application/octet-stream)
2950  22.217342      1.1.1.1 -> 1.1.1.6      HTTP 217 GET /hanlon/api/v1/image/mk/2YAL74TcnUjaO6k8fqr6tX/boot/initrd HTTP/1.1 
24636  24.728013      1.1.1.6 -> 1.1.1.1      HTTP 11481 HTTP/1.1 200 OK  (application/octet-stream)

cloud-init

RancherOS v0.4.4 started

24671  41.304284      0.0.0.0 -> 255.255.255.255 DHCP 402 DHCP Discover - Transaction ID 0x8d491d0
24672  41.321278      0.0.0.0 -> 255.255.255.255 DHCP 412 DHCP Request  - Transaction ID 0x8d491d0
24676  41.736168      1.1.1.1 -> 1.1.1.6      HTTP 217 GET /hanlon/api/v1/image/mk/2YAL74TcnUjaO6k8fqr6tX/cloud-config HTTP/1.1 
24679  41.753816      1.1.1.6 -> 1.1.1.1      HTTP 1330 HTTP/1.1 200 OK 
24682  41.754587      1.1.1.1 -> 1.1.1.6      HTTP 217 GET /hanlon/api/v1/image/mk/2YAL74TcnUjaO6k8fqr6tX/cloud-config HTTP/1.1 
24684  41.764903      1.1.1.6 -> 1.1.1.1      HTTP 1330 HTTP/1.1 200 OK 
24689  43.085386      0.0.0.0 -> 255.255.255.255 DHCP 402 DHCP Discover - Transaction ID 0x5f25396a
24690  43.085844      0.0.0.0 -> 255.255.255.255 DHCP 412 DHCP Request  - Transaction ID 0x5f25396a
24694  45.637673      1.1.1.1 -> 1.1.1.6      HTTP 206 GET /hanlon/api/v1/image/mk/2YAL74TcnUjaO6k8fqr6tX/hanlon-mk-image.tar HTTP/1.1 
58327  52.532322      1.1.1.6 -> 1.1.1.1      HTTP 1067 HTTP/1.1 200 OK  (application/octet-stream)
# ssh rancher@1.1.1.1
rancher@1.1.1.1's password: 
[rancher@rancher ~]$ cat /opt/rancher/bin/start-mk.sh 
#!/bin/bash

# download Microkernel image from Hanlon server
cd /tmp
wget http://1.1.1.6:8026/hanlon/api/v1/image/mk/2YAL74TcnUjaO6k8fqr6tX/hanlon-mk-image.tar
# wait until docker daemon is running
prev_time=0
sleep_time=1
while true; do
  # break out of loop if docker daemon is in process table
  ps aux | grep `cat /var/run/docker.pid` | grep -v grep 2>&1 > /dev/null && break
  tmp_val=$((prev_time+sleep_time))
  prev_time=$sleep_time
  sleep_time=$tmp_val
  sleep $sleep_time
done
# load Microkernel image and start the Microkernel
docker load -i hanlon-mk-image.tar
docker run --privileged=true --name=hnl_mk -v /proc:/host-proc:ro -v /dev:/host-dev:ro -v /sys:/host-sys:ro -v /container-tmp-files:/tmp -d --net host -t `docker images -q` /bin/bash -c '/usr/local/bin/hnl_mk_init.rb && read -p "waiting..."'
[rancher@rancher ~]$ docker exec -ti hnl_mk /bin/bash
bash-4.3# ps
PID   USER     TIME   COMMAND
    1 root       0:00 /bin/bash -c /usr/local/bin/hnl_mk_init.rb && read -p "waiting..."
   62 root       0:00 ruby /usr/local/bin/hnl_mk_web_server.rb
   70 root       0:01 {ruby} hnl_mk_control_server.rb
  798 root       0:00 /bin/bash
 1763 root       0:00 /bin/bash
 1778 root       0:00 ps
58346  63.168392      1.1.1.1 -> 1.1.1.6      HTTP 329 GET /hanlon/api/v1/node/checkin?uuid=00000000-0000-0000-0000-00259097D478&mac_id=00259097D478_00259097D479&last_state=idle&first_checkin=true HTTP/1.1 
58348  63.361393      1.1.1.6 -> 1.1.1.1      HTTP 637 HTTP/1.1 200 OK  (application/json)
58360  65.036761      1.1.1.1 -> 1.1.1.6      HTTP 2993 POST /hanlon/api/v1/node/register HTTP/1.1  (text/json)
58363  65.211285      1.1.1.6 -> 1.1.1.1      HTTP 877 HTTP/1.1 201 Created  (application/json)
58373 249.154615      1.1.1.1 -> 1.1.1.6      HTTP 258 GET /hanlon/api/v1/node/checkin?uuid=00000000-0000-0000-0000-00259097D478&mac_id=00259097D478_00259097D479&last_state=idle HTTP/1.1 
58375 249.316414      1.1.1.6 -> 1.1.1.1      HTTP 656 HTTP/1.1 200 OK  (application/json)
58383 255.223705      1.1.1.1 -> 1.1.1.6      HTTP 258 GET /hanlon/api/v1/node/checkin?uuid=00000000-0000-0000-0000-00259097D478&mac_id=00259097D478_00259097D479&last_state=idle HTTP/1.1 
58385 255.442329      1.1.1.6 -> 1.1.1.1      HTTP 659 HTTP/1.1 200 OK  (application/json)
58393 262.297106      1.1.1.1 -> 1.1.1.6      HTTP 258 GET /hanlon/api/v1/node/checkin?uuid=00000000-0000-0000-0000-00259097D478&mac_id=00259097D478_00259097D479&last_state=idle HTTP/1.1 
58395 262.579147      1.1.1.6 -> 1.1.1.1      HTTP 654 HTTP/1.1 200 OK  (application/json)
58403 322.256353      1.1.1.1 -> 1.1.1.6      HTTP 258 GET /hanlon/api/v1/node/checkin?uuid=00000000-0000-0000-0000-00259097D478&mac_id=00259097D478_00259097D479&last_state=idle HTTP/1.1 
58405 322.534767      1.1.1.6 -> 1.1.1.1      HTTP 654 HTTP/1.1 200 OK  (application/json)
58413 1291.709303      1.1.1.1 -> 1.1.1.6      HTTP 310 GET /hanlon/api/v1/node/checkin?uuid=00000000-0000-0000-0000-00259097D478&mac_id=00259097D478_00259097D479&last_state=idle HTTP/1.1 
58415 1291.884336      1.1.1.6 -> 1.1.1.1      HTTP 637 HTTP/1.1 200 OK  (application/json)
58427 1293.510302      1.1.1.1 -> 1.1.1.6      HTTP 2993 POST /hanlon/api/v1/node/register HTTP/1.1  (text/json)
58431 1293.685819      1.1.1.6 -> 1.1.1.1      HTTP 877 HTTP/1.1 201 Created  (application/json)
58441 1345.834145      1.1.1.1 -> 1.1.1.6      HTTP 258 GET /hanlon/api/v1/node/checkin?uuid=00000000-0000-0000-0000-00259097D478&mac_id=00259097D478_00259097D479&last_state=idle HTTP/1.1 
58443 1346.112950      1.1.1.6 -> 1.1.1.1      HTTP 654 HTTP/1.1 200 OK  (application/json)
58451 1351.717374      1.1.1.1 -> 1.1.1.6      HTTP 310 GET /hanlon/api/v1/node/checkin?uuid=00000000-0000-0000-0000-00259097D478&mac_id=00259097D478_00259097D479&last_state=idle HTTP/1.1 
58453 1352.028310      1.1.1.6 -> 1.1.1.1      HTTP 635 HTTP/1.1 200 OK  (application/json)
58465 1353.560088      1.1.1.1 -> 1.1.1.6      HTTP 3005 POST /hanlon/api/v1/node/register HTTP/1.1  (text/json)
58468 1353.729846      1.1.1.6 -> 1.1.1.1      HTTP 889 HTTP/1.1 201 Created  (application/json)

image

iPXE 1.0.0+ centos_7 model boot_call

58475 1391.151698      0.0.0.0 -> 255.255.255.255 DHCP 590 DHCP Discover - Transaction ID 0x9097d478
58476 1391.152005    1.1.1.250 -> 255.255.255.255 DHCP 342 DHCP Offer    - Transaction ID 0x9097d478
58477 1391.152124      1.1.1.6 -> 255.255.255.255 DHCP 419 DHCP Offer    - Transaction ID 0x9097d478
58478 1393.150006      0.0.0.0 -> 255.255.255.255 DHCP 590 DHCP Request  - Transaction ID 0x9097d478
58479 1393.150193    1.1.1.250 -> 255.255.255.255 DHCP 342 DHCP ACK      - Transaction ID 0x9097d478
58480 1398.369995      1.1.1.1 -> 1.1.1.6      TFTP 69 Read Request, File: pxelinux.0, Transfer type: octet, tsize\000=0\000
58481 1398.371004      1.1.1.1 -> 1.1.1.6      TFTP 74 Read Request, File: pxelinux.0, Transfer type: octet, blksize\000=1456\000
58482 1398.427764      1.1.1.1 -> 1.1.1.6      TFTP 121 Read Request, File: pxelinux.cfg/00000000-0000-0000-0000-00259097d478, Transfer type: octet, tsize\000=0\000, blksize\000=1408\000
58483 1398.428675      1.1.1.1 -> 1.1.1.6      TFTP 105 Read Request, File: pxelinux.cfg/01-00-25-90-97-d4-78, Transfer type: octet, tsize\000=0\000, blksize\000=1408\000
58484 1398.429701      1.1.1.1 -> 1.1.1.6      TFTP 93 Read Request, File: pxelinux.cfg/01010101, Transfer type: octet, tsize\000=0\000, blksize\000=1408\000
58485 1398.430792      1.1.1.1 -> 1.1.1.6      TFTP 92 Read Request, File: pxelinux.cfg/0101010, Transfer type: octet, tsize\000=0\000, blksize\000=1408\000
58486 1398.431760      1.1.1.1 -> 1.1.1.6      TFTP 91 Read Request, File: pxelinux.cfg/010101, Transfer type: octet, tsize\000=0\000, blksize\000=1408\000
58487 1398.432712      1.1.1.1 -> 1.1.1.6      TFTP 90 Read Request, File: pxelinux.cfg/01010, Transfer type: octet, tsize\000=0\000, blksize\000=1408\000
58488 1398.433655      1.1.1.1 -> 1.1.1.6      TFTP 89 Read Request, File: pxelinux.cfg/0101, Transfer type: octet, tsize\000=0\000, blksize\000=1408\000
58489 1398.434601      1.1.1.1 -> 1.1.1.6      TFTP 88 Read Request, File: pxelinux.cfg/010, Transfer type: octet, tsize\000=0\000, blksize\000=1408\000
58490 1398.435506      1.1.1.1 -> 1.1.1.6      TFTP 87 Read Request, File: pxelinux.cfg/01, Transfer type: octet, tsize\000=0\000, blksize\000=1408\000
58491 1398.436383      1.1.1.1 -> 1.1.1.6      TFTP 86 Read Request, File: pxelinux.cfg/0, Transfer type: octet, tsize\000=0\000, blksize\000=1408\000
58492 1398.437311      1.1.1.1 -> 1.1.1.6      TFTP 92 Read Request, File: pxelinux.cfg/default, Transfer type: octet, tsize\000=0\000, blksize\000=1408\000
58493 1398.438242      1.1.1.1 -> 1.1.1.6      TFTP 80 Read Request, File: menu.c32, Transfer type: octet, tsize\000=0\000, blksize\000=1408\000
58494 1398.457291      1.1.1.1 -> 1.1.1.6      TFTP 92 Read Request, File: pxelinux.cfg/default, Transfer type: octet, tsize\000=0\000, blksize\000=1408\000
58495 1403.652508      1.1.1.1 -> 1.1.1.6      TFTP 81 Read Request, File: ipxe.lkrn, Transfer type: octet, tsize\000=0\000, blksize\000=1408\000
58496 1403.743160      1.1.1.1 -> 1.1.1.6      TFTP 83 Read Request, File: hanlon.ipxe, Transfer type: octet, tsize\000=0\000, blksize\000=1408\000
58497 1404.869942      0.0.0.0 -> 255.255.255.255 DHCP 444 DHCP Discover - Transaction ID 0x9b342309
58498 1404.870791      1.1.1.6 -> 255.255.255.255 DHCP 419 DHCP Offer    - Transaction ID 0x9b342309
58499 1404.871020      0.0.0.0 -> 255.255.255.255 DHCP 456 DHCP Request  - Transaction ID 0x9b342309
58502 1412.826364      1.1.1.1 -> 1.1.1.6      HTTP 326 GET /hanlon/api/v1/boot?uuid=00000000-0000-0000-0000-00259097d478&mac_id=00%3A25%3A90%3A97%3Ad4%3A78_00%3A25%3A90%3A97%3Ad4%3A79______&dhcp_mac=00%3A25%3A90%3A97%3Ad4%3A78 HTTP/1.1 
58504 1413.091910      1.1.1.6 -> 1.1.1.1      HTTP 653 HTTP/1.1 200 OK 

Starting anaconda installer

Package installs

58506 1416.118060      1.1.1.1 -> 1.1.1.6      HTTP 222 GET /hanlon/api/v1/image/os/7X6g1WkOom8Bt4sK4DlEvb/isolinux/vmlinuz HTTP/1.1 
62437 1416.557559      1.1.1.6 -> 1.1.1.1      HTTP 4735 HTTP/1.1 200 OK  (application/octet-stream)
62475 1416.567551      1.1.1.1 -> 1.1.1.6      HTTP 225 GET /hanlon/api/v1/image/os/7X6g1WkOom8Bt4sK4DlEvb/isolinux/initrd.img HTTP/1.1 
92020 1420.102830      1.1.1.6 -> 1.1.1.1      HTTP 272 HTTP/1.1 200 OK  (application/octet-stream)
92053 1433.518687      0.0.0.0 -> 255.255.255.255 DHCP 343 DHCP Discover - Transaction ID 0x88b06210
92054 1433.518940    1.1.1.250 -> 1.1.1.1      DHCP 342 DHCP Offer    - Transaction ID 0x88b06210
92055 1433.519266      0.0.0.0 -> 255.255.255.255 DHCP 355 DHCP Request  - Transaction ID 0x88b06210
92056 1433.519501    1.1.1.250 -> 1.1.1.1      DHCP 342 DHCP ACK      - Transaction ID 0x88b06210
92060 1436.768534      1.1.1.1 -> 1.1.1.6      HTTP 277 GET /hanlon/api/v1/policy/callback/6yRDUuaTuGDGbdPIAx8CJ9/kickstart/file HTTP/1.1 
92062 1436.943975      1.1.1.6 -> 1.1.1.1      HTTP 1172 HTTP/1.1 200 OK 
92070 1437.610395      1.1.1.1 -> 1.1.1.6      HTTP 265 GET /hanlon/api/v1/image/os/7X6g1WkOom8Bt4sK4DlEvb/.treeinfo HTTP/1.1 
92072 1437.611322      1.1.1.6 -> 1.1.1.1      HTTP 1297 HTTP/1.1 200 OK  (application/octet-stream)
92080 1437.641833      1.1.1.1 -> 1.1.1.6      HTTP 275 GET /hanlon/api/v1/image/os/7X6g1WkOom8Bt4sK4DlEvb/LiveOS/squashfs.img HTTP/1.1 
215756 1462.594184      1.1.1.6 -> 1.1.1.1      HTTP 5817 HTTP/1.1 200 OK  (application/octet-stream)
215779 1462.599765      1.1.1.6 -> 1.1.1.1      HTTP 104 HTTP/1.1 500 Internal Server Error 

# There are a few of these.... I assume they are looking for updates
215786 1462.621645      1.1.1.1 -> 1.1.1.6      HTTP 274 GET /hanlon/api/v1/image/os/7X6g1WkOom8Bt4sK4DlEvb/images/updates.img HTTP/1.1 
215788 1462.624383      1.1.1.6 -> 1.1.1.1      HTTP 374 HTTP/1.1 500 Internal Server Error  (application/json)

/hanlon/api/v1/image/os/7X6g1WkOom8Bt4sK4DlEvb/.treeinfo HTTP/1.1 
215869 1501.059031      1.1.1.6 -> 1.1.1.1      HTTP 1297 HTTP/1.1 200 OK  (application/octet-stream)
215877 1501.256293      1.1.1.1 -> 1.1.1.6      HTTP 225 GET /hanlon/api/v1/image/os/7X6g1WkOom8Bt4sK4DlEvb/repodata/repomd.xml HTTP/1.1 
215880 1501.257276      1.1.1.6 -> 1.1.1.1      HTTP 1026 HTTP/1.1 200 OK  (application/octet-stream)
215889 1501.263949      1.1.1.1 -> 1.1.1.6      HTTP 201 GET /hanlon/api/v1/image/os/7X6g1WkOom8Bt4sK4DlEvb/.treeinfo HTTP/1.1 
215891 1501.264987      1.1.1.6 -> 1.1.1.1      HTTP 1297 HTTP/1.1 200 OK  (application/octet-stream)
215899 1501.278607      1.1.1.1 -> 1.1.1.6      HTTP 294 GET /hanlon/api/v1/image/os/7X6g1WkOom8Bt4sK4DlEvb/repodata/5e7aa50a6f6811cee0b55013e2b742ef0aec84cdfdd5ae84875117c283e6aad0-primary.xml.gz HTTP/1.1 
216028 1501.305101      1.1.1.6 -> 1.1.1.1      HTTP 1685 HTTP/1.1 200 OK  (application/octet-stream)
216042 1501.316195      1.1.1.1 -> 1.1.1.6      HTTP 310 GET /hanlon/api/v1/image/os/7X6g1WkOom8Bt4sK4DlEvb/repodata/4a04a16ba51071ab8942bc507087ed59cac82e9b22134b322f1452a45b87a1a2-c7-minimal-x86_64-comps.xml.gz HTTP/1.1 
216045 1501.317234      1.1.1.6 -> 1.1.1.1      HTTP 955 HTTP/1.1 200 OK  (application/octet-stream)
216054 1501.324666      1.1.1.1 -> 1.1.1.6      HTTP 298 GET /hanlon/api/v1/image/os/7X6g1WkOom8Bt4sK4DlEvb/repodata/90e86d06e1784b6846551fbfb92a21711a570eb3ce99ca4916a8a7344c916d58-primary.sqlite.bz2 HTTP/1.1 

active_model state: postinstnall

centos7 boot

centos7 login

Breakdown

DHCP

The first 5 packets will likely be DHCP, which may include two offers. Initially the booting PXE implementation does a Discovery. The first offer is from your normal dhcp server for that network. The second offer may be from an secondary dhcp server that provides PXE boot options only. Then the PXE bios implementation does a Request which includes the combined Offer options. Finally the authoritative DHCP server ACKnowledges the combined options.

It's important to note that at this point, it's the bios / network card firmware that has been assigned an address and file/server options for tftp/bootp.

Capturing on 'eth0'
  1   0.000000      0.0.0.0 -> 255.255.255.255 DHCP Discover
  2   0.000276    1.1.1.250 -> 255.255.255.255 DHCP Offer
  3   0.000437      1.1.1.6 -> 255.255.255.255 DHCP Offer
  4   1.988556      0.0.0.0 -> 255.255.255.255 DHCP Request
  5   1.988830    1.1.1.250 -> 255.255.255.255 DHCP ACK

DHCP Discover

Let's take a closer look at the sequence of packets starting with the initial DHCP Discover.

  1   0.000000      0.0.0.0 -> 255.255.255.255 DHCP Discover

If we add -V to tshark it will be verbose and we can look at the specifics of that inital DHCP request.

There are various options that can be set of dhcp packets.

  • Note that this packet has a parameter request list that asks for TFTP and PXE options. Most dhcp servers not respond with or provide these options by default, as you will see in the next packet (the first offer from a normal dhcp response packet without pxe/tftp/bootp info).
Bootstrap Protocol (Discover)
    Option: (55) Parameter Request List
        Parameter Request List Item: (66) TFTP Server Name
        Parameter Request List Item: (67) Bootfile name
        Parameter Request List Item: (128) DOCSIS full security server IP [TODO]
        Parameter Request List Item: (129-135) PXE - undefined (vendor specific)
    Option: (57) Maximum DHCP Message Size
        Maximum DHCP Message Size: 1260
    Option: (97) UUID/GUID-based Client Identifier
        Client Identifier (UUID): 00000000-0000-0000-0000-00259097d478
    Option: (93) Client System Architecture
        Client System Architecture: IA x86 PC (0)
    Option: (94) Client Network Device Interface
        Major Version: 2
        Minor Version: 1
    Option: (60) Vendor class identifier
        Vendor class identifier: PXEClient:Arch:00000:UNDI:002001

DHCP Offer (from normal dhcp server)

The first Offer here is a normal DHCP Offer.

  2   0.000276    1.1.1.250 -> 255.255.255.255 DHCP Offer

It includes the options for client IP, netmask, router, and dns.

  • Note that the Next server IP address and and Boot file name aren't provided.
Bootstrap Protocol (Offer)
    Message type: Boot Reply (2)
    Your (client) IP address: 1.1.1.1 (1.1.1.1)
    Next server IP address: 0.0.0.0 (0.0.0.0)
    Boot file name not given
    Magic cookie: DHCP
    Option: (53) DHCP Message Type (Offer)
        DHCP: Offer (2)
    Option: (54) DHCP Server Identifier
        DHCP Server Identifier: 1.1.1.250 (1.1.1.250)
    Option: (51) IP Address Lease Time
        IP Address Lease Time: (86400s) 1 day
    Option: (1) Subnet Mask
        Subnet Mask: 255.255.255.0 (255.255.255.0)
    Option: (3) Router
        Router: 1.1.1.250 (1.1.1.250)
    Option: (6) Domain Name Server
        Domain Name Server: 10.0.0.19 (10.0.0.19)
    Option: (15) Domain Name
        Domain Name: int.my.org

DHCP Offer (proxy offer generated by dnsmasq)

This next packet is an proxy dhcp packet. It was generated by a dhcp proxy packet using dnsmasq.

  3   0.000437      1.1.1.6 -> 255.255.255.255 DHCP Offer

This Offer packet includes the the boot server ip, boot file name, and PXEClient options.

Bootstrap Protocol (Offer)
    Message type: Boot Reply (2)
    Your (client) IP address: 0.0.0.0 (0.0.0.0)
    Next server IP address: 1.1.1.6 (1.1.1.6)
    Boot file name: undionly.kpxe
    Magic cookie: DHCP
    Option: (53) DHCP Message Type (Offer)
        DHCP: Offer (2)
    Option: (54) DHCP Server Identifier
        DHCP Server Identifier: 1.1.1.6 (1.1.1.6)
    Option: (60) Vendor class identifier
        Vendor class identifier: PXEClient
    Option: (97) UUID/GUID-based Client Identifier
        Client Identifier (UUID): 00000000-0000-0000-0000-00259097d478
    Option: (43) Vendor-Specific Information (PXEClient)
        Option 43 Suboption: (6) PXE discovery control
            discovery control: 0x03
        Option 43 Suboption: (8) PXE boot servers
            boot servers: 8000010101010600000101010106
        Option 43 Suboption: (9) PXE boot menu
            boot menu: 800011426f6f742066726f6d206e6574776f726b00001942...
        Option 43 Suboption: (10) PXE menu prompt
            menu prompt: 03507265737320463820666f7220626f6f74206d656e75
        PXE Client End: 255

dnsmasq config

I use this config with dnsmasq -d -k -C /path/to/pxe-dhcp-proxy.conf to augment existing the dhcp server and it's lack of PXE info.

#dnsmasq pxe-dhcp-proxy.conf
interface=eth0

# This option let's us proxy requests on IP
# to augment the existing DHCP Offers
# with PXE / Hanlon Options
dhcp-range=1.1.1.6,proxy,255.255.255.0

# No DNS
port=0

# Log the requests
log-dhcp

# Necessary for dhcp-range options
bind-dynamic

# These vendor specific options used to be processed
# by https://github.com/csc/Hanlon-Microkernel/commit/05b87262967ee33b3a50ea3d412e7812d8dcac9f#diff-0ba7ab68423f2fe93e585cf854323583
# until it was removed... 
 
dhcp-option-force=224,1.1.1.6
dhcp-option-force=225,8026
dhcp-option-force=226,http://1.1.1.6:8026/

# Set's Option 43 Suboption: (8) PXE boot servers
# and Boot file name
# if it's not ipxe, we offer undionly.pxe
# if it's ipxe, we offer hanlon.ipxe
# A PXEClient aware implementation will ignores
# these options and use Option 43 suboptions
dhcp-boot=tag:ipxe,hanlon.ipxe,1.1.1.6,1.1.1.6
dhcp-boot=tag:!ipxe,undionly.kpxe,1.1.1.6,1.1.1.6
dhcp-boot=undionly.kpxe,1.1.1.6,1.1.1.6

# Option 43 Suboption: (6) PXE discovery control
dhcp-userclass=set:ipxe,iPXE

# Sets Option 43 Suboption: (10) PXE menu prompt
pxe-prompt="Press F8 for boot menu", 3

# These two options craft Option 43 Suboption: (9) PXE boot menu
pxe-service=X86PC, "Boot from network", pxelinux, 1.1.1.6
pxe-service=X86PC, "Boot from local hard disk", 0, 1.1.1.6

TFTP

PXEClient load

If the pxe implementation supports Option 43 PXEClient, it uses the pxe-service / PXE boot menu options to decide which file to load via tftp from the PXE boot servers list.

The default is to load pxelinux.0.

  6   7.227375      1.1.1.2 -> 1.1.1.6      Read Request, File: pxelinux.0, Transfer type: octet, tsize\000=0\000
  7   7.228379      1.1.1.2 -> 1.1.1.6      Read Request, File: pxelinux.0, Transfer type: octet, blksize\000=1456\000

PXEClient config load

The configuration of pxelinux is handled by matching the first of UUID, MAC, IP, NET(by backing off the hex ip one digit at a time), then the default file.

  8   7.285098      1.1.1.2 -> 1.1.1.6      Read Request, File: pxelinux.cfg/00000000-0000-0000-0000-002590979f58
  9   7.286096      1.1.1.2 -> 1.1.1.6      Read Request, File: pxelinux.cfg/01-00-25-90-97-9f-58
 10   7.287152      1.1.1.2 -> 1.1.1.6      Read Request, File: pxelinux.cfg/01010102
 11   7.288216      1.1.1.2 -> 1.1.1.6      Read Request, File: pxelinux.cfg/0101010
 12   7.289153      1.1.1.2 -> 1.1.1.6      Read Request, File: pxelinux.cfg/010101
 13   7.290126      1.1.1.2 -> 1.1.1.6      Read Request, File: pxelinux.cfg/01010
 14   7.291094      1.1.1.2 -> 1.1.1.6      Read Request, File: pxelinux.cfg/0101
 15   7.292027      1.1.1.2 -> 1.1.1.6      Read Request, File: pxelinux.cfg/010
 16   7.292962      1.1.1.2 -> 1.1.1.6      Read Request, File: pxelinux.cfg/01
 17   7.293943      1.1.1.2 -> 1.1.1.6      Read Request, File: pxelinux.cfg/0
 18   7.294878      1.1.1.2 -> 1.1.1.6      Read Request, File: pxelinux.cfg/default

That file usually comes from the csclabs/atftpd

default menu.c32
prompt 0
menu title Hanlon Boot Menu
timeout 50
label hanlon-boot
  menu label Automatic hanlon Node Boot
  kernel ipxe.lkrn
  append initrd=hanlon.ipxe

Once pxelinux loads it's config, it grabs menu.c32 which reloads the config file and displays it in a menu.

pxelinux.0 and menu.32 are pulled from the ubuntu syslinux package with you build csclabs/atftpd via atftp.yml

 19   7.296214      1.1.1.2 -> 1.1.1.6      Read Request, File: menu.c32
 20   7.321247      1.1.1.2 -> 1.1.1.6      Read Request, File: pxelinux.cfg/default

loading ipxe kernel and hanlon.ipxe script

The default menu option boots after 5 seconds to load ipxe.lkrn and hanlon.ipxe

 21  12.509860      1.1.1.2 -> 1.1.1.6      Read Request, File: ipxe.lkrn
 22  12.628043      1.1.1.2 -> 1.1.1.6      Read Request, File: hanlon.ipxe

ipxe.lkrn is built from http://git.ipxe.org/ipxe.git when you build cscdock/aftpd via build.yml

hanlon.ipxe is generated at run time by the tftpd container by querying the /hanlon/api/v1/config/ipxe api endpoint which is an erb template hanlon.ipxe

running ipxe kernel w/ hanlon config

The config make ipxe perform dhcp on each interface, which we pick up as a second set of DHCP Discover / Offer / Request.

 23  13.753767      0.0.0.0 -> 255.255.255.255 DHCP Discover
 24  13.754611      1.1.1.6 -> 255.255.255.255 DHCP Offer
 25  13.773036      0.0.0.0 -> 255.255.255.255 DHCP Request

ipxe dhcp discover

The Discover packet in this set will look slightly different than the one from the bios / firmware.

 23  13.753767      0.0.0.0 -> 255.255.255.255 DHCP Discover

It adds Option (175) Etherboot, and an Option (177) User Class Information (that I couldn't decode with tshark).

Bootstrap Protocol (Discover)
    Message type: Boot Request (1)
    Client MAC address: SuperMic_97:d4:78 (00:25:90:97:d4:78)
    Option: (53) DHCP Message Type (Discover)
        DHCP: Discover (1)
    Option: (93) Client System Architecture
        Client System Architecture: IA x86 PC (0)
    Option: (60) Vendor class identifier
        Vendor class identifier: PXEClient:Arch:00000:UNDI:002001
    Option: (77) User Class Information
        Length: 4 (not decoded, tshark broken?)
    Option: (55) Parameter Request List
        Parameter Request List Item: (1) Subnet Mask
        Parameter Request List Item: (3) Router
        Parameter Request List Item: (6) Domain Name Server
        Parameter Request List Item: (7) Log Server
        Parameter Request List Item: (12) Host Name
        Parameter Request List Item: (15) Domain Name
        Parameter Request List Item: (17) Root Path
        Parameter Request List Item: (43) Vendor-Specific Information
        Parameter Request List Item: (60) Vendor class identifier
        Parameter Request List Item: (66) TFTP Server Name
        Parameter Request List Item: (67) Bootfile name
        Parameter Request List Item: (119) Domain Search
        Parameter Request List Item: (128) DOCSIS full security server IP [TODO]
        Parameter Request List Item: (129) PXE - undefined (vendor specific)
        Parameter Request List Item: (130) PXE - undefined (vendor specific)
        Parameter Request List Item: (131) PXE - undefined (vendor specific)
        Parameter Request List Item: (132) PXE - undefined (vendor specific)
        Parameter Request List Item: (133) PXE - undefined (vendor specific)
        Parameter Request List Item: (134) PXE - undefined (vendor specific)
        Parameter Request List Item: (135) PXE - undefined (vendor specific)
        Parameter Request List Item: (175) Etherboot
        Parameter Request List Item: (203) Unassigned
    Option: (175) Etherboot
        Value: b1050180861521eb03010000170101220101130101110101...
    Option: (61) Client identifier
        Client MAC address: SuperMic_97:d4:78 (00:25:90:97:d4:78)
    Option: (97) UUID/GUID-based Client Identifier
        Client Identifier (UUID): 00000000-0000-0000-0000-00259097d478

hanlon dhcp offer to ipxe

 24  13.754611      1.1.1.6 -> 255.255.255.255 DHCP Offer

This dhcp offer did not wait for the original/primary dhcp server to respond.

Bootstrap Protocol (Offer)
    Message type: Boot Reply (2)
    Client IP address: 0.0.0.0 (0.0.0.0)
    Your (client) IP address: 0.0.0.0 (0.0.0.0)
    Next server IP address: 1.1.1.6 (1.1.1.6)
    Client MAC address: SuperMic_97:d4:78 (00:25:90:97:d4:78)
    Boot file name: hanlon.ipxe
    Magic cookie: DHCP
    Option: (53) DHCP Message Type (Offer)
        DHCP: Offer (2)
    Option: (54) DHCP Server Identifier
        DHCP Server Identifier: 1.1.1.6 (1.1.1.6)
    Option: (60) Vendor class identifier
        Vendor class identifier: PXEClient
    Option: (97) UUID/GUID-based Client Identifier
        Client Identifier (UUID): 00000000-0000-0000-0000-00259097d478
    Option: (43) Vendor-Specific Information (PXEClient)
        Option 43 Suboption: (6) PXE discovery control
            discovery control: 0x03
        Option 43 Suboption: (8) PXE boot servers
            boot servers: 8000010101010600000101010106
        Option 43 Suboption: (9) PXE boot menu
            boot menu: 800011426f6f742066726f6d206e6574776f726b00001942...
        Option 43 Suboption: (10) PXE menu prompt
            Length: 23
            menu prompt: 03507265737320463820666f7220626f6f74206d656e75

ipxe dhcp request

 25  13.773036      0.0.0.0 -> 255.255.255.255 DHCP Request

I'm not sure why this fires, ipxe doesn't seem to wait for an ACK (at least I don't see one on the network).

Bootstrap Protocol (Request)
    Message type: Boot Request (1)
    Client IP address: 0.0.0.0 (0.0.0.0)
    Your (client) IP address: 0.0.0.0 (0.0.0.0)
    Next server IP address: 0.0.0.0 (0.0.0.0)
    Relay agent IP address: 0.0.0.0 (0.0.0.0)
    Client MAC address: SuperMic_97:d4:78 (00:25:90:97:d4:78)
    Server host name not given
    Boot file name not given
    Option: (93) Client System Architecture
        Client System Architecture: IA x86 PC (0)
    Option: (94) Client Network Device Interface
        Major Version: 2
        Minor Version: 1
    Option: (60) Vendor class identifier
        Vendor class identifier: PXEClient:Arch:00000:UNDI:002001
    Option: (77) User Class Information
        (tshark couldn't decode this)
    Option: (55) Parameter Request List
        Parameter Request List Item: (1) Subnet Mask
        Parameter Request List Item: (3) Router
        Parameter Request List Item: (6) Domain Name Server
        Parameter Request List Item: (7) Log Server
        Parameter Request List Item: (12) Host Name
        Parameter Request List Item: (15) Domain Name
        Parameter Request List Item: (17) Root Path
        Parameter Request List Item: (43) Vendor-Specific Information
        Parameter Request List Item: (60) Vendor class identifier
        Parameter Request List Item: (66) TFTP Server Name
        Parameter Request List Item: (67) Bootfile name
        Parameter Request List Item: (119) Domain Search
        Parameter Request List Item: (128) DOCSIS full security server IP [TODO]
        Parameter Request List Item: (129) PXE - undefined (vendor specific)
        Parameter Request List Item: (130) PXE - undefined (vendor specific)
        Parameter Request List Item: (131) PXE - undefined (vendor specific)
        Parameter Request List Item: (132) PXE - undefined (vendor specific)
        Parameter Request List Item: (133) PXE - undefined (vendor specific)
        Parameter Request List Item: (134) PXE - undefined (vendor specific)
        Parameter Request List Item: (135) PXE - undefined (vendor specific)
        Parameter Request List Item: (175) Etherboot
        Parameter Request List Item: (203) Unassigned
    Option: (175) Etherboot
        Value: b1050180861521eb03010000170101220101130101110101...
    Option: (61) Client identifier
        Hardware type: Ethernet (0x01)
        Client MAC address: SuperMic_97:d4:78 (00:25:90:97:d4:78)
    Option: (97) UUID/GUID-based Client Identifier
        Client Identifier (UUID): 00000000-0000-0000-0000-00259097d478
    Option: (54) DHCP Server Identifier
        DHCP Server Identifier: 1.1.1.250 (1.1.1.250)
    Option: (50) Requested IP Address
        Requested IP Address: 1.1.1.1 (1.1.1.1)

ipxe chainload to dynamic hanlon config

At this point hanlon.ipxe is ready to chain load whatever iPXE-boot script /hanlon/api/v1/boot and ProjectHanlon::Engine.instance.boot_checkin respond with.

# this functionality is used during the iPXE boot process to retrieve the appropriate
# iPXE-boot script for a given node based on it's hardware ID nad the MAC address of the
# interface that it received it's DHCP assignment from)

dhcp_mac is required, and uuid OR mac_id be passed in:

# iPXE-boot script for a given node based on it's hardware ID nad the MAC address of the
# interface that it received it's DHCP assignment from)
# GET /boot
# Query for the boot script for a node
#   parameters:
#         required:
#           :dhcp_mac | String | The MAC address the DHCP NIC.
#         optional (although one of these two must be specified):
#           :uuid     | String | The UUID for the node (from the BIOS).
#           :mac_id   | String | The MAC addresses for the node's NICs.
#         allowed for backwards compatibility
#           (although will throw an error if used with 'mac_id')
#           :hw_id    | String | The MAC addresses for the node's NICs.

avi/v1/boot call

Here is hanlon.ipxe making the call to /hanlon/api/v1/boot with the nessary params. The request will be the same every boot, but the ipxe config it returns (to be chainloaded) depends on the logic in ProjectHanlon::Engine.boot_checkin

 28  21.739418      1.1.1.2 -> 1.1.1.6      GET /hanlon/api/v1/boot?uuid=00000000-0000-0000-0000-002590979f58&mac_id=00%3A25%3A90%3A97%3A9f%3A58_00%3A25%3A90%3A97%3A9f%3A59______&dhcp_mac=00%3A25%3A90%3A97%3A9f%3A58 HTTP/
1.1 
 30  22.073529      1.1.1.6 -> 1.1.1.2      HTTP/1.1 200 OK 

api/v1/boot response (not yet registered)

If the node doesn't exist or isn't bound to a model HanlonProject::Engine.instance.default_mk_boot is called, which uses ProjectHanlon::PolicyTemplate::BootMK.get_boot_script to generate the ipxe boot_script specific to rancheros.

Some global configuration options used by the mk boot_script:

#hanlon/web/config/hanlon_server.conf
# if you want to append extra args to your mk_kernel command line
hnl_mk_boot_kernel_args: 'debug info args to /proc/cmdline'
# can only be 'debug' OR 'quiet'
hnl_mk_boot_debug_level: 'debug'

Some options are pulled from the ProjectHanlon::Engine.default_mk is chosen based on ProjectHanlon::ImageService::MicroKernel.version_weight and some mk configuration options are set when you add the image:

    -k, --ssh-keyfile /path/to/key
        The local path to public key file (optional; mk images only) 
    -m, --mk-password PASSWORD
        The microkernel password (optional; mk images only)

(seems the weight doesn't take into account the version of the microkernel docker image, just the version of the rancher iso)

yaml text response (not yet registered)
#!ipxe
kernel http://1.1.1.6:8026/hanlon/api/v1/image/mk/2YAL74TcnUjaO6k8fqr6tX/boot/vmlinuz rancher.password=test1234 rancher.cloud_init.datasources=[url:http://1.1.1.6:8026/hanlon/api/v1/image/mk/2YAL74TcnUjaO6k8fqr6tX/cloud-config] smbios_uuid=00000000-0000-0000-0000-002590979F59 || goto error
initrd http://1.1.1.6:8026/hanlon/api/v1/image/mk/2YAL74TcnUjaO6k8fqr6tX/boot/initrd || goto error
boot || goto error

:error
echo ERROR, will reboot in 60 seconds
sleep 60
reboot

api/v1/image/mk/*/boot/{vmlinux,initrd}

The mk kernel and initrd are loaded:

273681 20955.142149      1.1.1.3 -> 1.1.1.6      GET /hanlon/api/v1/image/mk/2YAL74TcnUjaO6k8fqr6tX/boot/vmlinuz
276564 20955.464735      1.1.1.6 -> 1.1.1.3      HTTP/1.1 200 OK  (application/octet-stream)
276599 20955.474452      1.1.1.3 -> 1.1.1.6      GET /hanlon/api/v1/image/mk/2YAL74TcnUjaO6k8fqr6tX/boot/initrd
298408 20958.001851      1.1.1.6 -> 1.1.1.3      HTTP/1.1 200 OK  (application/octet-stream)

DHCP fires for the mk/rancheros boot itself

298441 20975.236781      0.0.0.0 -> 255.255.255.255 DHCP Discover
298442 20975.285992      0.0.0.0 -> 255.255.255.255 DHCP Request

And now the rancher.cloud_init.datasources kernel command is processed and calls api/v1/image/mk/IMAGE_UUID/cloud-config when rendered looks like:

write_files:
  - path: /container-tmp-files/first_checkin.yaml
    permissions: 644
    owner: root
    content: |
      --- true
  - path: /container-tmp-files/mk_conf.yaml
    permissions: 644
    owner: root
    content: |
      mk_register_path: /hanlon/api/v1/node/register
      mk_uri: http://1.1.1.6:8026
      mk_checkin_interval: 60
      mk_checkin_path: /hanlon/api/v1/node/checkin
      mk_checkin_skew: 5
      mk_fact_excl_pattern: (^facter.*$)|(^id$)|(^kernel.*$)|(^memoryfree$)|(^memoryfree_mb$)|(^operating.*$)|(^osfamily$)|(^path$)|(^ps$)|(^ruby.*$)|(^selinux$)|(^ssh.*$)|(^swap.*$)|(^timezone$)|(^uniqueid$)|(^.*uptime.*$)|(.*json_str$)
      mk_log_level: Logger::ERROR
  - path: /container-tmp-files/mk-version.yaml
    permissions: 644
    owner: root
    content: |
      --- 
      mk_version: 3.0.1_dirty
  - path: /opt/rancher/bin/listen-cmd-channel.sh
    permissions: 755
    owner: root
    content: |
      #!/bin/bash
      [ -d /container-tmp-files/cmd-channels ] || mkdir /container-tmp-files/cmd-channels
      [ -e /container-tmp-files/cmd-channels/node-state-channel ] || mkfifo /container-tmp-files/cmd-channels/node-state-channel
      while read msg < /container-tmp-files/cmd-channels/node-state-channel; do
        if [ "$msg" = "reboot" ]; then
          reboot
        elif [ "$msg" = "poweroff" ]; then
          poweroff
        else
          echo "message '$msg' unrecognized"
        fi
      done
  - path: /opt/rancher/bin/start-mk.sh
    permissions: 755
    owner: root
    content: |
      #!/bin/bash
      
      # download Microkernel image from Hanlon server
      cd /tmp
      wget http://1.1.1.6:8026/hanlon/api/v1/image/mk/2YAL74TcnUjaO6k8fqr6tX/hanlon-mk-image.tar
      # wait until docker daemon is running
      prev_time=0
      sleep_time=1
      while true; do
        # break out of loop if docker daemon is in process table
        ps aux | grep `cat /var/run/docker.pid` | grep -v grep 2>&1 > /dev/null && break
        tmp_val=$((prev_time+sleep_time))
        prev_time=$sleep_time
        sleep_time=$tmp_val
        sleep $sleep_time
      done
      # load Microkernel image and start the Microkernel
      docker load -i hanlon-mk-image.tar
      docker run --privileged=true --name=hnl_mk -v /proc:/host-proc:ro -v /dev:/host-dev:ro -v /sys:/host-sys:ro -v /container-tmp-files:/tmp -d --net host -t `docker images -q` /bin/bash -c '/usr/local/bin/hnl_mk_init.rb && read -p "waiting..."'
  - path: /opt/rancher/bin/start.sh
    permissions: 755
    owner: root
    content: |
      #!/bin/bash
      /opt/rancher/bin/listen-cmd-channel.sh &
      /opt/rancher/bin/start-mk.sh &

As a result of the cloud-config getting pulled, and the content being run, there are a few more calls to hanlon/api:

298446 20975.758119      1.1.1.3 -> 1.1.1.6      GET /hanlon/api/v1/image/mk/2YAL74TcnUjaO6k8fqr6tX/cloud-config 
298449 20975.775594      1.1.1.6 -> 1.1.1.3      HTTP/1.1 200 OK 
298452 20975.776415      1.1.1.3 -> 1.1.1.6      GET /hanlon/api/v1/image/mk/2YAL74TcnUjaO6k8fqr6tX/cloud-config 
298454 20975.787703      1.1.1.6 -> 1.1.1.3      HTTP/1.1 200 OK 
298459 20976.343729      0.0.0.0 -> 255.255.255.255 Discover
298460 20976.344203      0.0.0.0 -> 255.255.255.255 Request
298464 20979.779070      1.1.1.3 -> 1.1.1.6      GET /hanlon/api/v1/image/mk/2YAL74TcnUjaO6k8fqr6tX/hanlon-mk-image.tar
332603 20986.673012      1.1.1.6 -> 1.1.1.3      HTTP/1.1 200 OK  (application/octet-stream)
332625 20997.229760      1.1.1.3 -> 1.1.1.6      GET /hanlon/api/v1/node/checkin?uuid=00000000-0000-0000-0000-002590E24CC2&mac_id=002590E24CC2_002590E24CC3&last_state=idle&first_checkin=true
332627 20997.276116      1.1.1.6 -> 1.1.1.3      HTTP/1.1 200 OK  (application/json)
332639 20998.898263      1.1.1.3 -> 1.1.1.6      POST /hanlon/api/v1/node/register (text/json)
332642 20999.266620      1.1.1.6 -> 1.1.1.3      HTTP/1.1 201 Created  (application/json)
332652 21057.224632      1.1.1.3 -> 1.1.1.6      GET /hanlon/api/v1/node/checkin?uuid=00000000-0000-0000-0000-002590E24CC2&mac_id=002590E24CC2_002590E24CC3&last_state=idle
332654 21057.749997      1.1.1.6 -> 1.1.1.3      HTTP/1.1 200 OK  (application/json)

The cloud-config GETS api/v1/image/mk/IMAGE_UUID/hanlon-mk-image.tar, imports it into docker and runs it.

Creation of this tarball is documented at csc/Hanlon-Microkernel/Building-a-MK-Container.md

api/v1/node/checkin

# GET /node/checkin
# handle a node checkin (from a Hanlon Microkernel instance)
#   parameters:
#         required:
#           :last_state     | String | The "state" the node is currently in.
#         optional (although one of these two must be specified):
#           :uuid           | String | The UUID for the node (from the BIOS).
#           :mac_id         | String | The MAC addresses for the node's NICs.
#         optional
#           :first_checkin  | Boolean | Indicates if is first checkin (or not).

The initial response from checkin is:

The node will continue to GET api/v1/node/checkin based on the Settings for:

mk_checkin_interval: microkernel checkin interval (in seconds). mk_checkin_skew: maximum initial microkernel checkin splay time (in seconds).

Hanlon will send back a json config that contains some configuration and includes a 'response/command_name' dictating the next action the node should take.

{
  "resource": "ProjectHanlon::Slice::Node",
  "command": "checkin_node",
  "result": "Ok",
  "http_err_code": 200,
  "errcode": 0,
  "response": {
    "command_name": "register",
    "command_param": {}
  },
  "client_config": {
    "mk_checkin_interval": 60,
    "mk_checkin_skew": 5,
    "mk_log_level": "Logger::ERROR",
    "mk_fact_excl_pattern": "(^facter.*$)|(^id$)|(^kernel.*$)|(^memoryfree$)|(^memoryfree_mb$)|(^operating.*$)|(^osfamily$)|(^path$)|(^ps$)|(^ruby.*$)|(^selinux$)|(^ssh.*$)|(^swap.*$)|(^timezone$)|(^uniqueid$)|(^.*uptime.*$)|(.*json_str$)"
  }
}

When response/command_name is register HanlonMicroKernel::HnlMkRegistrationManager.register_node will POST a hash in json to api/v1/node/register containing the attribute_hash (facter) and last_state (current).

# POST /node/register
# register a node with Hanlon
#   parameters:
#     required:
#       last_state      | String | The "state" the node is currently in.
#       attributes_hash | Hash   | The attributes_hash of the node.
#     optional (although one of these two must be specified):
#       uuid            | String | The UUID for the node (from the BIOS).
#       mac_id          | String | The MAC addresses for the node's NICs.

Sitting at checkin

After initial checkin (and subsequent registration) the node will continue to GET api/v1/node/checkin, but the response/command_name could be one of a few:

checkin => acknowledge

When response/command_name is acknowledge just note that we are checked_in.

{
  "resource": "ProjectHanlon::Slice::Node",
  "command": "checkin_node",
  "result": "Ok",
  "http_err_code": 200,
  "errcode": 0,
  "response": {
    "command_name": "acknowledge",
    "command_param": {}
  }
}

checkin => reboot / poweroff

When the node matches a policy (via a tag), we'll be asked to reboot, we might also be asked to poweroff. These actions are handled via /opt/rancher/bin/listen-cmd-channel.sh created from hnl_mk cloud-config that ends up running on the node as:

#!/bin/bash
[ -d /container-tmp-files/cmd-channels ] || mkdir /container-tmp-files/cmd-channels
[ -e /container-tmp-files/cmd-channels/node-state-channel ] || mkfifo /container-tmp-files/cmd-channels/node-state-channel
while read msg < /container-tmp-files/cmd-channels/node-state-channel; do
  if [ "$msg" = "reboot" ]; then
    reboot
  elif [ "$msg" = "poweroff" ]; then
    poweroff
  else
    echo "message '$msg' unrecognized"
  fi
done

For example when we've been given a policy we'll at checkin we'll get:

{
  "resource": "ProjectHanlon::Slice::Node",
  "command": "checkin_node",
  "result": "Ok",
  "http_err_code": 200,
  "errcode": 0,
  "response": {
    "command_name": "reboot",
    "command_param": {}
  }
}

api/v1/boot for node w/ active_model

The normal boot process is applied the call to api/v1/boot.

Instead of the MicroKernel, an active_model is queried for it's boot_call.

Hanlon manages an active_models state via a finite state machine (think model instance?) as @current_state, which starts out as :init.

You can see the output of the finite state machine logs (stored for each active_model)

hanlon active_model 7lualum73Bv9IFb6w9RcOZ logs
Active Model Logs (7lualum73Bv9IFb6w9RcOZ):
      State              Action                   Result                Time     Last     Total             Node           
init               mk_call             n/a                            03:47:31  0 sec    0 sec     5ZPpLv8chMqWLoRXo4Dk83  
init               boot_call           Starting Redhat model install  03:48:33  1.0 min  1.0 min   5ZPpLv8chMqWLoRXo4Dk83  
init               kickstart_file      Replied with kickstart file    03:48:57  24 sec   1.4 min   5ZPpLv8chMqWLoRXo4Dk83  
init=>postinstall  kickstart_end       Acknowledged kickstart end     03:54:43  5.8 min  7.2 min   5ZPpLv8chMqWLoRXo4Dk83  
postinstall        postinstall_inject  n/a                            03:54:43  0 sec    7.2 min   5ZPpLv8chMqWLoRXo4Dk83  
postinstall        boot_call           Replied with os boot script    03:55:44  1.0 min  8.2 min   5ZPpLv8chMqWLoRXo4Dk83  
postinstall        boot_call           n/a                            03:57:10  1.4 min  9.7 min   5ZPpLv8chMqWLoRXo4Dk83  
postinstall        boot_call           n/a                            03:58:36  1.4 min  11.1 min  5ZPpLv8chMqWLoRXo4Dk83  

The mapping for each state and action (and the resulting state from that action) are stored in the fsm_tree for that model.

      def fsm_tree
        {
          :init => {
            :mk_call       => :init,
            :boot_call     => :init,
            :kickstart_start => :preinstall,
            :kickstart_file  => :init,
            :kickstart_end   => :postinstall,
            :timeout       => :timeout_error,
            :error         => :error_catch,
            :else          => :init
          },
          :preinstall => {
            :mk_call         => :preinstall,
            :boot_call       => :preinstall,
            :kickstart_start   => :preinstall,
            :kickstart_file    => :init,
            :kickstart_end     => :postinstall,
            :kickstart_timeout => :timeout_error,
            :error           => :error_catch,
            :else            => :preinstall
          },
          :postinstall => {
            :mk_call            => :postinstall,
            :boot_call          => :postinstall,
            :kickstart_end        => :postinstall,
            :postinstall_inject => :postinstall,
            :os_boot            => :postinstall,
            :os_final           => :os_complete,
            :post_error         => :error_catch,
            :post_timeout       => :timeout_error,
            :error              => :error_catch,
            :else               => :postinstall
          },
          :os_complete => {
            :mk_call   => :os_complete,
            :boot_call => :os_complete,
            :else      => :os_complete,
            :reset     => :init
          },
          :timeout_error => {
            :mk_call   => :timeout_error,
            :boot_call => :timeout_error,
            :else      => :timeout_error,
            :reset     => :init
          },
          :error_catch => {
            :mk_call   => :error_catch,
            :boot_call => :error_catch,
            :else      => :error_catch,
            :reset     => :init
          },
        }
      end

Look for boot_call in each core/model and note that usually the logic is normally something like:

      def boot_call(node, policy_uuid)
        super(node, policy_uuid)
        case @current_state
          when :init, :preinstall
            @result = "Starting THIS model install"
            ret = start_install(node, policy_uuid)
          when :postinstall, :os_complete, :broker_check, :broker_fail, :broker_success, :complete_no_broker
            ret = local_boot(node)
          when :timeout_error, :error_catch
            engine = ProjectHanlon::Engine.instance
            ret = engine.default_mk_boot(node.uuid)
          else
            engine = ProjectHanlon::Engine.instance
            ret = engine.default_mk_boot(node.uuid)
        end
        fsm_action(:boot_call, :boot_call)
        ret
      end

The ipxe script responses are generated based on the state (and logic) above usually resulting in a rending of an erb template.

core/model/redhat/7/boot_install.erb.

#!ipxe
echo Hanlon <%= @label %> model boot_call
echo Installation node UUID : <%= node.uuid %>
echo Installation image UUID: <%= @image_uuid %>
echo Active Model node state: <%= @current_state %>

sleep 3
kernel <%= "#{image_svc_uri}/#{@image_uuid}/#{kernel_path} #{kernel_args(policy_uuid)}" %> || goto error
initrd <%= "#{image_svc_uri}/#{@image_uuid}/#{initrd_path}" %> || goto error
boot

When iPXE runs this, is will grab image kernel and initrd. The kernel_args are also an erb template.

core/model/redhat/7/kernel_args.erb

<% if @node.dhcp_mac %>ksdevice=bootif BOOTIF=<%= @node.dhcp_mac %>  ks=<%= "#{api_svc_uri}/policy/callback/#{policy_uuid}/kickstart/file" %>
<% else %>ks=<%= "#{api_svc_uri}/policy/callback/#{policy_uuid}/kickstart/file" %>
<% end %>

The kernel args call back to the policy to load policy/callback/#{policy_uuid}kickstart/file.

Policies have a callback map

          "broker"      => :broker_agent_handoff,
          "kickstart"   => :kickstart_call,
          "postinstall" => :postinstall_call,

Callbacks to kickstart are handled looked up via core/model/redhat.rb:kickstart_call:

      def kickstart_call
        @arg = @args_array.shift
        case @arg
          when  "start"
            @result = "Acknowledged kickstart read"
            fsm_action(:kickstart_start, :kickstart)
            return "ok"
          when "end"
            @result = "Acknowledged kickstart end"
            fsm_action(:kickstart_end, :kickstart)
            return "ok"
          when "file"
            @result = "Replied with kickstart file"
            fsm_action(:kickstart_file, :kickstart)
            return generate_kickstart(@policy_uuid)
          else
            return "error"
        end

The kickstart/file callback is generate_kickstart

      def generate_kickstart(policy_uuid)
        # TODO: Review hostname
        hostname = "#{@hostname_prefix}#{@counter.to_s}"
        filepath = template_filepath('kickstart')
        ERB.new(File.read(filepath)).result(binding)
      end

It's interesting to note that the active_models have a count of how many nodes have registered, and set the often set the hostname for the install based on that count. Then return the rendered core/model/redhat/7/kickstart.erb

#!/bin/bash
# Kickstart for RHEL 7
# see: https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/Installation_Guide/sect-kickstart-syntax.html

install
url --url=http://<%= config.hanlon_server %>:<%= config.api_port %><%= config.websvc_root %>/image/os/<%= @image_uuid %>
text
lang en_US.UTF-8
keyboard us
rootpw <%= @root_password %>
network --hostname <%= hostname %>
firewall --service=ssh
authconfig --enableshadow --passalgo=sha512 --enablefingerprint
selinux --disabled
timezone --utc America/Denver
bootloader --location=mbr --driveorder=sda --append=crashkernel=auto rhgb quiet
clearpart --all
zerombr
# Partitioning scheme
<%= partition_scheme %>
# reboot automatically
reboot

# All packages below @core are required for facter
#
%packages --nobase
@core
%end

%post --log=/root/hanlon-post.log
curl <%= api_svc_uri %>/policy/callback/<%= policy_uuid %>/kickstart/end
curl -o /tmp/hanlon_postinstall.sh <%= api_svc_uri %>/policy/callback/<%= policy_uuid %>/postinstall/inject
echo bash /tmp/hanlon_postinstall.sh >> /etc/rc.local
chmod +x /tmp/hanlon_postinstall.sh
# This line needs to be added otherwise rc.local will not execute on boot
chmod +x /etc/rc.d/rc.local
%end

The kickstart script sets the base url for the rest of the install to the base dir of the image.

In the %post section of the kickstart there are two callbacks to the policy.

One is just to log the kickstart_end, which promotes our state from :preinstall to :postinstall due to the fsm_tree map:

          :preinstall => {
            :mk_call         => :preinstall,
            :boot_call       => :preinstall,
            :kickstart_start   => :preinstall,
            :kickstart_file    => :init,
            :kickstart_end     => :postinstall,
            :kickstart_timeout => :timeout_error,
            :error           => :error_catch,
            :else            => :preinstall
          },

The second one will render a script that should get run on next boot. (the installs are normally set to reboot automatically).

Now that we are in :postinstall, we should look at the postinstall_call callbacks.

      def postinstall_call
        @arg = @args_array.shift
        case @arg
          when "inject"
            fsm_action(:postinstall_inject, :postinstall)
            return os_boot_script(@policy_uuid)
          when "boot"
            fsm_action(:os_boot, :postinstall)
            return os_complete_script(@node)
          when "final"
            fsm_action(:os_final, :postinstall)
            return ""
          when "send_ips"
            # Grab IP string
            @ip_string = @args_array.shift
            logger.debug "Node IP String: #{@ip_string}"
            @node_ip = @ip_string if @ip_string =~ /\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b/
            return
          else
            fsm_action(@arg.to_sym, :postinstall)
            return
        end
      end

Our second callback to os_boot_script to is to inject some commands to run after the machine is successfully kickstarted during it's first os_boot.

      def os_boot_script(policy_uuid)
        @result = "Replied with os boot script"
        filepath = template_filepath('os_boot')
        ERB.new(File.read(filepath)).result(binding)
      end

Which renders the os_boot.rb ERB template.

#!/bin/bash

# Wait for network to come up when using NetworkManager.
if service NetworkManager status >/dev/null 2>&1 && type -P nm-online; then
    nm-online -q --timeout=10 || nm-online -q -x --timeout=30
    [ "$?" -eq 0 ] || exit 1
fi

# Configure hostname.
hostname <%= hostname %>
echo <%= hostname %> > /etc/hostname

# This set of commands should convert the first local (but non-loopback) IP
# address in the /etc/hosts file to an entry that has the fully-qualified
# hostname and local hostname as part of the entry (so that tehse names can
# be resolved properly). A backup of the original file will be left in place
# in the /etc/hosts- file
cp -p /etc/hosts /etc/hosts-

# Modified for RHEL7, I can't imagine there would be multiple interaces and have multiple default gateways 
 
default_gw_device=`ip route show | grep 'default' | awk '{print $5}'`
node_ip=`ip addr show $default_gw_device | grep 'inet ' | awk -F'[ /]' '{print $6}'`
echo "$node_ip<%="\t"%><%= hostname %>.<%= domainname %> <%= hostname %>" >> /etc/hosts

[ "$?" -eq 0 ] && curl <%= callback_url("postinstall", "set_hostname_ok") %> || curl <%= callback_url("postinstall", "set_hostname_fail") %>

# Send IP up
curl <%= callback_url("postinstall", "send_ips") %>/$node_ip
# get final script
curl <%= callback_url("postinstall", "boot") %> | sh
# Send final state
curl <%= callback_url("postinstall", "final") %> &

callback postinstall/send_ips sets the model's @node_ip so we can hand it off to a broker if needed.

callback postinstall/boot runs os_complete_script

      def os_complete_script(node)
        @result = "Replied with os complete script"
        filepath = template_filepath('os_complete')
        ERB.new(File.read(filepath)).result(binding)
      end

which renders os_complete.rb

#!/bin/bash
echo Hanlon policy successfully applied > /tmp/hanlon_complete.log
echo Model <%= @label %> - <%= @description %> >> /tmp/hanlon_complete.log
echo Image UUID <%= @image_uuid %> >> /tmp/hanlon_complete.log
echo Node UUID: <%= @node.uuid %> >> /tmp/hanlon_complete.log

sed -i --follow-symlinks '/hanlon_postinstall/d' /etc/rc.local

This basically logs that the machine came up from the host perspective, and removes the hanlon_postinstall from /etc/rc.local.

callback postinstall/final runs should move active_model to os_complete.

You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Press h to open a hovercard with more details.