Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: (or docs issue): Games Example doesn't work on zarf 0.13.3 when using cert imported from Lets Encrypt Free CA #193

Closed
neoakris opened this issue Dec 8, 2021 · 11 comments · Fixed by #196

Comments

@neoakris
Copy link

neoakris commented Dec 8, 2021

Edit -- TLDR summary of findings:
Zarf fails to populate it's container registry, when imported HTTPS certs are used that don't allow for 127.0.0.1 (so public internet CA signed ones won't work), but zarf's self generated HTTPS certs will work.

Summary:
So about the docs... I've messed with zarf several times, but have never been able to get anything beyond zarf init to work correctly. (which offers an empty git repo + empty registry)
(This is why I didn't realize that zarf can populates it's registry + git repo, when I talked over zoom with @jeff-mccoy)
I tried the game example doc (with and without slight modifications) and I get image pull backoff, which shows the registry never got populated.

Before I go into steps to reproduce the bug I think you'll find my desired use case valuable.



Background context info about my ultimate use case:

  1. I have a script to imperatively generate a declarative images.txt list of 20-100 images (depending on input parameters)
  2. I plan to write a script to convert images.txt to a config like that can declaratively populate a registry. Basically script + images.txt ---> imperatively generated declarative zarf.yaml
  3. Said script will look similar to this (was investigating rancher's hauler but it doesn't work with authenticated registries, but this will give you a concrete idea of where my thinking is at / how I plan to imperatively generate a declarative zarf.yaml)
export TEMPLATIZED_IMAGES_LIST=$(cat imperatively-generated-bb-images.txt | sed 's/^/    - ref: /' )
echo "$TEMPLATIZED_IMAGES_LIST"

cat > declarative-hauler-config.yaml <<EOF
apiVersion: content.hauler.cattle.io/v1alpha1
kind: Images
metadata:
  name: bigbang-images
spec:
  images:
$TEMPLATIZED_IMAGES_LIST
EOF

hauler store sync -f declarative-hauler-config.yaml
  • Side Note: Can you bump registry:2's default PVC size from 10GB to 100GB? (It'll help my use case + won't hurt anything since local-path storage is thin provisioned)



Copy Paste-able Reproducibility commands:

  • I use vagrant to provision a fresh centos7 VM
# provisioned fresh rhel7 for this and ssh'd in
# install git
sudo yum install git -y

# become root / per zarf docs
# I'm assuming all commands need to be run as root
sudo su - 

# install zarf 0.13.3 and zarf's dependencies artifact tar
cd ~
curl -L https://zarf-public.s3-us-gov-west-1.amazonaws.com/release/v0.13.3/zarf > zarf-cli
chmod +x zarf-cli
sudo mv zarf-cli /usr/bin/zarf
curl -L https://zarf-public.s3-us-gov-west-1.amazonaws.com/release/v0.13.3/zarf-init.tar.zst > zarf-init.tar.zst #538mb

# Install yq
export VERSION=v4.14.1
export BINARY=yq_linux_amd64
sudo wget https://github.com/mikefarah/yq/releases/download/${VERSION}/${BINARY} -O /usr/bin/yq && sudo chmod +x /usr/bin/yq

# install jq
sudo yum install jq -y

# verify yq / jq installed correctly
yq --version
jq --version

# pull dev cert/key pair for *.bigbang.dev
curl -L https://repo1.dso.mil/platform-one/big-bang/bigbang/-/raw/master/chart/ingress-certs.yaml | yq eval '.istio.gateways.public.tls.cert' - > bigbang.dev.crt
curl -L https://repo1.dso.mil/platform-one/big-bang/bigbang/-/raw/master/chart/ingress-certs.yaml | yq eval '.istio.gateways.public.tls.key' - > bigbang.dev.key

# Install zarf 0.13.3 & grab the zarf-init bootstrap artifact
cd ~
curl -L https://zarf-public.s3-us-gov-west-1.amazonaws.com/release/v0.13.3/zarf > zarf-cli
chmod +x zarf-cli
sudo mv zarf-cli /usr/bin/zarf

curl -L https://zarf-public.s3-us-gov-west-1.amazonaws.com/release/v0.13.3/zarf-init.tar.zst > zarf-init.tar.zst #538mb, takes about 1m20sec

# Pre-Login to registry1.dso.mil per zarf docs
export REGISTRY1_USERNAME=REPLACE_ME
export REGISTRY1_PASSWORD=REPLACE_ME
zarf tools registry login registry1.dso.mil -u $REGISTRY1_USERNAME -p $REGISTRY1_PASSWORD
# Lets you login w/o dependency on docker being installed, nice

# init zarf
cd ~
zarf init --server-key=$HOME/bigbang.dev.key --server-crt=$HOME/bigbang.dev.crt --host=bigbang.dev --components=management,gitops-service --confirm 
# ^-- takes about 20 secs

kubectl get pod -A  # zarf installs kubectl, just do this until it looks running

# Let's verify zarfs registry is empty b4 proceeding
# / Test Driven Development / helps to see what before and after look like
echo "127.0.0.1 registry.bigbang.dev" | tee -a /etc/hosts
export DOCKER_USER=$(sudo cat /root/.docker/config.json | jq '.auths."127.0.0.1".auth' | tr -d '"' | base64 -d | cut -d ':' -f 1) 
export DOCKER_PASS=$(sudo cat /root/.docker/config.json | jq '.auths."127.0.0.1".auth' | tr -d '"' | base64 -d | cut -d ':' -f 2)
zarf tools registry login registry.bigbang.dev -u $DOCKER_USER -p $DOCKER_PASS
zarf tools registry catalog registry.bigbang.dev
# (shows zarf cli splash, but under zarf is blank --implies--> empty registry)


# Zarf deploy the Game Example
cd ~
git clone https://github.com/defenseunicorns/zarf.git
cd ~/zarf/examples/game
zarf package create --confirm
# ^-- 8 seconds
zarf package deploy zarf-package-appliance-demo-doom.tar.zst --confirm
# INFO[0000] Deploy Zarf package confirmed
# INFO[0000] Loading dynamic config                        path=/tmp/zarf-768846375/zarf.yaml
# INFO[0000] Deploying Zarf component                      name=baseline
# INFO[0000] Loading images for local install
# INFO[0000] Loading images for gitops service transfer
# INFO[0000] Loading images
# INFO[0000] Updating image                                image="registry.dso.mil/platform-one/big-bang/apps/product-tools/zarf/game:doom"
# INFO[0000] 127.0.0.1/platform-one/big-bang/apps/product-tools/zarf/game:doom
# WARN[0000] Unable to push the image to the registry      image="registry.dso.mil/platform-one/big-bang/apps/product-tools/zarf/game:doom"
# INFO[0000] Loading manifests for local install, this may take a minute or so to reflect in k3s
# INFO[0000] Processing manifest file                      path=/tmp/zarf-768846375/components/baseline/manifests/game.yaml
# INFO[0000] Processing manifest file                      path=/tmp/zarf-768846375/components/baseline/manifests/image-pull-secret.yaml
# INFO[0000] Copying file                                  Destination=/var/lib/rancher/k3s/server/manifests Source=/tmp/zarf-768846375/components/baseline/manifests
# INFO[0000] Cleaning up temp files
########################################

# The WARN mentions an Error
# The Error is that it tried to push to registry.dso.mil  (repo1's registry built into gitlab)
# Why is zarf trying to do that? Bug? or config error?


# zarf's registry is still blank
zarf tools registry catalog registry.bigbang.dev
# (shows zarf cli splash, but under zarf is blank --implies--> empty registry)


kubectl get pod -A
# NAMESPACE  NAME                    READY    STATUS             RESTARTS   AGE
# default    game-69f5486bff-xjl7x   0/1      ImagePullBackOff   0          4m31s```




@RothAndrew
Copy link
Member

RothAndrew commented Dec 8, 2021

One of our issues right now is Zarf is changing faster than the docs can keep up. Mostly expected given the early beta mode we are in, but still a pain.

Let's create a new issue to track to update the docs related to the doom game demo (with @neoakris as a user/stakeholder/guinea pig), and use this issue to track the outlined use case.

@edengebrezgi and @wadedesir (with @YrrepNoj 's help as needed), can you please:

  1. Create a new Zarf issue to track updating the docs related to the Doom game demo so that they make sense and they work on @neoakris 's machine
  2. Add it to the roadmap project, in the "Now" column
  3. Work the issue (put out a Draft Pull Request for transparent progress and early feedback)
  4. Engage with @neoakris and maybe 1-2 other internal team members to validate that the docs are well understood and able to be used successfully, with the yardstick being "I have Doom running in a Zarf cluster on my machine using nothing but the docs and without needing my hand held through the process"

@RothAndrew
Copy link
Member

RothAndrew commented Dec 8, 2021

I suspect there's something else going on here than just the doom example not working. The use case laid out here does extra things that aren't part of the base demo.

Where is registry.bigbang.dev coming from? If you zarf init --host bigbang.dev the registry will be at https://bigbang.dev/v2/

Edit: Ah there it is, you're adding it to your hosts file

@RothAndrew
Copy link
Member

I'm working on recreating the way you're doing it (it looks to be working fine when when you don't import certs and use localhost as the host, so I'm guessing there's something up with that part of it.

nslookup bigbang.dev is returning 127.0.0.53 rather than what I'm expecting (127.0.0.1). That's of interest

Running zarf tools registry catalog 127.0.0.1 returns

Error: reading repos for 127.0.0.1: Get "https://127.0.0.1/v2/": x509: cannot validate certificate for 127.0.0.1 because it doesn't contain any IP SANs; Get "https://127.0.0.1/v2/": x509: cannot validate certificate for 127.0.0.1 because it doesn't contain any IP SANs

I thought we fixed that by adding the IP SAN it was looking for for 127.0.0.1, so I'm not expecting to see an x509 error here. Will look into it.

@RothAndrew
Copy link
Member

Addl note: We should create docs for doing the doom example with an imported cert like we do here, as a step for a more advanced user.

@RothAndrew
Copy link
Member

If I add 127.0.0.1 bigbang.dev to /etc/hosts when I run zarf tools registry catalog bigbang.dev I get

Error: reading repos for bigbang.dev: Get "https://bigbang.dev/v2/": x509: certificate is valid for *.bigbang.dev, not bigbang.dev

@neoakris
Copy link
Author

neoakris commented Dec 8, 2021

If I add 127.0.0.1 bigbang.dev to /etc/hosts when I run zarf tools registry catalog bigbang.dev I get

Error: reading repos for bigbang.dev: Get "https://bigbang.dev/v2/": x509: certificate is valid for *.bigbang.dev, not bigbang.dev

I added registry.bigbang.dev to /etc/hosts

It's why the following works

export DOCKER_USER=$(sudo cat /root/.docker/config.json | jq '.auths."127.0.0.1".auth' | tr -d '"' | base64 -d | cut -d ':' -f 1) 
export DOCKER_PASS=$(sudo cat /root/.docker/config.json | jq '.auths."127.0.0.1".auth' | tr -d '"' | base64 -d | cut -d ':' -f 2)
zarf tools registry login registry.bigbang.dev -u $DOCKER_USER -p $DOCKER_PASS
zarf tools registry catalog registry.bigbang.dev

Also the cert / key pair I pull in my copy pasteable example is from a public repo, and it's a Lets Encrypt Free Wildcard Cert / Signed by Public Internet CA
and registry.bigbang.dev matches *.bigbang.dev

The reason you got cert invalid is because you didn't copy paste my reproducible example / you used bigbang.dev instead of registry.bigbang.dev

@RothAndrew
Copy link
Member

ahh gotcha, I didn't realize the cert didn't include the base bigbang.dev. That makes sense.

So I think we're left with why zarf tools registry catalog 127.0.0.1 isn't working (it should). My belief is if we fix that we'll fix the issue with why the doom image couldn't be pushed to the registry (since Zarf uses 127.0.0.1 under the hood when doing the pushing regardless of what you set as the host)

@neoakris
Copy link
Author

neoakris commented Dec 8, 2021

it looks to be working fine when when you don't import certs and use localhost as the host

I mentioned in the original post "I tried the game example doc (with and without slight modifications)"

Without slight modifications from your directions was me trying it with zarf's auto generated cert, but I couldn't figure out how to get it to work either, since you had success with that route I'll revisit it. / I'll refresh my environment and try it with the auto generated certs again to see if I fair any better this time.

@jeff-mccoy
Copy link
Member

That could be the problem, in that version the injected certs include 127.0.0.1, a generated cert wouldn't normally. We might need to update docs or wait for the native apply which makes that point irrelevant.

@RothAndrew
Copy link
Member

oh! That's a good point. I didn't think of that. That's the reason for sure. When we generate the cert we include 127.0.0.1 but we aren't generating the cert in this case.

I'm writing up an issue to add an E2E test for using an imported cert and a host other than localhost so this use case is automatically tested in the future. Sounds like we are gonna have to fix something though.

@neoakris
Copy link
Author

neoakris commented Dec 8, 2021

I must have typo'd some where the first time I tried it using a zarf generated cert and couldn't get it to work

I got it to work by retrying from scratch using the following slight changes.

zarf init
# Selected               ---> Generate TLS chain with an ephmeral CA
# host or IP for ingress ---> 127.0.0.1
# yes to all

zarf tools registry login 127.0.0.1 -u $USER -p $PASS
zarf tools registry catalog 127.0.0.1
# platform-one/big-bang/apps/product-tools/zarf/game
# working :)

I'll update the title to clarify imported cert

@neoakris neoakris changed the title Bug: (or docs issue): Games Example doesn't work on zarf 0.13.3 Bug: (or docs issue): Games Example doesn't work on zarf 0.13.3 when using cert imported from Lets Encrypt CA Dec 8, 2021
@neoakris neoakris changed the title Bug: (or docs issue): Games Example doesn't work on zarf 0.13.3 when using cert imported from Lets Encrypt CA Bug: (or docs issue): Games Example doesn't work on zarf 0.13.3 when using cert imported from Lets Encrypt Free CA Dec 8, 2021
jeff-mccoy added a commit that referenced this issue Dec 11, 2021
)

### Breaking Changes:
* `localhost` is no longer a valid option for cluster ingress when initializing a zarf cluster. Instead you have to use a `127.0.0.1` or some other local ip found via `ifconfig`

### Fixes:
* No longer depends on 127.0.0.1 local bindings for the registry / gitops service
    * should fix #193 
* Resolve outstanding issues with image hostname swapping and
    * fixes #18
    * fixes #44
    * fixes #194

### Features:
* Adds `before` and `after` script options when defining a `zarf.yaml` with an optional retry flag
* Add symlink to ZarfFile for creating links to places files
* Add template boolean to ZarfFile to allow injection of zarf variables into text files
* Adds a new `zarf tool` command to print out config schema and commit the output to the repo (will need to make a git hook or something later on)
* Changes `zarf destroy` command to run any script that starts with `zarf-clean` instead of only running the k3s-remove script
* Add new ZarfState and `.zarf-state.yaml` for persisting host information from `zarf init` to `zarf package deploy`
* Remove all hard-coded logic for k3s install, now uses only standard zarf component features like everything else
* Add user prompt with host/IP address suggestions for ingress

#### Misc:
* Upgrades k3s from v1.21.2 to v1.21.6
* Adds optional regex filter for when performing RecursiveFileList()
* Adds more description to the components in zarf.yaml
* Renames type ZarfConfig to ZarfPackage in the config pkg
* Handful of general code organizing changes (moving yaml related functions to the `...../utils/yaml.go`, etc.)
* Expose execCommand() with stdout control
* Move traefik to standalone component and drop the internal k3s install of traefik
* Use the airgap tarball of K3s instead of manually listing images
* Cleanup init prompt logic
jeff-mccoy added a commit that referenced this issue Feb 8, 2022
)

### Breaking Changes:
* `localhost` is no longer a valid option for cluster ingress when initializing a zarf cluster. Instead you have to use a `127.0.0.1` or some other local ip found via `ifconfig`

### Fixes:
* No longer depends on 127.0.0.1 local bindings for the registry / gitops service
    * should fix #193
* Resolve outstanding issues with image hostname swapping and
    * fixes #18
    * fixes #44
    * fixes #194

### Features:
* Adds `before` and `after` script options when defining a `zarf.yaml` with an optional retry flag
* Add symlink to ZarfFile for creating links to places files
* Add template boolean to ZarfFile to allow injection of zarf variables into text files
* Adds a new `zarf tool` command to print out config schema and commit the output to the repo (will need to make a git hook or something later on)
* Changes `zarf destroy` command to run any script that starts with `zarf-clean` instead of only running the k3s-remove script
* Add new ZarfState and `.zarf-state.yaml` for persisting host information from `zarf init` to `zarf package deploy`
* Remove all hard-coded logic for k3s install, now uses only standard zarf component features like everything else
* Add user prompt with host/IP address suggestions for ingress

#### Misc:
* Upgrades k3s from v1.21.2 to v1.21.6
* Adds optional regex filter for when performing RecursiveFileList()
* Adds more description to the components in zarf.yaml
* Renames type ZarfConfig to ZarfPackage in the config pkg
* Handful of general code organizing changes (moving yaml related functions to the `...../utils/yaml.go`, etc.)
* Expose execCommand() with stdout control
* Move traefik to standalone component and drop the internal k3s install of traefik
* Use the airgap tarball of K3s instead of manually listing images
* Cleanup init prompt logic

Signed-off-by: Jeff McCoy <code@jeffm.us>
Noxsios pushed a commit that referenced this issue Mar 8, 2023
)

### Breaking Changes:
* `localhost` is no longer a valid option for cluster ingress when initializing a zarf cluster. Instead you have to use a `127.0.0.1` or some other local ip found via `ifconfig`

### Fixes:
* No longer depends on 127.0.0.1 local bindings for the registry / gitops service
    * should fix #193
* Resolve outstanding issues with image hostname swapping and
    * fixes #18
    * fixes #44
    * fixes #194

### Features:
* Adds `before` and `after` script options when defining a `zarf.yaml` with an optional retry flag
* Add symlink to ZarfFile for creating links to places files
* Add template boolean to ZarfFile to allow injection of zarf variables into text files
* Adds a new `zarf tool` command to print out config schema and commit the output to the repo (will need to make a git hook or something later on)
* Changes `zarf destroy` command to run any script that starts with `zarf-clean` instead of only running the k3s-remove script
* Add new ZarfState and `.zarf-state.yaml` for persisting host information from `zarf init` to `zarf package deploy`
* Remove all hard-coded logic for k3s install, now uses only standard zarf component features like everything else
* Add user prompt with host/IP address suggestions for ingress

#### Misc:
* Upgrades k3s from v1.21.2 to v1.21.6
* Adds optional regex filter for when performing RecursiveFileList()
* Adds more description to the components in zarf.yaml
* Renames type ZarfConfig to ZarfPackage in the config pkg
* Handful of general code organizing changes (moving yaml related functions to the `...../utils/yaml.go`, etc.)
* Expose execCommand() with stdout control
* Move traefik to standalone component and drop the internal k3s install of traefik
* Use the airgap tarball of K3s instead of manually listing images
* Cleanup init prompt logic

Signed-off-by: Jeff McCoy <code@jeffm.us>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants