Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve reliability of settings.network.hostname generator #2647

Merged
merged 3 commits into from
Dec 14, 2022

Conversation

cbgbt
Copy link
Contributor

@cbgbt cbgbt commented Dec 9, 2022

Issue number:

Closes #2585
Closes #2600

Description of Changes:
Alongside the title, this also introduces improved settings-generator logging from sundog.


    netdog: try harder to determine the hostname
    
    Failure scenarios when resolving a host's hostname are common enough to
    cause issues. This adds retries to our DNS queries for the current
    hostname.
    
    On AWS variants, we also attempt to query IMDS as a fallback mechanism.

---

    imdsclient: add method for fetching hostname

---

    sundog: always log settings generators' stderr

Just as a note, adding tokio and imdsclient moves netdog's binary size from 1.4M -> 1.6M.

Testing done:
I blocked DNS resolution in my Bottlerocket VPC at the network level, then ran netdog generate-hostname on the aws-ecs-1 variant, noting that hostname generation succeeded despite DNS resolution failing.

Note that the first line is output to stderr, and the second to stdout.

bash-5.1# netdog generate-hostname
Reverse DNS lookup failed: failed to lookup address information: Temporary failure in name resolution
"ip-192-168-19-221.us-west-2.compute.internal"

I also tested sundog logging by attempting to boot with partial DNS blocking and checking the systemd journal:

Dec 09 23:38:50 localhost sundog[400]: 23:38:50 [INFO] Sundog started
Dec 09 23:38:50 localhost sundog[400]: 23:38:50 [INFO] Retrieving setting generators
Dec 09 23:38:50 localhost sundog[400]: 23:38:50 [INFO] Retrieving settings values
Dec 09 23:38:50 localhost sundog[400]: 23:38:50 [INFO] Setting generator command 'shibaken' stderr: 23:38:50 [INFO] shibaken started
Dec 09 23:38:50 localhost sundog[400]: 23:38:50 [INFO] Setting generator command 'shibaken' stderr: 23:38:50 [INFO] Received meta-data/services/partition
Dec 09 23:38:50 localhost sundog[400]: 23:38:50 [INFO] Setting generator command 'shibaken' stderr: 23:38:50 [INFO] shibaken started
Dec 09 23:38:50 localhost sundog[400]: 23:38:50 [INFO] Setting generator command 'shibaken' stderr: 23:38:50 [INFO] Connecting to IMDS
Dec 09 23:38:50 localhost sundog[400]: 23:38:50 [INFO] Setting generator command 'shibaken' stderr: 23:38:50 [INFO] Fetching list of available public keys from IMDS
Dec 09 23:38:50 localhost sundog[400]: 23:38:50 [INFO] Setting generator command 'shibaken' stderr: 23:38:50 [INFO] Received meta-data/public-keys
Dec 09 23:38:50 localhost sundog[400]: 23:38:50 [INFO] Setting generator command 'shibaken' stderr: 23:38:50 [INFO] Generating targets to fetch text of available public keys
Dec 09 23:38:50 localhost sundog[400]: 23:38:50 [INFO] Setting generator command 'shibaken' stderr: 23:38:50 [INFO] Fetching public key (1/1)
Dec 09 23:38:50 localhost sundog[400]: 23:38:50 [INFO] Setting generator command 'shibaken' stderr: 23:38:50 [INFO] Received meta-data/public-keys/0/openssh-key
Dec 09 23:38:50 localhost sundog[400]: 23:38:50 [INFO] Setting generator command 'shibaken' stderr: 23:38:50 [INFO] Generating user-data
Dec 09 23:38:50 localhost sundog[400]: 23:38:50 [INFO] Setting generator command 'shibaken' stderr: 23:38:50 [INFO] Encoding user-data
Dec 09 23:38:50 localhost sundog[400]: 23:38:50 [INFO] Setting generator command 'shibaken' stderr: 23:38:50 [INFO] Outputting base64-encoded user-data
Dec 09 23:38:50 localhost sundog[400]: 23:38:50 [INFO] Sending settings values to the API
Dec 09 23:38:50 localhost systemd[1]: Finished User-specified setting generators.

Terms of contribution:

By submitting this pull request, I agree that this contribution is dual-licensed under the terms of both the Apache License, version 2.0, and the MIT license.

@cbgbt cbgbt marked this pull request as ready for review December 9, 2022 17:28
if !result.stderr.is_empty() {
let cmd_stderr = String::from_utf8_lossy(&result.stderr);
for line in cmd_stderr.lines() {
info!("{}: {}", command, line);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: perhaps the log could be a little bit more verbose so we know exactly what we're looking at in the logs. It could get a little confusing in the context of the whole journal.

info!("Setting generator command '{}' stderr: {}", command, line);

@cbgbt
Copy link
Contributor Author

cbgbt commented Dec 9, 2022

^ Addresses feedback from @zmrow

Copy link
Contributor

@zmrow zmrow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😶‍🌫️

.ok();

// On AWS variants, we fallback to inspecting IMDS.
#[cfg(variant_platform = "aws")]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This gives me pause. Elsewhere we have gone out of our way to segregate conditionally-compiled code from unconditionally-compiled code: https://github.com/bottlerocket-os/bottlerocket/blob/d20ea8efe911214ea635da81c2e8fed9c9703c77/sources/api/early-boot-config/src/provider.rs#L8..L21

I'm not sure that makes sense here, but I'm thinking about a case where we would want to query differently for a different provider (vmware? or a different cloud, for example), and ending up with a string of sections of code with compiler guards.

Perhaps a compromise would be something like this:
(maybe it's too much trouble, but it's an idea)

    let hostname: Option<String> = Retry::spawn(retry_strategy(), || async { lookup_addr(&ip) })
        .await
        .or_else(|e| {
            eprintln!("Reverse DNS lookup failed: {}", e);
            ip_string
            Err(e)
        })
        .ok()
        .or_else(|| query_provider_hostname().await)

elsewhere:

async fn query_provider_hostname() -> Option<String> {
    #[cfg(variant_platform = "aws")]
    query_imds_hostname().await

    #[cfg(not(variant_platform = "aws"))]
    None
}

#[cfg(variant_platform = "aws")]
/// On AWS variants, we fallback to inspecting IMDS.
async fn query_imds_hostname() -> Option<String> {
    todo!()
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had similar worries but didn't give it enough thought to land at this strategy. I like it, I'll do this.

Copy link
Member

@gthao313 gthao313 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉

@cbgbt
Copy link
Contributor Author

cbgbt commented Dec 13, 2022

  • Sequestered conditionally compiled code into its own module per a recommendation from @webern
  • Reduced the number of DNS lookups to 3 after noting that it takes ~5 seconds for glibc to timeout these requests

@cbgbt
Copy link
Contributor Author

cbgbt commented Dec 13, 2022

^ One more change to add a missing docstring.

Failure scenarios when resolving a host's hostname are common enough to
cause issues. This adds retries to our DNS queries for the current
hostname.

On AWS variants, we also attempt to query IMDS as a fallback mechanism.
@cbgbt
Copy link
Contributor Author

cbgbt commented Dec 14, 2022

I forgot to run clippy 🤦

@cbgbt cbgbt merged commit 138d192 into bottlerocket-os:develop Dec 14, 2022
@cbgbt cbgbt deleted the hostname-resolution branch August 15, 2023 23:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

sundog should log stderr even if the generator process returns 0 Hostname resolution
4 participants