Skip to content

SRE-3737 ci: HOT FIX - remove proxy related settings for Fault Injection testing stage#508

Merged
phender merged 1 commit intomasterfrom
grom72/SRE-3737-no-proxy-for-Fault-Injection-testing
Apr 23, 2026
Merged

SRE-3737 ci: HOT FIX - remove proxy related settings for Fault Injection testing stage#508
phender merged 1 commit intomasterfrom
grom72/SRE-3737-no-proxy-for-Fault-Injection-testing

Conversation

@grom72
Copy link
Copy Markdown
Contributor

@grom72 grom72 commented Apr 22, 2026

SRE-3737 ci: HOT FIX Fault Injection without proxy

Fault injection testing doesn't access the internet, so you don't need to use the https_proxy or no_proxy variables.

This PR is verified by daos-stack/daos#18024.

This PR must land before daos-stack/daos#18024.

Fault Injection testing shall not access internet so neither
https_proxy nor no_proxy variables are required.

Signed-off-by: Tomasz Gromadzki <tomasz.gromadzki@hpe.com>
Copy link
Copy Markdown
Contributor

@janekmi janekmi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand this is only a temporary solution until we remove the proxy from our pipelines altogether, right?

@grom72 grom72 changed the title SRE-3734 ci: HOT FIX - remove proxy related settings for Fault Injection testing stage SRE-3737 ci: HOT FIX - remove proxy related settings for Fault Injection testing stage Apr 22, 2026
@grom72 grom72 added the forced-landing The PR has known failures or intentionally reduced testing, but should still be landed label Apr 22, 2026
println "DAOS_NO_PROXY: $DAOS_NO_PROXY"
ret_str += ' --build-arg DAOS_NO_PROXY="' + env.DAOS_NO_PROXY + '"'
}
if (!(env.STAGE_NAME?.contains('Fault injection'))) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should use an argument passed to dockerBuildArgs instead of triggering action based upon stage names.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is going be a phased approach, with the first just to get things working again, then the next to to work on a plan starting with documenting how external resources should be accessed in a portable and maintainable manner.

The first option to look for is if there is a Artifact server to be used, and we need to make the use of Artifactory or Nexus to be transparent. Which means we also need to have that set up for proxy mirroring. This gives the best performance and reliability for our lab.

It needs to be optional so that the code will work outside of our lab. And the code exposed to the user should hide if Artifactory or Nexus is used. This will a bit of planning and refactoring to roll in correctly.

Next option is that the code should be looking for a proxy is configured to be used, and that gets more complicated because the "noproxy" environment variable is unreliable to use. This is an option to support smaller volume shops.

For both these options we want to look for global configuration files that can be configured early in the script or node setup. Use of proxy environment variables should be avoided in production if at all possible because they have too broad of scope, and "noproxy" may not be sufficient to work around that issue. On a desktop system we can have a script look up the proxy server to do this configuration portably because that is how the proxy for a web browser is typically configure. Unknown if this discovery would work in the lab.

And then the final case where there are known specific proxy or artifact server used, such as a GitHub hosted runner, then we fall back to assuming direct access to the public Internet.

println "DAOS_NO_PROXY: $DAOS_NO_PROXY"
ret_str += ' --build-arg DAOS_NO_PROXY="' + env.DAOS_NO_PROXY + '"'
}
if (!(env.STAGE_NAME?.contains('Fault injection'))) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is going be a phased approach, with the first just to get things working again, then the next to to work on a plan starting with documenting how external resources should be accessed in a portable and maintainable manner.

The first option to look for is if there is a Artifact server to be used, and we need to make the use of Artifactory or Nexus to be transparent. Which means we also need to have that set up for proxy mirroring. This gives the best performance and reliability for our lab.

It needs to be optional so that the code will work outside of our lab. And the code exposed to the user should hide if Artifactory or Nexus is used. This will a bit of planning and refactoring to roll in correctly.

Next option is that the code should be looking for a proxy is configured to be used, and that gets more complicated because the "noproxy" environment variable is unreliable to use. This is an option to support smaller volume shops.

For both these options we want to look for global configuration files that can be configured early in the script or node setup. Use of proxy environment variables should be avoided in production if at all possible because they have too broad of scope, and "noproxy" may not be sufficient to work around that issue. On a desktop system we can have a script look up the proxy server to do this configuration portably because that is how the proxy for a web browser is typically configure. Unknown if this discovery would work in the lab.

And then the final case where there are known specific proxy or artifact server used, such as a GitHub hosted runner, then we fall back to assuming direct access to the public Internet.

@ryon-jensen
Copy link
Copy Markdown
Contributor

@grom72
Copy link
Copy Markdown
Contributor Author

grom72 commented Apr 22, 2026

Looks like the downstream testing still fails fault injection: https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos/job/ci-daos-stack-pipeline-lib-PR-508-master/1/pipeline-overview/?selected-node=854 ??

Yes, it is expected until the daos-stack/daos#18024 landed to master.

I do not see any other way to fix the problem we have.
https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos/job/PR-18024/11/pipeline-overview/ proves that it works with daos-stack/daos#18024

@grom72 grom72 requested a review from a team April 23, 2026 13:38
@phender phender requested a review from a team April 23, 2026 15:28
@phender phender merged commit 19b72fd into master Apr 23, 2026
4 of 6 checks passed
@phender phender deleted the grom72/SRE-3737-no-proxy-for-Fault-Injection-testing branch April 23, 2026 15:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

forced-landing The PR has known failures or intentionally reduced testing, but should still be landed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants