Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

engine: remove ftp_proxy hack #7228

Merged
merged 5 commits into from
May 1, 2024
Merged

engine: remove ftp_proxy hack #7228

merged 5 commits into from
May 1, 2024

Conversation

sipsma
Copy link
Contributor

@sipsma sipsma commented May 1, 2024

This removes the ftp_proxy hack we've had for a while for passing uncached metadata to containers (previously a lot, but recently trimmed down to just ServerID).

Instead, we take advantage of the new custom executor added recently and use that to set the _DAGGER_SERVER_ID env var in containers. The plumbing passes a custom worker around via job values, which get read by the solver's ResolveOp callback and used to tell the Executor for ExecOps which ServerID to set in the container.

There's quite a bit more cleanup coming to everything involved here, but wanted to get this first step out since that hack, besides being annoying, was kind of dubious in nature and was a suspect in some strange behavior reported sometimes.


FTR, this is another spin-off of work on #6916. It was self-contained and beneficial enough that it seemed worth getting out for the next release.

@sipsma sipsma added this to the v0.11.3 milestone May 1, 2024
@sipsma sipsma requested review from vito and jedevc May 1, 2024 04:22
@sipsma

This comment was marked as outdated.

Copy link
Member

@jedevc jedevc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, once various SDK things are resolved 🎉 Nice to see our metadata hack go away!

sipsma added 5 commits May 1, 2024 14:21
This removes the ftp_proxy hack we've had for a while for passing
uncached metadata to containers (previously a lot, but recently trimmed
down to just ServerID).

Instead, we take advantage of the new custom executor added recently and
use that to set the _DAGGER_SERVER_ID env var in containers. The
plumbing passes the ServerID via job values, which get read by the
solver's ResolveOp callback and used to tell the Executor for ExecOps
which ServerID to set in the container.

There's quite a bit more cleanup coming to everything involved here, but
wanted to get this first step out since that hack, besides being
annoying, was kind of dubious in nature and was a suspect in some
strange behavior reported sometimes.

Signed-off-by: Erik Sipsma <erik@sipsma.dev>
I don't see that this is actually used or ends up modifying anything.
Suspect it's vestigial from pre-OTEL days so removing it.

Signed-off-by: Erik Sipsma <erik@sipsma.dev>
This fixes the problem in the previous commit where Dockerfiles with a
syntax pragma hit the executor directly before a ServerID could be set.

This wasn't the shortest fix, but it seems to be the best long term one.
The end result is that the Worker provided to LLBSolvers is always
scoped to the server, so we don't even need to be careful about ensuring
the right executor is being used everywhere. It just will be now because
it's the only executor available to buildkit internals we use.

This required a few adjustments from the previous code:
1. The custom executor has been expanded to also be a custom worker.
   Fortunately, it almost entirely reuses functionality from upstream's
   base.Worker, so this doesn't another big chunk of code to maintain.
2. llbSolver is now created in each buildkit.Client and setup with a
   worker.Controller with the Worker scoped to the current ServerID.
3. Rather than passing the ServerID through job keys, we just pass the
   whole Worker and use that to resolve ops.
4. Some more general cleanup of worker initialization made possible by
   these changes.

Signed-off-by: Erik Sipsma <erik@sipsma.dev>
Signed-off-by: Erik Sipsma <erik@sipsma.dev>
Signed-off-by: Erik Sipsma <erik@sipsma.dev>
@sipsma
Copy link
Contributor Author

sipsma commented May 1, 2024

Okay fixed that problem with Dockerfiles that use syntax pragmas in the not-fastest-but-best-long-term way. The high-level idea is the same as before but with some adjustments described in commit message here: 7baea5b

  • cc @jedevc worth another look if interested

Side note: I noticed during those changes that opTrackingGateway seemed to amount to a no-op but introduced some convolution in trying to follow code, so I deleted it in this commit: d0a00e5.

  • Would appreciate a quick double check I'm not missing something obvious
  • cc @vito (just if you are around, not sure of status, please ignore if vacationing 😄)

@sipsma sipsma requested a review from jedevc May 1, 2024 21:27
@sipsma
Copy link
Contributor Author

sipsma commented May 1, 2024

Also worth noting: the tests have been way less flaky so far (knock on wood, etc.). My educated guess is that ftp_proxy settings were resulting in different vtx digests that ended up edge merged, but now vtx digests end up the same more often and don't need to go through the (currently buggy, pending upstream fixes) edge merge logic?

I did still see the compute cache error in a publish workflow here once, but tests have not flaked out yet. Hopefully I'm not just getting lucky 🍀

@sipsma
Copy link
Contributor Author

sipsma commented May 1, 2024

I just pivoted to start on proxy support and immediately realized that I very much want the changes here because otherwise I'll just be recreating the same problems we had with ftp_proxy (even though we would be using those settings for their intended purpose now 😅), so I'm actually gonna go ahead and merge this but if you have any comments about the modifications since your last approval @jedevc please leave them and I'll make any follow-ups if needed!

@sipsma sipsma merged commit e967954 into dagger:main May 1, 2024
44 checks passed
kpenfound pushed a commit to kpenfound/dagger that referenced this pull request May 2, 2024
* engine: remove ftp_proxy hack

This removes the ftp_proxy hack we've had for a while for passing
uncached metadata to containers (previously a lot, but recently trimmed
down to just ServerID).

Instead, we take advantage of the new custom executor added recently and
use that to set the _DAGGER_SERVER_ID env var in containers. The
plumbing passes the ServerID via job values, which get read by the
solver's ResolveOp callback and used to tell the Executor for ExecOps
which ServerID to set in the container.

There's quite a bit more cleanup coming to everything involved here, but
wanted to get this first step out since that hack, besides being
annoying, was kind of dubious in nature and was a suspect in some
strange behavior reported sometimes.

Signed-off-by: Erik Sipsma <erik@sipsma.dev>

* engine: remove apparently unused opTrackingGateway

I don't see that this is actually used or ends up modifying anything.
Suspect it's vestigial from pre-OTEL days so removing it.

Signed-off-by: Erik Sipsma <erik@sipsma.dev>

* engine: pass worker around instead of custom executor

This fixes the problem in the previous commit where Dockerfiles with a
syntax pragma hit the executor directly before a ServerID could be set.

This wasn't the shortest fix, but it seems to be the best long term one.
The end result is that the Worker provided to LLBSolvers is always
scoped to the server, so we don't even need to be careful about ensuring
the right executor is being used everywhere. It just will be now because
it's the only executor available to buildkit internals we use.

This required a few adjustments from the previous code:
1. The custom executor has been expanded to also be a custom worker.
   Fortunately, it almost entirely reuses functionality from upstream's
   base.Worker, so this doesn't another big chunk of code to maintain.
2. llbSolver is now created in each buildkit.Client and setup with a
   worker.Controller with the Worker scoped to the current ServerID.
3. Rather than passing the ServerID through job keys, we just pass the
   whole Worker and use that to resolve ops.
4. Some more general cleanup of worker initialization made possible by
   these changes.

Signed-off-by: Erik Sipsma <erik@sipsma.dev>

* backfill integ test coverage for dockerfile w/ syntax pragma

Signed-off-by: Erik Sipsma <erik@sipsma.dev>

* add comment pointing to original runc executor implementation

Signed-off-by: Erik Sipsma <erik@sipsma.dev>

---------

Signed-off-by: Erik Sipsma <erik@sipsma.dev>
Signed-off-by: kpenfound <kyle@dagger.io>
vikram-dagger pushed a commit to vikram-dagger/dagger that referenced this pull request May 3, 2024
* engine: remove ftp_proxy hack

This removes the ftp_proxy hack we've had for a while for passing
uncached metadata to containers (previously a lot, but recently trimmed
down to just ServerID).

Instead, we take advantage of the new custom executor added recently and
use that to set the _DAGGER_SERVER_ID env var in containers. The
plumbing passes the ServerID via job values, which get read by the
solver's ResolveOp callback and used to tell the Executor for ExecOps
which ServerID to set in the container.

There's quite a bit more cleanup coming to everything involved here, but
wanted to get this first step out since that hack, besides being
annoying, was kind of dubious in nature and was a suspect in some
strange behavior reported sometimes.

Signed-off-by: Erik Sipsma <erik@sipsma.dev>

* engine: remove apparently unused opTrackingGateway

I don't see that this is actually used or ends up modifying anything.
Suspect it's vestigial from pre-OTEL days so removing it.

Signed-off-by: Erik Sipsma <erik@sipsma.dev>

* engine: pass worker around instead of custom executor

This fixes the problem in the previous commit where Dockerfiles with a
syntax pragma hit the executor directly before a ServerID could be set.

This wasn't the shortest fix, but it seems to be the best long term one.
The end result is that the Worker provided to LLBSolvers is always
scoped to the server, so we don't even need to be careful about ensuring
the right executor is being used everywhere. It just will be now because
it's the only executor available to buildkit internals we use.

This required a few adjustments from the previous code:
1. The custom executor has been expanded to also be a custom worker.
   Fortunately, it almost entirely reuses functionality from upstream's
   base.Worker, so this doesn't another big chunk of code to maintain.
2. llbSolver is now created in each buildkit.Client and setup with a
   worker.Controller with the Worker scoped to the current ServerID.
3. Rather than passing the ServerID through job keys, we just pass the
   whole Worker and use that to resolve ops.
4. Some more general cleanup of worker initialization made possible by
   these changes.

Signed-off-by: Erik Sipsma <erik@sipsma.dev>

* backfill integ test coverage for dockerfile w/ syntax pragma

Signed-off-by: Erik Sipsma <erik@sipsma.dev>

* add comment pointing to original runc executor implementation

Signed-off-by: Erik Sipsma <erik@sipsma.dev>

---------

Signed-off-by: Erik Sipsma <erik@sipsma.dev>
vikram-dagger pushed a commit to vikram-dagger/dagger that referenced this pull request May 3, 2024
* engine: remove ftp_proxy hack

This removes the ftp_proxy hack we've had for a while for passing
uncached metadata to containers (previously a lot, but recently trimmed
down to just ServerID).

Instead, we take advantage of the new custom executor added recently and
use that to set the _DAGGER_SERVER_ID env var in containers. The
plumbing passes the ServerID via job values, which get read by the
solver's ResolveOp callback and used to tell the Executor for ExecOps
which ServerID to set in the container.

There's quite a bit more cleanup coming to everything involved here, but
wanted to get this first step out since that hack, besides being
annoying, was kind of dubious in nature and was a suspect in some
strange behavior reported sometimes.

Signed-off-by: Erik Sipsma <erik@sipsma.dev>

* engine: remove apparently unused opTrackingGateway

I don't see that this is actually used or ends up modifying anything.
Suspect it's vestigial from pre-OTEL days so removing it.

Signed-off-by: Erik Sipsma <erik@sipsma.dev>

* engine: pass worker around instead of custom executor

This fixes the problem in the previous commit where Dockerfiles with a
syntax pragma hit the executor directly before a ServerID could be set.

This wasn't the shortest fix, but it seems to be the best long term one.
The end result is that the Worker provided to LLBSolvers is always
scoped to the server, so we don't even need to be careful about ensuring
the right executor is being used everywhere. It just will be now because
it's the only executor available to buildkit internals we use.

This required a few adjustments from the previous code:
1. The custom executor has been expanded to also be a custom worker.
   Fortunately, it almost entirely reuses functionality from upstream's
   base.Worker, so this doesn't another big chunk of code to maintain.
2. llbSolver is now created in each buildkit.Client and setup with a
   worker.Controller with the Worker scoped to the current ServerID.
3. Rather than passing the ServerID through job keys, we just pass the
   whole Worker and use that to resolve ops.
4. Some more general cleanup of worker initialization made possible by
   these changes.

Signed-off-by: Erik Sipsma <erik@sipsma.dev>

* backfill integ test coverage for dockerfile w/ syntax pragma

Signed-off-by: Erik Sipsma <erik@sipsma.dev>

* add comment pointing to original runc executor implementation

Signed-off-by: Erik Sipsma <erik@sipsma.dev>

---------

Signed-off-by: Erik Sipsma <erik@sipsma.dev>
vikram-dagger pushed a commit to vikram-dagger/dagger that referenced this pull request May 3, 2024
* engine: remove ftp_proxy hack

This removes the ftp_proxy hack we've had for a while for passing
uncached metadata to containers (previously a lot, but recently trimmed
down to just ServerID).

Instead, we take advantage of the new custom executor added recently and
use that to set the _DAGGER_SERVER_ID env var in containers. The
plumbing passes the ServerID via job values, which get read by the
solver's ResolveOp callback and used to tell the Executor for ExecOps
which ServerID to set in the container.

There's quite a bit more cleanup coming to everything involved here, but
wanted to get this first step out since that hack, besides being
annoying, was kind of dubious in nature and was a suspect in some
strange behavior reported sometimes.

Signed-off-by: Erik Sipsma <erik@sipsma.dev>

* engine: remove apparently unused opTrackingGateway

I don't see that this is actually used or ends up modifying anything.
Suspect it's vestigial from pre-OTEL days so removing it.

Signed-off-by: Erik Sipsma <erik@sipsma.dev>

* engine: pass worker around instead of custom executor

This fixes the problem in the previous commit where Dockerfiles with a
syntax pragma hit the executor directly before a ServerID could be set.

This wasn't the shortest fix, but it seems to be the best long term one.
The end result is that the Worker provided to LLBSolvers is always
scoped to the server, so we don't even need to be careful about ensuring
the right executor is being used everywhere. It just will be now because
it's the only executor available to buildkit internals we use.

This required a few adjustments from the previous code:
1. The custom executor has been expanded to also be a custom worker.
   Fortunately, it almost entirely reuses functionality from upstream's
   base.Worker, so this doesn't another big chunk of code to maintain.
2. llbSolver is now created in each buildkit.Client and setup with a
   worker.Controller with the Worker scoped to the current ServerID.
3. Rather than passing the ServerID through job keys, we just pass the
   whole Worker and use that to resolve ops.
4. Some more general cleanup of worker initialization made possible by
   these changes.

Signed-off-by: Erik Sipsma <erik@sipsma.dev>

* backfill integ test coverage for dockerfile w/ syntax pragma

Signed-off-by: Erik Sipsma <erik@sipsma.dev>

* add comment pointing to original runc executor implementation

Signed-off-by: Erik Sipsma <erik@sipsma.dev>

---------

Signed-off-by: Erik Sipsma <erik@sipsma.dev>
@jedevc jedevc mentioned this pull request May 17, 2024
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants