Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(docker): adding git context support #86

Merged
merged 23 commits into from Dec 7, 2023
Merged

feat(docker): adding git context support #86

merged 23 commits into from Dec 7, 2023

Conversation

miki725
Copy link
Contributor

@miki725 miki725 commented Oct 31, 2023

Issue

Support for git context is required for default docker push action as it uses git context by default

https://github.com/docker/build-push-action

Otherwise chalk github action cant run with default parameters

Description

Adding compatibility with docker git contexts. For example:

docker build -t chalk http://github.com/crashappsec/chalk.git

This is done by:

  1. Doing shallow fetch of git context into a bare git repo.
    git fetch --depth=1 origin <ref>
    
  2. From there we can extract Dockerfile with git show
  3. Adjust Dockerfile as usual and pass it to build via stdin
  4. If stdin is not supported, git context is checked out locally
    to disk and git context is replaced with path on disk.
    NOTE this is only necessary for non-buildx builds
    and it has edge cases as docker binary does more magic to checkout
    git context such as following all sub-modules recursively
    however for now chalk is kept simple by only checking out the
    main git context repo without any recursion.

In addition fixing providing files to docker build as buildx version
check was incorrect as it:

  • never called getBuildXVersion() so version comparison was against 0
  • did not compare versions semantically so 0.8 was always greater than
    0.11 which is incorrect as per semver

To handle semver comparison, instead of using 3rd party lib
such as https://github.com/euantorano/semver.nim
added very simple semver.nim with basic comparison operators.

refactor(semver): removing semver dep in favor of very simple local implementation

Testing

➜ make tests args="test_docker.py::test_git_context --logs -x --pdb"

tests:--slow


TODOs before merge:

  • update git metadata to use git context
  • fix segfault with bad TMPDIR permissions

@miki725 miki725 marked this pull request as ready for review November 2, 2023 13:21
@miki725 miki725 requested a review from viega as a code owner November 2, 2023 13:21
@miki725 miki725 force-pushed the ms/gitcontext branch 6 times, most recently from d426acd to 77cd56c Compare November 2, 2023 15:05
Copy link
Contributor

@viega viega left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should sort out the key situation, but otherwise it's "clean as a whistle." Well done again, sir.

src/util.nim Outdated Show resolved Hide resolved
Copy link
Contributor

@viega viega left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some things to consider; we can discuss whether / if some of my nits should be dealt with now or later.

But despite any comments (which are about both clarity and things where a better approach is possible), it's well considered and pretty clear on the whole, so nice job again.

requires "https://github.com/viega/zippy == 0.10.7"
requires "https://github.com/aruZeta/QRgen == 3.0.0"
requires "https://github.com/crashappsec/con4m#16f9597a4609df60c167085f8d309f69df463de9"
requires "https://github.com/viega/zippy == 0.10.7" # MIT
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the purpose of annotating this with the license?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

initially was using another dep for the semver parsing/comparing but it was GPL so then checked the licenses of existing deps. no specific reason, just seems useful information if inspecting deps as chalk is a static binary ans so we need to be careful with included deps/libs

@@ -2561,6 +2561,28 @@ Note that, when chalk is invoked with 'docker' as its EXE name, the default IO c
"""
}

field git_exe {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for these it might make more sense to make this a list of places to look, then you can use the findAllExePaths call, which is in nimutils and has the signature:
proc findAllExePaths*(cmdName: string, extraPaths: seq[string] = @[], usePath = true): seq[string]
Basically, you are fine to take the first item in the list... we don't in some contexts because we do extra checking (to make sure we don't run another chalk executable for instance).

The docs for the above:

  ## The priority here is to the passed command name, but if and only
  ## if it is a path; we're assuming that they want to try to run
  ## something in a particular location.  Generally, we're disallowing
  ## this in config files, but it's here just in case.
  ##
  ## Our second priority is to the the extraPaths array, which is
  ## basically a programmer supplied PATH, in case the right place
  ## doesn't get picked up in our environment.
  ##
  ## If all else fails, we search the PATH environment variable.
  ##
  ## Note that we don't check for permissions problems (including
  ## not-executable), and we do not open the file, so there's the
  ## chance of the executable going away before we try to run it.
  ##
  ## The point is, the caller should anticipate failure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed these configs as eventually well want to use libgit anyway. for now it just searches PATH


dockerPathOpt = findExePath("docker", userPath, ignoreChalkExes = true)
dockerExeLocation = dockerPathOpt.get("")
dockerExeLocation = findExePath("docker",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style point, I don't love chaining calls like this when a single function is so large as to span multiple lines. It's much better clarity wise to have individual assignments, with "bite-sized" logic. That's why I often end up with a variable with "Opt" or ? in the name that is just an intermediate. It generates the same code, but the more discrete statements makes it much easier to follow. Here, the get() is so insignificant relative to the rest of the assignment that I missed it on my first read, for instance, and I already know the API...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed

# Docker version 1.13.0, build 49bf474
# Docker version 23.0.0, build e92dd87
# Docker version 24.0.6, build ed223bc820
let (output, exitcode) = execCmdEx(dockerExeLocation & " --version")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Nim API for this stuff sucks. If docker changes its semantics about what gets printed on stdin and stdout this can silently break (since execCmdEx merges the two streams, OR refuses to give you stderr if you pass whatever the flag was to say not to combine them).

Additionally, remember that issue I fixed a few months back where a file was too big to fit into the OS buffer when passing over stdin? Yeah, execCmdEx does not deal with this problem and can easily hang if you pass too much into it.

All in all, the Nim team doesn't do a good job w/ posix subprocess stuff and I don't trust anything in their API at all. You really should use runDockerGetEverything() below this (which manages getting the exe location anyway) or go straight to the underlying runCommand() in nimutils if you need more customization of behavior.

I know you're just changing some code that is old and hasn't been touched for a long time, but if we're touching it, I think we should do it right :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

switched all usages I could find to use runCmd* which normalizes to runCommand from nimutils

once:
trace("Docker injection method: --build-context")

var chmodstr = ""

if chmod:
if chmod or hasUser:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This hasUser flag doesn't seem right. The original code only passes the chmod flag when we're copying in an executable that needs to be chalked, and just always makes it executable.

If we're copying non-executable bits in yet need to set permissions (and I'm not sure where we'd need to do that), then I would instead make the parameter take a mode, not a boolean, and get the mode right. I don't see why, when you might want to chmod, the user is even relevant... if chmod might be warranted, just do it, esp since we don't have full visibility into all versions of docker and associated semantics.

But I think that's probably overkill.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if user is set in the dockerfile, /chalk.json would not be accessible as it would be set with root permissions hence the need to change its permissions but Ill change to pass the permission mode

error(strip(result.stdOut & result.stdErr))
raise newException(ValueError, "Git failed")

proc getDefaultBranch(git: DockerGitContext): string =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, I'm not really satisfied with requiring the presence of git. I guess it's fine for a first pass, but I'm thinking we should be wrapping and shipping libgit (and maybe libssh if we have to??)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally, if we do go out with this, then we basically have to wait a long time to deprecate the config vars that ideally won't be needed for very long. Why not just say, if it's not in PATH, make sure it's in your path?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removing config option and only checking PATH now

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also just fyi docker internally calls git command so requiring git be on PATH wont be a new requirement for people to add chalk to existing docker flows

src/semver.nim Outdated
# it only handles basic dot-separated version format

type Version* = ref object
parts: seq[int]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why isn't parts a three-tuple that has default values if you want to allow short version strings? It would make the comparison more efficient and more clear too (Many people aren't going to get the 'zip' idiom anyway).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed to tuple

src/semver.nim Outdated
proc parseVersion*(version: string): Version =
new result
result.parts = @[]
for i in version.strip(chars={'v', ',', '.'}).split('.'):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also relies on parseInt to throw an error for malformed stuff, which makes this highly error prone as a parser. If people accidentally pass in the wrong thing, they could get spurious versions, etc.

OR it could blow up on things it probably should accept, like if people do semver but w/ release candidates... we've all seen v2.1.0-rc1 for instance. Personally I'd be much more precise and explicit when it comes to what is or is not allowed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for now this will support versions consisting of 1-3 version parts so only:

1
1.2
1.2.3

no other things will be allowed like rc1 or whatnot but eventually if we need that we can add support for that

src/semver.nim Outdated
return not (self == other)

proc `>`*(self: Version, other: Version): bool =
for (a, b) in zipLongest(self.parts, other.parts, 0):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, if you use a tuple, you can just rely on the default tuple comparison, which will be far faster and far more clear.

For instance, something like:

type Version = tuple[major: uint, minor: uint, patch: uint]

And then > and '!=' both work right without you even having to overload something. And it allows people to clearly ask for somevar.major.

Or if you wanted to track any dash and tag name after, you could do it in an object, then just compare the tuple fields.

(and, more specifically, declare Version

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed comparison to use tuple however as in some places it prints original version back keeping it as object so that it can keep the original version name so things like 1.0 dont print as 1.0.0 if the tuple always has 3 versions

src/util.nim Outdated Show resolved Hide resolved
@miki725 miki725 marked this pull request as draft November 27, 2023 17:10
@miki725 miki725 force-pushed the ms/gitcontext branch 3 times, most recently from 492c3d8 to c00b750 Compare November 29, 2023 18:16
@miki725 miki725 marked this pull request as ready for review November 29, 2023 18:48
@miki725 miki725 requested a review from viega December 5, 2023 01:44
Adding compatibility with docker git contexts. For example:

```
docker build -t chalk http://github.com/crashappsec/chalk.git
```

This is done by:

1. Doing shallow fetch of git context into a bare git repo.
    ```
    git fetch --depth=1 origin <ref>
    ```
2. From there we can extract `Dockerfile` with `git show`
3. Adjust Dockerfile as usual and pass it to build via `stdin`
4. If `stdin` is not supported, git context is checked out locally
   to disk and git context is replaced with path on disk.
   NOTE this is only necessary for non-buildx builds
   and it has edge cases as docker binary does more magic to checkout
   git context such as following all sub-modules recursively
   however for now chalk is kept simple by only checking out the
   main git context repo without any recursion.

---

In addition fixing providing files to docker build as buildx version
check was incorrect as it:

* never called getBuildXVersion() so version comparison was against 0
* did not compare versions semantically so 0.8 was always greater than
  0.11 which is incorrect as per semver

To handle semver comparison, instead of using 3rd party lib
such as https://github.com/euantorano/semver.nim
added very simple `semver.nim` with basic comparison operators.

refactor(semver): removing semver dep in favor of very simple local implementation
when using ssh scheme in git context, SSH needs to be customized
to parse known hosts correctly
…rmissions

looks like docker when installed via snap cant access tmp files directly inside /tmp
but it can access them when they are nested inside another folder in /tmp

this is important as chalk passes chalkmark and Dockerfile in some cases via tmp files
and so it needs to write it to a place which will be accessible by docker
moving exception handling to docker command logic vs cmd parsing
which allows to correctly handle exceptions without segfaulting
in order to facilitate that, git fetch logic was reworked
to correctly fetch the git ref including related tags/branches/etc
otherwise git honors local git repo configs (if any) and therefore
might be executing with duplicate/conflicting configs

unfortunately there does not seem to be a way to ignore all local
configs and only honor global git configs
for now it will look for a binary on PATH, and if not found
blows up in some cases. eventually we can switch to using libgit/etc
also

* renamed runProc* to runCmd* so prefix is common across commands
* refactored codecDocker to ensure
  * file streams are closed during extraction
  * reusing common logic to extract chalk.json from layers
this allows to non-obtrusively show chalk reports in github actions
@viega viega merged commit e038ca4 into main Dec 7, 2023
2 checks passed
@viega viega deleted the ms/gitcontext branch December 7, 2023 21:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants