Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: follow graphQL spec when encoding UTF-8 unicodes (Go SDK) #5169

Merged
merged 1 commit into from May 31, 2023

Conversation

grouville
Copy link
Member

@grouville grouville commented May 22, 2023

Context

In order to properly escape strings, especially binary data on Go SDK, we now follow graphQL's spec around strings escape.

Limitations

This fix only fixes the escape of UTF-8 unicodes, as the JSON Unmarshall of Go correctly escapes these unicodes, when unmarshalling as a String.

For the mount of binary files, we will need to follow an alternative. Discussion happening here: #5069

Thus, we expect to have, as inputs of type string, UTF-8 unicodes. Follow-ups will be implemented for other SDKs, if necessary.

Repro

This PR enables the mount / copy of exact bytes sequences inside container, for UTF-8 unicodes, for Go SDK

main.go

package main

import (
	"context"
	"fmt"
	"io/ioutil"
	"os"
	"path/filepath"
	"time"

	"dagger.io/dagger"
)

func main() {
	// initialize Dagger client
	ctx := context.Background()
	client, err := dagger.Connect(ctx, dagger.WithLogOutput(os.Stderr))
	if err != nil {
		panic(err)
	}
	defer client.Close()

	// GPG files needed for signing
	gpgFiles := []string{
		"./foo",
	}

	// Container setup
	container := client.Container().From("alpine:3.17")

	// Mount GPG files
	for _, filePath := range gpgFiles {
		content, err := ioutil.ReadFile(filePath)
		if err != nil {
			panic(err)
		}
		secret := client.SetSecret(filepath.Base(filePath), string(content))
		container = container.WithMountedSecret("/root/.gnupg/"+filepath.Base(filePath), secret)
	}

	currentTime := time.Now()
	timeString := currentTime.Format("15:04:05")

	// Sign the binary
	// binaryName := "ls"
	out, err := container.
		WithEnvVariable("toto", timeString).
		WithExec([]string{"apk", "add", "--no-cache", "coreutils"}).
		WithWorkdir("/root/.gnupg/").
		WithExec([]string{"md5sum", "foo"}).
		Stdout(ctx)

	if err != nil {
		panic(err)
	}

	fmt.Println(out)
}

foo
Please create a file named foo, with the content of the unicode_test added in this PR. Cannot properly copy-paste it here and on gist.github.com

Proving that foo is UTF-8

script.py

def is_file_utf8(filename):
    try:
        with open(filename, 'r', encoding='utf-8') as file:
            contents = file.read()
    except UnicodeDecodeError:
        return False
    return True

# Example usage
filename = 'foo'  # Replace with the actual filename
is_utf8 = is_file_utf8(filename)
print(f"The file '{filename}' is UTF-8 encoded: {is_utf8}")

Output:

 python3 script.py
The file 'foo' is UTF-8 encoded: True

Current issue

Given this UTF-8 only string of unicodes, our current Go SDK codegen breaks:

go run ./main.go
Connected to engine 1f480355bbc8
panic: input:1: Syntax Error GraphQL request (1:49) Invalid character escape sequence: \\a.

1: query{setSecret(name:"tata", plaintext:"∆?–∂∂√˛\av\x1ciÙ˜Ÿ¿\x11GÆÓ∂Ó˘◊ñ\n"){id}}
                                                   ^


Please visit https://dagger.io/help#go for troubleshooting guidance.

goroutine 1 [running]:
main.main()
        /private/tmp/test_encod/main.go:60 +0x6fc
exit status 2

With this PR, we properly escape it:

go run ./main.go
Connected to engine 1f480355bbc8
#1 resolve image config for docker.io/library/alpine:3.17
#1 DONE 1.5s

#2 from alpine:3.17
#2 resolve docker.io/library/alpine:3.17@sha256:124c7d2707904eea7431fffe91522a01e5a861a624ee31d03372cc1d138a3126 done
#2 CACHED

#3 
#3 0.150 fetch https://dl-cdn.alpinelinux.org/alpine/v3.17/main/aarch64/APKINDEX.tar.gz
#3 0.393 fetch https://dl-cdn.alpinelinux.org/alpine/v3.17/community/aarch64/APKINDEX.tar.gz
#3 0.539 (1/5) Installing libacl (2.3.1-r1)
#3 0.553 (2/5) Installing libattr (2.5.1-r2)
#3 0.567 (3/5) Installing skalibs (2.12.0.1-r0)
#3 0.584 (4/5) Installing utmps-libs (0.1.2.0-r1)
#3 0.597 (5/5) Installing coreutils (9.1-r0)
#3 0.626 Executing busybox-1.35.0-r29.trigger
#3 0.631 OK: 9 MiB in 20 packages
#3 0.143 e654fb6f415b2a954787d1695863b4c3  tata
#3 DONE 1.0s
e654fb6f415b2a954787d1695863b4c3  tata

➜  test_encod md5 tata
MD5 (tata) = e654fb6f415b2a954787d1695863b4c3

As you can see, the data is preserved: the md5 hash is the same in the container and locally

Warning
This introduces an issue with non UTF-8 unicodes (for Go SDK), that are not properly unmarshalled by json.Unmarshall.

It used to fail, now it might silently fail, as the encoding will properly escape all unicodes, but some data will be lost at unmarshall. Context here: #5069 (comment)

However, all SDKs now behave the same on this very particular case that will be solved in part 2) of #5069

Only Go SDK is concerned

@helderco, Node and Python SDK are properly managing UTF-8 unicodes. So no need to apply the changes to these SDKs:

Repro

The code is not perfect (not the aim), but it works:

Node SDK
import Client, { connect } from "@dagger.io/dagger"
import * as fs from "fs";
import * as path from "path";

type Container = ReturnType<Client["container"]>;

connect(async (client: Client) => {
  const gpgFiles = [
    "./foo",
  ];

  let container = client.container().from("alpine:3.17");

  // Mount GPG files
  for (const filePath of gpgFiles) {
    const content = await fs.promises.readFile(filePath, "utf-8");
    const secret = client.setSecret(filePath.split("/").pop()!, content);

    container = container.withMountedSecret(`/root/.gnupg/${filePath.split("/").pop()}`, secret);
  }

  const currentTime = new Date();
  const timeString = currentTime.toTimeString().split(' ')[0];

  // Sign the binary
  container = container
    .withEnvVariable("toto", timeString)
    .withExec(["apk", "add", "--no-cache", "coreutils"])
    .withWorkdir("/root/.gnupg/")
    .withExec(["md5sum", "foo"]);

  console.log(await container.stdout());
});

Output on console:

node --loader ts-node/esm ./main.mts
(node:22111) ExperimentalWarning: Custom ESM Loaders is an experimental feature. This feature could change at any time
(Use `node --trace-warnings ...` to show where the warning was created)
36f52da69485700d63e3b13d34a015bb  foo

➜  test_dag md5 foo
MD5 (foo) = 36f52da69485700d63e3b13d34a015bb
Python SDK
import sys
import anyio
import dagger
from datetime import datetime

async def main():
    async with dagger.Connection(dagger.Config(log_output=sys.stderr)) as client:
        # read file
        config = await anyio.Path("./foo").read_text()

        # set secret to file contents
        secret = client.set_secret("foo", config)

        # mount secret as file in container
        out = await (
            client.container()
            .from_("alpine:3.17")
            .with_env_variable("toto", datetime.now().strftime('%H:%M:%S'))
            .with_exec(["apk", "add", "--no-cache", "coreutils"])
            .with_mounted_secret("/root/.gnupg/foo", secret)
            .with_workdir("/root/.gnupg/")
            .with_exec(["md5sum", "foo"])
            .stdout()
        )

    # print result
    print(out)

anyio.run(main)

output:

python3 main.py
Connected to engine e777eb94d041
#1 resolve image config for docker.io/library/alpine:3.17
#1 DONE 0.4s

#2 from alpine:3.17
#2 resolve docker.io/library/alpine:3.17@sha256:124c7d2707904eea7431fffe91522a01e5a861a624ee31d03372cc1d138a3126 done
#2 CACHED

#3
#3 0.127 fetch https://dl-cdn.alpinelinux.org/alpine/v3.17/main/aarch64/APKINDEX.tar.gz
#3 0.303 fetch https://dl-cdn.alpinelinux.org/alpine/v3.17/community/aarch64/APKINDEX.tar.gz
#3 0.453 (1/5) Installing libacl (2.3.1-r1)
#3 0.467 (2/5) Installing libattr (2.5.1-r2)
#3 0.479 (3/5) Installing skalibs (2.12.0.1-r0)
#3 0.496 (4/5) Installing utmps-libs (0.1.2.0-r1)
#3 0.510 (5/5) Installing coreutils (9.1-r0)
#3 0.547 Executing busybox-1.35.0-r29.trigger
#3 0.551 OK: 9 MiB in 20 packages
#3 DONE 0.7s

#3
#3 0.133 9bdecf058c699f6d4a0f9fbb5122729c  foo
#3 DONE 0.9s
9bdecf058c699f6d4a0f9fbb5122729c  foo

(python_test_dag) ➜  python_test_dag md5 foo
MD5 (foo) = 9bdecf058c699f6d4a0f9fbb5122729c

@grouville grouville changed the title feat: follow graphQL spec in strings encoding feat: follow graphQL spec in strings encoding (Go SDK) May 22, 2023
@grouville grouville changed the title feat: follow graphQL spec in strings encoding (Go SDK) feat: follow graphQL spec in encoding of UTF-8 unicode(Go SDK) May 26, 2023
@grouville grouville changed the title feat: follow graphQL spec in encoding of UTF-8 unicode(Go SDK) feat: follow graphQL spec when encoding UTF-8 unicodes (Go SDK) May 26, 2023
@grouville grouville force-pushed the unicode_go_sdk branch 2 times, most recently from 2577fda to c2a8a75 Compare May 26, 2023 16:33
@grouville grouville marked this pull request as ready for review May 26, 2023 16:40
@grouville grouville mentioned this pull request May 30, 2023
Copy link
Contributor

@helderco helderco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Copy link
Contributor

@vito vito left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for being so methodical and clear with all this, it's a tricky intersection of concerns!

I was a little hesitant to add a dependency on gqlgen, but it turns out we already depend on it through Khan/genqlient so that's fine:

❯ go mod graph | grep 99des
github.com/Khan/genqlient@v0.5.0 github.com/99designs/gqlgen@v0.17.2

In order to properly escape strings, especially binary data on Go SDK, we now marshal properly the strings. Follow-ups will be implemented for other SDKs, if necessary

Signed-off-by: grouville <guillaume@dagger.io>
@grouville grouville merged commit 4b38f73 into dagger:main May 31, 2023
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants