Track std.base64 non-ASCII string semantics vs official Jsonnet

## Motivation

During the stdlib audit, `std.base64` / `std.base64Decode` showed a non-ASCII string semantics difference between sjsonnet and official C++ Jsonnet `v0.22.0`.

We are not changing this in the current stdlib correctness PR because this behavior may be visible to users and jrsonnet currently makes the same UTF-8 choice as sjsonnet. This issue tracks the discrepancy separately.

## Evidence

Official `jsonnet v0.22.0`:

```jsonnet
std.base64("é")                         // "6Q=="
std.base64Decode("w6k=")                 // "Ã©"
std.base64Decode("6Q==")                 // "é"
std.base64DecodeBytes(std.base64("é"))   // [233]
std.base64("Ā")                          // runtime error: Can only base64 encode strings / arrays of single bytes.
```

Current sjsonnet:

```jsonnet
std.base64("é")                         // "w6k="
std.base64Decode("w6k=")                 // "é"
std.base64Decode("6Q==")                 // "�"
std.base64DecodeBytes(std.base64("é"))   // [195, 169]
```

Local jrsonnet check:

```text
jrsonnet 0.5.0-pre98
commit 80cd36abd868507312e2cc2c78cb0f55a684c620
```

jrsonnet matches sjsonnet's UTF-8-byte behavior here:

```jsonnet
std.base64("é")                         // "w6k="
std.base64Decode("w6k=")                 // "é"
std.base64Decode("6Q==")                 // runtime error: bad utf8
std.base64DecodeBytes(std.base64("é"))   // [195, 169]
```

## Root Cause

Official C++ Jsonnet's stdlib implements string base64 as codepoint/char bytes:

```jsonnet
local bytes =
  if std.isString(input) then
    std.map(std.codepoint, input)
  else
    input;

base64Decode(str)::
  local bytes = std.base64DecodeBytes(str);
  std.join('', std.map(std.char, bytes)),
```

sjsonnet currently encodes string input as UTF-8 bytes and decodes bytes as UTF-8 strings.

## Proposed Direction

If sjsonnet decides to align strictly with official Jsonnet:

- Keep byte-array `std.base64` and `std.base64DecodeBytes` behavior unchanged.
- Change string `std.base64` to encode each character as a single byte, rejecting codepoints above `255`.
- Change `std.base64Decode` to map decoded bytes directly to `std.char(byte)` semantics instead of UTF-8 decoding.
- Add directional tests for `"é"`, `"w6k="`, `"6Q=="`, and a high-codepoint rejection case such as `"Ā"`.

This would intentionally diverge from jrsonnet's current UTF-8 behavior but match official C++ Jsonnet `v0.22.0`.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Track std.base64 non-ASCII string semantics vs official Jsonnet #793

Motivation

Evidence

Root Cause

Proposed Direction

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Track std.base64 non-ASCII string semantics vs official Jsonnet #793

Description

Motivation

Evidence

Root Cause

Proposed Direction

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions