Skip to content

Replace YAJL with c-json for JSON parsing and generation#171

Open
giuseppe wants to merge 7 commits intocontainers:mainfrom
giuseppe:use-c-json
Open

Replace YAJL with c-json for JSON parsing and generation#171
giuseppe wants to merge 7 commits intocontainers:mainfrom
giuseppe:use-c-json

Conversation

@giuseppe
Copy link
Copy Markdown
Member

@giuseppe giuseppe commented May 7, 2026

Alternative to #170

Copy link
Copy Markdown

@jnovy jnovy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concern 1: json_object_new_string_len truncation:

In json_gen_string(), len is size_t but cast to (int) for json_object_new_string_len. For strings > 2GB this would silently truncate. Practically impossible in OCI spec contexts, but a SIZE_MAX > INT_MAX guard or comment would be defensive.

Concern 2: Residual handling semantic change

The old code stored residuals as a yajl_val tree (preserving exact in-memory structure). The new code serializes residuals to a JSON string (char *) and re-parses when generating output. This adds a serialize/parse round-trip. In practice json-c handles this well, but it's worth noting as a behavioral change - e.g., key ordering within residuals may change.

Concern 3: clang-format auto-formatting may cause inconsistency

generate.py now runs clang-format -i on generated files if clang-format is available. This means the generated output differs depending on whether clang-format is installed, which could cause confusion in CI or when comparing outputs across environments. Consider making this opt-in via a flag.

I'm happy we lean towards json-c here!

Comment thread src/ocispec/json_common.c
json_object_put (g->root);
for (i = 0; i <= g->depth; i++)
{
free (g->pending_key[i]);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Memory leak:

// Current code:
for (i = 0; i <= g->depth; i++)
  {
    free (g->pending_key[i]);
  }

// Should be:
for (i = 0; i <= g->depth; i++)
  {
    free (g->pending_key[i]);
    json_object_put (g->stack[i]);
  }

@giuseppe
Copy link
Copy Markdown
Member Author

giuseppe commented May 8, 2026

Concern 1: json_object_new_string_len truncation:

In json_gen_string(), len is size_t but cast to (int) for json_object_new_string_len. For strings > 2GB this would silently truncate. Practically impossible in OCI spec contexts, but a SIZE_MAX > INT_MAX guard or comment would be defensive.

Concern 2: Residual handling semantic change

The old code stored residuals as a yajl_val tree (preserving exact in-memory structure). The new code serializes residuals to a JSON string (char *) and re-parses when generating output. This adds a serialize/parse round-trip. In practice json-c handles this well, but it's worth noting as a behavioral change - e.g., key ordering within residuals may change.

Concern 3: clang-format auto-formatting may cause inconsistency

generate.py now runs clang-format -i on generated files if clang-format is available. This means the generated output differs depending on whether clang-format is installed, which could cause confusion in CI or when comparing outputs across environments. Consider making this opt-in via a flag.

I'm happy we lean towards json-c here!

thanks!

Addressed now

@jnovy
Copy link
Copy Markdown

jnovy commented May 8, 2026

LGTM

@giuseppe giuseppe force-pushed the use-c-json branch 2 times, most recently from 4fa111d to 989feb7 Compare May 8, 2026 08:57
giuseppe and others added 7 commits May 8, 2026 13:06
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
- actions/checkout v3.0.2/v2 -> v6
- uraimo/run-on-arch-action v3.0.1 -> v3.1.0
- Replace deprecated actions-rs/* with dtolnay/rust-toolchain and
  direct cargo commands

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Remove the honggfuzz HF_ITER code path that caused undefined reference
errors when building with -DFUZZER. Wrap main() in #ifndef FUZZER so
LibFuzzer can provide its own entry point.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Test parsing and round-trip generation of int64 and uint64 fields
with boundary values: INT64_MAX, INT64_MIN, UINT64_MAX, values
exceeding 32-bit range, negative values, and zero.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Expand the fuzzer from a single-mode runtime config parser to 7 modes
covering all OCI spec parsers with round-trip testing (parse, generate,
re-parse). Add honggfuzz-based CI fuzzing with Docker, matching the
pattern used by crun.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Switch the JSON backend from the bundled yajl/yyjson to json-c (>= 0.14),
which is widely available as a system package.

- Replace all yajl/yyjson API calls with json-c equivalents
- Implement a thin json_gen_ctx layer that builds json_object trees,
  matching the old yajl_gen streaming interface
- Store residuals as json_object* directly instead of serializing to
  string, avoiding a round-trip and preserving key ordering
- Add INT_MAX guard in json_gen_string() to prevent silent truncation
- Fix memory leak in json_gen_free() for unclosed maps/arrays
- Remove the embedded yajl/yyjson submodule and build options
- Add json-c-devel to fuzzing Dockerfile and remove --enable-embedded-yyjson

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants