Add JSON support with compile-time type inference using Java JSON-P streaming API (JVM-only) by Copilot · Pull Request #119 · Quafadas/scautable

Copilot · 2026-01-05T17:37:59Z

Implements compile-time JSON parsing for flat arrays of objects, mirroring the existing CSV functionality. JSON structure and types are inferred at compile time, producing type-safe Iterator[NamedTuple[K, V]] with zero runtime overhead.

Implementation

New module scautable-json: JVM-only module using Java's native JSON-P (JSR 374) streaming API
JSON object: Provides fromString, resource, absolutePath, pwd, url methods matching CSV API
JsonIterator: Returns typed named tuples with automatic field access
JsonInferrerOps: Infers Int/Long/Double/Boolean/String types from JSON values at compile time
Type inference strategies: FromAllRows (default), FirstRow, FirstN(n), StringType, FromTuple[T]
Null/missing field handling: Automatic Option wrapping when null or absent
StreamingJsonParser: Uses jakarta.json.stream.JsonParser for true event-based streaming that reads JSON objects incrementally without loading entire arrays into memory

Usage

import io.github.quafadas.scautable.json.*

// Inline JSON with compile-time type inference
inline val json = """[
  {"name": "Alice", "age": 30, "active": true},
  {"name": "Bob", "age": 25}
]"""

val data = JSON.fromString(json)
data.foreach { row =>
  println(s"${row.name} is ${row.age}")        // String, Int inferred
  println(s"Active: ${row.active.getOrElse(false)}")  // Option[Boolean] for missing field
}

// From resources with different inference strategy
val csv = JSON.resource("data.json", TypeInferrer.FirstRow)

Design notes

Headers extracted by merging keys from all objects: arr.flatMap(_.obj.keys).toSet
Missing fields across objects result in Option[T] types
Number formatting avoids scientific notation for large longs
Follows existing CSV patterns for API consistency
True streaming: Uses Java's jakarta.json.stream.JsonParser for event-driven parsing (START_OBJECT, KEY_NAME, VALUE_*, END_OBJECT events)
JVM-only: Leverages Java's mature, battle-tested JSON streaming API instead of cross-platform solutions
Memory efficient: At compile time, only reads as many objects as needed for type inference; at runtime, streams through files using InputStream without loading entirely into memory
Large file support: Successfully handles files with thousands of rows (tested with 1214-row dataset)

Tests

13 test cases covering type inference, null handling, missing fields, all data source methods, and large file handling.

Original prompt

This section details on the original issue you should resolve

<issue_title>From Json</issue_title>
<issue_description>It should be possible to construct a CSV iterator, from JSON using similar techniques as we used for CSV. We recieve some untyped JSON, that is a flat array of objects, with no nesting.

Something like
[
{"a":1, "b":2},
{"a":5, "b":3},
}]
It is assumed, that this information is available at compile time, in the same way, that our CSV file is also assumed to be available to the compiler. It should be possible to replicate the design of CSV.fromString etc.

Our goal is to pull the same trick but for JSON.

Parse the untyped JSON.

figure out the headers (parsed.arr.map(_.obj.keys.toSet).reduce(_ ++ _))

Using the headers, figure out the inferred types.
a. check parsing in the same way we inferred types for CSV files using parsed.arr.map(headers(0))

At this point, we have the list of headers, and the list of types, and we should be able decode to a NamedTuple

You'll need to introduce a new module, scautable-json with a dependancy on ujson.

</issue_description>

Comments on the Issue (you are @copilot in this section)

Fixes From Json #118

✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

- Add scautable-json module with ujson dependency - Implement JSON parsing for flat arrays of objects - Support compile-time type inference (FromAllRows, FirstRow, FirstN, StringType, FromTuple) - Handle null values as Option types - Extract headers from all objects in array - Add comprehensive tests for JSON functionality Co-authored-by: Quafadas <24899792+Quafadas@users.noreply.github.com>

- Add README.md for scautable-json module - Create JsonExample.scala with usage examples - Document type inference strategies - Show examples of handling missing fields and null values Co-authored-by: Quafadas <24899792+Quafadas@users.noreply.github.com>

- Fix JsonExample to handle non-optional id field correctly - Add scautable-json module dependency to examples - Verify example compiles and runs successfully Co-authored-by: Quafadas <24899792+Quafadas@users.noreply.github.com>

Quafadas · 2026-01-07T10:32:15Z

@copilot I added a test with a larger JSON file. Here, the file based strategies don't work, I believe because they attempt to read the entire file as a string.

Could you re-work those strategies to work incrementally? in the first instance, see if this is possible with ujson.

If it is not, then it may be worth changing the json library to jsoniter, which I believe support this kind of incremental reading of JSON.

Quafadas · 2026-01-07T10:41:05Z

@copilot truncating the number of rows is not an acceptable solution. The user can control this if they want via the typeInferrer

This macro will need to find a way to "stream" the JSON in a spiritually similar solution to the way the CSV works, line by line. Continue...

Quafadas · 2026-01-07T10:45:51Z

@copilot switch the JSON library to jsoniter, I believe that does offer streaming solutions.

- Replace ujson dependency with jsoniter-scala for better streaming support - Implement custom streaming JSON parser that reads objects incrementally - Parser manually identifies object boundaries and parses one at a time - At compile time, only reads as many objects as needed for type inference - At runtime, generates code that streams through the file without loading all into memory - All tests pass including the larger 1214-row mini-movies.json file - Maintains memory efficiency for large JSON files Co-authored-by: Quafadas <24899792+Quafadas@users.noreply.github.com>

Copilot · 2026-01-07T10:55:29Z

@copilot switch the JSON library to jsoniter, I believe that does offer streaming solutions.

Switched to jsoniter-scala and implemented a custom streaming JSON parser. The parser now:

Reads JSON objects incrementally by manually identifying object boundaries
At compile time, only reads as many rows as needed for type inference
At runtime, streams through the file without loading it entirely into memory
Handles large files efficiently (tested with 1214-row mini-movies.json)

The approach is similar to CSV's line-by-line reading. Commit: 769b679

Quafadas · 2026-01-07T14:00:57Z

@copilot unfortunatley I still found this unsatisfactory. Could we have another look, but this time we'll go JVM only, i.e. drop the JS cross compilation, but use the java JSON streaming API directly.

https://www.oracle.com/technical-resources/articles/java/json.html

It should pass the existing test suite.

- Remove JS cross-compilation support - Replace jsoniter-scala with Java's native JSON-P (JSR 374) streaming API - Use jakarta.json (Glassfish implementation) for true streaming parsing - Implement proper streaming that reads JSON objects one at a time using JsonParser events - At compile time: Only reads as many objects as needed for type inference - At runtime: Streams through files using InputStream without loading into memory - All existing tests pass with the new implementation Co-authored-by: Quafadas <24899792+Quafadas@users.noreply.github.com>

Copilot · 2026-01-07T14:08:27Z

@copilot unfortunatley I still found this unsatisfactory. Could we have another look, but this time we'll go JVM only, i.e. drop the JS cross compilation, but use the java JSON streaming API directly....

Switched to Java's native JSON-P (JSR 374) streaming API and made the module JVM-only. The implementation now:

Uses jakarta.json.stream.JsonParser for true event-based streaming
Reads JSON objects one at a time using parser events (START_OBJECT, KEY_NAME, VALUE_*, etc.)
At compile time: Only reads objects needed for type inference
At runtime: Streams directly from InputStream without loading file into memory
Uses battle-tested Jakarta JSON (Glassfish) implementation

All existing tests pass. Commit: 4b17bc1

Copilot

Pull request overview

This PR implements compile-time JSON parsing for flat arrays of objects, mirroring the existing CSV functionality. JSON structure and types are inferred at compile time, producing type-safe Iterator[NamedTuple[K, V]] using Java's native JSON-P streaming API.

Key Changes:

Adds JVM-only JSON support using Jakarta JSON-P (JSR 374) streaming API for memory-efficient parsing
Implements compile-time type inference with strategies matching CSV (FromAllRows, FirstRow, FirstN, StringType, FromTuple)
Provides API methods matching CSV pattern: fromString, resource, absolutePath, pwd, url, and fromTyped

Reviewed changes

Copilot reviewed 18 out of 22 changed files in this pull request and generated 8 comments.

Show a summary per file

File	Description
scautable/src-jvm/StreamingJsonParser.scala	Core streaming parser using Java JSON-P for event-driven parsing without loading entire arrays into memory
scautable/src-jvm/JsonTable.scala	Main API entry point providing compile-time JSON parsing methods
scautable/src-jvm/JsonIterator.scala	Iterator implementation converting JSON objects to typed NamedTuples
scautable/src-jvm/JsonInferrerOps.scala	Type inference logic determining Int/Long/Double/Boolean/String types from JSON values
scautable/src-js/JsonStub.scala	Stub for JS platform (JSON support is JVM-only)
scautable/src/package.scala	Exports JsonTable object and removes CSV.* wildcard export
scautable/package.mill	Adds jakarta.json dependency and BSP configuration
scautable/test/src-jvm/JsonSuite.scala	Comprehensive test suite with 13 test cases covering type inference, null handling, missing fields, and nested structure detection
scautable/test/resources/*.json	Test fixtures including simple cases and large file (1214 rows)
examples/src/JsonExample.scala	Example demonstrating JSON parsing with Vega plotting integration
examples/resources/scatter.vg.json	Modified Vega-Lite specification (contains error)
examples/package.mill	Updates dedav4s dependency to stable version
.github/workflows/ci.yml	Refactors CI to use matrix strategy for JVM/JS testing and adds PR checks for examples/benchmarks
.vscode/settings.json	Adds file watcher exclusions for build artifacts
.gitignore	Tracks .vscode/settings.json
benchmark/package.mill	Disables BSP for benchmark module

Copilot · 2026-01-08T13:23:14Z

scautable/src-jvm/JsonTable.scala

+  * ]
+  * ```
+  */
+object JsonTable:


The JSON API uses JsonTable as the object name, which is inconsistent with the CSV object naming pattern used elsewhere in the codebase. For consistency and to match user expectations (as indicated by the documentation examples that reference JSON.fromString), consider renaming JsonTable to JSON.

This would make the API consistent:

CSV.fromString(...) for CSV files

JSON.fromString(...) for JSON files

Currently it's:

CSV.fromString(...) for CSV files

JsonTable.fromString(...) for JSON files

Copilot · 2026-01-08T13:23:14Z

.github/workflows/ci.yml

      - uses: actions/checkout@main
        with:
          fetch-depth: 0
          fetch-tags: true


The test job is missing a Java setup step. The matrix test jobs attempt to run Mill commands without first setting up Java 21, which is required for the project.

Add a Java setup step before the coursier cache action:

- uses: actions/setup-java@v4 with: java-version: 21 distribution: 'temurin'

Note that other jobs in the workflow (like site_pr_check and examples_benchmarks_pr_check) correctly include this setup step.

Suggested change

fetch-tags: true

fetch-tags: true

- uses: actions/setup-java@v4

with:

java-version: 21

distribution: 'temurin'

Copilot · 2026-01-08T13:23:15Z

scautable/test/src-jvm/JsonSuite.scala

+  test("larger file") {
+    val result = JsonTable.resource("mini-movies.json")
+    val data = result.toSeq
+    assert(data.length == 1214)
+
+  }


The test name "larger file" is not descriptive. Consider renaming it to something more specific like "should handle large JSON files with 1214 rows" or "JSON.resource should successfully parse mini-movies dataset" to make the test purpose clearer.

scautable/package.mill

Copilot · 2026-01-08T13:23:15Z

scautable/src/package.scala

@@ -4,7 +4,7 @@ import scala.quoted.*

 object table:
  export io.github.quafadas.scautable.CSV


The wildcard export export io.github.quafadas.scautable.CSV.* has been removed, which is a breaking change. This export was providing access to CSV methods and members, and removing it will break existing user code that relies on this export.

Consider restoring this export to maintain backward compatibility, or document this as a breaking change if it's intentional.

Suggested change

export io.github.quafadas.scautable.CSV

export io.github.quafadas.scautable.CSV

export io.github.quafadas.scautable.CSV.*

Copilot · 2026-01-08T13:23:16Z

examples/resources/scatter.vg.json

  "width": "container",
  "height": "container",
-  "data": { "url": "data/cars.json" },
+  "data": { "values": [{"url": "data/cars.json"}] },


This Vega-Lite data specification is incorrect. The change converts a valid URL reference into an invalid values array containing a URL object.

The original "data": { "url": "data/cars.json" } is correct Vega-Lite syntax for loading data from a URL.

The new "data": { "values": [{"url": "data/cars.json"}] } is incorrect - values expects actual data objects, not a URL reference.

If the intent is to use inline data (which seems to be the case based on the example code that uses result1.asJson), then this should remain as "data": { "values": [] } or just "data": {} if it will be populated programmatically.

Suggested change

"data": { "values": [{"url": "data/cars.json"}] },

"data": { "url": "data/cars.json" },

scautable/src-jvm/JsonTable.scala

Copilot · 2026-01-08T13:23:16Z

scautable/src-jvm/JsonIterator.scala

+  * Common usage:
+  *
+  * ```scala sc:nocompile
+  * val jsonIterator = JSON.fromString("""[{"a":1,"b":2},{"a":5,"b":3}]""")


The documentation example references JSON.fromString, but the actual object name is JsonTable. This should be updated to JsonTable.fromString to match the actual API.

Suggested change

* val jsonIterator = JSON.fromString("""[{"a":1,"b":2},{"a":5,"b":3}]""")

* val jsonIterator = JsonTable.fromString("""[{"a":1,"b":2},{"a":5,"b":3}]""")

Initial plan

b7e4c5b

Copilot AI assigned Copilot and Quafadas Jan 5, 2026

Copilot started work on behalf of Quafadas January 5, 2026 17:38 View session

Copilot AI and others added 3 commits January 5, 2026 17:49

Fix JSON example and add module dependency

282ab1b

- Fix JsonExample to handle non-optional id field correctly - Add scautable-json module dependency to examples - Verify example compiles and runs successfully Co-authored-by: Quafadas <24899792+Quafadas@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Add functionality to construct CSV iterator from JSON~~ Add JSON support with compile-time type inference Jan 5, 2026

Copilot AI requested a review from Quafadas January 5, 2026 17:55

Copilot finished work on behalf of Quafadas January 5, 2026 17:55

test scale

12c2ac1

Copilot started work on behalf of Quafadas January 7, 2026 10:32 View session

Copilot stopped work on behalf of Quafadas due to an error January 7, 2026 10:38
The session was cancelled by the user.

Copilot started work on behalf of Quafadas January 7, 2026 10:41 View session

Copilot stopped work on behalf of Quafadas due to an error January 7, 2026 10:44
The session was cancelled by the user.

Copilot started work on behalf of Quafadas January 7, 2026 10:46 View session

Copilot AI changed the title ~~Add JSON support with compile-time type inference~~ Add JSON support with compile-time type inference and streaming parser Jan 7, 2026

Copilot finished work on behalf of Quafadas January 7, 2026 10:56

Copilot started work on behalf of Quafadas January 7, 2026 14:01 View session

Copilot AI changed the title ~~Add JSON support with compile-time type inference and streaming parser~~ Add JSON support with compile-time type inference using Java JSON-P streaming API (JVM-only) Jan 7, 2026

Copilot finished work on behalf of Quafadas January 7, 2026 14:09

.

7f29216

Quafadas and others added 5 commits January 7, 2026 17:07

.

751b8be

out of order test

9d7427a

.

d8d4fcc

move into main module

7019338

Update exports

35fff08

Quafadas marked this pull request as ready for review January 8, 2026 13:15

Copilot AI review requested due to automatic review settings January 8, 2026 13:15

Copilot started reviewing on behalf of Quafadas January 8, 2026 13:16 View session

.

d74db9d

Copilot AI reviewed Jan 8, 2026

View reviewed changes

Quafadas added 2 commits January 8, 2026 14:26

.

eabe6e8

tidy

093a2a9

Quafadas merged commit a502b62 into main Jan 8, 2026
7 checks passed

-          fetch-tags: true
+          fetch-tags: true
+      - uses: actions/setup-java@v4
+        with:
+          java-version: 21
+          distribution: 'temurin'

		@@ -4,7 +4,7 @@ import scala.quoted.*

		object table:
		export io.github.quafadas.scautable.CSV

	export io.github.quafadas.scautable.CSV
	export io.github.quafadas.scautable.CSV
	export io.github.quafadas.scautable.CSV.*

	"data": { "values": [{"url": "data/cars.json"}] },
	"data": { "url": "data/cars.json" },

	* val jsonIterator = JSON.fromString("""[{"a":1,"b":2},{"a":5,"b":3}]""")
	* val jsonIterator = JsonTable.fromString("""[{"a":1,"b":2},{"a":5,"b":3}]""")

Uh oh!

Conversation

Copilot AI commented Jan 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Implementation

Usage

Design notes

Tests

Comments on the Issue (you are @copilot in this section)

Uh oh!

Quafadas commented Jan 7, 2026

Uh oh!

Quafadas commented Jan 7, 2026

Uh oh!

Quafadas commented Jan 7, 2026

Uh oh!

Copilot AI commented Jan 7, 2026

Uh oh!

Quafadas commented Jan 7, 2026

Uh oh!

Copilot AI commented Jan 7, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented Jan 5, 2026 •

edited

Loading