Conversation
- Add scautable-json module with ujson dependency - Implement JSON parsing for flat arrays of objects - Support compile-time type inference (FromAllRows, FirstRow, FirstN, StringType, FromTuple) - Handle null values as Option types - Extract headers from all objects in array - Add comprehensive tests for JSON functionality Co-authored-by: Quafadas <24899792+Quafadas@users.noreply.github.com>
- Add README.md for scautable-json module - Create JsonExample.scala with usage examples - Document type inference strategies - Show examples of handling missing fields and null values Co-authored-by: Quafadas <24899792+Quafadas@users.noreply.github.com>
- Fix JsonExample to handle non-optional id field correctly - Add scautable-json module dependency to examples - Verify example compiles and runs successfully Co-authored-by: Quafadas <24899792+Quafadas@users.noreply.github.com>
|
@copilot I added a test with a larger JSON file. Here, the file based strategies don't work, I believe because they attempt to read the entire file as a string. Could you re-work those strategies to work incrementally? in the first instance, see if this is possible with ujson. If it is not, then it may be worth changing the json library to jsoniter, which I believe support this kind of incremental reading of JSON. |
|
@copilot truncating the number of rows is not an acceptable solution. The user can control this if they want via the typeInferrer This macro will need to find a way to "stream" the JSON in a spiritually similar solution to the way the CSV works, line by line. Continue... |
|
@copilot switch the JSON library to jsoniter, I believe that does offer streaming solutions. |
- Replace ujson dependency with jsoniter-scala for better streaming support - Implement custom streaming JSON parser that reads objects incrementally - Parser manually identifies object boundaries and parses one at a time - At compile time, only reads as many objects as needed for type inference - At runtime, generates code that streams through the file without loading all into memory - All tests pass including the larger 1214-row mini-movies.json file - Maintains memory efficiency for large JSON files Co-authored-by: Quafadas <24899792+Quafadas@users.noreply.github.com>
Switched to jsoniter-scala and implemented a custom streaming JSON parser. The parser now:
The approach is similar to CSV's line-by-line reading. Commit: 769b679 |
|
@copilot unfortunatley I still found this unsatisfactory. Could we have another look, but this time we'll go JVM only, i.e. drop the JS cross compilation, but use the java JSON streaming API directly. https://www.oracle.com/technical-resources/articles/java/json.html It should pass the existing test suite. |
- Remove JS cross-compilation support - Replace jsoniter-scala with Java's native JSON-P (JSR 374) streaming API - Use jakarta.json (Glassfish implementation) for true streaming parsing - Implement proper streaming that reads JSON objects one at a time using JsonParser events - At compile time: Only reads as many objects as needed for type inference - At runtime: Streams through files using InputStream without loading into memory - All existing tests pass with the new implementation Co-authored-by: Quafadas <24899792+Quafadas@users.noreply.github.com>
Switched to Java's native JSON-P (JSR 374) streaming API and made the module JVM-only. The implementation now:
All existing tests pass. Commit: 4b17bc1 |
There was a problem hiding this comment.
Pull request overview
This PR implements compile-time JSON parsing for flat arrays of objects, mirroring the existing CSV functionality. JSON structure and types are inferred at compile time, producing type-safe Iterator[NamedTuple[K, V]] using Java's native JSON-P streaming API.
Key Changes:
- Adds JVM-only JSON support using Jakarta JSON-P (JSR 374) streaming API for memory-efficient parsing
- Implements compile-time type inference with strategies matching CSV (FromAllRows, FirstRow, FirstN, StringType, FromTuple)
- Provides API methods matching CSV pattern:
fromString,resource,absolutePath,pwd,url, andfromTyped
Reviewed changes
Copilot reviewed 18 out of 22 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| scautable/src-jvm/StreamingJsonParser.scala | Core streaming parser using Java JSON-P for event-driven parsing without loading entire arrays into memory |
| scautable/src-jvm/JsonTable.scala | Main API entry point providing compile-time JSON parsing methods |
| scautable/src-jvm/JsonIterator.scala | Iterator implementation converting JSON objects to typed NamedTuples |
| scautable/src-jvm/JsonInferrerOps.scala | Type inference logic determining Int/Long/Double/Boolean/String types from JSON values |
| scautable/src-js/JsonStub.scala | Stub for JS platform (JSON support is JVM-only) |
| scautable/src/package.scala | Exports JsonTable object and removes CSV.* wildcard export |
| scautable/package.mill | Adds jakarta.json dependency and BSP configuration |
| scautable/test/src-jvm/JsonSuite.scala | Comprehensive test suite with 13 test cases covering type inference, null handling, missing fields, and nested structure detection |
| scautable/test/resources/*.json | Test fixtures including simple cases and large file (1214 rows) |
| examples/src/JsonExample.scala | Example demonstrating JSON parsing with Vega plotting integration |
| examples/resources/scatter.vg.json | Modified Vega-Lite specification (contains error) |
| examples/package.mill | Updates dedav4s dependency to stable version |
| .github/workflows/ci.yml | Refactors CI to use matrix strategy for JVM/JS testing and adds PR checks for examples/benchmarks |
| .vscode/settings.json | Adds file watcher exclusions for build artifacts |
| .gitignore | Tracks .vscode/settings.json |
| benchmark/package.mill | Disables BSP for benchmark module |
| * ] | ||
| * ``` | ||
| */ | ||
| object JsonTable: |
There was a problem hiding this comment.
The JSON API uses JsonTable as the object name, which is inconsistent with the CSV object naming pattern used elsewhere in the codebase. For consistency and to match user expectations (as indicated by the documentation examples that reference JSON.fromString), consider renaming JsonTable to JSON.
This would make the API consistent:
CSV.fromString(...)for CSV filesJSON.fromString(...)for JSON files
Currently it's:
CSV.fromString(...)for CSV filesJsonTable.fromString(...)for JSON files
| - uses: actions/checkout@main | ||
| with: | ||
| fetch-depth: 0 | ||
| fetch-tags: true |
There was a problem hiding this comment.
The test job is missing a Java setup step. The matrix test jobs attempt to run Mill commands without first setting up Java 21, which is required for the project.
Add a Java setup step before the coursier cache action:
- uses: actions/setup-java@v4
with:
java-version: 21
distribution: 'temurin'Note that other jobs in the workflow (like site_pr_check and examples_benchmarks_pr_check) correctly include this setup step.
| fetch-tags: true | |
| fetch-tags: true | |
| - uses: actions/setup-java@v4 | |
| with: | |
| java-version: 21 | |
| distribution: 'temurin' |
| test("larger file") { | ||
| val result = JsonTable.resource("mini-movies.json") | ||
| val data = result.toSeq | ||
| assert(data.length == 1214) | ||
|
|
||
| } |
There was a problem hiding this comment.
The test name "larger file" is not descriptive. Consider renaming it to something more specific like "should handle large JSON files with 1214 rows" or "JSON.resource should successfully parse mini-movies dataset" to make the test purpose clearer.
| @@ -4,7 +4,7 @@ import scala.quoted.* | |||
|
|
|||
| object table: | |||
| export io.github.quafadas.scautable.CSV | |||
There was a problem hiding this comment.
The wildcard export export io.github.quafadas.scautable.CSV.* has been removed, which is a breaking change. This export was providing access to CSV methods and members, and removing it will break existing user code that relies on this export.
Consider restoring this export to maintain backward compatibility, or document this as a breaking change if it's intentional.
| export io.github.quafadas.scautable.CSV | |
| export io.github.quafadas.scautable.CSV | |
| export io.github.quafadas.scautable.CSV.* |
| "width": "container", | ||
| "height": "container", | ||
| "data": { "url": "data/cars.json" }, | ||
| "data": { "values": [{"url": "data/cars.json"}] }, |
There was a problem hiding this comment.
This Vega-Lite data specification is incorrect. The change converts a valid URL reference into an invalid values array containing a URL object.
The original "data": { "url": "data/cars.json" } is correct Vega-Lite syntax for loading data from a URL.
The new "data": { "values": [{"url": "data/cars.json"}] } is incorrect - values expects actual data objects, not a URL reference.
If the intent is to use inline data (which seems to be the case based on the example code that uses result1.asJson), then this should remain as "data": { "values": [] } or just "data": {} if it will be populated programmatically.
| "data": { "values": [{"url": "data/cars.json"}] }, | |
| "data": { "url": "data/cars.json" }, |
| * Common usage: | ||
| * | ||
| * ```scala sc:nocompile | ||
| * val jsonIterator = JSON.fromString("""[{"a":1,"b":2},{"a":5,"b":3}]""") |
There was a problem hiding this comment.
The documentation example references JSON.fromString, but the actual object name is JsonTable. This should be updated to JsonTable.fromString to match the actual API.
| * val jsonIterator = JSON.fromString("""[{"a":1,"b":2},{"a":5,"b":3}]""") | |
| * val jsonIterator = JsonTable.fromString("""[{"a":1,"b":2},{"a":5,"b":3}]""") |
Implements compile-time JSON parsing for flat arrays of objects, mirroring the existing CSV functionality. JSON structure and types are inferred at compile time, producing type-safe
Iterator[NamedTuple[K, V]]with zero runtime overhead.Implementation
scautable-json: JVM-only module using Java's native JSON-P (JSR 374) streaming APIfromString,resource,absolutePath,pwd,urlmethods matching CSV APIjakarta.json.stream.JsonParserfor true event-based streaming that reads JSON objects incrementally without loading entire arrays into memoryUsage
Design notes
arr.flatMap(_.obj.keys).toSetOption[T]typesjakarta.json.stream.JsonParserfor event-driven parsing (START_OBJECT, KEY_NAME, VALUE_*, END_OBJECT events)Tests
13 test cases covering type inference, null handling, missing fields, all data source methods, and large file handling.
Original prompt
✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.