Parse JSON incrementally as it streams in, e.g. from a network request or a language model. Gives you a sequence of increasingly complete values.
This is a Python port of the TypeScript jsonriver library.
- Incremental parsing: Get progressively complete JSON values as data arrives
- Zero dependencies: Uses only Python standard library
- Fully typed: Complete type hints with mypy strict mode compliance
- Memory efficient: Reuses objects and arrays when possible
- Correct: Final result matches
json.loads()
exactly - Fast: Optimized for performance with minimal overhead
Using uv:
uv add jsonriver
Using pip:
pip install jsonriver
Using uv:
git clone https://github.com/chrisschnabl/streamjson.git
cd streamjson
uv pip install -e .
Using pip:
git clone https://github.com/chrisschnabl/streamjson.git
cd streamjson
pip install -e .
import asyncio
import json
from jsonriver import parse
async def make_stream(text: str, chunk_size: int):
"""Simulate a streaming source"""
for i in range(0, len(text), chunk_size):
yield text[i:i + chunk_size]
async def main():
json_str = '{"name": "Alice", "age": 30}'
stream = make_stream(json_str, chunk_size=3)
async for value in parse(stream):
print(json.dumps(value))
# Output shows incremental results:
# {}
# {"name": "Al"}
# {"name": "Alice"}
# {"name": "Alice", "age": 30.0}
asyncio.run(main())
jsonriver yields a sequence of increasingly complete JSON values. Consider this JSON:
{"name": "Alex", "keys": [1, 20, 300]}
If you parse this one byte at a time, it would yield:
{}
{"name": ""}
{"name": "A"}
{"name": "Al"}
{"name": "Ale"}
{"name": "Alex"}
{"name": "Alex", "keys": []}
{"name": "Alex", "keys": [1]}
{"name": "Alex", "keys": [1, 20]}
{"name": "Alex", "keys": [1, 20, 300]}
The library maintains these guarantees:
- Type stability: Future versions will have the same type (never changes string → array)
- Atomic values:
null
,true
,false
, and numbers are only yielded when complete - String growth: Strings may be replaced with longer versions
- Array append-only: Arrays only modified by appending or mutating the last element
- Object append-only: Objects only modified by adding properties or mutating the last one
- Complete keys: Object properties only added once key and value type are known
The parser throws errors for invalid JSON, matching json.loads()
behavior:
async def example_error():
try:
stream = make_stream('{"invalid": }', 1)
async for value in parse(stream):
print(value)
except ValueError as e:
print(f"Parse error: {e}")
# Create virtual environment and install dependencies
uv venv
uv pip install -e ".[dev]"
# Run all tests
python -m pytest tests/ -v
# Run specific test file
python -m pytest tests/test_parse.py -v
# Run with coverage
python -m pytest tests/ --cov=src/jsonriver
# Check types with mypy
mypy src/jsonriver --strict
python example_jsonriver.py
src/jsonriver/
__init__.py # Public API exports
parse.py # JSON parser implementation
tokenize.py # JSON tokenizer implementation
tests/
test_parse.py # Parser tests
test_tokenize.py # Tokenizer tests
test_cross_validate.py # Cross-validation with TypeScript
utils.py # Test utilities
bench/
python-bench.py # Full file parsing benchmarks
streaming-bench.py # Streaming parsing benchmarks
README.md # Benchmark results and analysis
Incrementally parse a single JSON value from the given iterable of string chunks.
Parameters:
stream
: An async iterator that yields string chunks containing JSON data
Yields:
- Increasingly complete JSON values as more input is parsed
Raises:
ValueError
: If the input is not valid JSONRuntimeError
: For internal parsing errors
Example:
async def parse_json():
json_str = '{"a": 1, "b": 2}'
async def stream():
for char in json_str:
yield char
async for value in parse(stream()):
print(value)
JsonValue = Union[
None,
bool,
float,
str,
list['JsonValue'],
dict[str, 'JsonValue']
]
JsonObject = dict[str, JsonValue]
jsonriver is optimized for streaming scenarios, not batch parsing:
- Time-to-first-value: 25x faster than json.loads when data arrives in chunks
- Progressive updates: Provides 300+ incremental updates for large files
- User responsiveness: Shows partial results immediately vs waiting for complete data
# Full file parsing comparison
python bench/python-bench.py
# Streaming scenario comparison
python bench/streaming-bench.py
Full file parsing: json.loads
is ~35x faster (expected, as it's C-based)
Streaming parsing: jsonriver is ~25x faster to first value (the key advantage)
See bench/README.md for detailed benchmark results and analysis.
- Streaming APIs: Parse JSON from network requests as data arrives
- Large payloads: Start processing data before complete response
- Real-time UIs: Update UI as JSON parses
- LLM responses: Parse structured output from language models
- Progress indicators: Show parsing progress to users
- Server-sent events: Handle JSON in SSE streams
Feature | jsonriver | json.loads | ijson |
---|---|---|---|
Incremental parsing | ✅ | ❌ | ✅ |
Complete values | ✅ | ✅ | ❌ |
No dependencies | ✅ | ✅ | ❌ |
Type hints | ✅ | ✅ | ❌ |
Memory efficient | ✅ | ❌ | ✅ |
BSD-3-Clause License
- Original TypeScript implementation: Copyright (c) 2023 Google LLC
- Python port: Copyright (c) 2024 jsonriver-python contributors
See LICENSE file for full license text.
This is a Python port of the excellent jsonriver TypeScript library by Peter Burns (@rictic).
Contributions are welcome! Please ensure:
- All tests pass:
pytest tests/ -v
- Type checking passes:
mypy src/jsonriver --strict
- Code follows existing style
- New features include tests
- Initial Python port from TypeScript
- Full type hints with mypy strict mode
- Comprehensive test suite (37 tests)
- Complete documentation
- Zero dependencies