Skip to content

"Line is too long" for JSON-lines from Kubernetes API #4453

Closed
@nolar

Description

Long story short

Iterating over long lines in a streaming response fails in StreamReader with ValueError: Line is too long.

It affects all the APIs that return JSON-lines format, such as Kubernetes API, where a JSON-serialised object can be much more than current limit of 128 KB — e.g. 2 MB.

Expected behaviour

Any buffer size is supported, or the buffer size is configurable per session/request/response.

Actual behaviour

The buffer size is limited to lo-watermark 64 KB, hi-watermark 128 KB, and is not configurable except as by hacking the protected properties (response.content._high_water in 2019, was response.content._limit in 2017) or global constants (DEFAULT_LIMIT).

Steps to reproduce

See zalando-incubator/kopf#275 and zalando-incubator/kopf#276 — a workaround with a self-made line iterator.

For example, have a Kubernetes cluster, create a secret with 2MB encrypted and base64-encoded line, and try to watch over the secrets — the content is a JSON-lines streaming response.

Any other artificial ways to use JSON-lines will show this same issue.

The JSON-lines format implies that the objects cannot be split to multiple lines anyway, i.e. it is one object per line — which is a problem for aiohttp as a client.

UPD: Here is an isolated example to reproduce the issue:

# pip install pytest pytest-asyncio aresponses
# pytest -s -vv _thisfile.py

import aiohttp
import json
import pytest


async def client_fn():
    session = aiohttp.ClientSession()
    async with session:
        response = await session.get('http://xyz/path')
        async with response:
            async for line in response.content:
                print('===', len(line))
                assert len(line) >= 1 * 1024 * 1024


@pytest.mark.asyncio
async def test_long_line(event_loop, aresponses):
    big_objs = [
        {'spec': {'field': 'x' * 1 * 1024 * 1024}},
        {'spec': {'field': 'y' * 1 * 1024 * 1024}},
        {'spec': {'field': 'z' * 1 * 1024 * 1024}},
    ]
    content = ''.join([f'{json.dumps(obj)}\n' for obj in big_objs])
    aresponses.add('xyz', '/path', 'get', content)
    await client_fn()

Your environment

aiohttp==3.6.2 and newer.
client.

Links

There is already #2216 with exactly the same issue. However, it is closed and locked due to the issue happening in someone's internal tool.

I have decided to re-create the issue, since the problem affects all the client-side usage of aiohttp with JSON-lines and specifically with Kubernetes; and I could not comment in the locked issue.

Usage of the self-made line iterators, however, increases the memory footprint of the process, as the buffer/accumulator should be stored locally in the iterator in addition to the StreamReader, and so the long line that is going to be yielded. It is better if this is done internally.

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions