Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

curl -Z overlapping and truncated output on stdout #5175

Closed
Earnestly opened this issue Apr 2, 2020 · 8 comments
Closed

curl -Z overlapping and truncated output on stdout #5175

Earnestly opened this issue Apr 2, 2020 · 8 comments

Comments

@Earnestly
Copy link

@Earnestly Earnestly commented Apr 2, 2020

I've been attempting to use curl -sZ with multiple urls as queries to twitch.tv which returns several lines of json such as:

{
    "streams": [
        {
            "_id": 1234567890,
            "channel": {
                "name": "foobar",
            }
        },
        {
            ...
        }
    ]
}
{
    "streams": [
        ...
    ]
}

All of this is done before piping the whole lot through jq and filtering as needed. The trouble is jq will randomly report errors such as:

jq: parse error: Invalid numeric literal at line 1, column 13708

If I were to look at that particular column, for example, I can clearly see the output has been truncated or interwoven somehow, e.g.

... banner":null,"{"streams": ...

I'm not sure how to prevent this, or provide a test case easily, but I couldn't find anything on google or the existing issues about potential output truncation or "weaving" when using -Z.

Without -Z it works correctly, albeit much more slowly.

curl 7.70.0-DEV (x86_64-pc-linux-gnu) libcurl/7.70.0-DEV OpenSSL/1.1.1f zlib/1.2.11 libidn2/2.3.0 libpsl/0.21.0 (+libidn2/2.2.0) nghttp2/1.40.0 librtmp/2.3
Release-Date: [unreleased]
Protocols: dict file ftp ftps gopher http https imap imaps pop3 pop3s rtmp rtsp smb smbs smtp smtps telnet tftp 
Features: AsynchDNS GSS-API HTTP2 HTTPS-proxy IDN IPv6 Kerberos Largefile libz NTLM NTLM_WB PSL SPNEGO SSL TLS-SRP UnixSockets
Linux teapot 5.5.13-arch1-1 #1 SMP PREEMPT Wed, 25 Mar 2020 16:04:40 +0000 x86_64 GNU/Linux
@bagder
Copy link
Member

@bagder bagder commented Apr 2, 2020

-Z means you get the URLs in parallel. In parallel means the transfers are done concurrently. Of course they will be intermixed if you pass them to stdout as all the transfers will then output their data as soon as possible to the same output stream. They will even be mixed very unpredictably.

@Earnestly
Copy link
Author

@Earnestly Earnestly commented Apr 2, 2020

I don't think there's necessarily any reason for it to be a given as tools such as (GNU) make have strategies for dealing with how output is arranged.

I imagined curl would make some effort to ensure the output from a request is collected together rather than interspersed with the output from other requests. This would be similar to make --output-sync=target (which is the default).

@bagder
Copy link
Member

@bagder bagder commented Apr 2, 2020

I don't see how curl could do this in any effective or good way (I'm open to get proven wrong), but perhaps more important there's no such thing now and no promise of it.

@jay
Copy link
Member

@jay jay commented Apr 2, 2020

I think we should shut off -Z for stdout. What is the use case?

@dfandrich
Copy link
Collaborator

@dfandrich dfandrich commented Apr 3, 2020

@jay
Copy link
Member

@jay jay commented Apr 3, 2020

Sure, but that output can be mixed together. I guess it is a use case if that doesn't matter.

@Earnestly
Copy link
Author

@Earnestly Earnestly commented Apr 3, 2020

Collecting line-based output from several servers at once, maybe?

This seems the most ideal. If output can be complete lines then that will make it usable in the unix shell environment. The order of output doesn't matter in my case, although if it did then curl would probably have to store everything until all requests are complete (e.g. sort).

@bagder
Copy link
Member

@bagder bagder commented Apr 3, 2020

That's a feature-request and not a bug. There might be use-cases where such a feature would come in handy, even if JSON seems to be a data format that is not line oriented and that would easily broke even in such a scenario. If someone wants to work on adding such a feature to curl I won't block it and I could assist, but I will personally not put it on my near-term TODO.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants