-
-
Notifications
You must be signed in to change notification settings - Fork 6.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fails to get the page from www.washingtonpost.com/wp-srv but wget works #12643
Comments
On Sat, Jan 06, 2024 at 09:01:43AM -0800, janko-js wrote:
I did this
curl -q --ipv4 -v https://www.washingtonpost.com/wp-srv/politics/special/
clinton/stories/bctest092198_1.htm -o 10.html >e1_0 2>e_2_0
I expected the following
To receive a content of the web page on this address. Instead, the file 10.html
has 0 (zero) bytes.
< HTTP/2 302
The server is sending a redirect to another URL. Add the -L option to have curl
follow it.
|
See in the first message, I've tried that too, and it doesn't help:
|
curl is acting correctly here according to how the server tells it You might get it to do what you want by also enabling cookies with |
Thanks, I didn't know how to see that that was needed. I assume the current behavior of curl is "by design" and it is expected from the users to understand that (I wasn't able to figure that out from the output). Using that assumption I'm closing the issue. |
One more question: What also confuses me now is that curl takes 0m 9.46s of real time to fetch the page but wget gets the page in 0m 0.08s real time, that is, curl is still 118 times slower than wget. Can that slow get be avoided with some switch too? |
That is because of the server. I checked in Wireshark to confirm. Probably it is some CDN server that does not cache the page for some agents or waits x number of times etc. I see a 9s delay with both curl and wget probably because mine has a different user agent string. If I run the curl command with |
Yes, that's it, the server changes its reactions, many thanks! I also see the consistent behavior when I match the user agent strings. One additional cause of confusion by me during these experiments: an empty -b in the command line "activates the cookie engine" even without specifying the input file but adding just the -b in the .curlrc results in a warning and "the cookie engine" probably remains non active? I've had to see more failures and to think more until I've finally tried to put the explicit null file name in the .curlrc and the behavior matched one in the command line, where I've had simply
and it "worked" but not with
in the .curlrc. Now I know that too, in Windows it has to be
etc. Thanks! |
The argument that comes after -b is the required option value, -b <data|filename>. You can enable the cookie engine with an empty string like |
I don't doubt that it's a "required option value", I'm just reporting that
I haven't investigated why the behaviors differ, and if it is platform specific (I've tried on Windows), and that's what I believe I've observed, experimenting with this site which, if I understand correctly, insists on the "cookie engine" being on for that redirection step to "work". Specifically, I believe that instead
one can, from the command line, just write:
without noticing that it's officially required. And I, personally, like the possibility to write:
|
The option value is required. That will open cookie filename |
Thanks a lot Jay! That's completely new concept to me: "-L is treated as an option value and not an option" - it never occurred to me. I've naively thought that the options are recognized before it is attempted to read their parameters, and that therefore -L couldn't be considered as a file. Now when I think about why, I guess that the probable rationale was that one should be able to use the file names with the starting '-' for these parameters without some additional methods being needed for that. I also have no idea how many other programs use the same approach, clearly because I've never even thought about that. |
I've submitted #12646 to add |
- Explain that --cookie "" can be used to enable the cookie engine without reading any initial cookies. As is documented in CURLOPT_COOKIEFILE. Ref: https://curl.se/libcurl/c/CURLOPT_COOKIEFILE.html Bug: curl#12643 (comment) Reported-by: janko-js@users.noreply.github.com Closes #xxxx
Thanks! Now when the man page is a topic: I also haven't known that the convention of the curl man page is that < > brackets mark a required option argument, now that I think about it, maybe indeed they influenced me to somehow expect some "optionality" as I'm more used to read things like:
(from man man) with an argument in both cases and without < > than
from man curl where the short version also doesn't have the argument. It could have contributed to confusion. Also no idea which other man uses the curl-like conventions. |
- Explain that --cookie "" can be used to enable the cookie engine without reading any initial cookies. As is documented in CURLOPT_COOKIEFILE. Ref: https://curl.se/libcurl/c/CURLOPT_COOKIEFILE.html Bug: #12643 (comment) Reported-by: janko-js@users.noreply.github.com Closes #12646
result_files.zip
I did this
I expected the following
To receive a content of the web page on this address. Instead, the file 10.html has 0 (zero) bytes.
After the execution of the above command the file e_2_0 contains:
Using wget, e.g.:
stores the expected page in w.html of 29 KB.
Modifying the curl request to
The above command results again in not producing the 13.html file and the file e_2_3 is:
Trying to add --http1.1 also doesn't change the outcome, even if it uses HTTP/1.1, so it's not HTTP/2 related.
The original resulting files attached:
result_files.zip
curl/libcurl version
curl 8.5.0 (x86_64-w64-mingw32) libcurl/8.5.0 LibreSSL/3.8.2 (Schannel) zlib/1.3 brotli/1.1.0 zstd/1.5.5 WinIDN libssh2/1.11.0 nghttp2/1.58.0 ngtcp2/1.1.0 nghttp3/1.1.0
Release-Date: 2023-12-06
Protocols: dict file ftp ftps gopher gophers http https imap imaps ldap ldaps mqtt pop3 pop3s rtsp scp sftp smb smbs smtp smtps telnet tftp ws wss
Features: alt-svc AsynchDNS brotli HSTS HTTP2 HTTP3 HTTPS-proxy IDN IPv6 Kerberos Largefile libz MultiSSL NTLM SPNEGO SSL SSPI threadsafe UnixSockets zstd
operating system
Windows 8.1
The text was updated successfully, but these errors were encountered: