Fails to get the page from www.washingtonpost.com/wp-srv but wget works #12643
I did this
I expected the following
To receive a content of the web page on this address. Instead, the file 10.html has 0 (zero) bytes.
After the execution of the above command the file e_2_0 contains:
Using wget, e.g.:
stores the expected page in w.html of 29 KB.
Modifying the curl request to
The above command results again in not producing the 13.html file and the file e_2_3 is:
Trying to add --http1.1 also doesn't change the outcome, even if it uses HTTP/1.1, so it's not HTTP/2 related.
The original resulting files attached:
curl 8.5.0 (x86_64-w64-mingw32) libcurl/8.5.0 LibreSSL/3.8.2 (Schannel) zlib/1.3 brotli/1.1.0 zstd/1.5.5 WinIDN libssh2/1.11.0 nghttp2/1.58.0 ngtcp2/1.1.0 nghttp3/1.1.0
The text was updated successfully, but these errors were encountered:
On Sat, Jan 06, 2024 at 09:01:43AM -0800, janko-js wrote: I did this curl -q --ipv4 -v https://www.washingtonpost.com/wp-srv/politics/special/ clinton/stories/bctest092198_1.htm -o 10.html >e1_0 2>e_2_0 I expected the following To receive a content of the web page on this address. Instead, the file 10.html has 0 (zero) bytes.
< HTTP/2 302 The server is sending a redirect to another URL. Add the -L option to have curl follow it.
See in the first message, I've tried that too, and it doesn't help:
That is because of the server. I checked in Wireshark to confirm. Probably it is some CDN server that does not cache the page for some agents or waits x number of times etc. I see a 9s delay with both curl and wget probably because mine has a different user agent string. If I run the curl command with
Yes, that's it, the server changes its reactions, many thanks! I also see the consistent behavior when I match the user agent strings.
One additional cause of confusion by me during these experiments:
an empty -b in the command line "activates the cookie engine" even without specifying the input file
adding just the -b in the .curlrc results in a warning and "the cookie engine" probably remains non active?
I've had to see more failures and to think more until I've finally tried to put the explicit null file name in the .curlrc and the behavior matched one in the command line, where I've had simply
and it "worked" but not with
in the .curlrc. Now I know that too, in Windows it has to be
I don't doubt that it's a "required option value", I'm just reporting that
I haven't investigated why the behaviors differ, and if it is platform specific (I've tried on Windows), and that's what I believe I've observed, experimenting with this site which, if I understand correctly, insists on the "cookie engine" being on for that redirection step to "work".
Specifically, I believe that instead
one can, from the command line, just write:
without noticing that it's officially required. And I, personally, like the possibility to write:
Thanks a lot Jay! That's completely new concept to me: "-L is treated as an option value and not an option" - it never occurred to me. I've naively thought that the options are recognized before it is attempted to read their parameters, and that therefore -L couldn't be considered as a file. Now when I think about why, I guess that the probable rationale was that one should be able to use the file names with the starting '-' for these parameters without some additional methods being needed for that. I also have no idea how many other programs use the same approach, clearly because I've never even thought about that.
Now when the man page is a topic:
I also haven't known that the convention of the curl man page is that < > brackets mark a required option argument, now that I think about it, maybe indeed they influenced me to somehow expect some "optionality" as I'm more used to read things like:
(from man man) with an argument in both cases and without < > than
from man curl where the short version also doesn't have the argument. It could have contributed to confusion. Also no idea which other man uses the curl-like conventions.