Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

http: add support to read and store the referrer header #6591

Closed
wants to merge 1 commit into from

Conversation

vszakats
Copy link
Member

@vszakats vszakats commented Feb 11, 2021

This patch implements:

  • CURLINFO_REFERER libcurl option
  • --write-out '%{referer}' command-line option
  • --xattr to fill user.xdg.referrer.url extended attribute with the referrer. This extended attribute is supported by wget and the referrer value is also stored by at least Safari (albeit in a platform-specific format.)

This can also be useful to retain a meaningful download URL where the initial URL is being redirected to a temporary one, e.g. with GitHub releases:

$ curl --xattr --location --referer ';auto' --remote-name \
  https://github.com/curl/curl/releases/download/curl-7_75_0/curl-7.75.0.tar.xz
$ xattr -l curl-7.75.0.tar.xz
[...]
user.xdg.referrer.url: https://github.com/curl/curl/releases/download/curl-7_75_0/curl-7.75.0.tar.xz

Implemented:

  • rename "referrer" to "referer" to match existing names
  • test examples
  • delete "URL"
  • possibly add a regression test

Any comments/suggestions are welcome.

@vszakats vszakats changed the title [WIP] add support to read and store the referer url [WIP] add support to read and store the referrer url Feb 11, 2021
@vszakats vszakats changed the title [WIP] add support to read and store the referrer url [WIP] http: add support to read and store the referrer url Feb 11, 2021
@vszakats
Copy link
Member Author

vszakats commented Feb 11, 2021

Any opinions if using "URL" in this context is correct or should we just use 'referer'? [YES] After all the referrer field can contain any string, not just a URL.

Next question, if stripcredentials() should in this case be applied to the referer string before saving it to the user.xdg.referrer.url file attribute (or should this be done by the ;auto logic for data->set.http_auto_referer?). [NO]

@vszakats vszakats changed the title [WIP] http: add support to read and store the referrer url [WIP] http: add support to read and store the referrer header Feb 11, 2021
@bagder
Copy link
Member

bagder commented Feb 11, 2021

Any opinions if using "URL" in this context is correct or should we just use 'referer'?

I vote for just referer.

Next question, if stripcredentials() should in this case be applied to the referer string before saving it to the user.xdg.referrer.url file attribute (or should this be done by the ;auto logic for data->set.http_auto_referer?).

Hm. That's a good point. I think we should make sure any credentials get stripped off already in the internal logic.

@vszakats vszakats changed the title [WIP] http: add support to read and store the referrer header http: add support to read and store the referrer header Feb 11, 2021
@vszakats vszakats force-pushed the xattreferer branch 3 times, most recently from 1dd7300 to 93141d1 Compare February 16, 2021 10:24
@vszakats
Copy link
Member Author

Still some failures here and there, but couldn't tell conclusively that they have something to do with the changes in this PR:
1206 (2x), 1503, 579, 1188, some torture tests.

@jay
Copy link
Member

jay commented Feb 16, 2021

i've noticed seemingly random failures in the torture tests recently i'm not sure what that's about i'll take a look

@bagder
Copy link
Member

bagder commented Feb 17, 2021

I've seen test 2100 fail in torture tests recently but when I've tried to reproduce locally I have not managed ... 😞

@jay
Copy link
Member

jay commented Feb 17, 2021

I found one problem in ASAN, exit on leak disrupts closing the memdebug file so it reports wrong results. Reported as google/sanitizers#1374. Could be solved in curl with an atexit to fclose the file. or with MEMDEBUG_LOG_SYNC but IMO we should close it explicitly. I'm not sure if valgrind does the same type of abort when a leak is detected.

Second problem (an actual leak) has something to do with the proxy, all the arbitrary failed torture tests are proxy I noticed. 264 335 552 try -t on any of those. If I had to guess it's 46620b9 since that is a close proxy one and also I randomly picked the CI of some earlier commits and don't see anything similar. 264 I can reproduce consistently, and so far I've traced it back to this call returning OOM, which causes something to be missed at cleanup:

curl/lib/url.c

Lines 2411 to 2412 in 94719e7

result = Curl_setstropt(&data->state.aptr.proxyuser,
proxyuser);

** MEMORY FAILURE
Leak detected: memory still allocated: 5 bytes
At 603000022a28, there's 5 bytes.
 allocated by escape.c:159
LIMIT setopt.c:66 strdup reached memlimit
 ==4694==ERROR: LeakSanitizer: detected memory leaks
 Direct leak of 21 byte(s) in 1 object(s) allocated from:
     #0 0x7fbcf6087602 in malloc (/usr/lib/x86_64-linux-gnu/libasan.so.2+0x98602)
     #1 0x7fbcf590374a in curl_dbg_malloc /home/owner/curl/lib/memdebug.c:126
     #2 0x7fbcf58798d3 in Curl_urldecode /home/owner/curl/lib/escape.c:159
     #3 0x7fbcf59fd1c6 in curl_url_get /home/owner/curl/lib/urlapi.c:1198
     #4 0x7fbcf59e418c in parse_proxy /home/owner/curl/lib/url.c:2417
     #5 0x7fbcf59e5795 in create_conn_helper_init_proxy /home/owner/curl/lib/url.c:2612
     #6 0x7fbcf59eecf6 in create_conn /home/owner/curl/lib/url.c:3577
     #7 0x7fbcf59f41b3 in Curl_connect /home/owner/curl/lib/url.c:4063
     #8 0x7fbcf5928b83 in multi_runsingle /home/owner/curl/lib/multi.c:1672
     #9 0x7fbcf592d467 in curl_multi_perform /home/owner/curl/lib/multi.c:2414
     #10 0x7fbcf5874a12 in easy_transfer /home/owner/curl/lib/easy.c:607
     #11 0x7fbcf587516d in easy_perform /home/owner/curl/lib/easy.c:698
     #12 0x7fbcf587528c in curl_easy_perform /home/owner/curl/lib/easy.c:718
     #13 0x451447 in serial_transfers /home/owner/curl/src/tool_operate.c:2348
     #14 0x45267b in run_all_transfers /home/owner/curl/src/tool_operate.c:2523
     #15 0x452eef in operate /home/owner/curl/src/tool_operate.c:2640
     #16 0x436870 in main /home/owner/curl/src/tool_main.c:279
     #17 0x7fbcf40c983f in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2083f)
 SUMMARY: AddressSanitizer: 21 byte(s) leaked in 1 allocation(s).

edit: ignore the line numbers I sprinkled in a lot of printfs..

@vszakats
Copy link
Member Author

@jay: Thank you for investigating!

Any thoughts if we should make this PR pending till these get resolved, or do you think this looks fine enough to merge now?

bagder added a commit that referenced this pull request Feb 17, 2021
@bagder
Copy link
Member

bagder commented Feb 17, 2021

The 264 problem should hopefully be fixed with #6614...

bagder added a commit that referenced this pull request Feb 17, 2021
bagder added a commit that referenced this pull request Feb 17, 2021
Reported-by: Jay Satiro
Reviewed-by: Jay Satiro
Reviewed-by: Emil Engler

Closes #6614
Bug: #6591 (comment)
jay added a commit to jay/curl that referenced this pull request Feb 17, 2021
- Use atexit to register a dbg cleanup function that closes the logfile.

ASAN/LSAN calls _exit() instead of exit() on error so the logfile must
be closed explicitly on exit or data could be lost.

Prior to this change the logfile was not explicitly closed so it was
possible that if LSAN detected a leak and called _exit (which does
not flush or close files like exit) then the logfile could be missing
data. That could then cause curl's memanalyze to report false leaks
(eg a malloc was recorded to the logfile but the corresponding free was
discarded from the buffer instead of written to the logfile, then
memanalyze reports that as a leak).

Ref: google/sanitizers#1374

Bug: curl#6591 (comment)

Closes #xxxx
@vszakats
Copy link
Member Author

vszakats commented Feb 18, 2021

One permanent-looking CI issue is that the Cirrus task for FreeBSD 12.1 fails at pkginstall:

pkg update -f
Updating FreeBSD repository catalogue...
Fetching meta.txz: . done
Fetching packagesite.txz: .......... done
Processing entries: 
Newer FreeBSD version for package py37-mlt:
To ignore this error set IGNORE_OSVERSION=yes
- package: 1202000
- running kernel: 1201000
Ignore the mismatch and continue? [Y/n]: pkg: repository FreeBSD contains packages for wrong OS version: FreeBSD:12:amd64
Processing entries... done
Unable to update repository FreeBSD
Error updating repositories!

Opened a PR that deletes CI builds with this image, following the suggestion of FreeBSD forum thread.

@@ -93,6 +93,9 @@ When an HTTP request was made without --location to follow redirects (or when
--max-redir is met), this variable will show the actual URL a redirect
\fIwould\fP have gone to. (Added in 7.18.2)
.TP
.B referer
The referrer header, if there was any. (Added in 7.76.0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd stick to referer spelling (one r) throughout so it's less confusing

Copy link
Member Author

@vszakats vszakats Feb 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My rule of thumb was to use the rr flavour in human sentences and the other one in code. (this also kind-of matches existing use). In this particular case we can work it around by going The Referer: header, if there was any. (Added in 7.76.0). This style can be found in CURLOPT_AUTOREFERER and CURLOPT_REFERER-related docs.

Does that sound OK?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both spellings seem to be used, see --referer using referrer for example. Personally I find it confusing but there seems to be some precedent for it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it isn't used fully consistently as it is. I've pushed an update for this specific line here, which I'm hoping makes it look better.

@vszakats
Copy link
Member Author

vszakats commented Feb 18, 2021

Much better CI results so far. There is one test consistently failing for c-ares builds:

test 1188...[--write-out with %{onerror} and %{urlnum} to stderr]

 1188: stderr FAILED:
--- log/check-expected	2021-02-18 12:53:57.308870137 +0000
+++ log/check-generated	2021-02-18 12:53:57.308870137 +0000
@@ -1 +1 @@
-0 says 6 Could not resolve host: non-existing-host.haxx.se[LF]
+0 says 6 Could not resolve: non-existing-host.haxx.se (Domain name not found)[LF]
== Contents of files in the log/ dir after test 1188

@bagder
Copy link
Member

bagder commented Feb 18, 2021

Yeah, my fix for 1188 was not good enough. I landed #6626 since then should fix that too.

- add CURLINFO_REFERER libcurl option
- add --write-out '%{referer}' command-line option
- extend --xattr command-line option to fill user.xdg.referrer.url extended
  attribute with the referrer (if there was any)

Closes #xxxx
@vszakats
Copy link
Member Author

vszakats commented Feb 18, 2021

Remaining failures:

  • msys1_mingw_debug_openssl: test595 (FTP)
  • msys2_mingw32_debug_schannel: timed out possibly?

It seems unlikely these have something to do with this patch, and they were happening quite randomly in the past few days for all CI passes. msys1_mingw_debug_openssl tended to fail on different, single tests in this period.

jay added a commit to jay/curl that referenced this pull request Feb 18, 2021
- Use atexit to register a dbg cleanup function that closes the logfile.

LeakSantizier (LSAN) calls _exit() instead of exit() when a leak is
detected on exit so the logfile must be closed explicitly or data could
be lost. Though _exit() does not call atexit handlers such as this,
LSAN's call to _exit() comes after the atexit handlers are called.

Prior to this change the logfile was not explicitly closed so it was
possible that if LSAN detected a leak and called _exit (which does
not flush or close files like exit) then the logfile could be missing
data. That could then cause curl's memanalyze to report false leaks
(eg a malloc was recorded to the logfile but the corresponding free was
discarded from the buffer instead of written to the logfile, then
memanalyze reports that as a leak).

Ref: google/sanitizers#1374

Bug: curl#6591 (comment)

Closes #xxxx
@jay
Copy link
Member

jay commented Feb 19, 2021

It seems unlikely these have something to do with this patch, and they were happening quite randomly in the past few days for all CI passes. msys1_mingw_debug_openssl tended to fail on different, single tests in this period.

random ci weirdness. i restarted them and they passed

@vszakats vszakats closed this in 44872ae Feb 19, 2021
@vszakats vszakats deleted the xattreferer branch February 19, 2021 13:58
vszakats added a commit that referenced this pull request Feb 19, 2021
jay added a commit that referenced this pull request Feb 20, 2021
- Use atexit to register a dbg cleanup function that closes the logfile.

LeakSantizier (LSAN) calls _exit() instead of exit() when a leak is
detected on exit so the logfile must be closed explicitly or data could
be lost. Though _exit() does not call atexit handlers such as this,
LSAN's call to _exit() comes after the atexit handlers are called.

Prior to this change the logfile was not explicitly closed so it was
possible that if LSAN detected a leak and called _exit (which does
not flush or close files like exit) then the logfile could be missing
data. That could then cause curl's memanalyze to report false leaks
(eg a malloc was recorded to the logfile but the corresponding free was
discarded from the buffer instead of written to the logfile, then
memanalyze reports that as a leak).

Ref: google/sanitizers#1374

Bug: #6591 (comment)

Closes #6620
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

None yet

3 participants