-
Notifications
You must be signed in to change notification settings - Fork 124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Resolve compilation errors and add CI task #224
Conversation
- avoids openssl compilation issues on some systems (e.g., ubuntu-22.04) - no measured impact to compilation times in my testing - same tests pass/fail as before
- Prevents compilation errors from hiding behind feature flags - Keeps main branch free of warnings - Note that a test fails to compile and will be fixed in the next commit
- Moved tests from 'hashing_test.rs' to 'network.rs' - Test hasn't compiled since 2019 - Everything should compile now - 'live.rs' tests could not be run due to BRAVE_SERVICE_KEY
it worked fine locally...
@antonok-edm, let me know if I can make the review easier for you or if I can clarify anything. FWIW, the hash collision test on domain names makes little sense after the 2019 commit, and the test could be removed rather than rewritten. That said, there's admittedly some value in checking the domain filter hashing logic, but the logic is not easy to access because of how The other test that came from |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for taking a look at this! Lots of good changes here; I left a few comments
- build step can be run by CI or locally, denying warnings in both cases - fixed warnings in 'fuzz' package - incorporates feedback from PR
Converting this to a Draft PR to address two platform-specific issues:
(the failed GitHub Actions run) IDK yet how I'm going to fix those. I don't have a Mac or Windows Server machine to test on. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@denosaurtrain I'm alright with not building fuzz
on Windows for now, if it's just not possible (this is still much better than what we had before).
@antonok-edm lemme know if you want to chat on Slack/Discord/whatever to close the feedback loop on this one. The only Brave IM server I found is https://discord.gg/batbrigade, but that doesn't look too promising for dev discussions. |
I have a fix for the 'live' tests that's not overly invasive, so I'll just push that too. |
- changed brave-specific tests to opt-in only due to BraveServiceKey - removed --all-targets from 'cargo test' (flag causes benches to run)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@antonok-edm Phew! Thanks for your patience. All tests should be passing now. I added comments to explain some of my changes. I'm relatively confident in the changes, but I left the comments unresolved so you can see my reasoning. Let me know if anything is unclear.
#[test] | ||
fn check_live_deserialized_file() { | ||
#[ignore = "opt-in: requires BRAVE_SERVICE_KEY environment variable"] | ||
fn check_live_brave_deserialized_file() { // Note: CI relies on part of this function's name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ignoring this by default allows contributors to have 100% passing tests with cargo test --all-features
. In ci.yml
, a script bypasses the ignore and runs the two live_brave
tests. There's much more to say about this braveservicekey
header, and the tests could be un-ignored if we find that this key is something all Brave contributors should have.
FWIW, the tests pass for me when I manually set BRAVE_SERVICE_KEY
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that the key seems a bit contrived here; I had to add it a few months ago because the tests started failing due to unrelated serverside changes. I'll see if I can get a better answer for you from the devops team.
// 'easylist.to' is deployed via GitHub Pages. However, sometimes 'easylist.to' can | ||
// take minutes to respond despite 'easylist.github.io' having no delay. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we've noticed this too recently, I believe @ryanbr is looking into it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ryanbr, I'll share a few observations about the easylist.to
performance issue. Much of this is wild speculation on my part, so I won't be offended if you disregard all of it.
Observations/Assumptions
The domain appears to be hosted by Cloudflare and serves traffic through Cloudflare servers. (Running dig easylist.to
returns different IP addresses depending on location, all IP's owned by Cloudflare as far as I can tell.) By design, Cloudflare hides the upstream IP addresses, but based on my testing I'm pretty confident that it's pointing to GitHub Pages.
I'm going to assume that this arrangement is intentional. EasyList probably gets downloaded a ridiculous number of times in a given day, so it's plausible that whoever runs the domain wants to use Cloudflare for edge caching/delivery.
Suggestions/Musings
- Cloudflare has a LOT of knobs for bot/traffic control. Thus, Cloudflare may be throttling certain requests like those used by
curl
. - Since GitHub pages is served by Fastly, another CDN/edge service, it may not be necessary to also run the traffic through Cloudflare. To use Cloudflare for DNS and not for traffic control, the domain could point directly to the IP addresses for GitHub Pages or use a
CNAME
since GitHub Pages sometimes changes those IP's. - If sticking with Cloudflare, and considering that EasyList is mostly serving static content, the cache could be more aggressive than this:
< HTTP/2 200
< date: Mon, 19 Sep 2022 20:52:34 GMT
< content-type: text/plain; charset=utf-8
< last-modified: Mon, 19 Sep 2022 15:30:48 GMT
< access-control-allow-origin: *
< etag: W/"63288b28-3070c"
< expires: Mon, 19 Sep 2022 20:09:10 GMT
< cache-control: max-age=600
< x-proxy-cache: MISS
< x-github-request-id: SNIP
< via: 1.1 varnish
< age: 475
< x-served-by: cache-yyz4549-YYZ
< x-cache: HIT
< x-cache-hits: 1
< x-timer: S1663620709.224138,VS0,VE1
< vary: Accept-Encoding
< x-fastly-request-id: SNIP
< cf-cache-status: DYNAMIC
< server: cloudflare
< cf-ray: SNIP
(Those headers hint at some other possible issues, but I'm trying not to read too much into them.)
// The use of 'http' rather than 'https' below is intentional. reqwest only | ||
// respects the host header if the target url is http, not https. (Unclear | ||
// whether that's a bug or intentional behavior.) One way to confirm that is | ||
// to send requests to http://httpbin.org/headers instead of github.io. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ha, I've actually run into the exact same confusion before, for unrelated reasons: actix/actix-web#1050
basically Host
is forbidden for HTTP/2, the URL ends up being the source of truth.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TL;DR: TIL that the situation with the Host
header is deliciously complex.
@antonok-edm I'm skeptical about the response you received -- "host header is forbidden."
When mentioning the Host
header, the original HTTP/2 spec, RFC 7540 uses SHOULD
rather than MUST
:
Clients that generate HTTP/2 requests directly SHOULD use the ":authority" pseudo-header field instead of the Host header field. An intermediary that converts an HTTP/2 request to HTTP/1.1 MUST create a Host header field if one is not present in a request by copying the value of the ":authority" pseudo-header field.
I think that's the current standard because that spec has only been amended by RFC 8740, which primarily adds TLS 1.3 and doesn't mention the host header.
However, the recently proposed update for HTTP/2, RFC 9113, takes a stricter position on use of the Host
header and the :authority
pseudo-header.
Clients MUST NOT generate a request with a Host header field that differs from the ":authority" pseudo-header field. A server SHOULD treat a request as malformed if it contains a Host header field that identifies an entity that differs from the entity in the ":authority" pseudo-header field.
Either way, EZ-PZ, I'll just ask curl
/reqwest
to change the :authority
pseudo header instead. Except... you can't? Most tools can't even log pseudo-headers, let alone allow editing them.
Now I understand why I lost all those hours troubleshooting k8s proxies that inconsistently required Host
header, SNI, etc.:
- Given that the current, official standards for both HTTP/1.1 and HTTP/2 are unclear about how
Host
should be handled, different client and server vendors have made different assumptions and allowances for it. - Given that overriding a
Host
header is extremely useful, some devs continue to support it because it's not clear to everyone what the alternative should be. Expose directly editing "pseudo headers"?
The iOS test has been removed, and the latest CI run on my fork is ✔️ ✔️ ✔️. (Just for kicks I added a IMO, this could be merged as-is, although it could be cleaned up a bit if the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this all looks great! I can update the easylist part if that's sorted out later.
Thanks for all the outstanding work you've provided here @denosaurtrain!
On a fresh clone of
v0.5.6
I encountered a few compilation errors. After this PR, the project should compile without errors or warnings usingRUSTFLAGS="-D warnings" cargo check --workspace --all-targets --all-features
, and that command has been added to CI to prevent compilation errors from hiding behind feature flags in the future.All tests pass except for
live.rs
tests, which require aBRAVE_SERVICE_KEY
that isn't checked into the repo. I managed to find that key, which does not appear to be a secret. However, I want to understand the purpose of theBraveServiceKey
header before I fix those tests.The changes are organized into three commits:
native-tls
forrustls-tls
in thereqwest
dev dependency, which should be less sensitive to version changes in OS-specific openssl libraries. (I use Ubuntu 22.04, which uses libssl3 instead of libssl1.1, so I run into this problem frequently.)full-domain-matching
which since 2019 would cause the build to fail. I also moved & rewrote an old failing test fromtests/hashing_test.rs
tosrc/filters/network.rs
.resolve #225