Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resolve compilation errors and add CI task #224

Merged
merged 9 commits into from
Sep 20, 2022

Conversation

denosaurtrain
Copy link
Contributor

@denosaurtrain denosaurtrain commented Sep 8, 2022

On a fresh clone of v0.5.6 I encountered a few compilation errors. After this PR, the project should compile without errors or warnings using RUSTFLAGS="-D warnings" cargo check --workspace --all-targets --all-features, and that command has been added to CI to prevent compilation errors from hiding behind feature flags in the future.

All tests pass except for live.rs tests, which require a BRAVE_SERVICE_KEY that isn't checked into the repo. I managed to find that key, which does not appear to be a secret. However, I want to understand the purpose of the BraveServiceKey header before I fix those tests.

The changes are organized into three commits:

  1. Swapped native-tls for rustls-tls in the reqwest dev dependency, which should be less sensitive to version changes in OS-specific openssl libraries. (I use Ubuntu 22.04, which uses libssl3 instead of libssl1.1, so I run into this problem frequently.)
  2. Added CI step to keep main branch free of warnings & errors
  3. Removed feature flag full-domain-matching which since 2019 would cause the build to fail. I also moved & rewrote an old failing test from tests/hashing_test.rs to src/filters/network.rs.

resolve #225

denosaurtrain and others added 4 commits September 7, 2022 21:59
- avoids openssl compilation issues on some systems (e.g., ubuntu-22.04)
- no measured impact to compilation times in my testing
- same tests pass/fail as before
- Prevents compilation errors from hiding behind feature flags
- Keeps main branch free of warnings
- Note that a test fails to compile and will be fixed in the next commit
- Moved tests from 'hashing_test.rs' to 'network.rs'
- Test hasn't compiled since 2019
- Everything should compile now
- 'live.rs' tests could not be run due to BRAVE_SERVICE_KEY
it worked fine locally...
@denosaurtrain
Copy link
Contributor Author

@antonok-edm, let me know if I can make the review easier for you or if I can clarify anything.

FWIW, the hash collision test on domain names makes little sense after the 2019 commit, and the test could be removed rather than rewritten. That said, there's admittedly some value in checking the domain filter hashing logic, but the logic is not easy to access because of how network.rs is structured.

The other test that came from tests/hashing-test.rs checks for ID collisions; it's still valuable; and I moved that one with only trivial modification.

Copy link
Collaborator

@antonok-edm antonok-edm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking a look at this! Lots of good changes here; I left a few comments

src/filters/network.rs Outdated Show resolved Hide resolved
.github/workflows/ci.yml Outdated Show resolved Hide resolved
.github/workflows/ci.yml Outdated Show resolved Hide resolved
src/blocker.rs Show resolved Hide resolved
- build step can be run by CI or locally, denying warnings in both cases
- fixed warnings in 'fuzz' package
- incorporates feedback from PR
@denosaurtrain
Copy link
Contributor Author

Converting this to a Draft PR to address two platform-specific issues:

  1. fuzz fails to build on Windows (related: Include the fuzzer entrypoint on Windows rust-fuzz/cargo-fuzz#279)
  2. the new build-all.sh script fails on MacOS (possibly because "BSD sed" differs from "GNU sed")

(the failed GitHub Actions run)

IDK yet how I'm going to fix those. I don't have a Mac or Windows Server machine to test on.

@denosaurtrain denosaurtrain marked this pull request as draft September 16, 2022 18:25
Copy link
Collaborator

@antonok-edm antonok-edm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@denosaurtrain I'm alright with not building fuzz on Windows for now, if it's just not possible (this is still much better than what we had before).

.github/workflows/ci.yml Outdated Show resolved Hide resolved
@denosaurtrain denosaurtrain marked this pull request as ready for review September 17, 2022 20:55
rust-toolchain.toml Show resolved Hide resolved
fuzz/Cargo.toml Show resolved Hide resolved
.github/workflows/ci.yml Outdated Show resolved Hide resolved
@denosaurtrain
Copy link
Contributor Author

@antonok-edm lemme know if you want to chat on Slack/Discord/whatever to close the feedback loop on this one. The only Brave IM server I found is https://discord.gg/batbrigade, but that doesn't look too promising for dev discussions.

@denosaurtrain denosaurtrain marked this pull request as draft September 18, 2022 15:59
@denosaurtrain
Copy link
Contributor Author

I have a fix for the 'live' tests that's not overly invasive, so I'll just push that too.

- changed brave-specific tests to opt-in only due to BraveServiceKey
- removed --all-targets from 'cargo test' (flag causes benches to run)
@denosaurtrain denosaurtrain marked this pull request as ready for review September 19, 2022 15:20
Copy link
Contributor Author

@denosaurtrain denosaurtrain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@antonok-edm Phew! Thanks for your patience. All tests should be passing now. I added comments to explain some of my changes. I'm relatively confident in the changes, but I left the comments unresolved so you can see my reasoning. Let me know if anything is unclear.

tests/live.rs Show resolved Hide resolved
tests/live.rs Show resolved Hide resolved
tests/live.rs Show resolved Hide resolved
tests/live.rs Show resolved Hide resolved
tests/live.rs Show resolved Hide resolved
Comment on lines 285 to +287
#[test]
fn check_live_deserialized_file() {
#[ignore = "opt-in: requires BRAVE_SERVICE_KEY environment variable"]
fn check_live_brave_deserialized_file() { // Note: CI relies on part of this function's name
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ignoring this by default allows contributors to have 100% passing tests with cargo test --all-features. In ci.yml, a script bypasses the ignore and runs the two live_brave tests. There's much more to say about this braveservicekey header, and the tests could be un-ignored if we find that this key is something all Brave contributors should have.

FWIW, the tests pass for me when I manually set BRAVE_SERVICE_KEY.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that the key seems a bit contrived here; I had to add it a few months ago because the tests started failing due to unrelated serverside changes. I'll see if I can get a better answer for you from the devops team.

tests/live.rs Show resolved Hide resolved
tests/live.rs Show resolved Hide resolved
Comment on lines +79 to +80
// 'easylist.to' is deployed via GitHub Pages. However, sometimes 'easylist.to' can
// take minutes to respond despite 'easylist.github.io' having no delay.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we've noticed this too recently, I believe @ryanbr is looking into it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ryanbr, I'll share a few observations about the easylist.to performance issue. Much of this is wild speculation on my part, so I won't be offended if you disregard all of it.

Observations/Assumptions

The domain appears to be hosted by Cloudflare and serves traffic through Cloudflare servers. (Running dig easylist.to returns different IP addresses depending on location, all IP's owned by Cloudflare as far as I can tell.) By design, Cloudflare hides the upstream IP addresses, but based on my testing I'm pretty confident that it's pointing to GitHub Pages.

I'm going to assume that this arrangement is intentional. EasyList probably gets downloaded a ridiculous number of times in a given day, so it's plausible that whoever runs the domain wants to use Cloudflare for edge caching/delivery.

Suggestions/Musings

  1. Cloudflare has a LOT of knobs for bot/traffic control. Thus, Cloudflare may be throttling certain requests like those used by curl.
  2. Since GitHub pages is served by Fastly, another CDN/edge service, it may not be necessary to also run the traffic through Cloudflare. To use Cloudflare for DNS and not for traffic control, the domain could point directly to the IP addresses for GitHub Pages or use a CNAME since GitHub Pages sometimes changes those IP's.
  3. If sticking with Cloudflare, and considering that EasyList is mostly serving static content, the cache could be more aggressive than this:
< HTTP/2 200 
< date: Mon, 19 Sep 2022 20:52:34 GMT
< content-type: text/plain; charset=utf-8
< last-modified: Mon, 19 Sep 2022 15:30:48 GMT
< access-control-allow-origin: *
< etag: W/"63288b28-3070c"
< expires: Mon, 19 Sep 2022 20:09:10 GMT
< cache-control: max-age=600
< x-proxy-cache: MISS
< x-github-request-id: SNIP
< via: 1.1 varnish
< age: 475
< x-served-by: cache-yyz4549-YYZ
< x-cache: HIT
< x-cache-hits: 1
< x-timer: S1663620709.224138,VS0,VE1
< vary: Accept-Encoding
< x-fastly-request-id: SNIP
< cf-cache-status: DYNAMIC
< server: cloudflare
< cf-ray: SNIP

(Those headers hint at some other possible issues, but I'm trying not to read too much into them.)

Comment on lines +95 to +98
// The use of 'http' rather than 'https' below is intentional. reqwest only
// respects the host header if the target url is http, not https. (Unclear
// whether that's a bug or intentional behavior.) One way to confirm that is
// to send requests to http://httpbin.org/headers instead of github.io.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ha, I've actually run into the exact same confusion before, for unrelated reasons: actix/actix-web#1050

basically Host is forbidden for HTTP/2, the URL ends up being the source of truth.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TL;DR: TIL that the situation with the Host header is deliciously complex.

@antonok-edm I'm skeptical about the response you received -- "host header is forbidden."

When mentioning the Host header, the original HTTP/2 spec, RFC 7540 uses SHOULD rather than MUST:

Clients that generate HTTP/2 requests directly SHOULD use the ":authority" pseudo-header field instead of the Host header field. An intermediary that converts an HTTP/2 request to HTTP/1.1 MUST create a Host header field if one is not present in a request by copying the value of the ":authority" pseudo-header field.

I think that's the current standard because that spec has only been amended by RFC 8740, which primarily adds TLS 1.3 and doesn't mention the host header.

However, the recently proposed update for HTTP/2, RFC 9113, takes a stricter position on use of the Host header and the :authority pseudo-header.

Clients MUST NOT generate a request with a Host header field that differs from the ":authority" pseudo-header field. A server SHOULD treat a request as malformed if it contains a Host header field that identifies an entity that differs from the entity in the ":authority" pseudo-header field.

Either way, EZ-PZ, I'll just ask curl/reqwest to change the :authority pseudo header instead. Except... you can't? Most tools can't even log pseudo-headers, let alone allow editing them.

Now I understand why I lost all those hours troubleshooting k8s proxies that inconsistently required Host header, SNI, etc.:

  1. Given that the current, official standards for both HTTP/1.1 and HTTP/2 are unclear about how Host should be handled, different client and server vendors have made different assumptions and allowances for it.
  2. Given that overriding a Host header is extremely useful, some devs continue to support it because it's not clear to everyone what the alternative should be. Expose directly editing "pseudo headers"?

tests/live.rs Outdated Show resolved Hide resolved
@denosaurtrain
Copy link
Contributor Author

The iOS test has been removed, and the latest CI run on my fork is ✔️ ✔️ ✔️. (Just for kicks I added a BRAVE_SERVICE_KEY to the secrets on my fork since I know where to find it.)

IMO, this could be merged as-is, although it could be cleaned up a bit if the easylist.to problem is resolved. Up to you, @antonok-edm.

Copy link
Collaborator

@antonok-edm antonok-edm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this all looks great! I can update the easylist part if that's sorted out later.

Thanks for all the outstanding work you've provided here @denosaurtrain!

@antonok-edm antonok-edm merged commit 082cc51 into brave:master Sep 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Compilation and test failures with --all-features (PR available)
2 participants