Skip to content

Whistler Necko

Seo Sanghyeon edited this page Jun 29, 2015 · 2 revisions

Agenda

  • rust-url
  • HTTP stack

Action items

  • Fix Rust build in gecko to handle multiple files

rust-url in Gecko

  • valentin: Found it nice, though fighting with the build system was horrible. If you could help with that, it would be great. There's now single-source file support in-tree, but cargo does not work.
  • jack: How does that work? Rustc only works on one file. Just point it at lib.rs and rustc should do it all correctly.
  • valentin: Not sure that works right now. If the .rs files are not in the SOURCES, you don't all get hooked up.
  • jack: Should work it out with the build team. Does the backend of the build system use make? If it has deps file support, we can spit that out from rustc and have it work for rust-url.
  • valentin: I have a separate project with the C API for rust-url right now. But if this takes, we could maybe put the wrappers in the rust-url library?
  • jack: We do that with html5ever, too.
  • valentin: That's where we took it.
  • simon: It's a separate crate/rust library in that case.
  • larsberg: How can we help more? The thread on dev-platform went crazy.
  • valentin: Mainly build system help. I think it's landable.
  • larsberg: Pref'd off in nightly, you mean?
  • valentin: Yes.
  • jack: Should be pretty easy for rust-url. We can do that next quarter.
  • simon: Is there anything in the rust-url design you'd like changed?
  • valentin: It's OK. Maybe needs perf optimization.
  • simon: The current gecko parser handles a single string with indices, whereas rust-url does separate allocations. I'm thinking of switching to be more like gecko.
  • valentin: For the path, I think we don't need an array of it. But apart from that, I don't know if it's necessary. It's a bit difficult to manage.
  • jason: URI parsing did show up in the profiles at some point in the past.
  • valentin: The URI parser is slower now.
  • jack: Could chop them up and return borrowed pointers into the string instead of indices.
  • simon: Yes, the indices would be masked behind the API. I haven't done benchmarking or performance yet.
  • valentin: You do less passes over the string than gecko does today.
  • jack: Are there perf benchmarks in Gecko now?
  • valentin: No.
  • jack: Any bugs for it?
  • jason: No, just have heard it anecdotally.
  • simon: Rust strings are UTF-8. Is that a problem?
  • valentin: No. We do the conversion internally and already work with UTF-8 in the parser.
  • jason: I know the current code is a mess. How did this get hooked in?
  • valentin: I just had our current parser intercept that.
  • jason: I mean, is the code cleaner now or copy/paste of the implementation?
  • valentin: It feels safer at the stub layer.
  • larsberg: Do you mean the parsing or stubs?
  • jason: I mean the parsing.
  • valentin: I'm considering removing the old implementation completely once everything is stabilized.
  • jack: Currently the biggest blocker is WinXP support.
  • larserg: Much closer to the spec.
  • simon: Not entirely, but broke up the state machine from the whatwg spec into more functions. Working with annevk on pushing the code that we did to simplify the description of the spec. There's also the story of how the spec is not completely matching the IETF spec.
  • valentin: Also trying to figure out with annevk on how to merge more of our parsers for more types (e.g., data). Then, other components could also rely on rust-url to slice up the data for other parsers.

HTTP stack

  • valentine: Do you have h/2 support?
  • pcwalton: Not yet.
  • jason: A C library?
  • pcwalton: Nope! Fully Rust; named "hyper", driven by seanmonstar (from FF accounts). It's used by most of the Rust community. It's maturing. No http/2 support yet; https through openssl. Does not have pipelining yet.
  • daniel: Yes, skip that.
  • pcwalton: Needs pooling badly. We're trying to add support for it in Servo (it just landed in Servo). Uses threadpool-based network design. Uses hyper under the hood.
  • jason: Caching layer?
  • pcwalton: Not for anything other than images.
  • jack: If so, just the most minimal thing right now.
  • pcwalton: Everything is centralized through a single resource manager using message-passing. There are no non-blocking APIs right now.
  • jason: Right. All networking in Necko is single-threaded using non-blocking APIs. All I/O for tcp sockets is non-blocking and on a single thread. Then, you avoid all the context switching, and has worked out to be more efficient.
  • pcwalton: We have support in Rust... wait, do you use IOCP on Windows?
  • jason: We use NSPR which abstracts all of this on every platform. For TCP sockets, completion ports work well. But I think we just use select...
  • daniel: We use select under the hood, not i/o completion ports.
  • jack: I've asked mio (the Rust folks) to add a select, but the library is primarily used by people doing web servers, etc.
  • simonsapin: Also, I checked and there is some initial support for HTTP/2 in hyper now.
  • jack: What should we be doing? How do you mesure performance:
  • jason: Raw network is best, but you can do things over Talos. You can see if networking on blocking vs. non-blocking loads should give you a good chance to look into it.
  • pcwalton: Let's look into getting Talos working w/ Rust.
  • jason: There's specific cases that Talos works. Maybe XPCShell tests to avoid the Gecko noise?
  • jack: Does any of this use WebSockets?
  • jason: No. I think Talos is really old. WebSockets also aren't a huge percentage of the traffic yet.
  • pcwalton: Can also check at what chromium is doing.
  • valentin: Just do microbenchmarking. Blocking vs. non-blocking APIs.
  • jason: Test most of it using XPCShell, which removes all the parsing / engine / etc. Just Rust code that does those requests in parallel would be great.
  • pcwalton: We should be able to do that via the ResourceTask.
  • larsberg: Do you do anything interesting on mobile?
  • jason: Often constraints on the number of maximum connections; it's in the hundreds on mobile but no real limit on desktop. Main thing there is high latencies, but we're also have some settings changes. KeepAlive is different; use pipelining (probably not important given h/2). Pipelining good on mobile but breaks things on banks.
  • mbrubeck: Probably don't copy the settings straight from mobile, but look at them.
  • jason: Skip pipelining. No desktop browser has it on. Just go with h/2.
  • pcwalton: What are we missing?

Caching

  • jason: Caching layer.
  • pcwalton: Separate?
  • jason: In our stack, it's intertwined with the http stack.
  • valentin: Not too tied together...
  • jack: Need the etags in the cache layer so you can at least validate nothing has changed.
  • jason: Lots of engineering to minimize the amount of I/O and disk. So it all gets kinda muddled together.
  • jack: Do any of this now?
  • jason: Not right now....
  • valentin: There's also a memory cache.
  • michal: If there's a file on disk then it's read from there but it may be the filesystem's memory.
  • jason: lots of banks have no-cache/store items that we try to just never put to filesystem.
  • jack: So everything that's no-store only stays in memory; everything else to disk?
  • michal: Yes. We used to cache more things in-memory, but it bloats memory usage.
  • pcwalton: Implementation?
  • jason: The general-purpose databases were not optimized enough. We hash URIs to a name that we can get.
  • jack: So each entry on the file system is a cache entry?
  • jason: Yes. Block files didn't work.
  • pcwalton: Metadata?
  • michal: In the same file. Helps with recovering after a cache.
  • pcwalton: Do you like that design?
  • michal: Yes.
  • jack: Expiry?
  • michal: Frecency. Same as used in url bar. Whenever we need to make room, we just start evicting.
  • daniel: People keep storying all sorts of other things in the cache - bytecode, etc.
  • jack: Is this exposed?
  • michal: People can append things into the cache.
  • jason: Production-level HTTP cache issues. It gets BIG. Things that work at 50MB don't work at 1GB.
  • jack: Compression?
  • jason: We tried it. If we receive it compressed, we store it that way, so unclear how much we'd gain.
  • valentin: The image folks have a different in-memory image cache, though.
  • jason: When it's big, you're worried about how big the cache becomes. There's also some jank you can get due to global locks. We kept accidentally grabbing them on the main thread and had to just remove the whole thing. Crash recovery is also non-trivial. We used to have something horrible like 25% recovery after crash on Linux. Want corruption discovery per-resource, not global. Caching a corrupt resource and serving it to the user is horrible.
  • jack: Checksum?
  • michal: Yes. We check whenever we read from disk.
  • jack: In the index, too?
  • michal: No.
  • jason: In the index?

rust-url in Gecko

  • jason: What's the index?
  • michal: Just maps the URI to the hash and whether we have it or not. Just for eviction and if we have it or not.
  • jack: Couldn’t you just check on-disk?
  • michal: Too slow.

Involvement with caching

  • larsberg: Want to review this? Or help hack on it?
  • jason: Review with
  • jack: Cache single-threaded in gecko?
  • michal: No.
  • jack: Then how do you handle the index?
  • jason: Only the main process handles it.

Cache pinning

  • jack: Do you have such a thing?
  • valentin: There have been proposals.
  • jason: Gotta trust them to pin and unpin always.
  • jack: jquery?
  • jason: Recency handles it.
  • daniel: It’s not recent stuff; it’s big things like games that you don’t want to re-download.
  • jason: We used to have a 50MB limit. Have had ideas for a separate area for big items so you always get your last video or last game but haven’t done it.
  • daniel: Fancy sites have worked around with NetworkStorage.
  • michal: media has its own cache.
  • jason: Single-session.
  • jack: Any hints from content about cache policies?
  • jason: There was noise at IETF, etc. but there’s nothing going on right now.
  • jack: Take size into account when doing recency?
  • michal: No.

WebSockets

  • jason: Do you have an impl?
  • jack: Send-only, unfortunately.
  • larsberg: Servo is SUPER secure!
  • jack: It’s coming this quarter. The first half was a student project from NCSU.

HTML parser

  • simonsapin: From the network perspective, should the network stack push bytes into the parser or have the parser do blocking reads?
  • jason: We have notifications with onDataAvailable right now. Not sure if it’s inherently better; just how we do it now. We do AsyncOpen and then you’ll get callbacks.

Competitive implementation comparison

  • jack: What do other engines do? Similar design to Necko?
  • jason: Yes. The head Chrome network guys were formerly Moz employees on Neck and did similar things there.

ServiceWorkers

  • jack: Anything special in Necko for them?
  • jason: Yes. We allow ServiceWorkers to intercept requests, but I don’t know where those hooks are.
  • valentin: There’s an intercept for the channel.
  • jack: So, the ServiceWorker registers a hook for certain calls?
  • jason: Yes, but I can’t remember if we call with all requests to ask if they want to intercept or if they give us a regexp.
  • jack: Just curious because in Servo none of these things are on the same thread, so not sure how it hooks up.
  • valentin: Just message-passing for the queries.
  • jason: Nice to avoid a hop to see if ServiceWorker wants to filter.

Application cache

  • jason: Don’t do it. Supported by all browsers but ServiceWorkers are where you want to be.
  • valentin: Nobody uses it, basically.
  • jason: It’s dying. It’s tweaked a bit for online office, twitter tried to use it, but everybody’s given up.
  • jason: ServiceWorkers provide nice primitives; AppCache tried to provide a full system but didn’t work out.

DNS caching

  • valentin: It’s a two minute expiration.
  • jason: Chrome has their own DNS resolver; we just use our own C API. Windows is super complicated; don’t do your own. Daniel is adding support for flushing all the caches when you change networks.
  • daniel: Changing networks changes more than just caching, but also IPv4 vs. IPv6 preferences.
  • jack: Should we write our own resolver?
  • jason: Avoid as long as possible. DNSSEC doesn’t appear to actually be coming.
  • jack: Any trouble with SRV?
  • daniel: I don’t think we have any SRV support.

URL spec

  • simon: IPv4 in URLs? On some platforms you can have less than 4 components or in hex and it might work.
  • daniel: Sometimes just a huge integer!
  • simon: Could be a phishing issue. Do you have any ideas?
  • jack: Kill it with fire.
  • valentin: If it looks like a dotted IP we do it, but pass others to the DNS resolver
  • daniel: But some OSes will try to figure out how to turn a component into an IP address. Don’t allow it.
  • jack: Problem is that the system is doing the bad thing.
  • valentin: We have a check beforehand.
  • simon: Maybe a parse error in the URL parser? Not clear what the spec should say.
  • jack: awesome bar. How is that hooked up?
  • valentin: There’s a URIFixup thing that does it
  • simon: Not worried about the URL bar. Worried about an href link in a page doing different things. So the spec should say something.
Clone this wiki locally