Skip to content
This repository has been archived by the owner on Jan 4, 2023. It is now read-only.

Infrastructure and design reboot #81

Closed
rviscomi opened this issue Mar 28, 2017 · 6 comments
Closed

Infrastructure and design reboot #81

rviscomi opened this issue Mar 28, 2017 · 6 comments

Comments

@rviscomi
Copy link
Member

Been chatting with @igrigorik about rebooting HA's UI and backend. Making this issue to organize thoughts/plans and invite wider discussion.

Incomplete list of high level objectives in no particular order:

  • mobile friendly design
  • upgrade data viz
  • simplify data pipeline
  • add content vertical dimension
  • upgrade WPT agents
  • add Lighthouse support
  • improve test robustness
  • encourage exploration (BigQuery, Discuss)

Should use sub-issues for tracking each objective.

@rviscomi rviscomi added this to In Progress in HTTP Archive Reboot Mar 28, 2017
@rviscomi rviscomi moved this from In Progress to FYI in HTTP Archive Reboot Mar 28, 2017
@rviscomi rviscomi mentioned this issue Mar 30, 2017
Closed
@rviscomi
Copy link
Member Author

rviscomi commented Apr 7, 2017

@pmeenan WDYT of the feasibility of adding first-class support for Google Storage to WPT, similar to S3? This would simplify the architecture of HA and would allow WPT to write directly to GS without the need for polling/transferring on HA's end.

@pmeenan
Copy link
Member

pmeenan commented Apr 7, 2017

It already can but that is for general archiving of tests. The GS writes that happen right now are for artifacts that WPT doesn't normally store or archive on it's own (traces and HARs).

To eliminate the polling you probably want to use a few pub/sub queues for pushing things through various stages. This is largely how the queue already works since it is built on top of beanstalk but it doesn't use the support for re-trying failed tasks. We'd still need to plumb the output side to do the post-processing/extraction (or even just the state management if all processing is moved to BQ).

Instead of the current "callback" support, the ability to post HARs (or references to HARs) directly from WPT into a pub/sub queue is probably more along the lines of what you'd need.

@igrigorik
Copy link
Contributor

I don't think I ever dug deep enough into this part of the HA pipeline.. :)

What would be the externally visible net win from the above? HARs would show up earlier in the GS storage bucket? Load decrease on the agents?

@pmeenan
Copy link
Member

pmeenan commented Apr 8, 2017

Not much. You may get HARs a few hours earlier (they are already uploaded as the crawls run) and it doesn't help the agents. It may lighten the load on the server and clean up some processing logic but that's about it.

@Themanwithoutaplan
Copy link

Minor niggle: is it possible to use less ambiguous notation when referring to the dates of various crawls? "5/1" is 5th January (or even 5 to 1 odds) for most of the world. It's surprisingly confusing when reading some of the issues.

@rviscomi
Copy link
Member Author

Thanks @Themanwithoutaplan. Tracking this in #106.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
No open projects
Development

No branches or pull requests

4 participants