Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WebAssembly 2021 #2168

Closed
6 tasks done
rviscomi opened this issue Apr 27, 2021 · 44 comments · Fixed by #2605
Closed
6 tasks done

WebAssembly 2021 #2168

rviscomi opened this issue Apr 27, 2021 · 44 comments · Fixed by #2605
Assignees
Labels
2021 chapter Tracking issue for a 2021 chapter
Projects

Comments

@rviscomi
Copy link
Member

rviscomi commented Apr 27, 2021

Part I Chapter 6: WebAssembly

If you're interested in contributing to the WebAssembly chapter of the 2021 Web Almanac, please reply to this issue and indicate which role or roles best fit your interest and availability: author, reviewer, analyst, and/or editor.

Content team

Lead Authors Reviewers Analysts Editors Coordinator
@RReverser @RReverser @jsoverson @carlopi @RReverser - @rviscomi
Expand for more information about each role
  • The content team lead is the chapter owner and responsible for setting the scope of the chapter and managing contributors' day-to-day progress.
  • Authors are subject matter experts and lead the content direction for each chapter. Chapters typically have one or two authors. Authors are responsible for planning the outline of the chapter, analyzing stats and trends, and writing the annual report.
  • Reviewers are also subject matter experts and assist authors with technical reviews during the planning, analyzing, and writing phases.
  • Analysts are responsible for researching the stats and trends used throughout the Almanac. Analysts work closely with authors and reviewers during the planning phase to give direction on the types of stats that are possible from the dataset, and during the analyzing/writing phases to ensure that the stats are used correctly.
  • Editors are technical writers who have a penchant for both technical and non-technical content correctness. Editors have a mastery of the English language and work closely with authors to help wordsmith content and ensure that everything fits together as a cohesive unit.
  • The section coordinator is the overall owner for all chapters within a section like "User Experience" or "Page Content" and helps to keep each chapter on schedule.

Note: The time commitment for each role varies by the chapter's scope and complexity as well as the number of contributors.

For an overview of how the roles work together at each phase of the project, see the Chapter Lifecycle doc.

Milestone checklist

0. Form the content team

  • May 31: The content team has at least one author, reviewer, and analyst

1. Plan content

  • June 15 The content team has completed the chapter outline in the draft doc

2. Gather data

  • June 30: Analysts have added all necessary custom metrics and drafted a PR (example) to track query progress
  • July 1 - 31: HTTP Archive runs the July crawl

3. Validate results

  • September 30: Analysts have queried all metrics and saved the output to the results sheet

4. Draft content

  • October 31: The content team has written, reviewed, and edited the chapter in the doc

5. Publication

  • November 15: The completed chapter and all required metadata and figures are converted to markdown and submitted to GitHub
  • December 1: Target launch date 🚀

Chapter resources

Refer to these 2021 WebAssembly resources throughout the content creation process:

📄 Google Docs for outlining and drafting content
🔍 SQL files for committing the queries used during analysis
📊 Google Sheets for saving the results of queries
📝 Markdown file for publishing content and managing public metadata

@rviscomi rviscomi added 2021 chapter Tracking issue for a 2021 chapter help wanted Extra attention is needed labels Apr 27, 2021
@RReverser
Copy link
Member

I suspect I'll be an author :)

@rviscomi
Copy link
Member Author

Safe to say so, I think! I've added you to the team. 😁

@rviscomi rviscomi added this to TODO in 2021 via automation Apr 28, 2021
@rviscomi
Copy link
Member Author

rviscomi commented May 4, 2021

@RReverser thanks for your interest in authoring this chapter! As the content team lead, you'll be responsible for the scope and direction of the chapter and keeping it on schedule. We automatically monitor the staffing and progress of each chapter based on the state of the initial comment so please keep that updated as you add new contributors and meet each milestone.

We've created a Google Doc for this chapter, which you're encouraged to use to collaborate with the content team on the initial outline, metrics, and ultimately the final draft.

Next steps for this chapter are:

There's not currently a section coordinator for this chapter, so I'll be periodically checking in with you directly to make sure the chapter is staying on schedule. Reach out here in this issue if you have any questions about the process.

More information about the content team lead and author roles and responsibilities are available for reference in the wiki if needed.

To anyone else interested in contributing to this chapter, please comment below to join the team!

@rviscomi rviscomi added the help wanted: reviewers This chapter is looking for reviewers label May 4, 2021
@rviscomi rviscomi moved this from TODO to In Progress in 2021 May 4, 2021
@rviscomi
Copy link
Member Author

rviscomi commented May 11, 2021

Hi @RReverser just checking in. Here are some tips to help keep the chapter on track:

  • Request edit access to the doc and start brainstorming an outline for the chapter
  • Consider announcing to your professional networks that you're looking for co-contributors knowledgable in WebAssembly to join the chapter
  • Edit the top comment to keep the chapter metadata in sync with all coauthors/reviewers/analysts and also any completed milestones (this is helpful for us to monitor progress at a glance in 2021 Chapter Progress #2179)

@jsoverson
Copy link

I'm following up on @RReverser's request for reviewers from this twitter thread

I'm coming from the WebAssembly developer side, not the web stats side. I'm unfamiliar with the dataset. If you're looking for reviewers who know the scope of a technology and can ramp-up on stats then I can be helpful. If you want the opposite then I'm probably not the best candidate.

@rviscomi
Copy link
Member Author

Welcome @jsoverson! I'll defer to the @RReverser as content team lead to bring you up to speed and do the onboarding.

@carlopi
Copy link

carlopi commented May 12, 2021

Hi, I found @RReverser call for reviewers on twitter, and would like to help out.

I work on compilers to WebAssembly, I am informed on the standardization / toolchains point of view, and would be very interested to learn more about the stats collection side.

@RReverser
Copy link
Member

Thanks @jsoverson @carlopi! I've added you to the list. Let's see if more people want to join, and I'll start on a shared doc within the next two weeks.

@foxdavidj
Copy link
Contributor

Thanks @jsoverson @carlopi! I've added you to the list. Let's see if more people want to join, and I'll start on a shared doc within the next two weeks.

Can you add @jsoverson @carlopi to the corresponding roles in the top comment?

@RReverser
Copy link
Member

RReverser commented May 17, 2021

Actually that's what I did when I left that comment, but I don't see them there now... so weird. Added again.

@rviscomi rviscomi removed help wanted Extra attention is needed help wanted: reviewers This chapter is looking for reviewers labels May 17, 2021
@rviscomi
Copy link
Member Author

📟 attn content team (@RReverser @jsoverson @carlopi) reminder that the next milestone (complete the chapter outline) is due on June 15. So please request edit access to the WebAssembly chapter doc if you haven't already and brainstorm the contents you'd like to see added to the chapter outline. This doesn't have to be super detailed, you can think of it more like sketching out the table of contents. It's important to get this done on time so that we can make any necessary changes to the test runner before it starts on July 1, for example if we're currently unable to measure something you need for your chapter. WASM is new to the Web Almanac so I don't know what we can and can't support currently, making it all the more important to get this done sooner than later. Let me know if you have any questions.

@RReverser could you also tick the checkbox to mark the 0th milestone as completed in the top comment? This helps us track chapter progress at a glance in #2179.

@RReverser
Copy link
Member

@jsoverson @carlopi Can you please request access to the doc as outlined above? I've added a few ideas I had to the outline, but review & more ideas are welcome!

@carlopi
Copy link

carlopi commented Jun 2, 2021

@RReverser: I wrote down some possible ideas, it's very unclear to me what the mean for collecting the data is (do you have any reference?), would be helpful in shaping what questions are worth exploring.
I am not sure how do you want to coordinate with all this, one idea could be also brainstorm over a call at some point, I would look forward to it. (I am based in CET, we can figure out a time that works for everyone)

@RReverser
Copy link
Member

@carlopi Generally we just have HTTPArchive + BigQuery data, this can be used for reference and instructions: https://github.com/HTTPArchive/httparchive.org/blob/main/docs/gettingstarted_bigquery.md. That is, we mostly have data about payloads - their compressed & uncompressed sizes, content types and so on. That's what we can extract info from most easily.

However, as mentioned on the doc, due to relatively small amount of Wasm resources among top 1M websites (which are included in the dataset), we can do some simple binary analysis as well - that's why I included things like "how many modules use this feature" for SIMD and threads.

I think control flow analysis would be a bit too expensive to run, and I'm not sure how relevant it is for post-optimisation modules on the Web due to effect of inlining and other passes that significantly change the structure. Instead, I think we should focus on data that shows adoption of Wasm & its new features and how it is actually used in the wild (so e.g. "means of delivery" is definitely an interesting addition).

@RReverser
Copy link
Member

one idea could be also brainstorm over a call at some point, I would look forward to it. (I am based in CET, we can figure out a time that works for everyone)

I was thinking we were going to just sync over the doc to account for different timezones more easily, but we can certainly do a call as well.

@jsoverson
Copy link

@jsoverson @carlopi Can you please request access to the doc as outlined above? I've added a few ideas I had to the outline, but review & more ideas are welcome!

Requested!

@RReverser
Copy link
Member

I guess let's do a meeting after all, it might be easier to talk through the main points. @jsoverson I think you're in different timezone than us, what timerange and dates work for you next week?

@RReverser
Copy link
Member

FWIW I'm on vacation next week, but would appreciate any feedback meanwhile; next month will be busy as we'll start downloading Wasm files and analyzing all the data :)

@RReverser
Copy link
Member

Hmm I'm guessing I'll just have to go ahead with my best judgment...

@jsoverson
Copy link

I must not have submitted my comment before traveling, sorry.

I don't think the operand details are going to be valuable enough and the stats around security settings are probably niche enough to ignore. I had some work done on the rust project but didn't get to a useful stopping point before I left.

@RReverser
Copy link
Member

Thanks for the response. Meanwhile I wrote a small script and downloaded most of the Wasm files - looks like out of ~2.7K in dataset only ~2.2K are unique URLs + reachable so that's what I'll be next analyzing using the Rust repo above.

@RReverser
Copy link
Member

After some retries got to almost ~2.3K unique URLs, which, interestingly, results in only 713 unique Wasm files (many are copies under different URLs).

That's fewer than I hoped but in itself it's also an interesting stat related to reusability of Wasm.

@RReverser
Copy link
Member

One interesting question that arises from this level of reusability is: do we want to aggregate stats by pages, by websites (domains), or by unique Wasm modules?

E.g. is "5% of unique Wasm modules rely on SIMD" more or less valuable than "5% of all pages using Wasm rely on SIMD" or "5% of websites using Wasm relies on SIMD"?

It's tempting to do all of it, but multiplied by number of stats it's just impractical.

@carlopi @jsoverson Thoughts welcome.

@rviscomi
Copy link
Member Author

rviscomi commented Sep 9, 2021

I think stats in terms of # or % of pages are most easily understood by readers.

@RReverser
Copy link
Member

I think stats in terms of # or % of pages are most easily understood by readers.

Maybe, but then if the same library is included on lots of pages, it can "drown" stats from Wasm used on a single popular website. The balance seems tricky...

@rviscomi
Copy link
Member Author

👋 Hey @RReverser, just checking in on each chapter's progress. It looks like you're all set but let me know if you run into any issues.

@RReverser
Copy link
Member

Yeah no new issues right now. Chatted a bit more, we're going with breakdown by pages then.

@RReverser
Copy link
Member

FWIW I've analyzed the downloaded Wasms using the current state of wasm-stats repo above, saved results to JSON and imported to BigQuery, so now it's possible to join them with the summary_requests and the list of Wasm URLs to do any kinds of aggregations.

I'm happy to share access if anyone wants it (and if I figure out how to do that in BigQuery...)

image
image

@rviscomi
Copy link
Member Author

@RReverser let's coordinate to get this data imported into the public httparchive.almanac dataset, so the results can be backed by publicly runnable queries.

@rviscomi
Copy link
Member Author

Note: I unchecked "Milestone 2" in the top comment as I'm not seeing the draft PR in the list of open PRs. @RReverser I know you're working on it so feel free to update it whenever available. Let me know if you run into any blockers.

@RReverser
Copy link
Member

Oh, I misunderstood that milestone upon first read, I thought it was for adding custom metrics to the crawler.

@RReverser
Copy link
Member

@carlopi @jsoverson FWIW I've added a bunch of metrics to the spreadsheet already, if you want to take a look before they're turned into graphs and into a post.

@rviscomi
Copy link
Member Author

@RReverser @jsoverson @carlopi

🎉 This chapter is fully written, reviewed, edited, and ready to be launched on Wednesday! Thank you to all of the contributors who put in the time and effort to make this a great chapter.

When you get 5 minutes, I'd really appreciate if you could fill out our contributor survey to tell us (the project leads) about your experience. It's super helpful to hear what went well or what could be improved for next time. 🙏

Congratulations and thank you all again. I'm excited for this to launch soon!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2021 chapter Tracking issue for a 2021 chapter
Projects
2021
Done
Development

Successfully merging a pull request may close this issue.

5 participants