[Discuss] Interactive data visualizations #896

rviscomi · 2020-06-23T03:18:30Z

Last year @bkardell built a Glitch app to help slice the Markup chapter data and it was really useful to deep link for specific subsets of the data. And from the data exploration side I thought it was a great way for users to find their own interesting insights. I'd like to take that idea a step further and discuss how we might be able to embed interactive experiences in the chapters themselves.

Our embedded Google Sheets visualizations have nice hover effects but you can't really explore the data at all. I would love to have "immersive" experiences with the data in which readers could dive into a specific area of interest. Contrived example, but in the JS chapter we could have an interactive figure that lets the reader select one or more frameworks and compare their adoption rates visually.

I see this as a bespoke build in close collaboration between the content (authors/reviewers), analysis, and development teams for only the most popular chapters. Last year we had thought about building some kind of pipeline to auto-generate data viz based on every JSON data file, but I don't think we need to get that sophisticated. (though that would be cool)

Should we get into D3.js or are there better drop-in solutions, like an embedded Glitch app or Data Studio dashboard? How can we ensure that whatever we build is accessible?

cc @HTTPArchive/developers @HTTPArchive/data-analysts

rviscomi · 2020-07-11T18:12:15Z

According to the Web Almanac analytics, here are the latest page view counts for each 2019 chapter:

Chapter	Views
JavaScript	33,618
CSS	11,436
SEO	6,323
Performance	5,334
Markup	4,052
HTTP/2	3,996
Third Parties	3,717
Fonts	3,171
Accessibility	3,108
PWA	3,052
CMS	3,032
Media	2,445
Security	2,419
Mobile Web	2,250
CDN	1,988
Ecommerce	1,724
Page Weight	1,656
Compression	1,332
Caching	1,324
Resource Hints	961

So the top chapters were JavaScript, CSS, SEO, Performance, Markup, etc in that order. Now that each of these 2020 chapters have content teams of authors and reviewers, we can coordinate with them on a bespoke interactive data visualization that will be engaging to readers.

There are some platform questions we can iron out, like the mechanics of the markdown file for embedding the interactive visualization. Would the build system find/replace a placeholder, or would it be an embedded web component or similar markup that a script would "hydrate" onto the page? How do we support static media like print/ebook?

I've also been thinking about visualization tools, and SVG with JSON+JS and media queries seems like a really lightweight and accessible approach worth considering. I'm inspired by what the NYT Graphics team does, for example their coronavirus map.

Next steps...

@OBTo let's start coordinating with the JS, CSS, and SEO leads to figure out what content would be the most engaging. If there's time/resources we can go further down the list of chapters.
@paulcalvano I'd love to hear your ideas for ways to leverage the data in engaging ways: dimensions to slice, fields to filter, metrics to sort by, etc. We'd need a process for writing a special kind of query to get all this data and saving it to the codebase so we can feed it into the visualization.
@bazzadp what are your thoughts on the data viz implementation side? Do you have a sense for how many chapters we would be able to support, seeing as how the bulk of this work seems to be JS?

bkardell · 2020-07-11T19:30:55Z

Hey I'm sorry I'm just seeing this now, things have been pretty hectic and I appear to have subscribed to the whole of GitHub at this point so sometimes I don't notice... Just wanted to say I think this sounds awesome and I'd love to see it happen.. In fact, I'd love to help make it happen.. I'm just not sure I have the bandwidth to commit to it too much. How can I help though? Also, one quick question as my phone connection is being spotty and not letting me dig in... what do these analytics numbers represent? This month? The whole time?

…

On Sat, Jul 11, 2020, 2:12 PM Rick Viscomi ***@***.***> wrote: According to the Web Almanac analytics <https://datastudio.google.com/s/gUPCF0CEsCw>, here are the latest page view counts for each 2019 chapter: Chapter Views JavaScript 33,618 CSS 11,436 SEO 6,323 Performance 5,334 Markup 4,052 HTTP/2 3,996 Third Parties 3,717 Fonts 3,171 Accessibility 3,108 PWA 3,052 CMS 3,032 Media 2,445 Security 2,419 Mobile Web 2,250 CDN 1,988 Ecommerce 1,724 Page Weight 1,656 Compression 1,332 Caching 1,324 Resource Hints 961 So the top chapters were JavaScript, CSS, SEO, Performance, Markup, etc in that order. Now that each of these 2020 chapters have content teams of authors and reviewers, we can coordinate with them on a bespoke interactive data visualization that will be engaging to readers. There are some platform questions we can iron out, like the mechanics of the markdown file for embedding the interactive visualization. Would the build system find/replace a placeholder, or would it be an embedded web component or similar markup that a script would "hydrate" onto the page? How do we support static media like print/ebook? I've also been thinking about visualization tools, and SVG with JSON+JS and media queries seems like a really lightweight and accessible approach worth considering. I'm inspired by what the NYT Graphics team does, for example their coronavirus map <https://www.nytimes.com/interactive/2020/us/new-york-coronavirus-cases.html> . Next steps... @OBTo <https://github.com/obto> let's start coordinating with the JS, CSS, and SEO leads to figure out what content would be the most engaging. If there's time/resources we can go further down the list of chapters. @paulcalvano <https://github.com/paulcalvano> I'd love to hear your ideas for ways to leverage the data in engaging ways: dimensions to slice, fields to filter, metrics to sort by, etc. We'd need a process for writing a special kind of query to get all this data and saving it to the codebase so we can feed it into the visualization. @bazzadp <https://github.com/bazzadp> what are your thoughts on the data viz implementation side? Do you have a sense for how many chapters we would be able to support, seeing as how the bulk of this work seems to be JS? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#896 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAGUQZO73AFIJ2BXLJXVCKLR3CTQZANCNFSM4OFHWFCA> .

tunetheweb · 2020-07-11T20:08:02Z

I’m kind of torn on this to be honest.

On the one hand I think there could be some very cool stuff come out of this. And it could certainly potentially help to expose the data for more digging in to gather insights, and possibly for future authors to play with to help right future chapters. I also think a lot of news media have done some very cool things with data recently (the NY Times example you gave).

On the other side, I’m concerned with the effort, the performance impact, the accessibility and internationalisation considerations versus how much they would actually be used. As you know I already questioned the value of the embedded sheets for similar reasons, but they at least were basically free as that was how we got the screenshots anyway so embedding the sheets was very little effort after that. Also we started late this year, so concerned with getting the actual thing written and not getting too sidetracked on non-core items if that delays that.

I also wonder if they should be embedded in the chapter and even belong to in the Web Almanac - of if they should be tools on the main HTTP Archive, updated each month, and just referenced/linked in the almanac? To me the Almanac is a report on the state of the web, and while we’re certainly keen to expose the queries and data we find for those to explore further, the real value IMHO is in the insights the authors bring and their opinion of the data.

Then again the use cases the news media has thrown up are certainly interesting and so those show how data-driven articles (which the Web Almanac definitely is!) can be enhanced. So i guess I’m being overly negative cause I’m more worried about how to do this, which is clouding my judgement on the benefits. Hopefully with some more concrete examples it won’t seem such a big, scary prospect! We’re also looking pretty good on the tech stack so far this year, as benefiting from site build last year and its in pretty good shape for 2020 already, so the dev team should be able to do something different this year and this seems like an obvious thing to look at to further build in what we did last year.

So have some concerns, but also excited to see what we come up with.

On a tech front, I personally have no experience in data visualisation and my JS is not great. Hoping some of the other developers will have this. I would say Glitch is really slow to boot up when site isn’t used, but Data Studio Dashboards could be an easy win to at least prototype some things. Then can look at more custom JS solutions after we have more of an idea what we want to build. How many chapters we support will depend on how many different components we want to build and how complex each are and how much reuse there is between chapters. Really difficult to say at this point. But I say don’t hold back - can always tone it down later if it’s too much.

ibnesayeed · 2020-07-11T21:15:28Z

Being a web archive researcher, I would prefer if reports are self-contained. Interactive visualization are great, but they should be portable and not be powered by a third-party service, because when those services go out of business (or change their APIs, rate limits, and whatnot) the report gets affected. Here is my rule of thumb, if a page is archived in a web archive and can faithfully reply independent of the origin servers' existence, I would consider it portable. It would not be nice if a few years down the line researches try to explore historical records of Web Almanac, but can't access its visualizations they way we can do it now.

Following are a few points to summarize how I think we should ideally prepare the report, if resources allow:

Report with all the basic ideas in the most accessible form
Provide structured summary data and raw data as downloadable resources
Add interactive charts and data visualizations as enhanced representations, but using portable solutions (e.g., using D3.js or other libraries, not an external service)
Make the whole site (every resource and data it needs) static Web Bundle (Web Packaging)-friendly

Distil is a good example where they publish fabulous interactive scientific articles.

rviscomi · 2020-07-16T23:14:13Z

what do these analytics numbers represent? This month? The whole time?

Those are the all-time page views for the English versions.

@bazzadp your concerns are totally valid. I wouldn't want to build anything that was inaccessible or slow. I think this would bring a lot of value to the chapters so IMO it would be worth the effort. The way I see it is that there are different levels of engagement between the readers and the data:

Level 0 uses words to describe the data
Level 1 uses static images to help tell the story
Level 2 uses interactive images
Level 3 uses immersive experiences

Level 0 is the most basic form without any visualization and it's also the fallback for visually impaired readers. We're currently at Level 2. By unlocking Level 3, readers readers would be empowered to dig deeper into the data and make their own discoveries that are of interest/relevance to them. I only went to one "Data Storytelling" meetup but the one I went to taught me about the importance of "letting readers find themselves in the data". It's why I gravitated towards New York in the coronavirus map I shared earlier. The same way that @bkardell's glitch app encourages exploration, an on-page app would let readers poke around the results in ways images can't support. At this level of engagement, the reader shifts from being a passive consumer of information to an active explorer of information, which I think would help drive all of the "conversions" we're looking for in visitors: longer time spent on the page, permalinking and sharing with the community, return visits to discover something new, etc.

I also wonder if they should be embedded in the chapter and even belong to in the Web Almanac - of if they should be tools on the main HTTP Archive, updated each month, and just referenced/linked in the almanac?

Ultimately, yes, I'd love to see the HA website be the home of the monthly data and Almanac website be the home of the annual interpretations of the data. One thing I would want to keep are all of the Level 0-3 data visualizations embedded inside of the chapters rather than linking to an external resource. For someone wanting to check on the latest trends in the months between Almanac publications, the HA website would be the destination for that.

And I think this addresses @ibnesayeed's point about being self-contained. We could embed an iframe in a Web Almanac chapter from the HA website and it's still the same 1st party data. I do think implementing it from scratch using native SVG/JS/CSS would be the most accessible/performant. We'd generate SVG from the JSON data, make it interactive using JS, and polish it with CSS styles/animations.

I think a prototype is in order so we can wrap our heads around the scope of the problem. We can start with the CSS and/or JS chapters to see what their one immersive data exploration use case would be and I can help mock something up.

rviscomi added help wanted Extra attention is needed development Building the Almanac tech stack analysis Querying the dataset labels Jun 23, 2020

rviscomi added this to TODO in 2020 via automation Jun 23, 2020

rviscomi added this to the 2020 Platform Development milestone Jun 23, 2020

tunetheweb mentioned this issue Jul 4, 2020

Clean up unused code and dependencies #953

Closed

rviscomi mentioned this issue Jul 17, 2020

Prototype an interactive data visualization for the 2020 CSS chapter #1052

Closed

rviscomi mentioned this issue Aug 8, 2020

Remove run_queries.sh as no longer used #1179

Merged

tunetheweb mentioned this issue Oct 1, 2020

Markup 2020 #899

Closed

10 tasks

tunetheweb mentioned this issue Oct 16, 2020

Join the 2020 Developers team #924

Closed

rviscomi closed this as completed Oct 29, 2020

2020 automation moved this from TODO to Done Oct 29, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Discuss] Interactive data visualizations #896

[Discuss] Interactive data visualizations #896

rviscomi commented Jun 23, 2020 •

edited

Loading

rviscomi commented Jul 11, 2020

bkardell commented Jul 11, 2020 via email

tunetheweb commented Jul 11, 2020

ibnesayeed commented Jul 11, 2020

rviscomi commented Jul 16, 2020

[Discuss] Interactive data visualizations #896

[Discuss] Interactive data visualizations #896

Comments

rviscomi commented Jun 23, 2020 • edited Loading

rviscomi commented Jul 11, 2020

bkardell commented Jul 11, 2020 via email

tunetheweb commented Jul 11, 2020

ibnesayeed commented Jul 11, 2020

rviscomi commented Jul 16, 2020

rviscomi commented Jun 23, 2020 •

edited

Loading