Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Discuss] Interactive data visualizations #896

Closed
rviscomi opened this issue Jun 23, 2020 · 5 comments
Closed

[Discuss] Interactive data visualizations #896

rviscomi opened this issue Jun 23, 2020 · 5 comments
Labels
analysis Querying the dataset development Building the Almanac tech stack help wanted Extra attention is needed
Projects

Comments

@rviscomi
Copy link
Member

rviscomi commented Jun 23, 2020

Last year @bkardell built a Glitch app to help slice the Markup chapter data and it was really useful to deep link for specific subsets of the data. And from the data exploration side I thought it was a great way for users to find their own interesting insights. I'd like to take that idea a step further and discuss how we might be able to embed interactive experiences in the chapters themselves.

Our embedded Google Sheets visualizations have nice hover effects but you can't really explore the data at all. I would love to have "immersive" experiences with the data in which readers could dive into a specific area of interest. Contrived example, but in the JS chapter we could have an interactive figure that lets the reader select one or more frameworks and compare their adoption rates visually.

I see this as a bespoke build in close collaboration between the content (authors/reviewers), analysis, and development teams for only the most popular chapters. Last year we had thought about building some kind of pipeline to auto-generate data viz based on every JSON data file, but I don't think we need to get that sophisticated. (though that would be cool)

Should we get into D3.js or are there better drop-in solutions, like an embedded Glitch app or Data Studio dashboard? How can we ensure that whatever we build is accessible?

cc @HTTPArchive/developers @HTTPArchive/data-analysts

@rviscomi rviscomi added help wanted Extra attention is needed development Building the Almanac tech stack analysis Querying the dataset labels Jun 23, 2020
@rviscomi rviscomi added this to TODO in 2020 via automation Jun 23, 2020
@rviscomi rviscomi added this to the 2020 Platform Development milestone Jun 23, 2020
@rviscomi
Copy link
Member Author

According to the Web Almanac analytics, here are the latest page view counts for each 2019 chapter:

Chapter Views
JavaScript 33,618
CSS 11,436
SEO 6,323
Performance 5,334
Markup 4,052
HTTP/2 3,996
Third Parties 3,717
Fonts 3,171
Accessibility 3,108
PWA 3,052
CMS 3,032
Media 2,445
Security 2,419
Mobile Web 2,250
CDN 1,988
Ecommerce 1,724
Page Weight 1,656
Compression 1,332
Caching 1,324
Resource Hints 961

So the top chapters were JavaScript, CSS, SEO, Performance, Markup, etc in that order. Now that each of these 2020 chapters have content teams of authors and reviewers, we can coordinate with them on a bespoke interactive data visualization that will be engaging to readers.

There are some platform questions we can iron out, like the mechanics of the markdown file for embedding the interactive visualization. Would the build system find/replace a placeholder, or would it be an embedded web component or similar markup that a script would "hydrate" onto the page? How do we support static media like print/ebook?

I've also been thinking about visualization tools, and SVG with JSON+JS and media queries seems like a really lightweight and accessible approach worth considering. I'm inspired by what the NYT Graphics team does, for example their coronavirus map.

Next steps...

@OBTo let's start coordinating with the JS, CSS, and SEO leads to figure out what content would be the most engaging. If there's time/resources we can go further down the list of chapters.
@paulcalvano I'd love to hear your ideas for ways to leverage the data in engaging ways: dimensions to slice, fields to filter, metrics to sort by, etc. We'd need a process for writing a special kind of query to get all this data and saving it to the codebase so we can feed it into the visualization.
@bazzadp what are your thoughts on the data viz implementation side? Do you have a sense for how many chapters we would be able to support, seeing as how the bulk of this work seems to be JS?

@bkardell
Copy link
Contributor

bkardell commented Jul 11, 2020 via email

@tunetheweb
Copy link
Member

I’m kind of torn on this to be honest.

On the one hand I think there could be some very cool stuff come out of this. And it could certainly potentially help to expose the data for more digging in to gather insights, and possibly for future authors to play with to help right future chapters. I also think a lot of news media have done some very cool things with data recently (the NY Times example you gave).

On the other side, I’m concerned with the effort, the performance impact, the accessibility and internationalisation considerations versus how much they would actually be used. As you know I already questioned the value of the embedded sheets for similar reasons, but they at least were basically free as that was how we got the screenshots anyway so embedding the sheets was very little effort after that. Also we started late this year, so concerned with getting the actual thing written and not getting too sidetracked on non-core items if that delays that.

I also wonder if they should be embedded in the chapter and even belong to in the Web Almanac - of if they should be tools on the main HTTP Archive, updated each month, and just referenced/linked in the almanac? To me the Almanac is a report on the state of the web, and while we’re certainly keen to expose the queries and data we find for those to explore further, the real value IMHO is in the insights the authors bring and their opinion of the data.

Then again the use cases the news media has thrown up are certainly interesting and so those show how data-driven articles (which the Web Almanac definitely is!) can be enhanced. So i guess I’m being overly negative cause I’m more worried about how to do this, which is clouding my judgement on the benefits. Hopefully with some more concrete examples it won’t seem such a big, scary prospect! We’re also looking pretty good on the tech stack so far this year, as benefiting from site build last year and its in pretty good shape for 2020 already, so the dev team should be able to do something different this year and this seems like an obvious thing to look at to further build in what we did last year.

So have some concerns, but also excited to see what we come up with.

On a tech front, I personally have no experience in data visualisation and my JS is not great. Hoping some of the other developers will have this. I would say Glitch is really slow to boot up when site isn’t used, but Data Studio Dashboards could be an easy win to at least prototype some things. Then can look at more custom JS solutions after we have more of an idea what we want to build. How many chapters we support will depend on how many different components we want to build and how complex each are and how much reuse there is between chapters. Really difficult to say at this point. But I say don’t hold back - can always tone it down later if it’s too much.

@ibnesayeed
Copy link
Contributor

Being a web archive researcher, I would prefer if reports are self-contained. Interactive visualization are great, but they should be portable and not be powered by a third-party service, because when those services go out of business (or change their APIs, rate limits, and whatnot) the report gets affected. Here is my rule of thumb, if a page is archived in a web archive and can faithfully reply independent of the origin servers' existence, I would consider it portable. It would not be nice if a few years down the line researches try to explore historical records of Web Almanac, but can't access its visualizations they way we can do it now.

Following are a few points to summarize how I think we should ideally prepare the report, if resources allow:

  • Report with all the basic ideas in the most accessible form
  • Provide structured summary data and raw data as downloadable resources
  • Add interactive charts and data visualizations as enhanced representations, but using portable solutions (e.g., using D3.js or other libraries, not an external service)
  • Make the whole site (every resource and data it needs) static Web Bundle (Web Packaging)-friendly

Distil is a good example where they publish fabulous interactive scientific articles.

@rviscomi
Copy link
Member Author

what do these analytics numbers represent? This month? The whole time?

Those are the all-time page views for the English versions.

@bazzadp your concerns are totally valid. I wouldn't want to build anything that was inaccessible or slow. I think this would bring a lot of value to the chapters so IMO it would be worth the effort. The way I see it is that there are different levels of engagement between the readers and the data:

  • Level 0 uses words to describe the data
  • Level 1 uses static images to help tell the story
  • Level 2 uses interactive images
  • Level 3 uses immersive experiences

Level 0 is the most basic form without any visualization and it's also the fallback for visually impaired readers. We're currently at Level 2. By unlocking Level 3, readers readers would be empowered to dig deeper into the data and make their own discoveries that are of interest/relevance to them. I only went to one "Data Storytelling" meetup but the one I went to taught me about the importance of "letting readers find themselves in the data". It's why I gravitated towards New York in the coronavirus map I shared earlier. The same way that @bkardell's glitch app encourages exploration, an on-page app would let readers poke around the results in ways images can't support. At this level of engagement, the reader shifts from being a passive consumer of information to an active explorer of information, which I think would help drive all of the "conversions" we're looking for in visitors: longer time spent on the page, permalinking and sharing with the community, return visits to discover something new, etc.

I also wonder if they should be embedded in the chapter and even belong to in the Web Almanac - of if they should be tools on the main HTTP Archive, updated each month, and just referenced/linked in the almanac?

Ultimately, yes, I'd love to see the HA website be the home of the monthly data and Almanac website be the home of the annual interpretations of the data. One thing I would want to keep are all of the Level 0-3 data visualizations embedded inside of the chapters rather than linking to an external resource. For someone wanting to check on the latest trends in the months between Almanac publications, the HA website would be the destination for that.

And I think this addresses @ibnesayeed's point about being self-contained. We could embed an iframe in a Web Almanac chapter from the HA website and it's still the same 1st party data. I do think implementing it from scratch using native SVG/JS/CSS would be the most accessible/performant. We'd generate SVG from the JSON data, make it interactive using JS, and polish it with CSS styles/animations.

I think a prototype is in order so we can wrap our heads around the scope of the problem. We can start with the CSS and/or JS chapters to see what their one immersive data exploration use case would be and I can help mock something up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
analysis Querying the dataset development Building the Almanac tech stack help wanted Extra attention is needed
Projects
No open projects
2020
  
Done
Development

No branches or pull requests

4 participants