Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query metrics: Chapter 10. SEO #91

Closed
15 tasks done
rviscomi opened this issue Jul 23, 2019 · 2 comments · Fixed by #159
Closed
15 tasks done

Query metrics: Chapter 10. SEO #91

rviscomi opened this issue Jul 23, 2019 · 2 comments · Fixed by #159
Assignees
Labels
analysis Querying the dataset

Comments

@rviscomi
Copy link
Member

rviscomi commented Jul 23, 2019

Part Chapter Authors Reviewers Tracking Issue
II. User Experience 10. SEO @rachellcostello @ymschaap @AVGP @clarkeclark @andylimn @voltek62 #12

READ ME!

All of the metrics in the table below have been marked as Able To Query during the metrics triage. The analyst assigned to each metric is expected to write the corresponding query and submit a PR to have it reviewed and added to the repo.

In order to stay on schedule and have the data ready for authors, please have all metrics reviewed and merged by August 5.

Assignments

ID Metric description Analyst Notes
10.01 Structured data rich results eligibility (ratings, search, etc,) @ymschaap html -> regex
10.02 Lang attribute usage and mistakes (lang='en') @ymschaap lighthouse -> html-lang-valid, resource: See: https://discuss.httparchive.org/t/what-are-the-invalid-uses-of-the-lang-attribute/1022
10.03 <link> rel="amphtml" (AMP) @ymschaap html -> regex
10.04 <link> hreflang="en-us" (localisation usage) @ymschaap html -> regex + lighthouse -> hreflang
10.05 Breakdown of type of structured data served (ld+json, microformatting, schema.org + what @type)? @ymschaap Custom Query Can we have this data: https://search.google.com/structured-data/testing-tool
10.06 Indexability - looking at meta tags like <meta> noindex, <link> canonicals. @ymschaap lighthouse -> is-crawlable, lighthouse -> canonical
10.07 <meta> description + <title> (presence & length) @ymschaap html -> regex
10.08 Status codes and whether pages are accessible - 200, 3xx, 4xx, 5xx. @ymschaap request -> response
10.09 Content - looking at word count, thin pages, header usage, alt attributes images @ymschaap lighthouse ->image-alt, Custom Query
10.10 Linking - extract <a href> count per page (internal + external) @ymschaap Custom Query
10.11 Linking - fragment URLs (together with SPAs to navigate content) @ymschaap we have react/vue as application type + a href Custom Query
10.12 robots.txt (It is mentioned in Lighthouse, can we parse the content or only confirm its existence? E.g. check if has a sitemap reference - seems it does list the potential issues) @ymschaap lighthouse -> robots-txt
10.13 If the desktop site is responsive/mobile-ready, or a specific mobile site (redirect, UA)? (Can we find if these are different sites?) @ymschaap compare mobile vs desktop crawl page -> _final_url + lighthouse -> seo-mobile
10.14 Descriptive link text usage (available in Lighthouse data) @ymschaap lighthouse -> link-text
10.15 speed metrics (FCP, server response time) would be nice for SEO as well given the recent focus on fast loading sites @ymschaap See: https://discuss.httparchive.org/t/measuring-cms-host-ttfb-in-crux/1676/1

Checklist of metrics to be merged

  • 10.01 Structured data rich results eligibility (ratings, search, etc,)
  • 10.02 Lang attribute usage and mistakes (lang='en')
  • 10.03 <link> rel="amphtml" (AMP)
  • 10.04 <link> hreflang="en-us" (localisation usage)
  • 10.05 Breakdown of type of structured data served (ld+json, microformatting, schema.org + what @type)?
  • 10.06 Indexability - looking at meta tags like <meta> noindex, <link> canonicals.
  • 10.07 <meta> description + <title> (presence & length)
  • 10.08 Status codes and whether pages are accessible - 200, 3xx, 4xx, 5xx.
  • 10.09 Content - looking at word count, thin pages, header usage, alt attributes images
  • 10.10 Linking - extract <a href> count per page (internal + external)
  • 10.11 Linking - fragment URLs (together with SPAs to navigate content)
  • 10.12 robots.txt (It is mentioned in Lighthouse, can we parse the content or only confirm its existence? E.g. check if has a sitemap reference - seems it does list the potential issues)
  • 10.13 If the desktop site is responsive/mobile-ready, or a specific mobile site (redirect, UA)? (Can we find if these are different sites?)
  • 10.14 Descriptive link text usage (available in Lighthouse data)
  • 10.15 speed metrics (FCP, server response time) would be nice for SEO as well given the recent focus on fast loading sites
@rviscomi rviscomi added the analysis Querying the dataset label Jul 23, 2019
@rviscomi rviscomi added this to the Content written milestone Jul 23, 2019
@rviscomi rviscomi added this to TODO in Web Almanac 2019 via automation Jul 23, 2019
@rviscomi
Copy link
Member Author

rviscomi commented Aug 3, 2019

10.01 Structured data rich results eligibility

I had assumed GoogleChrome/lighthouse#4359 was already added to Lighthouse, but it doesn't seem like it. This may be tricky to get right using only SQL. Since I was the one to add this metric, I think it's ok to change this to "Not Feasible". @ymschaap WDYT?

@ymschaap
Copy link
Contributor

ymschaap commented Aug 3, 2019

10.01 Structured data rich results eligibility

I had assumed GoogleChrome/lighthouse#4359 was already added to Lighthouse, but it doesn't seem like it. This may be tricky to get right using only SQL. Since I was the one to add this metric, I think it's ok to change this to "Not Feasible". @ymschaap WDYT?

What I did now was use the 10.05 metric (which grabs any json+ld, finds @type and @content) and looks at what @types triggers rich results.

We know the JSON is valid, and we know @context + @type is set, which is pretty close to what GoogleChrome/lighthouse#4359 does.

So imho we could keep it as long as we make clear what the limitations of this metric is in the webalmanac. On the other hand, 10.05 might already touch on this, and 10.01 would could be considered a duplicate.

@rviscomi rviscomi moved this from TODO to In Progress in Web Almanac 2019 Aug 27, 2019
@rviscomi rviscomi added the ASAP This issue is blocking progress label Sep 4, 2019
Web Almanac 2019 automation moved this from In Progress to Done Sep 17, 2019
@rviscomi rviscomi removed the ASAP This issue is blocking progress label Sep 25, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
analysis Querying the dataset
Projects
No open projects
Development

Successfully merging a pull request may close this issue.

2 participants