SEO 2020 queries #1062

Tiggerito · 2020-07-19T07:33:26Z

SEO 2020 Queries (#908)

Checklist of the metrics needed

Crawlability & Indexability

Content

Links

No of outgoing links to internal pages (include subdomains)
pages_wpt_bodies_by_device_and_percentile.sql - outgoing_links_internal
pages_wpt_bodies_anchors_by_device_and_number_outgoing_internal_links.sql
No of outgoing links to external pages
pages_wpt_bodies_by_device_and_percentile.sql - outgoing_links_external
pages_wpt_bodies_anchors_by_device_and_number_outgoing_external_links.sql
No of links in mobile and desktop version
pages_wpt_bodies_by_device_and_percentile.sql - outgoing_links
pages_wpt_bodies_anchors_by_device_and_number_links.sql
No of rel
pages_wpt_bodies_by_device_and_number_rel_attribute.sql
Text vs image links
pages_wpt_bodies_text_vs_images_links_by_device.sql - not sure if this is enough?

Advanced

Pulls in LCP, FID, CLS (Core Web Vitals) and calculates pass cores based on 75% in the good place. Also pulls in TTFB, FCP and a speed score from last years query. Data is grouped by device.

Tiggerito · 2020-07-19T07:51:54Z

I just noticed that I should be converting pct to 0-100 values.

added 2019 SLQ scripts and renamed them added notes for the new script

rviscomi · 2020-07-19T17:42:30Z

Thanks @Tiggerito! I'll give it a more thorough review later.

Meanwhile could you also edit the top comment of this PR to include a checklist to show what queries are in this PR and what the remaining metrics to query are?

10_01_structured_data_rich_results_rendered.sql

You can remove the 10_##_ prefixes from the file names, that was the old metric naming scheme that we're doing away with this year in favor of descriptive file names.

I just noticed that I should be converting pct to 0-100 values.

I'm also on the fence about this pattern. (context)

Tiggerito · 2020-07-19T18:30:54Z

Thanks @Tiggerito! I'll give it a more thorough review later.

Meanwhile could you also edit the top comment of this PR to include a checklist to show what queries are in this PR and what the remaining metrics to query are?

I'm learning. I didn't realise all commits automatically become part of an ongoing pull request. So I need to keep the first comment up to date on all changes made.

We're still working on the metrics and all the queries needed. I believe the other analysis can now see this and can work on it, which is where I wanted to be.

Maybe next time we don't do the draft pull so early, just do the branch? Or is the doing the pull important? I guess it triggers a review from the smart guys ;-)

10_01_structured_data_rich_results_rendered.sql

You can remove the 10_##_ prefixes from the file names, that was the old metric naming scheme that we're doing away with this year in favor of descriptive file names.

I left them in so that I could identify references to them from last year. Will we be updating almanac.js to list the fullnames of the scripts that use each property? We'd probably want to include the chapter name so people can find them and know who's using them. e.g.

// Used by:
// 2020/07_SEO/core_web_vitals_by_device.sql

Or maybe we just say which chapter

// Used by : SEO

almanac.js could do with a restructure as some properties use the '10_1' type syntax. Maybe they should also stick with descriptive names?

?

I just noticed that I should be converting pct to 0-100 values.

I'm also on the fence about this pattern. (context)

Gave it a thumbs up.

rviscomi · 2020-07-19T18:46:23Z

I'll remove the analyst reviewers for now. I thought it'd be good for everyone to peek at each others' work, but this PR isn't necessarily ready for review yet.

Maybe next time we don't do the draft pull so early, just do the branch? Or is the doing the pull important?

Early is good. And having the draft PR open gives you a place to track progress for each metric using the checklist. It's ok to continuously add to and modify the branch/checklist as the chapter planning evolves. Hopefully in about a week things will be settled and you'll know which metrics need to be queried.

almanac.js could do with a restructure as some properties use the '10_1' type syntax. Maybe they should also stick with descriptive names?

Yes, this is an unfortunate vestige of last year's system. Not sure if it's better to leave them so the 2019 queries make sense or update them for consistency with 2020. In any case, the file name is not the best place to reference the related 2019 query - for that you can add it as a comment in the SQL, # See related: sql/2019/10_SEO/10_01.sql.

foxdavidj · 2020-07-19T22:12:45Z

I can help out with reviewing this when its ready for review @rviscomi

@Tiggerito Can you edit the name of the PR to reflect what its for as well? By default it just uses the name of the first commit you made haha. Maybe something like: SEO 2020 Queries

sql/2020/07_SEO/10_02_lang_attribute_raw.sql

sql/2020/07_SEO/10_04a_has_hreflang_raw.sql

sql/2020/07_SEO/10_04b_top_hreflang_values.sql

sql/2020/07_SEO/10_07b_title_tag_length_percentiles_raw.sql

sql/2020/07_SEO/10_16_h1_length_percentiles_raw.sql

rviscomi · 2020-07-21T01:23:22Z

sql/2020/07_SEO/json-ld-scripts-raw.sql

@@ -0,0 +1,98 @@
+# how many lson-ld scripts on a page
+SELECT 
+ARRAY_LENGTH(REGEXP_EXTRACT_ALL(body, '(?i)<script[^>]*type=[\'"]?application\\/ld\\+json[\'"]?[^>]*>(.*?)<\\/script>')) AS jsonld_scripts_count,


Proposed custom metric:

document.querySelectorAll('script[type="application/ld+json" i]').length

This one probably needs to be on the raw body as it's designed to find out if the tags parses into json. But thinking about it, I think I can read the raw text of a script in the DOM and try to parse it there. Will test.

What does the i in the selector signify?

Case insensitivity. Plug the regex in here: https://regexr.com/ and make sure you choose PCRE instead of JavaScript and it should explain each part to you.

Sorry, talking about the css selector from @rviscomi . I though that, but tested and it's case insensitive without it.

I just discovered that some attributes are case sensitive, so my gathering for meta tags is missing a whole bunch of data. I opened an issue for it:

HTTPArchive/legacy.httparchive.org#188

What's the best way to resolve this. Do I need to fork and fix to get it into the next crawl?

It'd be nice to fix for the future but it won't be included until the October crawl at the soonest, the results of which would be available too late to be included in the Almanac. I don't think the case sensitivity would significantly affect the results, but if that's a concern we can include a side note about it in the chapter.

I mentioned to the team that we might be able to do a raw html crawl at the end to get anything we missed. If not, the side note will work.

My concern is that older websites tended to shout (upper case) their html, so they would be the ones we miss.

Can we submit the PR to fix this now while it’s fresh in our mind? Really want to avoid what we did last year which was not think about custom metrics for 11 months and then have a complete blind panic for the last month! Would be really nice to work on issues like this (or even custom metrics we wish we’d added in hindsight!) as we analyse the 2020 results so we’ve a head start on 2021.

Would also mean you can have a quick look during (or at end of) October and if the stats are radically different then can decide what to do.

Fix looks relatively simple since we’re using Chrome for the crawl: https://stackoverflow.com/questions/38399089/ignoring-case-sensitiveness-in-queryselectorall

sql/2020/07_SEO/json-ld-scripts-raw.sql

Where possible I have change the queries to use data from almanac.js. I've identified almanac.js queires by rendered_ and body queries by raw_. I've tried to merge as many queries as possible based on the data source and format of the results. This should make it a lot easier to manage as well as save on the number of queries made $$$ lighthouse.sql - all data from lighthouse rendered_by_device.sql - rendered data from almanac.js grouped by device rendered_percentiles_by_device.txt - rendered data from almanac.js reported by percentiles and grouped by device raw_by_device.sql - data from the raw html - very expensive query when done on the real data - does not gather much at the moment. This could become our raw/rendered comparison query. Other scripts are ones that have a unique output so can't be merged. Typically because they group by the field they are analysing. At the moment most are pointing to test tables. But be careful. Always double check the cost before making a query as some old ones are still pointing to live tables. We could do with a test table for httparchive.almanac.summary_response_bodies as the table I'm currently testing with is in a different format. We're starting to tie down the exact metrics needed, and get the almanac.js updated to fit those needs. My next move here is to see if I can get the comparisons data they want. This would be making a query joining several of the tables.

….httparchive.org into seo-sql-2020

Added json_ld_scripts to other rendered queries Moved no longer needed raw scripts into a sandbox folder.

renamed them set to use test tables merged where possible

sql/2020/07_SEO/core_web_vitals_by_device.sql

rviscomi · 2020-09-10T16:07:17Z

WebPageTest Results is basically data to help me test. Do you not keep that sort of thing? I feel like it will give next years analyst a clue.

I think it's ok to remove these, the results spreadsheet will be the publicly available place for query results.

Edit: Oh nevermind, I see these are the sample inputs to the queries, not the outputs. Anyway, I do still think they can be removed. BigQuery table previews can give future analysts sample data. If there are any gotchas in the data, README would be the best place to document those.

rviscomi · 2020-09-10T16:20:17Z

Most of my real queries contain a test data section, controlled by a simple if (false) statement. They can add a lot to the JavaScript but seem to have no effect on query performance. Should I remove my testing system?

The way I look at the queries that we check into source control is to help readers scrutinize our methodology. For example, answering "how did you calculate that metric?" I think our queries should be optimized for legibility, so it's ok to have lots of descriptive comments for any parts of the query that can be confusing. IMO the testing system woven into the queries gets in the way of the meat of the queries, which is how the metrics are implemented. Don't get me wrong, it's a brilliant idea to have a toggle to switch back to sample data, but I think it's only useful to us now while we debug the 2020 queries and future analysts won't need to debug queries that are already working, so they'd be mostly superfluous.

So I'd recommend stripping the testing code out of the queries, leaving only the essentials.

rviscomi

Reviewed the first query. Let me know what you think of those suggestions, then I'll continue with the other queries.

rviscomi · 2020-09-10T16:46:12Z

sql/2020/07_SEO/alt_images_by_device.sql

@@ -0,0 +1,78 @@
+#standardSQL


I have a few suggestions that I'll propose by way of a rewritten query.

I agree with your comment about using percentiles. For this query it makes sense to represent the % of images with alt attributes per page as a distribution.

Do the division in the JS function while you're at it, which also simplifies the return type. Function renamed accordingly.

You could simplify some of the defensive if statements by returning null in the catch block. Null values are ignored in the percentile aggregation by default.

Various whitespace tidying.

Uppercase for all keywords, like AS.

Remove all "TEST" logic.

#standardSQL # Distribution of the coverage of alt attributes over all images per page. CREATE TEMPORARY FUNCTION get_percent_alt_images(markup_string STRING) RETURNS FLOAT64 LANGUAGE js AS ''' try { var markup = JSON.parse(markup_string); return markup.images.img.alt.present / markup.images.img.total; } catch (e) { return null; } '''; SELECT client, percentile, ROUND(APPROX_QUANTILES(pct_alt, 1000)[OFFSET(percentile * 10)], 4) AS pct_alt FROM ( SELECT _TABLE_SUFFIX AS client, get_percent_alt_images(JSON_EXTRACT_SCALAR(payload, '$._markup')) AS pct_alt FROM `httparchive.pages.2020_08_01_*`), UNNEST([10, 25, 50, 75, 90]) AS percentile GROUP BY client, percentile ORDER BY percentile, client

Results:

client percentile pct_alt

desktop 10 0.00%

mobile 10 0.00%

desktop 25 10.53%

mobile 25 9.09%

desktop 50 50.00%

mobile 50 50.00%

desktop 75 89.74%

mobile 75 89.47%

desktop 90 100.00%

mobile 90 100.00%

(to be honest, I'm very surprised that the median is exactly 50% - please let me know if you spot anything wrong with my query!)

I should have made the first query the last one. It was one I did not think was too valuable, so did not focus on it much.

I think I'd already added it into my percentile query. Will check and fold in your comments there.

Do the division in the JS function while you're at it, which also simplifies the return type. Function renamed accordingly.

A few reasons I leave some maths to the SQL:

SQL has a safe div so no exception on zeros. Not an issue with one return value, bot could break all JS result data with a single mistake.

Returning more data could come in handy for the SQL writer. e.g. I often return a count and not a boolean indicating non zero. Again, to minify churn if the SQL writer decided they wanted something else. Or to try and keep the data provided consistent between queries.

I have the feeling that SQL runs far faster then JS.

You could simplify some of the defensive if statements by returning null in the catch block. Null values are ignored in the percentile aggregation by default.

That script does not go as defensive as a normally go!

This goes against my own beliefs and what I think is best practice. Exceptions should not be used to handle normal code flow.

It's more apparent when you see some of my more complex JS. If I was not defensive, then an exception would mean all subsequent data calculations would be lost. A bug in an early bit of code could cause the loss of much data.

The pattern I use where I create the result object at the start and always return it (even in an exception) is deliberate. That way an exception does not stop all data from being returned (null).

I've also tried to make my exception/error handling provide visible clues in the results that things were broken. Returning something like a null means an issue is silently ignored. Returning an error message where possible really stands out.

That reminds me, I built in exception handling to my custom metrics. I should check that to see if they did have issues. I did that after noticing that a reasonable percentage of results from last year had blank custom metrics, indicating an exception happened and was not handled.

Other comments are sound and I will work on them today.

I'll double check the custom metrics for this to make sure there is nothing strange causing the 50% median.

It's hard to get my head around using percents to feed into percentiles 🤯

I've updated pages_markup_by_device_and_percentile.sql to include the percentile version of that query and have it pulling out a few values to help me check validity and understand things. Here's the result from a live run:

Row percentile client total img_count images_with_alt_present_percent images_with_alt_blank_percent images_with_alt_missing_percent images_with_alt_present images_with_alt_blank images_with_alt_missing

1 10 desktop 5593642 2 0.0 0.0 0.0 0 0 0

2 10 mobile 6347640 2 0.0 0.0 0.0 0 0 0

3 25 desktop 5593642 8 0.1429 0.0 0.0 1 0 0

4 25 mobile 6347640 8 0.125 0.0 0.0 1 0 0

5 50 desktop 5593642 21 0.5385 0.1 0.0299 7 2 1

6 50 mobile 6347640 19 0.5122 0.1111 0.0244 6 2 1

7 75 desktop 5593642 43 0.9091 0.4906 0.3333 22 10 5

8 75 mobile 6347640 39 0.9063 0.5 0.3333 20 9 5

9 90 desktop 5593642 82 1.0 0.8261 0.8037 48 24 19

10 90 mobile 6347640 73 1.0 0.8333 0.8235 43 22 18

The first column and last three columns are just counts. The others are the percents.

So I'm guessing it's saying things like (mobile):

The median page has 19 images, 10% have 2 or less, and 10% have 73 or more.

10% of pages have alt attributes present on all their images? While 10% have no images with the alt present. The median is to have 51% of alts present?

10% have 82% of their images without any alt attribute. 50% do well with less than 3% missing.

50% use less than 10% blank attributes.

Time to rest my brain and clean out all that test code.

Tiggerito · 2020-09-10T22:10:47Z

Most of my real queries contain a test data section, controlled by a simple if (false) statement. They can add a lot to the JavaScript but seem to have no effect on query performance. Should I remove my testing system?

The way I look at the queries that we check into source control is to help readers scrutinize our methodology. For example, answering "how did you calculate that metric?" I think our queries should be optimized for legibility, so it's ok to have lots of descriptive comments for any parts of the query that can be confusing. IMO the testing system woven into the queries gets in the way of the meat of the queries, which is how the metrics are implemented. Don't get me wrong, it's a brilliant idea to have a toggle to switch back to sample data, but I think it's only useful to us now while we debug the 2020 queries and future analysts won't need to debug queries that are already working, so they'd be mostly superfluous.

So I'd recommend stripping the testing code out of the queries, leaving only the essentials.

I agree, it does distract from the main code.

On my first real query I found a bug that meant I had to quickly revert back to test mode, so what I had was handy. At least now the test code is available in version control, so I can quickly get it back again.

I think I will remove all test stuff and document my testing process in the README, including an indication on how to see my test versions. Can I label commits so it would be easy to find the last version before switching to live values? e.g. "contains test code".

I can also document how I get the custom metrics result data to help me test, and then remove my results folder.

Tiggerito · 2020-09-10T23:42:55Z

I was going to reference a commit that contained all my test code, so people could refer back to it to see how I did things. But I suspect all my commits are lost when it merges? So no record of my test code is kept?

testing image percentile query

removed commented out code minor cosmetic changes

Tiggerito · 2020-09-16T05:31:57Z

@rviscomi @bazzadp Do I need to hold off on gathering real data until this pull request is merged?

rviscomi

Sorry for the delay @Tiggerito! Great work. This PR is most of the way there.

Similar to my feedback on the Markup PR, I do think these queries need to be split up by metric, not by table or grouping. I understand if you don't have the bandwidth to do that now, and I don't want to hold this up any further.

I've made a couple of comments about being more descriptive in the file names about what the underlying metrics are. These comments are applicable now for the queries that only contain a single metric. When we have time to split up the monolithic queries, each of the individual files should be similarly named more descriptively for the metric, so this is a step in that direction.

Other than that, most of my comments apply to how the metrics are aggregated. Grouping by INT64 values would effectively generate a histogram of granularity 1, which could lead to an unlimited number of values. Instead, to summarize distributions of values like this, we typically calculate as percentiles.

Once these comments are addressed we should be ready to run the queries, save the results to the Sheet, and start drafting the chapter. Thanks again for all the time you've spent on this, it will show through on the final result!

sql/2020/07_SEO/pages_almanac_by_device.sql

rviscomi · 2020-09-16T15:40:24Z

sql/2020/07_SEO/pages_almanac_by_device.sql

+  COUNT(0) AS total,
+
+  # Pages with videos
+  COUNTIF(almanac_info.videos_total > 0) AS has_videos,


Is there a reason to prefer this over reqVideo or bytesVideo in summary_pages?

Didn't know about them. How do I find out how they are generated?

This metric is a bit dodgy anyhow as it only finds video tags, and many people use other ways to show videos, like iframe. They are aware if its limitation.

The requests.sql query emulates how videos are detected:

almanac.httparchive.org/sql/util/requests.sql

Line 38 in ebe9c56

function prettyType(mimeType, ext) {

These fields are the count of requests and sum of requests' bytes per page. Could be a useful proxy for this metric.

I did a quick test. I don't think it picks up the embedding of videos unless they auto play.

I'd class this one as a low priority metric anyhow. Video is not a major SEO thing. I'll make it clear that the results are limited (already indicated that). If they want something better we can negotiate.

sql/2020/07_SEO/pages_almanac_by_device_and_percentile.sql

rviscomi · 2020-09-16T15:41:14Z

sql/2020/07_SEO/pages_almanac_by_device_and_percentile.sql

+    _TABLE_SUFFIX AS client,
+    percentile,
+    url,
+    get_almanac_info(JSON_EXTRACT_SCALAR(payload, '$._almanac')) AS almanac_info


Ditto on reqVideo/bytesVideo

sql/2020/07_SEO/pages_robots_txt_by_device_and_status.sql

sql/2020/07_SEO/pages_wpt_bodies_hreflang_by_device_and_http_header_value.sql

sql/2020/07_SEO/pages_wpt_bodies_hreflang_by_device_and_link_tag_value.sql

sql/2020/07_SEO/pages_wpt_bodies_meta_description_by_device_and_char_count .sql

sql/2020/07_SEO/pages_wpt_bodies_structured_data_by_device_and_type.sql

rviscomi · 2020-09-16T16:19:22Z

sql/2020/07_SEO/pages_wpt_bodies_text_vs_images_links_by_device.sql

+SELECT
+client,
+COUNT(0) AS total, 
+ROUND(AVG(wpt_bodies_info.image_links)) AS image_links_avg,


Ditto on percentiles

@antoineeripret I'm not sure how this one works?

@Tiggerito : what I did there, as I was aware that a freq table would end up with a lot of possible values, was to apply an AVG() function on the returned results. Maybe not the best idea I have had.

We can change it to return percentiles instead. I'll have a look at that during the afternoon (Spanish time).

I've already added percentiles for those values in the percentiles script. Just want to make sure we can remove these?

Ah, my bad! Yes, for sure !

I've added some image v text link percentiles and dropped all the queries that are now covered by percentiles.

The requirements for "image v text links" are vague, in fact that is the requirements. I've created separate percentiles for each as well as a percentile of the percent of links that are images. Hopefully that will cover what they asked for.

I think all is done.

Co-authored-by: Rick Viscomi <rviscomi@users.noreply.github.com>

…ibute.sql Co-authored-by: Rick Viscomi <rviscomi@users.noreply.github.com>

…_type.sql Co-authored-by: Rick Viscomi <rviscomi@users.noreply.github.com>

…eader_value.sql Co-authored-by: Rick Viscomi <rviscomi@users.noreply.github.com>

…ag_value.sql Co-authored-by: Rick Viscomi <rviscomi@users.noreply.github.com>

clean up hreflang values

Tiggerito · 2020-09-16T23:37:02Z

@rviscomi

I think all that's left is:

Removing the number of links queries (@antoineeripret are you ok with that?) and using the percentile results instead..

Maybe adding a reqVideo/bytesVideo if that data looks to be of any use.

Working out how to do pages_wpt_bodies_text_vs_images_links_by_device.sql (@antoineeripret ?)

antoineeripret · 2020-09-17T07:10:10Z

@rviscomi

I think all that's left is:

Removing the number of links queries (@antoineeripret are you ok with that?) and using the percentile results instead..

Maybe adding a reqVideo/bytesVideo if that data looks to be of any use.

Working out how to do pages_wpt_bodies_text_vs_images_links_by_device.sql (@antoineeripret ?)

@Tiggerito : I will have time to work on what is needed this afternoon. If you figure out how to do pages_wpt_bodies_text_vs_images_links_by_device.sql before, just let me know, otherwise just commit what you have so far and I'll continue :)

added image v text link percentiles

rviscomi · 2020-09-24T04:08:24Z

This is approved, so I'll merge and any last minute changes can be applied to a new PR. Great work everyone!

Create core_web_vitals_by_device.sql

e44973a

Pulls in LCP, FID, CLS (Core Web Vitals) and calculates pass cores based on 75% in the good place. Also pulls in TTFB, FCP and a speed score from last years query. Data is grouped by device.

Tiggerito added 4 commits July 19, 2020 17:26

Changed pct fields to 0-100 values

7f93b6b

Copying over scripts from 2019

cfb83d7

Renamed SQL files to be more descriptive

85888e3

last commit messed up

f8f8d67

added 2019 SLQ scripts and renamed them added notes for the new script

rviscomi added the analysis Querying the dataset label Jul 19, 2020

rviscomi added this to TODO in 2020 via automation Jul 19, 2020

rviscomi added this to the 2020 Analysis milestone Jul 19, 2020

rviscomi requested a review from a team July 19, 2020 17:43

rviscomi removed the request for review from a team July 19, 2020 18:44

Tiggerito mentioned this pull request Jul 20, 2020

SEO 2020 #908

Closed

10 tasks

added script to validate json-ld script tags in the raw html

1e1019d

tunetheweb changed the title ~~Create core_web_vitals_by_device.sql~~ 2020 SEO queries Jul 20, 2020

rviscomi reviewed Jul 21, 2020

View reviewed changes

Tiggerito added 10 commits July 21, 2020 17:54

Update README.md

7c5a843

Escaping underscores

51567bd

added a heading

5fa8bb2

renaming almanac.js property names

b2beaeb

Merge branch 'seo-sql-2020' of https://github.com/HTTPArchive/almanac…

9b7f060

….httparchive.org into seo-sql-2020

Tidied up the json-ld scripts

e401353

created rendered_top_hreflang_values.sql

70dbe87

Added rendered_json_ld_scripts.sql

3634576

Added json_ld_scripts to other rendered queries Moved no longer needed raw scripts into a sandbox folder.

Pulled in 2019 SQLs for the Markup chapter

68cfc80

renamed them set to use test tables merged where possible

max-ostapenko reviewed Jul 23, 2020

View reviewed changes

sql/2020/07_SEO/core_web_vitals_by_device.sql Outdated Show resolved Hide resolved

COUNT(*) to COUNT(0)

c0ee5e6

rviscomi requested a review from a team September 10, 2020 16:21

rviscomi requested changes Sep 10, 2020

View reviewed changes

Tiggerito mentioned this pull request Sep 10, 2020

Markup 2020 queries #1137

Merged

75 tasks

Tiggerito added 2 commits September 11, 2020 10:34

Removed test results

9bdb27f

testing image percentile query

removed all test code

23eca7b

removed commented out code minor cosmetic changes

antoineeripret approved these changes Sep 16, 2020

View reviewed changes

rviscomi requested changes Sep 16, 2020

View reviewed changes

Tiggerito and others added 8 commits September 17, 2020 08:35

Update sql/2020/07_SEO/pages_robots_txt_by_device_and_status.sql

65a1d08

Co-authored-by: Rick Viscomi <rviscomi@users.noreply.github.com>

Update sql/2020/07_SEO/pages_robots_txt_by_device_and_useragent.sql

3159b48

Co-authored-by: Rick Viscomi <rviscomi@users.noreply.github.com>

Update sql/2020/07_SEO/pages_wpt_bodies_by_device_and_number_rel_attr…

7168f8f

…ibute.sql Co-authored-by: Rick Viscomi <rviscomi@users.noreply.github.com>

Update sql/2020/07_SEO/pages_wpt_bodies_structured_data_by_device_and…

2b7bb0d

…_type.sql Co-authored-by: Rick Viscomi <rviscomi@users.noreply.github.com>

Update sql/2020/07_SEO/pages_wpt_bodies_hreflang_by_device_and_http_h…

14d4f55

…eader_value.sql Co-authored-by: Rick Viscomi <rviscomi@users.noreply.github.com>

Update sql/2020/07_SEO/pages_wpt_bodies_hreflang_by_device_and_link_t…

5f49139

…ag_value.sql Co-authored-by: Rick Viscomi <rviscomi@users.noreply.github.com>

tidy up

b5b6843

clean up hreflang values

removing unused queries

6be6e96

dropped unused queries

7b4a6ab

added image v text link percentiles

rviscomi approved these changes Sep 18, 2020

View reviewed changes

fixed some ORDER BYs so data will not get chopped of by device

04f26d3

rviscomi merged commit f1d86c0 into main Sep 24, 2020

2020 automation moved this from In progress to Done Sep 24, 2020

rviscomi deleted the seo-sql-2020 branch September 24, 2020 04:08

client	percentile	pct_alt
desktop	10	0.00%
mobile	10	0.00%
desktop	25	10.53%
mobile	25	9.09%
desktop	50	50.00%
mobile	50	50.00%
desktop	75	89.74%
mobile	75	89.47%
desktop	90	100.00%
mobile	90	100.00%

Row	percentile	client	total	img_count	images_with_alt_present_percent	images_with_alt_blank_percent	images_with_alt_missing_percent	images_with_alt_present	images_with_alt_blank	images_with_alt_missing
1	10	desktop	5593642	2	0.0	0.0	0.0	0	0	0
2	10	mobile	6347640	2	0.0	0.0	0.0	0	0	0
3	25	desktop	5593642	8	0.1429	0.0	0.0	1	0	0
4	25	mobile	6347640	8	0.125	0.0	0.0	1	0	0
5	50	desktop	5593642	21	0.5385	0.1	0.0299	7	2	1
6	50	mobile	6347640	19	0.5122	0.1111	0.0244	6	2	1
7	75	desktop	5593642	43	0.9091	0.4906	0.3333	22	10	5
8	75	mobile	6347640	39	0.9063	0.5	0.3333	20	9	5
9	90	desktop	5593642	82	1.0	0.8261	0.8037	48	24	19
10	90	mobile	6347640	73	1.0	0.8333	0.8235	43	22	18

SEO 2020 queries #1062

SEO 2020 queries #1062

Conversation

Tiggerito commented Jul 19, 2020 • edited Loading

Checklist of the metrics needed

Crawlability & Indexability

Content

Links

Advanced

Tiggerito commented Jul 19, 2020

rviscomi commented Jul 19, 2020

Tiggerito commented Jul 19, 2020

rviscomi commented Jul 19, 2020 • edited Loading

foxdavidj commented Jul 19, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tunetheweb Jul 21, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rviscomi commented Sep 10, 2020 • edited Loading

rviscomi commented Sep 10, 2020

rviscomi left a comment

Choose a reason for hiding this comment

rviscomi Sep 10, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Tiggerito commented Sep 10, 2020

Tiggerito commented Sep 10, 2020

Tiggerito commented Sep 16, 2020

rviscomi left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Tiggerito commented Sep 16, 2020

antoineeripret commented Sep 17, 2020 • edited Loading

rviscomi commented Sep 24, 2020

Tiggerito commented Jul 19, 2020 •

edited

Loading

rviscomi commented Jul 19, 2020 •

edited

Loading

tunetheweb Jul 21, 2020 •

edited

Loading

rviscomi commented Sep 10, 2020 •

edited

Loading

rviscomi Sep 10, 2020 •

edited

Loading

antoineeripret commented Sep 17, 2020 •

edited

Loading