Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Analyst SQL files chapter SEO #103

Merged
merged 19 commits into from
Sep 15, 2019
Merged

Analyst SQL files chapter SEO #103

merged 19 commits into from
Sep 15, 2019

Conversation

ymschaap
Copy link
Contributor

@ymschaap ymschaap commented Jul 25, 2019

Re: #12
Re: #91

See my comment below for latest state of queries:

@ymschaap ymschaap changed the title Analyst SQL files chapter SEO - #12 Analyst SQL files chapter SEO Jul 25, 2019
sql/2019/10_SEO/10_05.sql Outdated Show resolved Hide resolved
sql/2019/10_SEO/10_09.sql Outdated Show resolved Hide resolved

#standardSQL

SELECT
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rviscomi This query is 5.2TB to test. I tried with the sample data (query below) but it seems pages_desktop_1k contains different urls then pages_mobile_1k.

What I try to do is look at the mobile request response URL and the desktop request response URL, and flag if they are different (e.g. a custom mobile site).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right, a website may only exist in one of the tables. The full mobile table actually has 1M more websites than desktop.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am stuck in this one. To see if mobile is different from mobile, I look at the response and the 'redirectURL' field. If they don't match up, mobile is served a different URL.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One possible approach may be to compare the page weights of desktop vs mobile sites having the same URL:

#standardSQL
SELECT
  APPROX_QUANTILES(desktop_bytes - mobile_bytes, 10) AS bytes_diff
FROM
  (SELECT url, bytesTotal as desktop_bytes FROM `httparchive.summary_pages.2019_07_01_desktop`)
JOIN
  (SELECT url, bytesTotal as mobile_bytes FROM `httparchive.summary_pages.2019_07_01_mobile`)
USING (url)

Ideally we'll see fewer bytes on mobile.

Another approach could be to detect media query usage:

#standardSQL
SELECT
  client,
  num_urls,
  pct_urls
FROM
  `httparchive.blink_features.usage`
WHERE
  yyyymmdd = '20190701' AND
  feature = 'CSSAtRuleMedia'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The metric is more about answering whether websites serve a 'custom mobile' website. Page weight or media queries don't answer that imho.

sql/2019/10_SEO/10_05.sql Outdated Show resolved Hide resolved
@rviscomi rviscomi self-assigned this Jul 25, 2019
@rviscomi rviscomi added the analysis Querying the dataset label Jul 25, 2019
@rviscomi rviscomi added this to TODO in Web Almanac 2019 via automation Jul 25, 2019
@rviscomi rviscomi added this to the Content written milestone Jul 25, 2019
@ymschaap
Copy link
Contributor Author

ymschaap commented Jul 31, 2019

Another round. More work than expected, they all have their own quirks.

sql/2019/10_SEO/10_01.sql Outdated Show resolved Hide resolved
sql/2019/10_SEO/10_02.sql Outdated Show resolved Hide resolved
sql/2019/10_SEO/10_03.sql Outdated Show resolved Hide resolved
sql/2019/10_SEO/10_04.sql Outdated Show resolved Hide resolved
sql/2019/10_SEO/10_05.sql Outdated Show resolved Hide resolved
sql/2019/10_SEO/10_10.sql Outdated Show resolved Hide resolved
sql/2019/10_SEO/10_11.sql Outdated Show resolved Hide resolved

#standardSQL

SELECT
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One possible approach may be to compare the page weights of desktop vs mobile sites having the same URL:

#standardSQL
SELECT
  APPROX_QUANTILES(desktop_bytes - mobile_bytes, 10) AS bytes_diff
FROM
  (SELECT url, bytesTotal as desktop_bytes FROM `httparchive.summary_pages.2019_07_01_desktop`)
JOIN
  (SELECT url, bytesTotal as mobile_bytes FROM `httparchive.summary_pages.2019_07_01_mobile`)
USING (url)

Ideally we'll see fewer bytes on mobile.

Another approach could be to detect media query usage:

#standardSQL
SELECT
  client,
  num_urls,
  pct_urls
FROM
  `httparchive.blink_features.usage`
WHERE
  yyyymmdd = '20190701' AND
  feature = 'CSSAtRuleMedia'

sql/2019/10_SEO/10_15.sql Outdated Show resolved Hide resolved
sql/2019/10_SEO/10_15.sql Outdated Show resolved Hide resolved
@rviscomi
Copy link
Member

rviscomi commented Aug 9, 2019

Is this ready for another review?

@ymschaap
Copy link
Contributor Author

ymschaap commented Aug 9, 2019

Yes.

  • 10.13 I still want to try looking at difference between responseUrl for desktop or mobile devices, but can't get it to function. Your proposals (by page weight or media query) wouldn't answer much for SEO purposes. Whether the domain returns a custom mobile endpoint, would (e.g. m.wikipedia.org vs www.wikipedia.org).
  • 10.15 I'm too unfamiliar with CruX metrics. E.g. the 10% hurdle. But I tried.

Copy link
Contributor

@patrickhulce patrickhulce left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

apologies if these points were already discussed and not necessary :)

sql/2019/10_SEO/10_01.sql Outdated Show resolved Hide resolved

SELECT
COUNTIF(hasAmpLink(payload)) AS score_sum,
COUNTIF(hasAmpLink(payload)) / COUNT(0) AS score_percentage
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we want all the percentage queries as ROUND(numerator * 100 / denominator, 2)

sql/2019/10_SEO/10_04b.sql Show resolved Hide resolved
COUNTIF(parseStructuredData(payload)) AS occurence,
ROUND(COUNTIF(parseStructuredData(payload)) * 100 / SUM(COUNT(0)) OVER (), 2) AS occurence_perc
FROM
`httparchive.pages.2019_07_01_*`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rick recommended grouping by _TABLE_SUFFIX AS client in all of my queries, I'm not sure if that's something you might be considering as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@ymschaap can you resolve this feedback ASAP?

sql/2019/10_SEO/10_15.sql Outdated Show resolved Hide resolved
@rviscomi rviscomi moved this from TODO to In Progress in Web Almanac 2019 Aug 27, 2019
@rviscomi
Copy link
Member

rviscomi commented Sep 3, 2019

@ymschaap have you resolved @patrickhulce's feedback? Let me know if this is ready for another review.

@rviscomi rviscomi added the ASAP This issue is blocking progress label Sep 4, 2019
COUNTIF(parseStructuredData(payload)) AS occurence,
ROUND(COUNTIF(parseStructuredData(payload)) * 100 / SUM(COUNT(0)) OVER (), 2) AS occurence_perc
FROM
`httparchive.pages.2019_07_01_*`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@ymschaap can you resolve this feedback ASAP?

@rviscomi rviscomi merged commit f0baaff into master Sep 15, 2019
Web Almanac 2019 automation moved this from In Progress to Done Sep 15, 2019
@rviscomi rviscomi deleted the analyst-ymschaap branch September 15, 2019 08:16
@rviscomi rviscomi removed the ASAP This issue is blocking progress label Sep 25, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
analysis Querying the dataset
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

None yet

3 participants