Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ecommerce 2020 queries #1135

Merged
merged 10 commits into from
Oct 15, 2020
Merged

Conversation

jrharalson
Copy link
Contributor

@jrharalson jrharalson commented Jul 31, 2020

Progress on #914

Initial check-in PR/placeholder

@jrharalson jrharalson changed the title Update README.md Ecommerce Queries 2020 Jul 31, 2020
@jrharalson jrharalson changed the title Ecommerce Queries 2020 Ecommerce 2020 queries Jul 31, 2020
@rviscomi rviscomi marked this pull request as draft August 1, 2020 18:08
@rviscomi rviscomi added the analysis Querying the dataset label Aug 1, 2020
@rviscomi rviscomi added this to the 2020 Analysis milestone Aug 1, 2020
@rviscomi
Copy link
Member

rviscomi commented Sep 3, 2020

Hey @jrharalson. I know this isn't marked for review yet, but I wanted to give you a quick heads up about something. There's a new SQL naming convention this year where we'll be naming the queries more descriptively based on what they're analyzing as opposed to chapter/metric numbers. So for example 13_01.sql could be top_vendors.sql. This makes them more discoverable later.

@rviscomi
Copy link
Member

Be sure to update this from "Draft" to "Ready for review" so we can get more eyes on it

@rviscomi rviscomi marked this pull request as ready for review September 19, 2020 23:44
@rviscomi rviscomi requested a review from a team September 19, 2020 23:44
SELECT
client,
ecomm,
AVG(ROUND(SAFE_DIVIDE(fast_lcp, fast_lcp + avg_lcp + slow_lcp) * 100, 2)) AS fast,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than averages, I'd propose using one of these alternatives:

  • use 10,25,50,75,90 percentiles to summarize the distribution of pages' fast/avg/slow LCP
  • calculate the percent of pages with good LCP (fast / fast+avg+slow >= 75%) and poor LCP (slow / fast+avg+slow >= 25%), with "needs improvement" being everything else

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The second suggestion can be easily adapted from the CMS chapter's query by replacing WHERE category = 'CMS' with 'Ecommerce'. This gives you LCP, FID, CLS, and the overall CWV assessment.

You can also adapt this query of good/ni/poor speeds for each metric.

AND
IF(device = 'desktop', 'desktop', 'mobile') = client
WHERE
date = '2020-07-01' AND
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
date = '2020-07-01' AND
date = '2020-08-01' AND

ROUND(APPROX_QUANTILES(bytesHtml, 1000)[OFFSET(750)] / 1024, 2) AS p75,
ROUND(APPROX_QUANTILES(bytesHtml, 1000)[OFFSET(900)] / 1024, 2) AS p90
FROM
`httparchive.summary_pages.2020_07_01_*`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Throughout)

Suggested change
`httparchive.summary_pages.2020_07_01_*`
`httparchive.summary_pages.2020_08_01_*`

USING
(client, page)
WHERE
NET.HOST(url) IN (SELECT domain FROM `httparchive.almanac.third_parties`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Throughout)

Suggested change
NET.HOST(url) IN (SELECT domain FROM `httparchive.almanac.third_parties`)
NET.HOST(url) IN (SELECT domain FROM `httparchive.almanac.third_parties` WHERE date = '2020-08-01')

_TABLE_SUFFIX AS client,
vendor,
app,
COUNTIF(category = 'AMP') AS AMPfromFreq,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is AMP a category or app?

COUNT(0) AS total,
ROUND(COUNTIF(category = 'CDN') * 100 / COUNT(0), 2) AS pct
FROM
`httparchive.sample_data.technologies_*`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And group by _TABLE_SUFFIX

Suggested change
`httparchive.sample_data.technologies_*`
`httparchive.technologies.2020_08_01_*`

Copy link
Contributor Author

@jrharalson jrharalson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback. Made a few updates and push a few updates that were stuck pending a check-in. Need to rework the LCP sql later.

@rviscomi
Copy link
Member

@jrharalson sounds good let me know when this is ready for another review

@rviscomi
Copy link
Member

rviscomi commented Oct 6, 2020

@jrharalson how's this coming along?

@jrharalson
Copy link
Contributor Author

Updated to the latest Crux style

Copy link
Member

@rviscomi rviscomi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one small correction, then this should be good to go

sql/2020/16_Ecommerce/pagestats_image_bydevice_format.sql Outdated Show resolved Hide resolved
@rviscomi rviscomi merged commit 9298d9d into HTTPArchive:main Oct 15, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
analysis Querying the dataset
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants