Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge the satellite base data into the merged table #236

Merged
merged 4 commits into from
Sep 8, 2023
Merged

Conversation

ohnorobo
Copy link
Collaborator

@ohnorobo ohnorobo commented May 19, 2023

dashboard using this table: https://lookerstudio.google.com/c/u/0/reporting/d5506fb3-c9f7-42c6-894a-55e4b0528c8e/page/p_011unrxq4c

Currently the number if satellite rows is much larger than any other data source
image
probably because we're including the hostname, which splits potentially merged rows out into many.

resolver_name AS server_name,
domain_is_control,
domain,
NULL as outcome,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calculate the outcome here, and remove received_rcode and answers

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

answers
FROM `PROJECT_NAME.BASE_DATASET.satellite_scan` AS a
# Only include the last measurement in any set of retries
JOIN `PROJECT_NAME.DERIVED_DATASET.satellite_last_measurement_ids` AS b
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move the filtering and the Outcome calculation to a previous WITH.. AS statement, so all the tables are pref-filtered and with outcome in this UNION ALL query. Easier to reason about.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

), Grouped AS (
SELECT
date,
source,
server_country AS country_code,
# As per https://docs.censoredplanet.org/dns.html#id2, some resolvers are named `special` instead of the real hostname.
IF(server_name="special","special",NET.REG_DOMAIN(server_name)) as reg_hostname,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this to the Satelite-specific query. Let's have only code that applies to all sources after the union.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@ohnorobo ohnorobo changed the title Merge the satellite base data into the marged table Merge the satellite base data into the merged table Jun 2, 2023
@ohnorobo ohnorobo force-pushed the merge-queries branch 2 times, most recently from 99a6cf0 to af4f81e Compare June 2, 2023 09:55
@ohnorobo ohnorobo marked this pull request as ready for review June 2, 2023 15:59
@ohnorobo ohnorobo requested a review from fortuna June 6, 2023 13:20
@@ -42,8 +42,7 @@
# The test table is written into the <project>:test dataset
BEAM_TEST_BASE_DATASET = 'test'
BEAM_TEST_BASE_TABLE_SUFFIX = '_scan'
DERIVED_TABLE_NAME_HYPERQUACK = 'merged_reduced_scans_v2'
DERIVED_TABLE_NAME_SATELLITE = 'reduced_satellite_scans_v1'
DERIVED_TABLE_NAME = 'merged_reduced_scans_v3'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not blocking, but ideally we would call it DASHBOARD_TABLE_NAME and change the dataset name to dashboard.

WHEN (SELECT LOGICAL_OR(answer.http_analysis_is_known_blockpage)
FROM UNNEST(answers) answer)
THEN CONCAT("❗️page:http_blockpage:", answers[OFFSET(0)].http_analysis_page_signature)
WHEN (SELECT LOGICAL_OR(answer.https_analysis_is_known_blockpage)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can't possibly trigger, since we will trigger the cert cases first.
Remove?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

WHEN (SELECT LOGICAL_OR(answer.matches_control.asn)
FROM UNNEST(answers) answer)
THEN "✅answer:matches_asn"
ELSE CONCAT("❓answer:not_validated:", AnswersSignature(answers))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: rename to "fetch failed" or something clearer. This indicates we were not able to establish a TLS connection which is clearly unexpected.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was worked on seperatly in #247

@ohnorobo ohnorobo force-pushed the merge-queries branch 2 times, most recently from 2e78ebf to 3c781ed Compare June 23, 2023 15:44
@ohnorobo ohnorobo force-pushed the merge-queries branch 2 times, most recently from ea13b45 to 15e914e Compare September 6, 2023 10:26
@ohnorobo ohnorobo force-pushed the merge-queries branch 3 times, most recently from 4529b4b to 68e416b Compare September 6, 2023 11:54
@ohnorobo ohnorobo merged commit 4fc780c into master Sep 8, 2023
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants