diff --git a/docs/base_tables.md b/docs/base_tables.md index 5df31f9..24123e2 100644 --- a/docs/base_tables.md +++ b/docs/base_tables.md @@ -11,12 +11,13 @@ There is one table for each scan type. - `firehook-censoredplanet:base.discard_scan` - `firehook-censoredplanet:base.http_scan` - `firehook-censoredplanet:base.https_scan` +- `firehook-censoredplanet:base.satellite_scan` ## Partitioning and Clustering The tables are time-partitioned along the `date` field. -The tables are clustered along the `country` and then `asn` fields. +The tables are clustered along the `[server|resolver]_country` and then `[server|resolver]_asn` fields. ## Table Format @@ -82,6 +83,10 @@ The json data is processed into a flat table format which looks like this. We intend to add more columns in the future. + + + + ## Original Data Format The Censored Planet data is stored in .json files with one measurement per line. @@ -139,4 +144,45 @@ Data from before 2021-04-25 is parsed from the [Hyperquack V1 format](https://gi "stateful_block": false, "tag": "2021-05-30T01:01:01" } -``` \ No newline at end of file +``` + +### DNS Data + +The DNS (Satellite) data included the following alternative set of columns. (Many are identical to Hyperquack) + +domain STRING NULLABLE +domain_category STRING NULLABLE +domain_is_control BOOLEAN NULLABLE +date DATE NULLABLE +start_time TIMESTAMP NULLABLE +end_time TIMESTAMP NULLABLE +retry INTEGER NULLABLE +resolver_ip STRING NULLABLE +resolver_name STRING NULLABLE +resolver_is_trusted BOOLEAN NULLABLE +resolver_netblock STRING NULLABLE +resolver_asn INTEGER NULLABLE +resolver_as_name STRING NULLABLE +resolver_as_full_name STRING NULLABLE +resolver_as_class STRING NULLABLE +resolver_country STRING NULLABLE +resolver_organization STRING NULLABLE +resolver_non_zero_rcode_rate FLOAT NULLABLE +resolver_private_ip_rate FLOAT NULLABLE +resolver_zero_ip_rate FLOAT NULLABLE +resolver_connect_error_rate FLOAT NULLABLE +resolver_invalid_cert_rate FLOAT NULLABLE +received_error STRING NULLABLE +received_rcode INTEGER NULLABLE +answers RECORD REPEATED +success BOOLEAN NULLABLE +anomaly BOOLEAN NULLABLE +domain_controls_failed BOOLEAN NULLABLE +average_confidence FLOAT NULLABLE +untagged_controls BOOLEAN NULLABLE +untagged_response BOOLEAN NULLABLE +excluded BOOLEAN NULLABLE +exclude_reason STRING NULLABLE +has_type_a BOOLEAN NULLABLE +measurement_id STRING NULLABLE +source STRING NULLABLE \ No newline at end of file diff --git a/docs/merged_reduced_scans_table.md b/docs/merged_reduced_scans_table.md index 9aa456e..df626fc 100644 --- a/docs/merged_reduced_scans_table.md +++ b/docs/merged_reduced_scans_table.md @@ -38,4 +38,5 @@ Reduced Scans | outcome | STRING | What was the [outcome](outcome.md) of the individual measurement eg `read/timeout` | | count | INTEGER | How many measurements fit the exact pattern of this row? | | unexpected_count | INTEGER | Count of measurements with an unexpected outcome | - +| hostname | STRING | The domain name of the DNS resolver. (Only used in DNS) eg. `ns1.uts.ae` | +| reg_hostname | STRING | The domain name of the DNS resolver without subdomains. (Only used in DNS) eg. `uts.ae` | diff --git a/docs/outcome.md b/docs/outcome.md index 5a40930..746596c 100644 --- a/docs/outcome.md +++ b/docs/outcome.md @@ -84,5 +84,32 @@ Mismatch Errors are used when the connection is successful, but the content rece ## DNS Outcomes -The Satellite data uses its own unique set of outcomes, and does not use stages. +The Satellite data uses its own unique set of outcomes, and does not use stages. The outcomes are based on +| Outcome | Explanation | +| ----------------------- | ----------- | +| ✅ip.matchip | | +| ✅ip.matchasn | | +| ip.invalid | | +| ip.empty | | +| ✅tls.validcert | | +| tls.connerror | | +| tls.baddomain | | +| tls.badca | | +| blockpage | | +| dns.connrefused | | +| dns.error | | +| dns.hostunreach | | +| dns.msgsize | | +| dns.timedout | | +| | | +| | | +| | | +| | | +| | | +| | | +| | | +| | | +| | | +| | | +1