Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revamping Censored Planet Measurements - Part 1 - Satellite #12

Open
ramakrishnansr opened this issue Apr 5, 2021 · 0 comments
Open
Assignees
Labels
blogpost Blogposts on Censored Planet

Comments

@ramakrishnansr
Copy link
Member

Over the past few months, we at Censored Planet have been busy working on improving and revamping our remote measurements, with an emphasis on performing faster, more accurate measurements that can be of value to the community. In a series of blog posts, we plan to provide an overview of the changes we’ve made so far to our measurements, so that the community is able to understand and use our data accurately, and provide feedback on how we can improve our data.

Today’s post is about Satellite/Iris, Censored Planet’s remote measurement technique that detects DNS interference using Open DNS resolvers. Below, we provide an overview of Satellite, and the major changes we’ve made to the technique and data format recently. Refer to our academic papers for in-depth details about Satellite.

Satellite-v1

image
Figure 1: Overview of Satellite-v1

Satellite-v1 is the first version of Satellite that we operated from August 2018 - February 2021. The primary function of Satellite is to detect incorrect DNS resolutions from open DNS resolvers in many countries. The overall workflow of Satellite-v1 is shown in Figure 1.

  1. From a measurement machine at the University of Michigan, we send a DNS query for a website whose reachability we’re interested in, to an open DNS resolver in a country of interest (1). The response from the DNS resolver is our Test IP (2).
  2. We also send a DNS query for the same website to trusted control resolvers (3), and record their response as the control IP (4).
  3. We then compare the test and control responses using several heuristics, including a direct IP address comparison, and comparison of the AS number, AS names, HTTP content hashes, and TLS certificates associated with the test and control IP addresses (5). Satellite-v1 only labels a measurement as an anomaly when all of the heuristics mismatch.

Our various publications and reports have used Satellite-v1 to detect many cases of DNS manipulation. For instance, in our recent investigation into the filtering of COVID-19 websites, Satellite-v1 found many networks using website filtering products to manipulate DNS responses of COVID-related websites.

Data Format

We have recently released documentation describing the data format of Satellite-v1 data, and how it can be interpreted. We hope that this helps Censored Planet data users understand the data collected and published with Satellite-v1.

Sample Analysis scripts

We have also released some example analysis scripts for processing Satellite-v1 data, adding metadata, and removing false positives. Please refer to our documentation and the sample analysis script when using Satellite data to make observations. A number of these steps are included by default in Satellite-v2, our new version of Satellite.

Limitations

Although Satellite-v1 was extremely useful in detecting DNS interference at large scale, it suffered from several limitations, which form the improvements in Satellite-v2.

  • Satellite-v1 could not detect DNS censorship where A records were not available i.e. Satellite-v1 primarily focused on detecting incorrect DNS resolutions through the resolved IP address, and did not contain heuristics to measure DNS manipulation which manifested through timeouts, NXDOMAIN responses, SERVFAIL responses, etc.

  • Satellite-v1 required post-processing to remove false positives and confirm the presence of anomalies, such as through using post-measurement heuristics and blockpage regexes. As mentioned above, we have published some analysis tools for this purpose. Satellite-v2 has the inbuilt capability to perform most post-processing measurements.

  • Satellite-v1’s data format provided results in several files, which were hard to parse. With Satellite-v2, our aim is also to present an easier, more intuitive data format.

Satellite-v2

image
Figure 2: Overview of Satellite-v2

Satellite-v2 is our brand new version of Satellite, where we’ve made several modifications to the measurement technique and data format for facilitating accurate and efficient remote DNS interference measurements. Below, we detail the major changes we’ve made in Satellite-v2:

Measuring DNS interference without A records: In Satellite-v2, we have added a sandwiched retry mechanism to our Satellite measurements in order to detect DNS interference that results in a non-zero R code response. A description of the method is shown in Figure 2. We first make a control query to the open DNS resolver, providing a domain name that we do not expect to be blocked (eg. www.example.com). After the control query, we make up to 4 retries of the test DNS query, providing the test domain name. In case an A record is detected, we stop the test measurement. At the end, we perform another control query similar to the first measurement. The control queries ensure that the resolver is behaving correctly for an innocuous domain, and the multiple retry mechanism accounts for temporary errors in the network. With the help of the sandwiched retry mechanism, Satellite-v2 is able to detect DNS interference that manifests as timeouts, NXDOMAIN, SERVFAIL etc. From our preliminary analysis of Satellite-v2 data, we’ve already found several cases of DNS interference that can be identified using this method. For example, from the Satellite-v2 scan performed on 2021-03-17, we are able to identify 174,795 responses that have non-zero R codes from China, which makes up 15.6% out of the responses marked as interference. This kind of DNS interference was previously omitted by satellite v1. Shown below is an example measurement that passed the sandwich control tests, but received server failure R code. This could be an indicator of censorship or geoblocking.

vp location test_url response indication field
119.3.227.27 country_name: ChinaCountry_code: CN transsexual.org rcode:['2', '2', '2', '2'] passed_control: true connect_error: true in_control_group: true anomaly: true

An example of an NXDOMAIN response is shown below:

vp location test_url response indication field
122.112.240.5 country_name: ChinaCountry_code: CN www.eucom.mil rcode:['3', '3', '3', '3'] passed_control: true connect_error: true in_control_group: true anomaly: true

Fetching HTML pages hosted at resolved IPs marked as an anomaly: Satellite-v2 has an in-built fetch feature that performs HTTP and HTTPS GET requests to resolved IPs that fail our heuristics, and we store the HTML responses in blockpages.json. Satellite-v2 data files available on our website contain this file for easier confirmation of DNS censorship, while this step was being performed as a post-processing step in Satellite-v1. This addition helps in quickly identifying blockpages such as the example shown in Figure 3.

image
Figure 3: Example of blockpage collected by Satellite-v2.

Adding scan-level heuristics to exclude false positives: Another step part of the post-processing pipeline of Satellite-v1 that is inbuilt in Satellite-v2. We exclude potentially false positive anomalies by using scan-level heuristics, such as the number of domains resolving to the anomalous IP address, or the anomalous IP address being part of a big CDN. Note that this step may lead to Satellite-v2 missing certain censorship. This output can be found in results_verified.json.

Other changes: We updated the heuristics to determine whether a DNS response is interfered. Satellite-v2 now includes a new “confidence” field, which addresses the certainty of interference according to the state of comparison between responses from the test resolvers and the control resolvers. We also make sure that IPs with no metadata information from Censys are not marked as interference. For detailed information, please visit the documentation.

We have also reorganized our output files so that they are easier to read. The primary output files containing DNS interference data are results.json and results_verified.json. Satellite-v2 integrates more information in the results.json file, like the country name and country code of the target resolver, and start time and end time of each measurement. We hope this modification makes processing of the satellite data easier for our users.

@ramakrishnansr ramakrishnansr added the blogpost Blogposts on Censored Planet label Apr 5, 2021
@ramakrishnansr ramakrishnansr self-assigned this Apr 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blogpost Blogposts on Censored Planet
Projects
None yet
Development

No branches or pull requests

1 participant