Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revamping Censored Planet Measurements - Part 2 - Hyperquack #16

Open
ramakrishnansr opened this issue Apr 28, 2021 · 0 comments
Open
Assignees
Labels
blogpost Blogposts on Censored Planet

Comments

@ramakrishnansr
Copy link
Member

Over the past few months, we at Censored Planet have been busy working on improving and revamping our remote measurements, with an emphasis on performing faster, more accurate measurements that can be of value to the community. In a series of blog posts, we plan to provide an overview of the changes we’ve made so far to our measurements, so that the community is able to understand and use our data accurately, and provide feedback on how we can improve our data.

Today’s post is about Quack/Hyperquack, Censored Planet’s measurement techniques that measures application-layer interference using the Echo, Discard, HTTP, and HTTPS protocols. Below we provide an overview of Quack-v1 and Hyperquack-v1, and the major changes we’ve made to the technique and data format recently. Refer to our academic papers for more information about Quack and Hyperquack.

Quack-v1 and Hyperquack-v1

image
Figure 1: Overview of Quack-v1

Quack-v1 and Hyperquack-v1 were operated from August 2018 to April 2021. Quack-v1 detects application-layer interference using the Echo and Discard protocols. Quack-v1’s workflow is pictured in Figure 1.

  • From a remote measurement machine, we send an HTTP get look-alike request containing a non-sensitive control URL to a vantage point’s Echo or Discard port. Vantage points are selected from infrastructural servers such as ISP routers to minimize risk to their owners. We observe the result, and if the port is responding incorrectly according to its protocol, we abort the test, mark the vantage point as broken, and remove the vantage point from our test list.
  • If the control test succeeds, we then send an HTTP get look-alike request containing a potentially sensitive URL to the vantage point. If the vantage point responds correctly, we record that there is not an anomaly. If the vantage point responds incorrectly, we repeat the request up to four more times. If any such request results in a correct response, we again record that there is not an anomaly.
  • If all five requests result in incorrect responses, we then send another request containing a control keyword. If this request results in a correct response, we record the possibility of interference.
  • If this control request results in an incorrect response, we wait some time then resend the request, to account for stateful interference. If the second request fails, we mark the vantage point as broken and remove the vantage point from our test list. If the request results in a correct response, we mark both potential interference and stateful interference.

Hyperquack-v1 is built up from the Quack-v1 protocol to include support for the HTTP and HTTPS protocols. Before performing any tests, we send multiple HTTP get requests containing non-sensitive control URLs to each of the vantage points we are testing. If the responses to all of the requests are consistent, the responses are stripped of dynamic content such as cookies and turned into a template for the vantage point. Then when performing the tests with the sensitive keywords, we compare the vantage point’s response to its template.

Our various publications and reports have used Quack-v1 and Hyperquack-v1 to detect many cases of application-layer interference. For instance, in our recent investigation into the filtering of COVID-19 websites, Quack-v1 was used alongside our other technique Satellite to detect censorship in unexpected places like Canada.

Data Format

We have recently released documentation describing the data formats for both Quack-v1 and Hyperquack-v1 as well as how the data can be interpreted. We hope this helps censored planet data users understand the data collected with Quack-v1 and Hyperquack-v1.

Sample Analysis Scripts

We have also released some example analysis scripts for Quack-v1 and Hyperquack-v1 data that help solve some of the limitations of these techniques. Please refer to our documentation and the sample analysis scripts when using Quack and Hyperquack data to make observations.

Limitations

Although Quack-v1 and Hyperquack-v1 were very useful for measuring application-layer interference, the two systems suffered from limitations.

  • Many Quack-v1 and Hyperquack-v1 scans were slowed down by slow vantage points, to the point where hours of scan time were spent on these slow vantage points.
  • The Censored Planet Observatory scans a selection of sensitive test keywords on a worldwide selection of vantage points on a regular basis, but in certain cases, it is necessary to perform a more directed scan. For instance, in the above example of COVID-19 website filtering, we performed scans using a test URL list of all COVID-19 websites. To perform this smaller, more directed scan, a team member would have to manually install Quack and Hyperquack on a machine and run the scans.
  • Quack-v1 and Hyperquack-v1 are inflexible in their operation. All vantage points and input test URLs must be specified at scan start, and all input URLs must be tested with all vantage points. More granular control of the scans requires multiple scans being set up and run.
  • While the general structure of sending probes and comparing responses to a baseline is very versatile, the architecture of Quack-v1 and Hyperquack-v1 makes adding the option to scan using protocols other than Echo, Discard, HTTP, or HTTPs very difficult.

Hyperquack-v2

Hyperquack-v2 is our new version of both the Quack and Hyperquack measurement techniques. We’ve restructured the system to work as a request-based measurement server rather than a single-use measurement program. A user will run the program on a machine that will act as a server, and then users can interact with the program using a JSON API. The implications of this restructure are as follows:

  • Flexibility in Scheduling: Unlike in Quack-v1 and Hyperquack-v1, when a scan is performed using Hyperquack-v2, a list of vantage points are added to Hyperquack-v2, then test keywords are added as work for the server to complete. When adding work, the user can specify which vantage points that work applies to, such as specifying all the vantage points in a given country, all the vantage points in a given subnet, or simply a list of specific vantage points. This allows users to more easily schedule targeted scans. To make differentiating between these concurrent scans easier, we also added a tagging system that allows for the output of Hyperquack-v1 to be redirected to custom files.
  • On-the-fly Changes to Scans: As a scan is running, the user can call endpoints to add work, add more vantage points, or remove vantage points. This further increases the flexibility of Hyperquack-v2, as scans can be updated in the middle of running as opposed to being re-run with updated parameters in Quack-v1 and Hyperquack-v1.
  • Stronger Vantage Point Evaluation: In Quack-v1 and Hyperquack-v1, if a vantage point responded incorrectly to control probes, it would be completely removed from the scan. Since Hyperquack-v2 is continuously running, we have made it so a vantage point that fails one of the intermittent ‘health checks’ that Hyperquack-v2 performs has the potential to come back after a user-defined period of time. This will allow for greater coverage in cases where a vantage point experiences momentary failure.
  • Ability for More Complex Scheduling: This paradigm allows for far more complex scheduling of work than the previous system. In future, our goal is to produce a system where users that want a scan performed can submit the scan parameters to a scheduler server, which will then send that work to any number of worker servers, each running an instance of Hyperquack-v2. This paradigm will allow for multiple workloads to be scheduled simultaneously alongside any rapid response scans that crop up.

Below is a list of the other major changes we've made to Hyperquack-v2:

  • Combining Quack and Hyperquack: Hyperquack-v2 combines the Quack and Hyperquack measurement methods by creating a standard interface for how internet protocols can be used for internet censorship measurement. With this interface, new protocols can be easily added to Hyperquack-v2.
  • Changes to Output Format: In addition to the output from censorship trial, Hyperquack-v2 outputs the results of the previously mentioned ‘health checks’ from vantage points. This output is very similar to the trial output, with the change that if the ‘health check’ is passed, a template will be included. All responses from the vantage point will be compared to the template to detect interference. At the moment, the templates for the Echo and Discard protocols are pre-defined by the protocol, so only the HTTP and HTTPS protocols will have these dynamically-computed templates included.
@ramakrishnansr ramakrishnansr added the blogpost Blogposts on Censored Planet label Apr 28, 2021
@ramakrishnansr ramakrishnansr self-assigned this Apr 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blogpost Blogposts on Censored Planet
Projects
None yet
Development

No branches or pull requests

1 participant