captcha-bypass-internet-draft.txt



Network Working Group                                        A. Davidson
Internet-Draft                      Royal Holloway, University of London
Intended status: Informational                               N. Sullivan
Expires: March 17, 2017                                       Cloudflare
                                                           G. Tankersley
                                                              Cloudflare
                                                             F. Valsorda
                                                              Cloudflare
                                                      September 13, 2016


  Protocol for bypassing challenge pages using RSA blind signed tokens
                   draft-protocol-challenge-bypass-00

Abstract

   This document proposes a protocol for bypassing challenge pages (such
   as forms requiring CAPTCHA submissions) that are served by edge
   services in order to protect origin websites.  A client is required
   to complete an initial challenge and is then granted signed tokens
   which can be redeemed in the future to bypass challenges and thus
   meaning that honest users undergo less manual computation.  The
   signed tokens are cryptographically unlinkable to prevent future
   requests being linked to the original signed set of tokens.

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at http://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on March 17, 2017.

Copyright Notice

   Copyright (c) 2016 IETF Trust and the persons identified as the
   document authors.  All rights reserved.


Davidson, et al.         Expires March 17, 2017                 [Page 1]

Internet-Draft   Protocol for bypassing challenge pages   September 2016


   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.

Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   3
     1.1.  Terminology . . . . . . . . . . . . . . . . . . . . . . .   4
   2.  Protocol Overview . . . . . . . . . . . . . . . . . . . . . .   5
     2.1.  Acquiring Signed Tokens . . . . . . . . . . . . . . . . .   5
     2.2.  Redeeming Tokens  . . . . . . . . . . . . . . . . . . . .   7
   3.  Preliminaries . . . . . . . . . . . . . . . . . . . . . . . .   8
     3.1.  Protocol Communication  . . . . . . . . . . . . . . . . .   8
     3.2.  Design Formation  . . . . . . . . . . . . . . . . . . . .   8
       3.2.1.  Variables and Functions . . . . . . . . . . . . . . .   9
       3.2.2.  Structs . . . . . . . . . . . . . . . . . . . . . . .   9
       3.2.3.  JSON Objects  . . . . . . . . . . . . . . . . . . . .  10
     3.3.  Data Formatting . . . . . . . . . . . . . . . . . . . . .  10
       3.3.1.  Tokens  . . . . . . . . . . . . . . . . . . . . . . .  10
       3.3.2.  Client-Edge Message Format  . . . . . . . . . . . . .  11
       3.3.3.  Edge-Client Message Format  . . . . . . . . . . . . .  12
       3.3.4.  Signature Transport Format  . . . . . . . . . . . . .  12
       3.3.5.  Certificate Transport Format  . . . . . . . . . . . .  12
   4.  Cryptographic Tools . . . . . . . . . . . . . . . . . . . . .  12
     4.1.  Keys  . . . . . . . . . . . . . . . . . . . . . . . . . .  12
     4.2.  Signing/Verifying Algorithms  . . . . . . . . . . . . . .  13
     4.3.  Blinding/Unblinding Algorithms  . . . . . . . . . . . . .  13
     4.4.  Encryption/Decryption Algorithms  . . . . . . . . . . . .  13
     4.5.  MAC Algorithm . . . . . . . . . . . . . . . . . . . . . .  13
     4.6.  Instantiation of Cryptographic Tools  . . . . . . . . . .  14
     4.7.  Randomness Sampling . . . . . . . . . . . . . . . . . . .  14
   5.  Browser plugin  . . . . . . . . . . . . . . . . . . . . . . .  14
     5.1.  Pinned public keys  . . . . . . . . . . . . . . . . . . .  17
   6.  Token Acquisition Protocol  . . . . . . . . . . . . . . . . .  17
     6.1.  [OriginRequest] . . . . . . . . . . . . . . . . . . . . .  17
     6.2.  ChallengePage . . . . . . . . . . . . . . . . . . . . . .  18
     6.3.  VerifyCertificate . . . . . . . . . . . . . . . . . . . .  18
     6.4.  GenerateTokens + BlindTokens  . . . . . . . . . . . . . .  18
     6.5.  SolveChallenge  . . . . . . . . . . . . . . . . . . . . .  18
     6.6.  [SignRequest] . . . . . . . . . . . . . . . . . . . . . .  19
     6.7.  SignTokens  . . . . . . . . . . . . . . . . . . . . . . .  19
     6.8.  [Response]  . . . . . . . . . . . . . . . . . . . . . . .  19


Davidson, et al.         Expires March 17, 2017                 [Page 2]

Internet-Draft   Protocol for bypassing challenge pages   September 2016


     6.9.  VerifyingSignatures . . . . . . . . . . . . . . . . . . .  19
     6.10. UnblindTokens . . . . . . . . . . . . . . . . . . . . . .  20
     6.11. StoreTokens . . . . . . . . . . . . . . . . . . . . . . .  20
   7.  Challenge Bypass Protocol . . . . . . . . . . . . . . . . . .  20
     7.1.  ConstructTokenMessage . . . . . . . . . . . . . . . . . .  20
     7.2.  ProofOfWork . . . . . . . . . . . . . . . . . . . . . . .  21
     7.3.  SendToken . . . . . . . . . . . . . . . . . . . . . . . .  21
     7.4.  VerifyPoW . . . . . . . . . . . . . . . . . . . . . . . .  21
     7.5.  VerifyTokenMessage  . . . . . . . . . . . . . . . . . . .  22
     7.6.  GetOrigin + [Response]  . . . . . . . . . . . . . . . . .  22
   8.  References  . . . . . . . . . . . . . . . . . . . . . . . . .  22
     8.1.  Normative References  . . . . . . . . . . . . . . . . . .  22
     8.2.  Informative References  . . . . . . . . . . . . . . . . .  23
   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  23

1.  Introduction

   Various challenge pages are used to distinguish human access to a
   website from automated access, with the intention of preventing
   malicious behaviour that could compromise the website that is being
   hosted.  CAPTCHAs ("Completely Automated Public Turing test to tell
   Computers and Humans Apart") [ABHL03] are one of the most widely used
   methods for distinguishing human access to a resource from automated
   access.  CAPTCHAs are regularly deployed as "interstitial" pages
   forcing a user to answer the CAPTCHA before access is given to a
   website that was requested by the user.  This is used to prevent
   malicious access by automated processes that can adversely affect the
   performance of the website itself.  While these 'challenges' succeed
   in their mission, they create noticeably more work for honest users
   who have to complete them.

   These challenge pages are commonly served by CDNs who offer security
   services to customers.  Companies like Cloudflare offer customers the
   ability to serve CAPTCHA pages (often using Google's ReCAPTCHA
   service) to any IP addresses requesting a protected resource where
   the IP is deemed to have a "bad reputation".  IP reputation scoring
   comes from varied sources and is based on whether any malicious
   activity (such as spamming and/or abuse) is detected as originating
   from the IP in question.

   Services such as Tor suffer dramatically under such reputation-based
   systems.  Users are assigned to one of a small number of exit nodes
   when accessing webpages through Tor and appear to be browsing with
   the IP of that node.  The IP addresses of these nodes are frequently
   associated with malicious and abusive behaviour and are thus assigned
   poor reputation scores.  This problem is not specific to Tor; VPNs,
   I2P, and internet users behind large-scale NAT installations are
   affected similarly.


Davidson, et al.         Expires March 17, 2017                 [Page 3]

Internet-Draft   Protocol for bypassing challenge pages   September 2016


   The end result is that honest users of these services are forced to
   complete many challenge pages in order to access content protected by
   edge service providers such as Cloudflare, and the problem is
   exacerbated by the fact that these companies offer services to a wide
   range of popular websites.  This results in a huge increase in
   workload for average Tor users in spite of their non-malicious
   nature.

   Further problems arise for users who choose not to enable JavaScript
   in their browsers since they are served with challenges that are
   rapidly deteriorating to the point where a large proportion of
   challenges are too hard to be solved.

   Currently some edge providers (e.g.  Cloudflare) attempt to solve
   this problem by providing cookies that enable access to protected
   resources once a CAPTCHA has been solved.  There are two problems
   with this method: first, that when a new Tor circuit is constructed
   the cookie is rendered useless; and secondly, that setting cookies
   across many domains controlled by the same CDN could lead to
   deanonymisation attacks outside the current Tor Browser threat model.
   As such a new solution to this problem is needed.

   In this document we detail a protocol that enables a user to complete
   a single edge-served challenge page in return for a finite number of
   signed tokens.  These tokens can then be used to bypass future
   challenge pages that are served by participating edge-providers.  The
   tokens are generated in such a way that signed tokens cannot be
   linked to future redeemed tokens for bypassing.  We achieve this
   using the RSA blind signature scheme first presented by David Chaum
   [Cha83].

1.1.  Terminology

   The keywords MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD,
   SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL, when they appear in this
   document, are to be interpreted as described in [RFC2119].

   The following terms are used:

   edge: A serving endpoint that provides access to a protected origin.

   client: The endpoint attempting to access an edge-protected service.

   origin: The endpoint where web content is stored.

   edge-protected: Term for origins that pay the edge to provide
   protection services for their domain.


Davidson, et al.         Expires March 17, 2017                 [Page 4]

Internet-Draft   Protocol for bypassing challenge pages   September 2016


   endpoint: Points where requests and responses are dealt with.

   browser: A program ran by the client that provides access to
   webpages.

   plugin: An installed service that runs in the client's browser.

   tokens: JSON structures that are generated by the plugin in the
   client's browser for future redemption.

   blinding: An operation that "hides" the contents of the token while
   still allowing the underlying token to be cryptographically signed.

   unblinding: The reverse procedure of blinding.  Recovers a token and
   (if signed) a valid signature on the token.

   challenge answer: Generated when submitting a response to a given
   challenge.

   challenge page: A page generated by the edge for the client.  The
   client must answer a challenge on the page correctly and return it to
   gain access to a particular resource.

   nonce: A randomly sampled value that is used for generating unique
   tokens.

2.  Protocol Overview

   Our protocols are initiated when a client is presented with a
   challenge page that contains additional information indicating that
   the edge service accepts tokens for bypassing the challenge.  This
   can be indicated in the HTML of the page as a meta tag along with a
   certificate advising which public key the edge is currently using.
   Two separate protocols exist for when the client has no signed tokens
   available to it and secondly for when the client already has tokens.

   Both protocols require essentially four rounds of communication.  We
   take into account the initial request and response when the client
   attempts to visit an edge-protected origin and is served a challenge
   page instead.

2.1.  Acquiring Signed Tokens


Davidson, et al.         Expires March 17, 2017                 [Page 5]

Internet-Draft   Protocol for bypassing challenge pages   September 2016


            Client                                            Edge

            [OriginRequest]         ------->
                                                     ChallengePage  ^
                                                      + bypass_tag  |
                                                    + sig_key_cert  |
                                    <-------            [Response]  v
         ^  VerifyCertificate
         |  GenerateTokens
         v  BlindTokens
         ^  SolveChallenge
         v  [SignRequest]           ------->
                                                   VerifyChallenge  ^
                                                        SignTokens  |
                                    <-------            [Response]  v
         ^  VerifySigs
         |  UnblindTokens
         |  StoreTokens
         v  [Finished]


          Figure 1: Full message flow for acquiring signed tokens

   When a client attempts to visit an edge-protected origin the edge can
   indicate that it accepts tokens for bypassing a challenge page in
   exchange as well as presenting a certificate corresponding to their
   current signing key.  In this event the client does the following:

   o  checks that the certificate is valid and that the signature
      verifies correctly;

   o  checks that it is aware of the public key provided (e.g. that the
      key is pinned in the plugin);

   o  generates N tokens and blinds them;

   o  sends the tokens to the edge along with an answer to the
      challenge.

   In practice N =< 100 so as not put too much work on the browser,
   limiting to this number also mitigates DDoS potential.  After
   receiving the tokens and the answer to the challenge the edge does
   the following:

   o  checks that the answer is correct;

   o  if this is the case then it signs the tokens and returns the
      signatures to the client.


Davidson, et al.         Expires March 17, 2017                 [Page 6]

Internet-Draft   Protocol for bypassing challenge pages   September 2016


   The client does not immediately get access to the origin, though this
   can be achieved if the client immediately reloads the page and
   redeems a token using the process below.  The client participates in
   some final post-processing:

   o  they check that the signatures verify correctly with respect to
      the pinned public key and the blinded token;

   o  they unblind the token and signature pair to get a new pair of the
      original token and a valid sigature;

   o  they finally store the pair in their browser plugin for future
      use.

2.2.  Redeeming Tokens


            Client                                            Edge

            [OriginRequest]          ------>
                                                     ChallengePage  ^
                                                      + bypass_tag  |
                                                    + sig_key_cert  |
                                     <------            [Response]  v
         ^  VerifyCertificate
         |  ConstructTokenMessage
         |  ProofOfWork*
         v  [SendToken]              ------>
                                                        VerifyPoW*  ^
                                                VerifyTokenMessage  |
                                                         GetOrigin  |
                                     <------            [Response]  v
            [Finished]


             Figure 2: Full message flow for redeeming tokens

   o  - optional extensions to the protocol

   As before the client attempts to visit an edge-protected website and
   is faced with a challenge page.  If the edge accepts tokens and
   provides a certificate that corresponds to a key that the client has
   pinned in their plugin and they have tokens signed by the counterpart
   private key then the client can attempt to bypass the prospective
   challenge page.  This process is as follows:

   o  client constructs and sends a message containing:


Davidson, et al.         Expires March 17, 2017                 [Page 7]

Internet-Draft   Protocol for bypassing challenge pages   September 2016


      *  an encrypted, unused token;

      *  a valid signature for the token;

      *  a HMAC, keyed by the token and computed over message
         identifying information;

   o  edge receives the message and performs the following checks:

      *  decrypts the token and checks that it resembles an agreed
         structure, else the protocol is aborted;

      *  checks if the token has already been used, if so the protocol
         is aborted;

      *  verifies the signature on the unencrypted token;

      *  validates the HMAC using the token as a key and unique message
         information as input;

   o  If all checks pass the edge allows the client access to the
      originally requested origin.

3.  Preliminaries

3.1.  Protocol Communication

   We assume that our protocol is carried out over HTTP.  This is a
   natural choice for the medium of communication given that the
   protocol is initiated by a client who is accessing a URI over the
   internet.  Due to this assumption we may refer to messages between
   the client and the edge as HTTP requests and responses respectively.
   This also helps us to elaborate on particulars of the protocol that
   are intrinsically linked to this method of communication.

   However, while the original intention of this protocol is for
   bypassing challenge pages over HTTP we encourage usage of the idea to
   any scenario where the receiving of unlinkable "currency" is an
   appropriate reward for completing some pre-defined challenge.  The
   message format of the protocol is not strictly required to be HTTP as
   long as no structural changes are made to the messages that are sent.

3.2.  Design Formation

   To explain the concepts in our design we will use a variety of
   structures that are most easily exposed using an easily readable code
   syntax.


Davidson, et al.         Expires March 17, 2017                 [Page 8]

Internet-Draft   Protocol for bypassing challenge pages   September 2016


3.2.1.  Variables and Functions

   Variables and functions follow the syntax of C-like languages, such
   as:

                              int apple = 7;

   where "int" is the type of the variable 'apple' and 7 is the value
   assigned to it.  All types are self-explanatory and follow convention
   apart from:

   o  "int_b": Used for large integers when undergoing cryptographic
      operations in large groups.

   o  All arrays will be denoted by a set of square brackets followed by
      the type of data that is contained.  For example an array of
      strings will be described as: "[]string"

   o  We call key-value stores 'maps' and define them as:
      "map[type_1](type_2)"

   where type_1 is the type of the keys for the map (e.g. string) and
   type_2 is the type of the stored values (e.g. int).

   We avoid type declaration when defining functions in favour of a
   textual explanation.

3.2.2.  Structs

   We use structs to define a closed ecosystem (similar to an object)
   with a list of variables and functions that define the struct.  We
   describe structs using the syntax:

                  struct Person {
                    var (
                      string name;
                      int age;
                      map[string](string) emailAddresses;
                    );

                    func (
                      setAge(n int);
                      changeName(name string);
                      addEmail(email string);
                    )
                  }


Davidson, et al.         Expires March 17, 2017                 [Page 9]

Internet-Draft   Protocol for bypassing challenge pages   September 2016


   this gives us an interface with which we can interact with the
   struct, allowing us to store and access data with respect to this
   definition.  Here, "vars" defines a list of variables stored on the
   struct, while "func" defines a list of functions that require
   implementation.

3.2.3.  JSON Objects

   We use JSON objects for representing tokens and for constructing
   messages for sending from the client to the edge.  We define our JSON
   structures as:

                           {
                             "key_1":"[value_1]"
                             "key_2":"[value_2]"
                                   .
                                   .
                                   .
                             "key_n":"[value_n]"
                           }

   where each ""key_i"" is a key value, key_i is marshaled as a string
   but can be any built-in type.  Likewise ""[value_i]"" represents the
   corresponding value for ""key_i"", "value_i" is typically encoded as
   a string in either hexadecimal or base64 encoding.  We assume that
   all JSON is accessible using a map-interface where, if data is a JSON
   object, then "data[key_1]" returns "value_1".

3.3.  Data Formatting

   This section deals with the formatting of the different data types
   that are required in our protocol.  This will cover how tokens should
   be formatted and how messages between the client and the edge should
   be structured.

3.3.1.  Tokens

   Tokens are JSON-like structures containing a single nonce field, i.e.

                         {
                           "nonce":"[nonce_value]"
                         }

   where [nonce_value] is a base64 encoded 32-byte sequence of
   cryptographically random bytes.


Davidson, et al.         Expires March 17, 2017                [Page 10]

Internet-Draft   Protocol for bypassing challenge pages   September 2016


3.3.2.  Client-Edge Message Format

   The messages that the client sends to the edge after being served a
   challenge page (i.e. in the third round of communication) are written
   as JSON structures.  These messages designate the type of operation
   that is required of the edge.  All messages below are base64 encoded
   before they are sent.

   The messages are heavily defined by the HTTP protocol that the
   client-edge interaction takes place over.  For example, the signing
   messages detailed below are included in the body of a HTTP request
   due to artificial limits placed on HTTP header field sizes by web
   servers.  Likewise the redemption messages are significantly smaller
   and are thus included in a header.  This difference in transport
   architecture leads to differences in the message formats shown below.

3.3.2.1.  Signing

   In the first protocol, when the client sends an answer to the given
   challenge page they can also append a JSON object to the of the form:

                 {
                   "type":"Signing",
                   "contents":"[t'_1],[t'_2], ...,[t'_N]"
                 }

   where [t'_i] is a generated token that has been subsequently blinded.
   We call such a JSON object a 'JSON signing request' (JSR).

   After base64 encoding is done the final message is sent to the edge
   as:

                   blinded-tokens=[base64 encoded JSR]

3.3.2.2.  Redeeming

   In the second protocol when the client attempts to bypass a challenge
   they send a message containing a JSON object of the form:

           {
             "type":"Redeem",
             "contents":"[<encrypted_token>,<signature>,<HMAC>]"
           }

   where the token that is encrypted has been since unblinded.  Such an
   object is known as a 'JSON redemption request' (JRR).


Davidson, et al.         Expires March 17, 2017                [Page 11]

Internet-Draft   Protocol for bypassing challenge pages   September 2016


3.3.3.  Edge-Client Message Format

   The messages returned by the edge to the client are much more heavily
   defined by the messaging protocol being used to communicate.  For
   example, in the redemption protocol the server merely serves content
   from the origin in the event that the token that is redeemed verifies
   correctly.

   In the first protocol however the edge also returns a comma-separated
   list:

                    signatures=[s'_1],[s'_2],...,[s'_N]

   where [s'_i] is a signature computed by the edge over the blinded
   token [t'_i] that it received along with a response to the challenge
   page that was sent.

3.3.4.  Signature Transport Format

   Signatures are sent between the client and the edge using the JWS
   format defined in [RFC7515].  The token that is signed is stored as
   the payload on the JWS object - thus when carrying out unblinding on
   the signature the payload must also be updated.

3.3.5.  Certificate Transport Format

   Certificates can be transported via any standardised method for
   encoding a certificate (e.g.  X.509v3 [RFC5280]).

4.  Cryptographic Tools

   To instantiate the protocols above we require a set of tools that
   allows either participant to perform cryptographic operations over
   data.  In this section we detail the materials and the algorithms
   that are required in order to compute these operations.

4.1.  Keys

   Our protocol requires two key pairs:

   o  edge identity keys (id-pub-key, id-priv-key): Used for performing
      the encryption and decryption required on the token that is sent
      for redemption;

   o  edge signing keys (sign-pub-key, sign-priv-key): Used for
      performing the signing and verification of signatures;

   o  A symmetric MAC key derived from the 'nonce' field on a token.


Davidson, et al.         Expires March 17, 2017                [Page 12]

Internet-Draft   Protocol for bypassing challenge pages   September 2016


   The edge holds both key pairs and the plugin in the client's browser
   has the public keys.  The MAC key is derived at the time of messaging
   and is shared by both parties.

4.2.  Signing/Verifying Algorithms

   o  SIGN(sign-priv-key, data) --> sig : takes a private signing key
      and some 'data' and returns a valid signature 'sig' on 'data'.

   o  VERIFY(sign-pub-key, data, sig) --> 'good'/'bad' : takes the a
      public verification key, some 'data' and a signature 'sig' and
      outputs 'good' if 'sig' is a valid signature on 'data'.  Otherwise
      it outputs 'bad'.

4.3.  Blinding/Unblinding Algorithms

   o  BLIND(blinding-factor, data) --> blind-data : takes a randomly
      sampled 'blinding-factor' and some 'data' and outputs 'blind-data'
      that is computationally unlinkable from 'data'.

   o  UNBLIND(blinding-factor, blind-data, blind-sig) --> (data, sig) :
      takes 'blind-data' and the randomly sampled 'blinding-factor' used
      to generate it, along with an optional parameter for a valid
      signature 'blind-sig' computed over 'blind-data' as input.
      Outputs 'data' and 'sig' where 'data' is the unblinded counterpart
      to 'blind-data' and 'sig' is a valid signature on 'data'.

4.4.  Encryption/Decryption Algorithms

   o  ENCRYPT(id-pub-key, plaintext) --> ciphertext : takes a public
      encryption key and a 'plaintext' as input and outputs an encrypted
      'ciphertext'.

   o  DECRYPT(id-priv-key, ciphertext) --> plaintext : takes a private
      decryption key and an encrypted 'ciphertext' as input and outputs
      a 'plaintext'.

4.5.  MAC Algorithm

   Our MAC algorithm has the following specification:

   o  MAC(mac-key, data) --> mac : takes a symmetric mac-key and 'data'
      as input and outputs 'mac' as a valid authentication code on
      'data'.


Davidson, et al.         Expires March 17, 2017                [Page 13]

Internet-Draft   Protocol for bypassing challenge pages   September 2016


4.6.  Instantiation of Cryptographic Tools

   In theory any digital signature scheme that allows for blind signing
   and unblinding operations can be used to instantiate our
   requirements.  However, due to the simplicity of its design we have
   chosen to only support the RSA blind signing modification (RSA-blind)
   shown in [![Cha83]].  We may benefit by adding support for elliptic
   curve based designs in the future to decrease the size of messages in
   our protocol.

   By choosing RSA-blind we make the following parameter choices:

   o  both encryption and signing keys are 2048-bit RSA keys;

   o  the SIGN/VERIFY algorithm is FDH-RSA to support binding;

   o  BLIND/UNBLIND follow naturally from the referenced work;

   o  ENCRYPT/DECRYPT are instantiated with RSA-OAEP;

   o  MAC is instantiated with HMAC.


4.7.  Randomness Sampling

   Finally we require an ability for the browser plugin to sample random
   values for blinding tokens.  Our algorithm can be thought of as:

   o  SAMPLE(seed) --> rand : takes a random seed as input and generates
      a value 'rand'.

   we can instantiate this algorithm using any standard library for
   generating cryptographic randomness.  In future notation we may omit
   the seed for ease of exposition.

   Random numbers used as blinding factors must be sampled from the full
   domain allowed by the chosen RSA parameters [BNPS01].

5.  Browser plugin

   To participate in the protocol, the client must be using a browser
   with an installed and validated browser plugin.  This plugin controls
   the generation, blinding, unblinding, storage and redemption of
   tokens for bypassing challenge pages.  The browser plugin can be
   thought of as a struct with the following attributes:


Davidson, et al.         Expires March 17, 2017                [Page 14]

Internet-Draft   Protocol for bypassing challenge pages   September 2016


         struct Plugin {
           var (
             map[string]([]string) tokens;
             map[string](string) signatures;
             map[string](int_b) blindingFactors;
             []string publicKeys;
           )

           func (
             parse(string s, string p);
             verifyCert([]byte pk, []byte cert);
             verifySig([]byte pk, []byte s, []byte t);
             generate(int N);
             blind(string t);
             unblind(string t', string s', int_b r);
             store(string pubKey, []string tokens, []string sigs);
             encode(string type, []string data);
             mac([]byte nonce, string s);
             send(string msg);
             pow([]byte randNonce);
           )
         }

   We implement the struct functions in the following way.

   o  parse(s, p) --> b

   This function takes strings s, p as output and returns a boolean 'b'
   where "b == true" if p is a valid substring of p.  Otherwise "b ==
   false".

   o  verifyCert(pubKey, cert) --> b

   This function takes the bytes of a public verification key 'pubKey'
   and a certificate 'cert' as input and outputs a boolean value b,
   where "b == true" if the signature on 'cert' can be verified
   correctly and the public key on 'cert' is pinned in contained in
   "Plugin.publicKeys".  Otherwise "b == false".  The VERIFY() algorithm
   is used to ascertain whether the signature is valid over the inputs
   to this function.

   Other details on the certificate are also verified in this step (for
   example that the expiry date has not elapsed and that the provider is
   consistent with the protecting edge).

   o  verifySig(pubKey, s, t) --> b


Davidson, et al.         Expires March 17, 2017                [Page 15]

Internet-Draft   Protocol for bypassing challenge pages   September 2016


   This function takes the bytes of a public verification key 'pubKey',
   a token 't' and a signature 's'.  It outputs "b == true" if 's' is a
   valid signature on t and "b == false" otherwise.  The plugin runs
   VERIFY() using all three inputs to get the output b and returns this
   as the output of the function.

   o  generate(N) --> tokens

   This function takes an integer N as input and outputs an array
   'tokens' of length N containing.  The array is generated by sampling
   N 32-byte nonces randomly via SAMPLE() and constructing N tokens
   by creating N JSON objects with the "nonce" field set to the value
   of the sampled nonce.

   o  blind(t) --> t'

   This function takes a token t as input and outputs a blinded token
   t'.  The function uses SAMPLE() to generate a 256-byte random "int_b"
   r from the full domain allowed by the RSA keys and then runs
   BLIND(r, t) --> t' and outputs t'.  After each use of blind(), the plugin
   should store a map between the blinded data and the blinding factor used,
   such as

                      Plugin.blindingFactors[t'] = r

   o  unblind(t', s', r) --> (t, s)

   Takes a blinded token t', a valid signature s' for t' and the
   blinding factor r as input and outputs a pair (t, s) where t is the
   unblinded token and s is a valid signature on t.  The function uses
   the algorithm UNBLIND(r, t', s') to retrieve (t, s).

   o  store(pubKey, tokens, sigs)

   This function does not return anything.  It simply sets

                      Plugin.tokens[pubKey] = tokens

   and

                  Plugin.signatures[tokens[i]] = sigs[i]

   o  encode(type, data)

   Takes a 'type' string and a base64 encoded string 'data' as input.
   The 'type' string corresponds to a JSON request (either "JSR" or
   "JRR") and creates a JSON object with the "type" field set
   appropriately and the "contents" field set to be equal to the 'data'
   input.

   o  mac(nonce, s)


Davidson, et al.         Expires March 17, 2017                [Page 16]

Internet-Draft   Protocol for bypassing challenge pages   September 2016


   Takes 'nonce' in byte form and a string 's' as input.  The 'nonce'
   value is used as the key and 's' is the contents to be computed over.
   This function runs the algorithm MAC(nonce, s) on the two inputs and
   outputs whatever this algorithm outputs.

   o  send(msg)

   Provides no output, takes a string representation 'msg' as input
   where 'msg' is either a JSR or JRR as input.  This function reloads
   the current page in the browser and appends 'msg' in the HTTP request
   that is created (either in a header or in the body).

   o  pow(randNonce)

   Optional method for the plugin.  Takes the bytes of a random nonce
   'randNonce' as input and computes some proof-of-work computation that
   is specified by the edge.  The output is given as 'out' and is used
   by the client in the following bypass request that is made.

5.1.  Pinned public keys

   The plugin has a list of pinned public keys stored as base64 strings
   in the string array "publicKeys".  Because browser plugins are signed
   and verifiable as part of a deterministic build process, this
   prevents a service from assigning unique public keys to each client
   as a way of linking requests and deanonymising users.  When an edge
   provides a certificate for a given public key, the plugin checks that
   the key contained in the certificate is one of its pinned keys before
   communicating futher with the edge.

6.  Token Acquisition Protocol

   The token acquisition protocol allows a client to acquire signatures
   on client-generated tokens that can be redeemed in the future to
   bypass challenge pages.  We analyse the protocol with respect to the
   stages that we defined in Figure 1.

6.1.  [OriginRequest]

   This initiation of the protocol is triggered by the OriginRequest
   where the client attempts to access a webpage (for example over
   HTTP).  For the purposes of our protocol this webpage is edge-
   protected.


Davidson, et al.         Expires March 17, 2017                [Page 17]

Internet-Draft   Protocol for bypassing challenge pages   September 2016


6.2.  ChallengePage

   The edge deems the origin request to come from a client requiring the
   showing of a challenge in order to grant access to the protected
   website.  The challenge page displays some HTML conveying the
   explicit challenge to the client.

   To participate in accepting challenge bypass tokens, an edge must
   also append specific "<meta>" tags to the HTML of the page.  The tags
   that indicate participation are:

            <meta name="captcha-bypass" id="captcha-bypass" />
            <meta name="chl-cert" id="chl-cert" content="%s" />

   where '%s' is replaced with a valid certificate on some public key.

6.3.  VerifyCertificate

   When a client is delivered such a page, the installed plugin will run
   the "parse()" function on the HTML and the meta tags above, if this
   function returns true then the plugin inputs the certificate from
   '%s' into "verifyCert()" and checks that this also returns true.

6.4.  GenerateTokens + BlindTokens

   The plugin retrieves the public key 'pubKey' from the verified
   certificate and then checks if "Plugin.tokens[pubKey]" is empty or
   not.

   If there are no tokens stored for 'pubKey' the plugin runs
   "generate(N)" to get an array of N 'toks'.  It then runs "blind(t)"
   on each token and constructs an array of blinded tokens,
   'blindedTokens'.  The array 'toks' is stored in the 'tokens' map as

                           tokens[pubKey] = toks

   where 'pubKey' is the public key from the certificate.

6.5.  SolveChallenge

   This step involves the client solving the presented challenge.  This
   step requires human intervention, for instance as in the way that
   CAPTCHAs are solved.


Davidson, et al.         Expires March 17, 2017                [Page 18]

Internet-Draft   Protocol for bypassing challenge pages   September 2016


6.6.  [SignRequest]

   The plugin encodes the array 'blindedTokens' as a string 'content'
   and runs "encode("JSR", content)" to get a JSR request containing
   this data.  When the challenge solution is sent to the edge by the
   client, the plugin base64 encodes the JSR and appends it to the HTTP
   request body using the syntax:

                   blinded-tokens=<base64 encoded JSR>

   ## VerifyChallenge

   When the edge receives the request with a challenge solution and a
   JSR it first checks that the solution provided is correct with
   respect to the initial challenge that was sent.

6.7.  SignTokens

   The edge receives the blinded tokens, checks that the challenge
   solution is valid and then runs SIGN() on each blinded token t'_i
   from the JSR using the private signing key "sig-priv-key" that it
   owns.  The edge constructs an array 'sigs' from the signatures that
   are produced by the SIGN() algorithm.

6.8.  [Response]

   The edge responds to the client with an array containing the pairs of
   blinded tokens with their respective signatures from the array 'sigs'
   using the syntax:

                      signatures=[<s'_1>,...,<s'_N>]

   where each "<s'_i>" is a base64 encoded JWS object containing the
   blinded token that is signed as the payload.

6.9.  VerifyingSignatures

   The client receives the comma-separated signatures from the edge.

   Firstly, the plugin runs "verifySig(pubKey, s'_i, t'_i)" for the ith
   received signature "s'_i" where "t'_i" is the blinded token stored in
   the payload and pubKey stored on the original certificate.  If each
   invocation of "verifySig()" is successful then the plugin proceeds.


Davidson, et al.         Expires March 17, 2017                [Page 19]

Internet-Draft   Protocol for bypassing challenge pages   September 2016


6.10.  UnblindTokens

   Secondly the plugin runs "unblind(t'_i, s'_i, r_i)" where r_i is the
   ith blinding factor stored in "Plugin.blindingFactors".  This
   function outputs the pair "(t_i, s_i)".

6.11.  StoreTokens

   Finally the plugin checks that:

                      Plugin.tokens[pubKey][i] = t_i

   If so, then the plugin runs "store(pubKey, t_i, s_i)" to store the
   token and signature for future use.

7.  Challenge Bypass Protocol

   The challenge bypass protocol starts in the same way as the token
   acquisition protocol with the client attempting to visit an edge-
   protected origin.  The origin returns a challenge page as before and
   the client's browser verifies the HTML "meta" tags sent by the edge
   indicate that bypassing a challenge page can happen.  The protocol
   deviates after the VerifyCertificate stage if the map
   "Plugin.tokens[pubKey]" is populated by one or more tokens (where
   'pubKey' is the certified public key as before).

   We detail the steps that follow this stage in detailing how a client
   can bypass the challenge.

7.1.  ConstructTokenMessage

   When the client has tokens for being able to bypass challenges the
   browser plugin does the following:

   o  picks the next available token and signature pair (t,sig) for
      'pubKey' where:

                           pubKey = sig-pub-key;

   o  encrypts t by computing

                     ENCRYPT(id-pub-key, t) --> t-enc;

   o  computes

                MAC(t["nonce"], unique-request-data) --> hm


Davidson, et al.         Expires March 17, 2017                [Page 20]

Internet-Draft   Protocol for bypassing challenge pages   September 2016


   where the MAC algorithm is keyed by the "nonce" field on the token
   and 'unique-request-data' is some data that is unique to a request
   containing this token;

   o  creates a concatenated string:

                            t-enc || sig || hm

   and base64 encodes it to form a string 'data';

7.2.  ProofOfWork

   This is an optional extension to the protocol that enables the edge
   to specify some proof-of-work (PoW) computation to the client.  This
   is to prevent any client from being able to construct many viable-
   looking, but invalid, tokens that force the edge into computing a
   number of public-key operations before throwing away the invalid
   token.  If done often enough this could lead to a potential DDoS
   vector on the edge.  By establishing a PoW step this limits the
   client to only being able to redeem tokens when they can answer the
   PoW.

   If this step is to be used, the edge specifies an extra header in the
   initial response to the client with the attribute "bypass-proof-of-
   work" and a value "randNonce" that contains a random nonce that the
   client uses in answering the PoW.  The plugin then computes
   "pow(randNonce)" --> 'out' where 'out' represent the output of the
   computation.

7.3.  SendToken

   o  Runs encode("JRR", data) to get a JRR with the "contents" field
      set equal to 'data'

   o  If ProofOfWork is done, then the plugin appends an extra field to
      the JRR object named "pow" where the value is equal to 'out'.

   The plugin then reloads the page and sends this JRR as the value of
   the header "challenge-bypass-token".

7.4.  VerifyPoW

   When the edge receives the JRR message that was sent above, if a PoW
   was stipulated then the edge first checks that the value stored in
   the "pow" field is correct for the random nonce that was sent.

   If not, then the protocol is aborted at this point.


Davidson, et al.         Expires March 17, 2017                [Page 21]

Internet-Draft   Protocol for bypassing challenge pages   September 2016


7.5.  VerifyTokenMessage

   The edge decodes the "contents" fields from the received JRR and it
   does the following:

   o  sets a "success" bool to true;

   o  computes

                     DECRYPT(id-priv-key, t-enc) --> t

   and checks that t is a JSON object with a "nonce" field, if either
   check fails then set "success" equal to false; - checks that t has
   not been redeemed before, otherwise set "success" to false; -
   computes

                     VERIFY(sig-pub-key, t, sig) --> b

   if 'b' is not true then it sets "success" to false; - retrieves
   'edge-request-data' from the request it received and computes

              MAC(t["nonce"], edge-request-data) --> hm-edge

   and checks that hm == hm-edge, if not "it sets "success" to false.

   If "success" is still true, then the edge marks the bypass request as
   successful and continues.

7.6.  GetOrigin + [Response]

   If the verification process was successful.  The edge gets a response
   from the origin that corresponds to the original request in
   {OriginRequest} from the client.  The edge then sends this response
   directly back to the client.

   This allows the client to access the origin resource.

8.  References

8.1.  Normative References

   [ABHL03]   von Ahn, L., Blum, M., Hopper, N., and J. Langford,
              "CAPTCHA: Using Hard AI Problems For Security", 2003,
              <https://www.cs.cmu.edu/~mblum/research/pdf/captcha.pdf>.


Davidson, et al.         Expires March 17, 2017                [Page 22]

Internet-Draft   Protocol for bypassing challenge pages   September 2016


8.2.  Informative References

   [BNPS01]   Bellare, M., Namprempre, C., Pointcheval, D., and M.
              Semanko,   "The One-More-RSA-Inversion Problems and the
              Security of Chaum’s Blind Signature Scheme", 2001,
              <https://eprint.iacr.org/2001/002.pdf>.

   [Cha83]    Chaum, D., "Blind Signatures For Untraceable Payments",
              1983, <http://sceweb.sce.uhcl.edu/yang/teaching/
              csci5234WebSecurityFall2011/Chaum-blind-signatures.PDF>.

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119,
              DOI 10.17487/RFC2119, March 1997,
              <http://www.rfc-editor.org/info/rfc2119>.

   [RFC5280]  Cooper, D., Santesson, S., Farrell, S., Boeyen, S.,
              Housley, R., and W. Polk, "Internet X.509 Public Key
              Infrastructure Certificate and Certificate Revocation List
              (CRL) Profile", RFC 5280, DOI 10.17487/RFC5280, May 2008,
              <http://www.rfc-editor.org/info/rfc5280>.

   [RFC7515]  Jones, M., Bradley, J., and N. Sakimura, "JSON Web
              Signature (JWS)", RFC 7515, DOI 10.17487/RFC7515, May
              2015, <http://www.rfc-editor.org/info/rfc7515>.

Authors' Addresses

   Alex Davidson
   Royal Holloway, University of London
   Egham Hill
   Egham  TW20 0EX

   Email: alex.davidson.2014@live.rhul.ac.uk


   Nick Sullivan
   Cloudflare
   101 Townsend St
   San Francisco  CA 94107

   Email: nick@cloudflare.com


   George Tankersley
   Cloudflare

   Email: george.tankersley@cloudflare.com


Davidson, et al.         Expires March 17, 2017                [Page 23]

Internet-Draft   Protocol for bypassing challenge pages   September 2016


   Filippo Valsorda
   Cloudflare
   25 Lavington Street
   London  SE1 0NZ

   Email: filippo@cloudflare.com


Davidson, et al.         Expires March 17, 2017                [Page 24]