Design

ayumi  yu edited this page Feb 9, 2017 · 36 revisions
Welcome to Sync! pyramids contain stuff and are cool and mysterious

What is Brave Sync?

Brave Sync is a new way to automatically sync browsing data (bookmarks, preferences, history) between devices running Brave browser. It uses client-side encryption such that Brave's servers cannot read your data, since they do not have access to the encryption keys. Sync is not designed for data backup; that is, if you delete Brave browser from all your devices, you will not be able to recover old browsing data.

Brave Sync is disabled by default. Once you enable it on one device, you will be able to add more devices to Brave Sync by scanning a QR code or entering a secret phrase on the new device which encodes your Brave Sync encryption keys.

In the future, we plan to add support for syncing Brave Payments data as well as an option to use your own sync server on Amazon S3.

Technical Overview

  • Data to sync: bookmarks, history, site settings, list of devices. User chooses collections to sync.
  • Data encrypted client-side with a single keypair, generated on first run of Sync in a browser client.
  • Server verifies Ed25519 signature over the request in order to authenticate clients.
  • Clients periodically send writes to our web service.
  • Web service async writes to S3.
  • Clients get reads directly from S3.
  • To add new devices, copy the private encryption key seed to the new device.
  • Clients resolve conflicts.
  • Server does not have access to any unencrypted sync data, nor does it know which devices are making the updates or how many devices there are.

Architecture

Client library

  • JS
  • To be shared among browser-laptop/Electron, iOS, Android apps.
  • Runs in an extension background page and webviews respectively.
  • Methods to:
    • Generate Ed25519 keypair and secretbox key from a random seed
    • Export random seed to QR code
    • Import QR code to random seed
    • Request temporary AWS credentials to access a user's S3 sync data
    • Write records (accept a record, encrypt/sign payload, format, write to S3), buffered
    • Read records (query S3, decrypt records) (by polling) and call browser callback
    • (?) Resolve conflicts
    • (?) Delete collections
    • (?) Specify email for recovery

Web app

  • JS, Node
  • Heroku (tentative pending a load test. possible alternative is AWS API Gateway)
  • Create temporary AWS credentials to access user's S3 prefix
  • Register user

Crypto

To create a new userID, the client generates a 32 byte random seed. This is expanded with HKDF (SHA-512 HMAC) into:

  1. Ed25519 signing keypair
  2. NaCl secretbox authenticated encryption secret key

The hex or base64 encoding of the Ed25519 public key (32 bytes) is the userID which uniquely identifies each "persona". Users may associate multiple devices with the same userID by copying the 32-byte random seed to the new device.

Each sync update is end-to-end encrypted using secretbox by the client so that the server does not see the device ID or any of the sync data. API requests are signed using the Ed25519 keypair, which the web service verifies for client authentication using the userID / public key.

To add a new device, the user has to copy the 32-byte initial random seed to the new device via a QR code or similar. The new device uses the seed to reconstruct all the necessary keys.

Sync data

All sync records now contain a new objectId attribute, set by clients on create. It's sent with each write and can't be changed.

All sync PUTs include a 1-byte deviceId, which increments every time a new device is registered. Users can register up to 256 devices

{
  objectId: bytes // UUID object ID
}

Bookmark (categoryId = 0x00)

{
    location: string,
    title: string,
    customTitle: string, // User provided title for bookmark; overrides title
    favicon: string, // URL of the favicon (?) Do we need this
    lastAccessedTime: number, // datetime.getTime(), null if folder
    creationTime: number, //creation time of bookmark
    isFolder: boolean,
    folderId: number,
    parentFolderId: number
}

HistorySite (categoryId = 0x01)

{
  location: string,
  title: string,
  customTitle: string, // User provided title for bookmark; overrides title
  lastAccessedTime: number, // datetime.getTime(), null if folder
  creationTime: number, //creation time of bookmark
  favicon: string // URL of the favicon
}

Preferences (categoryId = 0x02)

SiteSetting

{
  hostPattern: hostPattern,
  zoomLevel: number,
  shieldsUp: boolean,
  adControl: enum, // (showBraveAds | blockAds | allowAdsAndTracking)
  cookieControl: enum, // (block3rdPartyCookie | allowAllCookies)
  safeBrowsing: boolean,
  noScript: boolean, // true = block scripts, false = allow
  httpsEverywhere: boolean,
  fingerprintingProtection: boolean,
  ledgerPayments: boolean, // False if site should not be paid by the ledger. Defaults to true.
  ledgerPaymentsShown: boolean, // False if site should not be paid by the ledger and should not be shown in the UI. Defaults to true.
}

Device

{
  name: string // optional user-chosen name
}

S3

  • We're not storing any object content; keys contain all the data.
  • Generally clients will request a temporary AWS secret which authorizes them to GET, PUT and LIST S3 data directly.
  • Stored data is serialized with protocol buffers.

Users

  • A user's Sync data is stored in S3 with prefix /{userId}/.
  • userId == user's pubkey matching the client generated private key.
  • Users register themselves implicitly with the first S3 PUT with their userId prefix.
  • Users can delete their Sync account by deleting all S3 keys with prefix /{userId}/.

Devices

  • A user has many devices.
  • Each write includes the client's device ID so clients can see which devices have been writing.
  • New clients read the devices list, determine the next device ID, and write the new device ID to S3. Devices store their own device ID in persistent local storage.

Collections

  • Browser record collections are stored at /{userId}/{categoryId}/ where categoryId maps to a collection that is one of: bookmark, historySite, preferences.
  • A collection is a write journal with writes (create, update, delete) of records.
  • Note the record data is stored in S3 keys which are limited to 1024 UTF-8 characters; so large objects are split into multiple parts.
  • Clients will fetch writes directly with S3 LIST /{userId}/{categoryId}/?start-after={timestamp}
  • Clients will construct Sync state by replaying all journal writes.
  • They'll keep a client-side timestamp of how far they are synced up to, and get updates periodically (15 min?).
  • In case of conflict, clients will preserve the last performed action based on records's S3 LastModified dates.

Format

/{ version }/{ userId == user pubkey }/{ categoryId }/{ client timestamp }/({ CRC32(full payload) }/{ part })/{ encrypt(serialize([ one or more writes ])) }

write: { action, deviceId, objectId, objectData={key1: value1, ...} }

  • version: 1 byte, format version
  • userId: 32 byte, client public key, base64 (hex?) encoded.
  • categoryId: 1 byte, maps to collection types e.g. bookmarks
  • timestamp: 4 byte
  • part: 1 byte
  • delimiters: /'s - 5 bytes
  • multi-part overhead (for requests >933 bytes): checksum + part index + delimiters == 4 + 1 + 2 = 7 bytes ({ CRC32(full payload) }/{ part }/)
  • encryption overhead: ~40 bytes
  • remaining: ~933 bytes

The max size of a multipart write is around 256 * 933 bytes ~= 262 kb.

write

// Writes are encrypted as a collection.
  {
    action: enum, // "create", "update", "delete"
    deviceId: bytes, // device doing the write
    objectId: bytes, // object uuid
    objectData: { // (Optional / in case of delete). Changed object attributes
      // title: string // "Cool bookmark"
      // folderId: number
    }
  },

API

  • All client requests are signed with the user's private key, and the result is specified as signedData in the request query or body.
  • Each endpoint contains the public key (user ID), which we use to verify the signed request and determine the user's S3 prefix.
  • If an endpoint takes no params the client should include a signature over the current client timestamp (epoch time in seconds). The server must reject the request as invalid if the timestamp is too old, to partially protect against signature replay attacks.
  • {version} is the same as the keys' format version

POST /{userId}/credentials

Returns AWS credential valid for 36 hours allowing:

  • S3 ListBucket: /brave-sync/{version}/{userId}/*
  • S3 DeleteObject: /brave-sync/{version}/{userId}
  • S3 DeleteObject: /brave-sync/{version}/{userId}/*
  • For collections bookmarks, historySites, preferences:
    • S3 GetObject, PutObject: /brave-sync/{version}/{userId}/{collection}/*

Response:

{
  aws: {
    accessKeyId: string,
    secretAccessKey: string,
    sessionToken: string,
    expiration: string
  }
  s3Post: {
    // This is POST form data to be included with writes to S3.
    // For details see AWS docs:
    // https://docs.aws.amazon.com/AmazonS3/latest/API/sigv4-post-example.html
    postData: {
      AWSAccessKeyId: string,
      policy: string,
      signature: string,
      acl: string
    },
    bucket: string
  }
}