Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v2 Storage and Distribution System Specification #2224

Closed
bedeho opened this issue Mar 3, 2021 · 4 comments
Closed

v2 Storage and Distribution System Specification #2224

bedeho opened this issue Mar 3, 2021 · 4 comments

Comments

@bedeho
Copy link
Member

bedeho commented Mar 3, 2021

Background

The current Joystream network, as of the Sumer release, has an extremely limited system for storing and distributing data, both in terms of functionality and ability of the system to accommodate any kind of even limited scale of utilisation. This specification is intended to substantially improve upon these limitation by settling the organisation and function of the system at a level suitable for mainnet purposes. Importantly, this specification should be read in the context of the Gateway specification #2089, which is complementary in that it outlines an incentive model for the vast majority of expected load on distributors.

Major Changes

The overall design philosophy of the system remains the same as before in the following respects

  • Permissioned entry of of staked service providers.
  • Discretionary slashing, not bound by cryptographic evidence.
  • Chain holds index of data, service provider obligations and rations utilisation of the system.
  • Publishers are not charged for the distribution cost of data.
  • Single chain selected upload host for each data object.

However the following major changes are introduced

  • Distinct roles for storage and distributing data.
  • Storage system with redundancy and only partial replication in nodes.
  • Distribution system with flexible policy space, allowing for CDN like organisation.
  • Efficient deletion and ownership transfer of groups of data objects.
  • On-chain host resolution metadata.
  • Distributors are incentivised in the Gateway system: Gateway Specification #2089

Architecture

Working Groups and Roles

There are two working groups, the storage working group and the distribution working groups, each with its own set of workers and two separate leads, called the storage lead and distribution lead respectively. The workers in the storage working group are called storage providers, and operate dedicated nodes for this purpose, called storage nodes. Likewise for the distribution group there is distribution providers and distribution nodes.

Activity

There are two major service provider activities: storing data and distributing data. By storing we mean long term archiving, and by implication accepting uploads of new data over time. Storing data also implies distributing data to others in order for them to persist it, for example as new data or providers enter the system. The primary operational requirement of this activity is the ability to securely persist and recover very large datasets. By distributing data we mean streaming transmission of a small subset of of all available data, with low latency and high throughput, to a large number of simultaneous users. Initial data copies are obtained from storage providers. All of communication uses HTTP, specifically

  1. user data uploads,
  2. coordination and data transmission between storage nodes,
  3. data transmission between storage nodes and distribution nodes, and
  4. user downloads.

When new data is added to the system, the chain initially determines how it should be stored and distributed, but later manual intervention can update this.

Importantly, the storage infrastructure is not public facing, it only serves itself and distribution infrastructure. The distribution infrastructure is public facing, however, the specific policy around what kind of data is accessible to different audiences (everyone, members only, etc.) is a policy variable which goes outside of this specification. It may also be subject to contextual information around the specific use case to which the data corresponds, so data from the content system for example may have explicit owner specifiable access policies, like paywalls, subscriptions, etc.

Fees

The only fees facing users are associated with publishing new data into the system, namely

  1. Normal transaction fees due to block size and computational weight of processing.
  2. Recoverable storage state fees due to cost of putting new objects into blockchains state. These are released when these objects are deleted at a later time.
  3. A data size sensitive surcharge which is burned when new data is uploaded.

Concepts

Data Object

The fundamental concept in the system is a data object, which represents single static binary object in the system. The main goal of the system is to retain an index of all such objects, including who owns them, and information about what actors are currently tasked with storing and distributing them to end users. The system is unaware of the underlying content represented by such an object, as it is used by different parts of the Joystream system. It can represent assets as diverse as

  • Video media, with a particular resolution and encoding, living in the content directory.
  • Image media used for the avatar of a member, an election candidate or for the cover of a channel in the content directory.
  • A data attachment to a blog post, proposal or role application.

Bags

A data object bag, or bag for short, is a dynamic collection of data objects which can be treated as one subject in the system. Each bag has an owner, which is established when the bag is created. A data object lives in exactly one bag, but may be moved across bags by the owner of the bag. Only the owner can create new data objects in a bag, or opt into absorbing objects from another bag.

The purpose of the concept of bags is to limit the on-chain footprint of administrating multiple objects which should be treated the same way. This is achieved by establishing a small immutable identifier for these objects. The canonical example would be assets that will be consumed together, such as the cover photo and different video media encodings of a single piece of video content. Storage and distribution nodes have commitments to bags, not individual data objects.

There are two different kinds of bags, static bags and dynamic bags. The former are all created when the system goes live and cannot be deleted, and of the latter there are few different types, and new instances of each type can be created over time. Specifically there is one static bag for the council and each working group, and there is a member, channel and DAO dynamic bag type. When a new member, channel or DAO is created, then a new instance of each such type is created.

A dynamic bag creation policy holds parameter values impacting how exactly the creation of a new dynamic bag occurs, and there is one such policy for each type of dynamic bag. It describes how many storage buckets should store the bag, and from what subset of distribution bucket families (described below) to select a given number of distribution buckets (described below).

Storage Buckets and Vouchers

A storage bucket is a commitment to hold some set of bags for long term storage. A bucket may have a bucket operator, which is a single worker in the storage working group. There is distinct bucket operator metadata associated with each, which describes things such as how to resolve the host. The operator of a bucket may change over time. As previously described, when new dynamic bags are created, they are allocated to one or more such buckets, unless the bucket has been temporarily disabled from accepting new bags. A bucket also has a voucher, which describes the limits on how much data and the number of objects can be assigned to the bucket, across all bags, as well as current usage. This limitation is enforced when data objects are uploaded or moved. The voucher limits can be updated by the

Distribution Bucket Families and Distribution Buckets

A distribution bucket is a commitment to distribute a set of bags to end users. A bucket may have multiple bucket operators, each being a worker in the distribution working group. The same metadata concept applies here as well, and additionally covers whether the operator is live or not. Bags are assigned to buckets when being uploaded, or later by the lead by manual intervention. Buckets are partitioned into so called distribution bucket families. These families group buckets with interchangeable semantics from distributional point of view, and the purpose of the grouping is to allow sharding over the bag space for a given service level when creating new bags. Here is an example that can make this more clear. A subset of families could for example represent each country in East Asia, where each family corresponds to a specific country. The buckets in a family, say the family for Mongolia, will be operated by infrastructure which can provide sufficiently low latency guarantees w.r.t. the corresponding country. The bag for a channel known to be particularly popular in this area could be setup so as to use these buckets disproportionately.

Utililisation Model

See here #2359

Runtime: storage_and_distribution

The subsystem specific functionality will be introduced in a new single module, called storage_and_distribution, sketched out below.

All user level actions must come from within the runtime, and each client module must be able to enforce its own authorisation of user transactions using local state about ownership and possible extraneous utilisation limits.

Unlike many other modules there is quite frequent use of inlined storage and state representations in order to facilitate database access free iteration which is unavoidable to unlock the sort of computational efficiency required for key operations, in particular deletion and assigning new bags to buckets.

Concepts

/// Identifier for a bag.
enum BagId {
  DynamicBag(DynamicBagId)
  StaticBag(StaticBagId)
}

enum StaticBagId {
  Council,
  WorkingGroup(WorkingGroup)
}

enum DynamicBagId {
  Member(MemberId),
  Channel(ChannelId),
  DAO(DAOId),
}

struct DynamicBagCreationPolicy {

  /// The number of storage buckets which should replicate the new bag.
  number_of_storage_buckets: u32,

  /// The set of distribution bucket families which should be sampled
  /// to distribute bag, and for each the number of buckets in that family
  /// which should be used.
  families: BTreeMap<DistributionBucketFamilyId, u32>
}

struct DataObjectCreationParameters {
  size: u64,
  ipfs_content_id: Vec<u8>,
}

struct UploadParameters {

  /// Public key used authentication in upload to liason.
  authentication_key: Vec<u8>,
  bag: BagId,
  object_creation: Vec<DataObjectCreationParameters>,
  deletion_prize_source_account: AccountId
}

struct PendingDataObjectStatus {
  liaison: StorageBucketId
}

enum DataObjectStatus {
  Pending(PendingDataObjectStatus),
  AcceptedByLiason
}

struct DataObject {
  status: DataObjectStatus,
  deletion_prize: Balance
}

struct StaticBag {
  objects: BTreeMap<DataObjectId, DataObject>,
  stored_by: BTreeSet<StorageBucketId>,
  distributed_by: BTreeSet<DistributionBucketId>,
}

struct DynamicBag {
  objects: BTreeMap<DataObjectId, DataObject>,
  stored_by: BTreeSet<StorageBucketId>,
  distributed_by: BTreeSet<DistributionBucketId>,
  deletion_prize: Balance
}

enum StorageBucketOperatorStatus {
  Missing,
  InvitedStorageWorker(WorkerId),
  StorageWorker(WorkerId)
}

struct StorageBucket {
  operator_status: StorageBucketOperatorStatus,
  accepting_new_bags: boolean,
  number_of_pending_data_objects: u32,
  voucher: Voucher
}

struct DistributionBucketFamily {

  next_distribution_bucket_id_in_family: DistributionBucketId,

  /// Next bucket in line
  /// only None if there are no buckets in family
  distribution_buckets: BTreeMap<DistributionBucketId, DistributionBucket>,
}

struct DistributionBucket {
  pending_invitations: BTreeSet<WorkerId>,
  number_of_pending_data_objects: u32,
  accepting_new_bags: boolean,
  distributing: boolean,
  number_of_operators: u32,
}

struct DistributionBucketOperator {
  distribution_working_group_worker: WorkerId,
}

struct Voucher {
  size_limit: u64,
  objects_limit: u64,
  size_used: u64,
  objects_used: u64,
}

Module Account

A dedicated module account should be introduced for holding the funds that incentivise cleaning up stale state objects.

Constants

  • Max size of blacklist.
  • Max number of storage buckets.
  • Max number of distribution bucket families
  • Max number of distribution buckets per family.
  • Max number of pending invitations per distribution bucket.
  • Max number of data objects per bag.

State

/// Id of next data object.
next_data_object_id: DataObjectId;

/// Dynamic bag map
bag_by_dynamic_bag_id: map DynamicBagId => DynamicBag;

/// Storage bucket (flat) map
storage_buckets_by_id: BTreeMap<StorageBucketId, StorageBucket>,

/// Id of next new storage bucket
id_of_next_new_storage_bucket: StorageBucketId,

/// Whether all new uploads blocked
uploading_blocked: Boolean

/// Static bags
council_bag: StaticBag;
forum_working_group_bag: StaticBag;
content_working_group_bag: StaticBag;
storage_working_group_bag: StaticBag;
distribution_working_group_bag: StaticBag;
builders_working_group_bag: StaticBag;

/// Default policy used for creation of bags.
member_bag_creation_policy: DynamicBagCreationPolicy;
channel_bag_creation_policy: DynamicBagCreationPolicy;
DAO_bag_creation_policy: DynamicBagCreationPolicy;

/// DistributionBucketFamily (flat) map
distribution_bucket_families_by_id: BTreeMap<DistributionBucketFamilyId, DistributionBucketFamily>;

/// Id of next DistributionBucketFamily
next_distribution_bucket_family_id: DistributionBucketFamilyId;

/// DistributionBucketOperator map
distribution_bucket_operator_by_id: map DistributionBucketOperatorId => DistributionBucketOperator;

/// Blacklisted data object hashes,
black_list: map Vev<u8> => ();

/// Prize contributed towards new data objects.
new_data_object_deletion_prize: Balance;

/// Prize contributed towards new bags.
new_bag_deletion_prize: Balance;

/// Size based pricing of new objects uploaded.
new_data_object_per_mega_byte_fee: Balance;

Extrinsics

// ===== Runtime (owner in client module, proposal system) (Origin == Root) =====

/// Upload new objects, and does so atomically if there is more than one provided.
/// NOTE:
/// - Must return rich information about bags & data objects created.
/// - a `can_upload` extrinsic is likely going to be needed
fn upload(origin, parameters: UploadParameters);

/// Reset auth key for objects in a bag.
/// NOTE: Must return rich information about bags & data objects created.
fn reset_auth_key_for_pending_objects_in_bag(origin, bag: BagId, objects: BTreeSet<DataObjectId>, new_key: Vec<u8>);

/// Retries upload of objects in pending status by selecting a new liason.
/// Retry an upload partially, or fully, blocked by an outstanding transition from pending status.
fn retry_pending_objects(origin, bag: BagId, objects: BTreeSet<DataObjectId>, new_key: Vec<u8>);

/// Delete storage objects.
/// NOTE: Must return rich information about bags & data objects created.
fn delete_objects(origin, objects_in_bags: BTreeSet<(BagId,DataObjectId)>);

/// Move data objects to a new bag.
fn move_objects(origin, source_bag: BagId, objects: BTreeSet<DataObjectId>, destination_bag: BagId);

/// Delete bags.
fn delete_bags(origin, bags: BTreeSet<BagId>);

/// Update what storage buckets back a dynamic bags.
fn update_storage_buckets_for_dynamic_bags(origin, update: BTreeMap<BagId, BTreeSet<StorageBucketId>>)

// ===== Storage Lead (Origin == Signed with storage WG role key) =====

/// Create storage bucket.
fn create_storage_bucket(origin, invite_worker: Option<WorkerId>, accepting_new_data_objects: boolean, voucher: Voucher);

/// Invite operator.
/// Must be missing.
fn invite_storage_bucket_operator(origin, storage_bucket_id: StorageBucketId);

/// Cancel pending invite.
/// Must be pending.
fn cancel_storage_bucket_operator_invite(origin, storage_bucket_id: StorageBucketId);

/// Delete storage bucket.
/// Must be empty.
fn delete_storage_bucket(origin, storage_bucket_id: StorageBucketId);

/// Add and remove hashes to the current blacklist.
fn update_blacklist(origin, remove_hashes: BTreeSet<Vec<u8>>, add: BTreeSet<Vec<u8>>)

/// Update whether uploading is globally blocked.
fn update_uploading_blocked_status(origin, new_status: boolean)

/// Update what storage buckets back a dynamic bags.
fn update_storage_buckets_for_dynamic_bags(origin, update: BTreeMap<DynamicBagId, BTreeSet<StorageBucketId>>)

/// Update number of storage buckets used in given dynamic bag creation policy.
fn update_number_of_storage_buckets_in_dynamic_bag_creation_policy(origin, dynamic_bag_id: DynamicBagId, number_of_storage_buckets: u32)

// ===== Distribution Lead (Origin == Signed with distribution WG role key)  =====

///
fn create_distribution_bucket_family(origin, ...)

///
fn delete_distribution_bucket_family(origin, ...)

///
fn create_distribution_bucket(origin, ...)

/// ::accepting_new_bags
fn update_distribution_bucket_status(origin, ...)

///
fn delete_distribution_bucket(origin, ...)

/// Invite operator.
/// Must be missing.
fn invite_distribution_bucket_operator(origin, distribution_bucket_family_id: DistributionBucketFamilyId, distribution_bucket_id: DistributionBucketId, worker_id: WorkerId);

/// Cancel pending invite.
/// Must be pending.
fn cancel_distribution_bucket_operator_invite(origin, distribution_bucket_family_id: DistributionBucketFamilyId, distribution_bucket_id: DistributionBucketId, worker_id: WorkerId);

/// Update what distribution buckets back a static or dynamic bag.
fn update_distribution_buckets_for_bags(origin, update: BTreeMap<BagId, BTreeSet<DistributionBucketId>>)

/// Update number of storage buckets used in given dynamic bag creation policy.
fn update_families_in_dynamic_bag_creation_policy(origin, dynamic_bag_id: DynamicBagId, families: BTreeMap<DistributionBucketFamilyId, u32>)

// ===== Storage Provider (Origin == Signed with storage WG worker role key) =====

///
fn set_storage_operator_metadata(origin, storage_bucket_id: StorageBucketId, metadata: Vec<u8>);

///
fn accepted_pending_data_objects(origin, worker_id: WorkerId, objects: BTreeSet<(BagId, DataObjectId))

///
fn accept_storage_bucket_invitation(origin, storage_bucket_id:StorageBucketId);

/// Update whether new objects are being accepted for storage.
fn update_storage_bucket_status(origin, storage_bucket_id: StorageBucketId, accepting_new_data_objects: boolean);

// ===== Distribution Provider (Origin == Signed with distribution WG worker role key) =====

///
fn set_distribution_operator_metadata(origin, distribution_bucket_family_id: DistributionBucketFamilyId, distribution_bucket_id: DistributionBucketId, metadata: Vec<u8>);

///
fn accept_distribution_bucket_invitation(origin, distribution_bucket_family_id: DistributionBucketFamilyId, distribution_bucket_id: DistributionBucketId);

Storage Node: Colossus

Here is a list of requirements for the storage node reference implementation.

  • Worker driven: should simultaneously power all obligations of a given worker in the working group, so this could mean multiple storage buckets.
  • Storage medium agnostic: e.g. a local drive, S3, network storage, etc.
  • Fault tolerant: should gracefully recover from any kind of network or power interruption at any stage.
  • Standalone API specification with native docs.
  • Three APIs:
    • A local operator API for controlling and inspecting node state.
    • An authenticated infrastructure API for servicing storage providers and distributors.
    • An authenticated public API for accepting uploads.
  • ELK stack logging of key events, such as requests and errors.
  • Implemented in Typescript.
  • Notice that under the runtime spec above, not all important data is stored in runtime storage, such as metadata, hence it may be that this node needs to speak to a query node to synch with this information in a timely and simple manner.

Distributor Node: Argus

Here is a list of requirements for the storage node reference implementation.

  • Worker driven: should simultaneously power all obligations of a given worker in the working group, however there should be fine grained control over what exact distribution buckets to service on a given instance.
  • Storage medium agnostic: e.g. a local drive, S3, network storage, etc.
  • Fault tolerant: should gracefully recover from any kind of network or power interruption at any stage.
  • Standalone API specification with native docs.
  • Universal Access: in this first version everything can be downloaded for public consumption, and there is no integration with the Gateway system, nor complex policies for distinguishing different asset classes, such as member avatars, proposal attachments, etc.
  • Two APIs:
    • A local operator API for controlling and inspecting node state.
    • A public API for servicing downloads.
  • ELK stack logging of key events, such as requests and errors.
  • Implemented in Typescript.
  • Notice that under the runtime spec above, not all important data is stored in runtime storage, such as metadata, hence it may be that this node needs to speak to a query node to synch with this information in a timely and simple manner. When the node has to start to understand the context to which a data object corresponds, in order to have more sophisticated access policies, this will be even more so the case.

Joystream CLI

New commands for

  • storage system provider and lead actions.
  • distribution system provider and lead actions.
  • Colossus node operator.
  • Argus node operator.
@shamil-gadelshin
Copy link
Contributor

Additional requirements:

We should update the storage system v2 to
- accept any liason
- allow continuing an upload with the same liason if it is interrupted

@bedeho
Copy link
Member Author

bedeho commented Apr 10, 2021

A requirement we should consider adding is some way to require a claimed liason confirming the upload to prove that the uploader at least contacted them for this purpose. But probably not needed right away.

@bedeho
Copy link
Member Author

bedeho commented Apr 25, 2021

An obvious point which is missing, but was pointed out by @mnaamani , is that distributors only store a subset of all videos in their corresponding bucket in their cache which is managed using a Time aware least recently used policy. This will radically reduce the storage requirements of a distributors, and will in practice work well. See section on cache policies here. There should probably be distinct API for doing the very fast fetching that is required when there is a cache miss, perhaps even over a persistent connection.

@bedeho
Copy link
Member Author

bedeho commented May 6, 2021

Question:

A question on a recent call came up about why uploading, at least when initiated through an off-chain signer (like a member or channel owner), should not be possible through a direct extrinsic on the storage module.

Answer:

  • How individual actors should be rationed depends not only on details of who they in the subsystem to which the data being uploaded corresponds, but also on the purpose of the data itself. Likewise, impact over this should be under the control of leads or workers in that subsystem. Its not practiacl to accomdate all of this in the storage module itself. In some sense toe storage pallet is just a dumb filesystem that does know anything about the semantics, the use case specific rules concerning the data in other modules has detailed awareness about this.

  • It would become more difficult to have operations that are atomic in the effects to the storage system and whatever other subsystem is in play. Say for example you want to add a new video. We would want to make it possible for the user to only publish a video in the case where it was possible for all assets to be added to the storage system, and vice versa.

  • It is also just most uniform, because this will have to be the case to integrate with DAOs & council, and its easier for the query node to process compared to any other realistic alternative.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants