Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specs for Acropolis storage module #45

Merged
merged 31 commits into from Jun 13, 2019

Conversation

Projects
None yet
4 participants
@jfinkhaeuser
Copy link
Contributor

commented May 16, 2019

  • Document storage module as-is.
  • Spec required changes to the storage module for cleaning up different concepts.
  • Spec for extensions to the module in Acropolis.

jfinkhaeuser added some commits May 16, 2019

@jfinkhaeuser jfinkhaeuser changed the title Specs for storage module as-is Specs for Acropolis storage module May 16, 2019

jfinkhaeuser added some commits May 16, 2019

- Introduction to DataDirectory.
- Also explain the connection between ContentId, ContentMetadata and
  DataObject a bit.
- Disambiguation between storage provider actor ID and actual machine
  contact information provided.
- High level introduction to DOSR added
@bedeho

This comment has been minimized.

Copy link
Collaborator

commented May 21, 2019

Questions

Progress

Would it be accurate to say that what is missing still to be specified is

  • the communication protocol between uploader and liason
  • the communication protocol between the liason and storage providers
  • the communication protocol between the downloader and the storage provider
  • the runtime protocol used by uploader to add new content
  • the runtime protocol for managing tranches, creating, updating and assigning them to object types
  • the runtime protocol for introducing a new storage provider
  • the runtime protocol for penalizing/evicting a storage provider
  • the runtime protocol for an exiting storage provider

Scope

Perhaps we can limit the scope for Acropolis, yet still achieve much of what we want for our OKRs, by making a good number of things just configured in the genesis state - without thinking of adding/removing/editing, such as

  • data object types
  • tranches
  • other?

Actors module

Should we even depends on Actors module? I never understood it, and Mokhtar has questioned it multiple times.

Mime key map

Data Object Type Registry module has a map DataObjectTypeConstraintsRegistry which maps of media types. Why are media types being emphasized as a special property of constraints? is this just a convenience thing for light clients having to do the client side constraint matching on prospective uploads?

Minor

Why both DataObjectTypeConstraints and DataObjectTypes

What does DataObjectTypeConstraints add, or vice versa? ultimately, you are trying to establish a mapping between a offchain data filtering rule and an on chain tranche. Prior this all lived in DataObjectTypes. Is the point that DataObjectTypeConstraints don’t point to tranches, only DataObjectTypes do?

Combine modules

The “submodules” 1-3 on the Structure list appear to mutually depend on each other. and nether can sensibly be reused on its own. Therefore organizing them a single Substrate modules seems to be more appropriate.Mind you, we don’t have any good convention on how to apply the Substrate module abstraction, so we just going on case by case basis for now. But re-usability in other runtimes seems like one plausible guiding concern.

Separate Content Directory from Storage Module

The Content Directory (CD) really should be in its own proper Substrate module, which would depend on the storage module. The CD is going to substantially grow in scope as more complicated high level business logic/state is introduced there, and none of it is relied upon by anything else in the storage module proper.

Major

Payload based model

The model chosen here makes the set of tranches available for a given type/constraint a function of the raw data payload. Moreover, it obliges the client to deduce this.

This has a number of limitations:

A) The functional role of a data object, that is how it is actually going to be used in user applications, may have substantial bearing on how it should be stored and distributed, and this cannot be deduced from the raw payload. The appropriate storage tranche may be sensitive such functional dimensions, and the distribution tranches/system likewise certainly also will be. For example, a certain type of image may be used in a way in the system that its stored in a highly redundant way with highly staked actors. But an equivalent image payload, in every way, may be used in applications in a way that its not critical at all, and it could be stored with tranches with unstaked newcomers with low redundancy.

B) the quota system, for example on uploads, becomes blind to functional roles of objects, and can only operate on raw size and similar metrics, which may not be as flexible as one wants.
For example, perhaps you want to make sure that no one store more than one channel cover photo explicitly.

As is, if new members are given a 1GB upload quota for example (which is quite low), they may chose to use that by uploading 1 milllion 1Kb images for example, which is both abusive and makes no sense.

C) the distribution system will be very sensitive to the role of a data object, as this is major determinant for future bandwidth requirement profile, hence having this explicitly represented will be very useful. <== speculative point, as we have not thought much about distribution.

D) it may be impractical for certain usage environments to even do this , e.g. a browser having process the actual internal encoding information of a large payload.

The alternative implied model by these observations is to have a set of data object types explicitly defined by functional role of the data, and there is only one type open to a given data object. Client applications will have to be hard coded to use the correct type for a given purpose. Likewise the content directory (schematas) would for example require that data objects pointed to have a specific type

@jfinkhaeuser

This comment has been minimized.

Copy link
Contributor Author

commented May 21, 2019

To summarize the discussion we had about this briefly:

  1. The progress is more or less captured.
  2. The payload-based model for data objects is a result of
    1. not having a clear notion of the purposes of data object types right now, and
    2. both the storage node needing to enforce the constraints, and the UI being able to provide better UX by also trying to enforce constraints
    3. therefore, starting from the file type being presented by the uploader made for a good starting point.
  3. The separation of data object types and constraints is to allow for more flexibility in specifying constraints.
  4. Actors module should be subsumed into the storage module because it's highly storage specific.
  5. Storage should be one module.
  6. Content directory should be a separate module.

We did discuss the payload-based model for a while, and came to two changes:

  1. Instead of the constraints as a separate map, create a versioned constraint payload field for data object type. We should not worry about UI having to discover anything related to data objects, and instead expect UI to hardcode data object types based on purpose. Therefore, a constraint payload field allows us to start with simple file types + size approach, and versioning will allow us to expand this into an appropriate DSL in the future.
  2. Starting from hard-coded data object types by purpose does not remove the need for the UI or the storage nodes to enforce constraints.

Does that seem to capture it?

@bedeho

This comment has been minimized.

Copy link
Collaborator

commented May 22, 2019

Excellent, I will prepare a report PR.

Edit:
here #54

jfinkhaeuser added some commits May 22, 2019

@jfinkhaeuser

This comment has been minimized.

Copy link
Contributor Author

commented May 22, 2019

Ok, I think this is more or less the state we should reach - with the caveat that instead of referring to the actors module as before, I'm referring to a currently non-existent "tranche staking" sub-module. If @mnaamani is up to that, great, otherwise I can put it together from the current actors module.

mnaamani and others added some commits May 24, 2019

@jfinkhaeuser jfinkhaeuser marked this pull request as ready for review May 28, 2019

@jfinkhaeuser

This comment has been minimized.

Copy link
Contributor Author

commented May 28, 2019

Let me reference the staking doc in the main doc, then I guess this is ready to be merged.

@jfinkhaeuser

This comment has been minimized.

Copy link
Contributor Author

commented May 28, 2019

Ok, @mnaamani - would be good if you could skim the data directory specs wrt staking. I also made a modification to the staking spec I didn't think about before; a tranche needs to be created with a matching data object type (but that should not be possible to modify later).

One could argue for a state TrancheIdsByDataObjectTypeId - that's certainly the kind of thing the storage module would like to read. WDYT?

After quick discussion with @mnaamani, add some missing detail about
data object types, and mention why limiting tranche size is not a goal
for right now.
@mnaamani

This comment has been minimized.

Copy link
Contributor

commented Jun 3, 2019

Okay the last final changes look good. I would say the spec write-up is done and good to merge.

@jfinkhaeuser

This comment has been minimized.

Copy link
Contributor Author

commented Jun 3, 2019

@bedeho should review :)

@bedeho

This comment has been minimized.

Copy link
Collaborator

commented Jun 13, 2019

Merging blind for now, will fix later.

@bedeho bedeho merged commit 71bf9b5 into Joystream:master Jun 13, 2019

@bedeho

This comment has been minimized.

Copy link
Collaborator

commented Jun 13, 2019

#64

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.