Skip to content
This repository has been archived by the owner on Feb 16, 2023. It is now read-only.

Extend Grid to take advantage of IPFS file sharding #181

Closed
jvmncs opened this issue Apr 30, 2018 · 1 comment
Closed

Extend Grid to take advantage of IPFS file sharding #181

jvmncs opened this issue Apr 30, 2018 · 1 comment
Labels
Type: New Feature ➕ Introduction of a completely new addition to the codebase

Comments

@jvmncs
Copy link
Contributor

jvmncs commented Apr 30, 2018

IPFS has a max block size of 1MB for security reasons. They've implemented sharding as a way to store larger files/directories on IPFS (see ipfs/notes#76, ipfs/kubo#3042, and also https://github.com/ipfs/js-ipfs-unixfs#usage for an example of how it's used in JS).

This becomes a problem for us, since we'll often want to send tensor objects that contain more than 1MB of data. For example, a 50-dimensional word embedding over a vocabulary of 100,000 words would normally require sending an embedding matrix of at least 50*100000*32/1000000/8=20MB. Training a matrix like this presents a range of challenges, but even freezing it and sending it once would be feasible and useful for users, so this is definitely something we want to be able to do to allow for a larger class of architectures to be trained on Grid.

The goal here would be to figure out a way to do JSON sharding with py-ipfs-api, and then to integrate those changes into Grid.

@jvmncs jvmncs added Type: New Feature ➕ Introduction of a completely new addition to the codebase help wanted labels Apr 30, 2018
@jeamick
Copy link
Member

jeamick commented May 22, 2018

ipfs/kubo#3885

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Type: New Feature ➕ Introduction of a completely new addition to the codebase
Projects
None yet
Development

No branches or pull requests

3 participants