Nix and IPFS #859

Open
vcunat opened this Issue Mar 24, 2016 · 78 comments

Projects

None yet
@vcunat
Member
vcunat commented Mar 24, 2016

(I wanted to split this thread from #296 (comment) .)

Let's discuss relations with IPFS here. As I see it, mainly a decentralized way to distribute nix-stored data would be appreciated.

What we might start with

The easiest usable step might be to allow distribution of fixed-output derivations over IPFS. That are paths that already are content-addressed, typically by (truncated) sha256 over either a flat file or a tar-like dump of a directory tree; more details are in the docs. These paths are mainly used for compressed tarballs of sources. This step itself should avoid lots of problems with unstable upstream downloads, assuming we could convince enough nixers to serve their files over IPFS.

Converting hashes

One of the difficulties is that we use different kinds of hashing than in IPFS, and I don't think it would be good to require converting those many thousands of hashes in our expressions. (Note that it's infeasible to convert among those hashes unless you have the whole content.) IPFS people might best suggest how to work around this. I imagine we want to "serve" a mapping from the hashes we use to the IPFS's hashes, perhaps realized through IPNS. (I don't know details of IPFS's design, I'm afraid.) There's an advantage that one can easily verify the nix-style hash in the end after obtaining the paths in any way.

Non-fixed content

If we get that far, it shouldn't be too hard to manage distributing everything via IPFS, as for all other derivations we use something we could call indirect content addressing. To explain that, let's look at how we distribute binaries now โ€“ our binary caches. We hash the build recipe, including all its recipe dependencies, and we inspect the corresponding narinfo URL on cache.nixos.org. If our build farm has built that recipe, various information is in that file, mainly the hashes of the content of the resulting outputs of that build and crypto-signatures of them.

Note that this narinfo step just converts our problem to the previous fixed-output case, and the conversion itself seems very reminiscent of IPNS.

Deduplication

Note that nix-built stuff has significantly greater than usual potential for chunk-level deduplication. Very often we do a rebuild of a package only because something in a dependency has changed, so there are only very minor changes expected in the results, mainly just exchanging the references to runtime dependencies as their paths have changed. (In seldom occasions even lengths of the paths would change.) There's a great potential to save on that during distribution of binaries, which would be utilized by implementing the section above, and even potential in saving disk space in comparison to our way of hardlinking equal files (the next paragraph).

Saving disk space

Another use might be to actually store the files in a FS similar to what IPFS uses. That seems a little more complex and tricky thing to deploy, e.g. I'm not sure someone already trusts the implementation of the FS enough to have the whole OS running of it.

It's probably premature to speculate too much on this use ATM; I'll just write I can imagine having symlinks from /nix/store/foo to /ipfs/*, representing the locally trusted version of that path. (That's working around the problems related to making /nix/store/foo content-addressed.) Perhaps it could start as a per-path opt-in, so one could move only the less vital paths out of /nix/store itself.


I can help personally with bridging the two communities in my spare time. Not too long ago, I spent many months on researching various ways to handle "highly redundant" data, mainly from the point of view of theoretical computer science.

@ehmry
Member
ehmry commented Mar 24, 2016

I'm curious what the minimalist way to associate store paths to IPFS objects while interfering as little as possible with IPFS-unaware tools would be.

@vcunat
Member
vcunat commented Mar 24, 2016

I described such a way in the second paragraph from bottom. It should work with IPFS and nix store as they are, perhaps with some script that would move the data, create the symlink and pin the path in IPFS to avoid losing it during GC. (It could be unpinned when nix deletes the symlink during GC.)

@ehmry
Member
ehmry commented Mar 24, 2016

I was thinking about avoiding storing store objects in something that wouldn't require a daemon, but of course you can't have everything.

@Ericson2314
Member

@vcunat Great write up! More thoughts on this later, but one thing that gets me is the tension between wanting incremental goals, and avoiding work we don't need long term. For example it will take some heroics to use our current hashing schemes, but for things like dedup and the intensional store we'd want to switch to what IPFS already does (or much closer to that) anyways.

Maybe the best first step is a new non-flat/non-NAR hashing strategy for fixed output derivations? We can slowly convert nixpkgs to use that, and get IPFS mirroring and dedup in the fixed-output case. Another step is using git tree hashes for fetch git. We already want to do that, and I suspect IPFS would want that too for other users. IPFS's multihash can certainly be heavily abused for such a thing :).

@Ericson2314
Member

For me the end goal should be only using IPNS for the derivation -> build map. Any trust-based compatibility map between hashing schemes long term makes the perfectionist in me sad :).

@vcunat
Member
vcunat commented Mar 24, 2016

For example it will take some heroics to use our current hashing schemes, but for things like dedup and the intensional store we'd want to switch to what IPFS already does (or much closer to that) anyways.

I meant that we would "use" some IPFS hashes but also utilize a mapping from our current hashes, perhaps run over IPNS, so that it would still be possible to run our fetchurl { sha256 = "..." } without modification. Note that it's these flat tarball hashes that most upstreams release and sign, and that's not going to change anytime soon, moreover there's not much point in trying to deduplicate compressed tarballs anyway. (We might choose to use uncompressed sources instead, but that's just another partially independent decision I'm not sure about.)

@Ericson2314
Member

For single files / IPFS blobs, we should be able to hash the same way without modification.

@Ericson2314
Member

But for VCS fetches we currently do a recursive/nar hash right? That is what I was worried about.

@Ericson2314
Member

@ehmry I assume it would be pretty easy to make the nix store an immutable FUSE filesystem backed by IPFS (hopefully such a thing exists already). Down the road I'd like to have package references and the other things currently in the SQLite database also backed by IPFS: they would "appear" in the fuse filesystem as specially-named symlinks/hard-links/duplicated sub-directories. "referees" is the only field I'm aware of that'd be a cache on top. Nix would keep track of roots, but IPFS would do GC itself, in the obvious way.

@cleverca22

one idea i had, was to keep all outputs in NAR format, and have the fuse layer dynamically unpack things on-demand, that can then be used with some other planned IPFS features to share a file without copying it into the block storage

then you get a compressed store and don't have to store 2 copies of everything (the nar for sharing and the installed)

@nmikhailov

@cleverca22 yeah, I had same thoughts about that, its unclear how hard this would impact performance though

@cleverca22

could keep a cache of recently used files in a normal tmpfs, and relay things over to that to boost performance back up

@davidar
davidar commented Apr 8, 2016

@cleverca22 another idea that was mentioned previously was to add support for NAR to ipfs, so that we can transparently unpack it as we do with TAR currently (ipfs tar --help)

@Ericson2314
Member

NAR sucks though---no file-level dedup we could otherwise get for free. The above might be fine as a temporary step, but Nix should learn about a better format.

@davidar
davidar commented Apr 9, 2016

@Ericson2314 another option that was mentioned was for Nix and IPFS (and perhaps others) to try to standardise on a common archive format

@Ericson2314
Member

@davidar Sure that's always good. For the shortish term, I was leaning towards a stripped down unixfs with just the attributes NAR cares about. As far as Nix is concerned this is basically the same format but with a different hashing scheme.

@Ericson2314
Member

Yeah looking at Car, it's seems to be both an "IPFS Schema" over the IPFS Merkel DAG (Unless it just reuses unixfs), and then an interchange format for packing the dag into one binary blob.

That former is cool, but I don't think Nix even needs the latter (except perhaps as a new way to fall back on http etc if IPFS is not available while using a compatible format). For normal operation, I'd hope nix could just ask IPFS to populate the fuse filesystem that is the store given a hash, and everything else would be transparent.

@cleverca22

https://github.com/cleverca22/fusenar

i now have a nixos container booting with a fuse filesystem at /nix/store, which mmap's a bunch of .nar files, and transparently reads the requested files

@knupfer
knupfer commented Jul 20, 2016

What is currently missing for using IPFS? How could I contribute? I really need this feature for work.

@knupfer
knupfer commented Jul 20, 2016

Pinging @jbenet and @whyrusleeping because they are only enlisted on the old issue.

@copumpkin
Member

@knupfer I think writing a fetchIPFS would be a pretty easy first step. Deeper integration will be more work and require touching Nix itself.

@knupfer
knupfer commented Jul 28, 2016 edited

Ok, I'm working on it but there are some problems. Apparently, ipfs doesn't save the executable flag, so stuff like stdenv doesn't work, because it expects an executable configure. The alternative would be to distribute tarballs and not directories, but that would be clearly inferior because it would exclude deduplication on file level. Any thoughts on that? I could make every file executable, but that would be not very nice...

@copumpkin
Member

@knupfer it's not great, but would it be possible to distribute a "permissions spec file" paired with a derivation, that specifies file modes out of band? Think of it like a JSON file or whatever format and your thing pulls from IPFS, then applies the file modes to the contents of the directory as specified in the spec. The spec could be identified unique by the folder it's a spec for.

@copumpkin
Member
copumpkin commented Jul 28, 2016 edited

In fact, the unit of distribution could be something like:

{
  "contents": "/ipfs/12345",
  "permissions": "/ipfs/647123"
}
@knupfer
knupfer commented Jul 28, 2016

Yep, that would work. Albeit it makes it more complicated for the user to add some sources to ipfs. But we could for example give an additional url in the fetchIPFS which wouldn't be in ipfs, and if it fetches from normal web automatically generate the permissions file and add that to ipfs... I'll think a bit about it.

@davidak
Contributor
davidak commented Jul 28, 2016

ipfs doesn't save the executable flag

should it? @jbenet

how is ipfs-npm doing it? maybe also just distributes tarballs. that is of course not the most elegant solution.

@Ericson2314
Member

I think @knupfer is talking about that thin-waist dag format. This is supposed to be a minimal building block for building more complex data structures. [This is one of the reasons why I am annoyed it supports keys at all---that's an unneeded feature that just causes confusion.]

@copumpkin
Member

additional url in the fetchIPFS which wouldn't be in ipfs

why would it have to be not in IPFS? it seems like you'd just need a single URL pointing at some sort of structure that points to another IPFS path

@knupfer
knupfer commented Jul 28, 2016

It can be everything in ipfs, the question is how does it enter there. So I thought there could be a conventional url to fetch from when it isn't already in ipfs.

@knupfer
knupfer commented Jul 30, 2016

Ok, I've got a working draft of fetchipfs which reuses a lot of code of fetchzip:

http://lpaste.net/172901

And an example of hello with fetchipfs:

http://lpaste.net/172904

You'd have to add the following to all-packages.nix

fetchipfs = import ../build-support/fetchipfs {
   inherit fetchurl lib unzip;
};

The ipfs-path now contains a file named executables which lists all files which should be executable seperated by newlines and a directory which contains the source (hello-2.10 in this case).

But I'm not sure, perhaps it would be better to list the executables as optional argument to fetchipfs instead of storing that directly into ipfs. This would be much nicer for updates: Someone just has to add the directory of the updated source, change the hash and change the sha256. Now, you'd have to write a file with executables, even if these didn't change and add this together with the source, like for example:
ipfs add -r -w executables hello-2.10

Any thoughts?

@Ericson2314
Member
Ericson2314 commented Jul 31, 2016 edited

This is a longer term concern, but I just opened an issue for Nix to use the git tree object format as an alternative to Nar #1006 . Similarly, I'd like to hack multihash to support that format too (I.e. a "hashing algorithm" which only supports a subset of IPFS dags isomorphic to git trees wrt hashing and DAG shape.)

@copumpkin
Member

Also note that it looks like @wkennington already added support for ipfs to fetchurl in triton.

@knupfer
knupfer commented Aug 4, 2016

Oh, I didn't know about triton (what's the point of it?). But there are some issues if I'm not wrong. He uses tryDownload, so he's forced to serve tarballs via ipfs. My approach downloads directories from ipfs, but asks the ipfs api to wrap it up as tarball. In this way, it can just use curl but deduplicates on file basis. Afterwards, you'd have to unpack the tarball and make a recursive hash, the tarballs from the ipfs api aren't stable.

At the moment I'm on vacation, in about a week I'll dump a branch here.

@copumpkin
Member

@knupfer my understanding is that triton is an attempt to "clean slate" nixpkgs/NixOS, and shed a lot of historical stuff we've accumulated over time. Also, it assumes linux and gets rid of the various hackery we need to support darwin and other platforms.

@wkennington can probably comment more sensibly on the actual approach to the IPFS stuff. I just wanted to make people were aware of related work ๐Ÿ˜„

@jbenet
jbenet commented Aug 4, 2016

will respond more thoughtfully soon.

cc @nicola @davidar @whyrusleeping @diasdavid

The work we're doing with IPLD will simplify a lot of this. More soon,
maybe others can fill in too
On Thu, Aug 4, 2016 at 10:57 Daniel Peebles notifications@github.com
wrote:

@knupfer https://github.com/knupfer my understanding is that triton is
an attempt to "clean slate" nixpkgs/NixOS, and shed a lot of historical
stuff we've accumulated over time. Also, it assumes linux and gets rid of
the various hackery we need to support darwin and other platforms.

@wkennington https://github.com/wkennington can probably comment more
sensibly on the actual approach to the IPFS stuff. I just wanted to make
people were aware of related work ๐Ÿ˜„

โ€”
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#859 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAIcoUqf0O0fi9N-zqdMbRZCmVWfsoJdks5qcf3ggaJpZM4H32W0
.

@knupfer
knupfer commented Aug 23, 2016

So, here is my first draft. It tries in order

And I've changed the hello program to use fetchipfs.
Ipfs doesn't store exec flags, so I've added a optional list of execs to fetchipfs, which will be applied (this seems cleaner to me than a different file which describes the execs like discussed in this thread).

The ipfs path expects a directory of the source code and not a tarball. This allows deduplication on file basis and delta updates of big code bases (like latex for example).

Downsides:

  1. It doesn't depend on ipfs and therefore can't add files to ipfs
  2. Ipfs (the newer versions) is hard to install on nixos and the older version is nonfunctional
  3. it needs an ipfs path and a sha256 because of not depending on ipfs
  4. when ipfs is running, it increases storage requirements

Comments:

  1. It would make a lot of things easier to bite the bullet and depend on ipfs (the exec is about 30mb). Alternatively, we could offer the option in an ipfs module to add new store paths to ipfs (with opt out).
  2. We need some script to convert a gx package to a normal go package
  3. This is alleviated a bit by btrfs. The dangerous alternative would be to mount ipfs and replace nix store paths with symlinks to ipfs paths. This would also allow some curious features like lazy source code, where only the files which are used for compilation will be downloaded. The itch with that idea is, that we would have to symlink every file and not the directory, to be able to download files with exec flags and change these to have these exec flags (ipfs doesn't store exec flags and symlinks can't change the flag).

https://github.com/knupfer/nixpkgs/tree/fetchIPFS

Any thoughts on how to proceed?

@CMCDragonkai

This is assuming the input source is already in IPFS (such as the example with the hello program) right? @knupfer we (me and @plintX) are working on a similar solution but approaching from a different angle. Basically to make it a bit more seamless integration, we will have a package archiver that supplies an ipfs store as a content addressed "input" cache. When a build expression is evaluating a fetchUrl it will first contact the input cache check if the input is already available there, and if not the input is fetched and archived. At some later point this can be integrated into hydra and combined with the "output" cache. Our project is here: https://github.com/MatrixAI/Forge-Package-Archiving

There are extra issues relating to deduplicating compressed inputs...

How we collaborate?

@whyrusleeping
whyrusleeping commented Aug 23, 2016 edited

@knupfer awesome work so far, this is really exciting :)

One thing you might really like is the ipfs tar subcommand set. Its not advertised well because we aren't sure what will become of it once ipld lands (I think it belongs inside unixfs). But it allows you to import and export tar files directly into ipfs. It will expand the tars structure inside ipfs so files get deduped nicely (especially important when dealing with multiple tar files) and retains all executable flags, symlinks and other fun filesystem stuff.

If you want to avoid running a full ipfs node, you could take a look at ipget but i'm not sure if its functional (we havent looked at it in quite a while).

  1. It would make a lot of things easier to bite the bullet and depend on ipfs (the exec is about 30mb).

With go1.7 and some other build flags, you can get this down into the 12-15MB range, or even lower depending on what guides youre following.

  1. We need some script to convert a gx package to a normal go package

Could you elaborate here on why this is needed? and what it would entail?

  1. ...

If the exec flags things is a feature that would really help you, we can add it fairly easily. And maybe only expose it at first behind a feature flag. We havent put it in yet because it presents interesting security concerns that need to be thoroughly thought through.

@Ericson2314
Member
Ericson2314 commented Aug 23, 2016 edited

N.B. ipfs/specs#130 here are some plans for IPFS to support foreign data, including git tree objects which support exactly the FS metadata we care about. I previously opened an issue #1006 for teaching Nix to use git trees in addition to NAR. If that IPFS spec is accepted, then we and IPFS can implement git trees in parallel.

Also I wonder if @shlevy's new DaemonStore abstraction is a first step towards eventually teaching Nix to use an arbitrary content addressable storage+networking layer.

All this is much more work and shouldn't impead progress on a more simple fetchIPFS however.

@nicola nicola referenced this issue in ipld/specs Aug 23, 2016
Open

Captain.log - IPLD v1 spec #13

2 of 2 tasks complete
@jbenet
jbenet commented Aug 23, 2016 edited

ipfs doesn't save the executable flag

It should, yes. it doesn't yet. we've run into this problem. unixfs should store some unix flags, as much as git.

@knupfer
knupfer commented Aug 28, 2016

@CMCDragonkai How do you convert the hashes of fetchurl to ipfs paths? Isn't that impossible? I've read your repo, but I don't understand if you're trying to replace nix or to improve it.
@whyrusleeping Thanks for the hint with ipfs tar add, I've seen it but I misused it (I tried to ipfs get the resulting path instead of ipfs tar cat), this actually solves my current issues with exec flags.
For me, it would be best to just depend on ipfs, but I can't expect that an entire operating system will depend on it. So by not depending and just checking whether there is something that looks like ipfs under the appropriate url, I can expect to integrate this at least into some packages. For example into grsecurity, which has very wonky tarballs.
I've looked into ipget, but I dislike the fact that the user reaps the benefit without helping the community.

The problem with gx (don't misunderstand me, I think gx is a great project) is, that it is a quite new package manager which hasn't got any infrastructure on the ipfs side. A fast hack would be to write something like fetchgx. Writing a normal package just for ipfs isn't possible, because gx needs network access, which doesn't exist in the install phase to ensure reproduceability.

@cleverca22

my original idea to solve this was to use fuse to mount .nar files onto /nix/store, and it sounds like IPLD could be used to add a nar in a de-duppable fasion maybe?

but that still leaves the issue of how you access a nar from the initrd to mount the rootfs, my original plan was just a directory of bare nar's for the entire store, and then ipfs was free to read/share them

@CMCDragonkai

@knupfer The package archiver acts like the binary cache. For a nix client, given a URL and other metadata such as the hashes, it should first check the package archiver to see if it has it. If the package archiver doesn't have the input package, it will download the package and multihash it and store it in ipfs and also add an entry into a multikeyed data structure which acts like a multiindex into ipfs and then serve it to the nix client. This should work for any client wishing to content address its inputs and not just ipfs enabled clients. But the multikeyed index is what I think should be able to map the arbitrary hashes that people have specified to the ipfs hash. Also later this can be integrated into hydra so build expressions in nixpkgs can be automatically archived.

@knupfer
knupfer commented Aug 29, 2016

@CMCDragonkai It seems to me that your project takes a lot more work, so in the mean time a fetchipfs makes sense (and helps future ipfs paths).

So, now it's using ipfs tar. It doesn't require anymore to specify execs and it does now when the ipfs path doesn't produce anything download from the given url and upload afterwards to ipfs (when the demon is running). While uploading it does verify the ipfs hash.

Any review would be appreciated!

https://github.com/knupfer/nixpkgs/tree/fetchIPFS

@CMCDragonkai

Currently the stream (constant memory) downloading works, parallel multihashing is also working. We are currently working on concurrent threads for each download (supporting just http protocol for now) and perhaps parallel conduit for each step in the pipeline and just benchmarking the best performance parameters. Afterwards we just have to construct the multikeyed index and see how well that integrates to lookups on ipfs.

The fetchipfs would only work for new build expressions right? Also who and where will be hosting the ipfs nodes?

@jbenet
jbenet commented Aug 29, 2016

Also who and where will be hosting the ipfs nodes?

We're happy to help run some nodes for you and pin graphs. Ideally we wouldn't run the only nodes, but we can help support bandwidth

@knupfer
knupfer commented Aug 29, 2016

Yes, it would work only for new build expressions. Every person which has got a daemon running will be hosting exactly the build expressions which this person installed. I think that's sensible, because there are a lot of people which would use it. And it's a strictly better solution than fetchurl, because it has got a fetchurl as a fallback.

@knupfer
knupfer commented Aug 29, 2016

Btw. I'm really interested in your project, perhaps I'll join forces, haskell is my favorite language.

@CMCDragonkai
CMCDragonkai commented Aug 29, 2016 edited

So the package archiver could potentially serve ipfsed content to clients that don't have IPFS enabled. I was thinking there would have to be a node of last resort (sort of like the "reserve node") that is just for NixOS community which only archives nixpkgs stuff. Other people hosting their own packages will need to run their own node and I'm not sure but if there was a way to merge different IPFS graphs together that would make cross-graph querying easy.

But whoever wants to join in a torrent like network, every nix client could act like that, and help distribute bandwidth.

I think fetchipfs can also support this package archiver once its ready.

@CMCDragonkai

@knupfer No problem, our repo is a bit messy atm, we should clean it up so its clearer what's happening and the progress. @plintX

@jbenet That will be cool. Does anyone have an estimate for what the total input package size of all nixpkgs would be?

@knupfer
knupfer commented Aug 29, 2016

Well, the question is how much redundancy is needed. The somewhat guaranteed last node would be the plain url, for example https://ftp.gnu.org/gnu/hello/hello-2.10.tar.gz. But considering that there are generous people with a lot of disk space and that there are more than 1000 nix users, I think there won't be any issue.

If we're talking only about source, I'd guess a TB.

@arianvp
arianvp commented Dec 21, 2016

Are people still working on this? It sounds interesting

@vcunat
Member
vcunat commented Dec 21, 2016

I'm not aware. I've been a bit overloaded lately and thus neglecting "larger" issues.

@CMCDragonkai
@mguentner

Please have a look at #1167 and let me know what you think.
It adds IPFS support to the binary cache generation. If a binary cache is being generated (nix copy) each .nar will be added to IPFS and the resulting hash written into the corresponding .narinfo.
When retrieving the .narinfo a signed IPFSHash will be found and instead of downloading the .nar from the same cache, IPFS can be used.

@vcunat
Member
vcunat commented Jan 1, 2017 edited

@mguentner: I wondered why you decided adding *.nar files into IPFS. I would find it much more practical to add the /nix/store/xxx subtree as it is, because that would be (almost) directly usable when mounted at /ipfs/. (The only remaining step is to add a symlink /nix/store/xxx -> /ipfs/yyy.)

@mguentner

@vcunat: Currently the unixfs implementation of IPFS lacks an execute bit which is quite useful for the store which is why I opted for .nar distribution until the new unixfs implementation (using IPLD) is done.
Then, IPFS contents can be symlinked/bind-mounted to the store like you describe it. However, this requires ipfs running on the system while the .nar method also works using a gateway.
While the concept of a almost decentralized distribution is awesome, it requires that each instance of Nix(OS) also runs an IPFS daemon which impacts not only the memory footprint but also is a security concern among other things.
Don't get me wrong, I really like the idea of using IPFS at the FS level but for some use cases this might not be the ideal choice.

Basically there are two scenarios:
Scenario A:

[Machine 1] ----|
[Machine 2] ----|---------HTTP--------[ IPFS Gateway ] -------- IPFS
[Machine 3] ----|

Scenario B

[Machine 1] ----|
[Machine 2] ----|---------IPFS
[Machine 3] ----|

In A a local IPFS gateway which fetches/distributes content and local Nix(OS) machines fetch their content via this gateway using HTTP. This gateway is not necessarily a dedicated machine but can also be some form of container (e.g. nixos-container).
You just need to manage IPFS on the gateway like setting storage, networking quotas and limiting resources IPFS uses (memory, CPU, IO). The distribution method should be uncompressed .nar files.

In B you need to manage IPFS on all machines with the upside that IPFS can be used at the FS level, i.e. mounting /ipfs to /nix/store.

A is better suited for laptops and servers since your machine will not start distributing files when you don't want it to.
B is nice for desktops and/or machines where IO and bandwidth can be donated.

We should focus on a distribution of uncompressed nar files using IPFS and later on directly symlinking/mounting IPFS contents to /nix/store.

@vcunat
Member
vcunat commented Jan 2, 2017
  • I didn't realize/remember the +x problem. Thanks!
  • I meant that "moving to IPFS" would be per-path. In particular, I preferred to avoid having system-critical stuff on such an experimental FS.

Gateways

I really like the idea of gateways, and *.nar is a very good fit there. For now it truly seems better if most NixOS instances don't serve their /nix/store directly and instead they upload custom-built stuff to some gateway(s). People could contribute by:

  • providing such gateways (perhaps each with some policy about what paths are accepted and from whom);
  • uploading new builds and/or verifying existing ones (signed by their key; perhaps even some Hydra-like SW for this could be created).

Together this ecosystem might (soon) offer some properties that we don't have with our current solution (centralized farm + standard CDN).

@mguentner

@vcunat Have a look:
https://github.com/mguentner/nix-ipfs/blob/master/ipfs-gateway.nix
This gateway currently accepts all requests to /ipfs while this is the original config that is
whitelist-only:
https://github.com/mguentner/nix-ipfs-gateway/blob/master/containers.nix
The config still lacks the means to compile the whitelist in a sane way (i.e. checking for duplicates, including older hashes that are not in the latest binary cache etc.)
This script could be extended for that:
https://github.com/mguentner/nix-ipfs/blob/master/ipfs-mirror-push.py

@mguentner

@vcunat And I really like your idea of distributing/decentralizing the actual build process. The most critical part here is the web-of-trust which is currently missing in Nix(OS). Other package managers have integrated gpg and each package is being signed by the respective maintainer (Arch comes to mind).

All this could possibly be achieved using the IPFS ecosystem. Have you looked at IPLD yet?

@vcunat
Member
vcunat commented Jan 2, 2017 edited

Currently: our build farm signs the results and publishes that within the *.narinfo files; nix.conf then contains a list of trusted keys in binary-cache-public-keys.

With IPFS: I don't remember details from my studying IPFS anymore :-) but I remember IPNS seemed the very best fit for publishing the mapping: signing key + derivation hash -> output hashes (+ signature).

@cleverca22

my old ideas for IPFS+NIX was to store whole nar files in IPFS, and to use https://github.com/taktoa/narfuse to turn a directory of nar files into a /nix/store

then the IPFS daemon can be started/stopped, and serve the raw nar files as-is

but you would need a mapping from storepath(hash of build scripts) to IPFS path(multi-hash of nar)

main downside to this plan i had was that it had to store the entire NAR uncompressed in the IPFS system, and on the end-users systems, though normal users pay the same cost once its unpacked to /nix/store

@CMCDragonkai

The mapping problem is also an issue for forge package archiving. In this case we would like to map arbitrary upstream source hashes to the ipfs path. We're hoping to do this without the need of a separate moving part. Like if there was a way to embed extra hashes into ipfs object. But is there other ways?

@Ericson2314
Member
Ericson2314 commented Jan 3, 2017 edited

@CMCDragonkai https://github.com/ipld/cid would be exactly what you want I think, but that spec sadly seems to be stalled. The basic idea is allowing IPFS-links to point to more things than IPFS-objects as long as the "pointing" is via content-addressing.

@cleverca22

the original idea i had to solve the mapping problem was for hydra to multi-hash every nar, and include that into the .narinfo file, but to leave the "ipfs add" as an optional step anybody can do to contribute bandwidth

main downside is that you still need cache.nixos.org for the narinfo files, it just stops being a bandwidth issue

@CMCDragonkai

The haskell code we got currently stream multihashes http resources, so that could be integrated into hydra. But cid project looks interesting, we will check it out indepth soon.

@mguentner

@cleverca22 Nice idea with narfuse! That solves the problem of duplicate storage.
If you leave the ipfs add step optional you still need some authority that does the mapping of .nar hashes and the IPFS hash. A user that does ipfs add still needs to inform other users that the .nar is now available using that IPFS hash.

Just an idea how it could work (the code is already finished for that, see #1167):
(That is Scenario A in #859 (comment))

A Hydra will build a jobset (e.g. nixpkgs), create binary cache afterwards and publish the resulting IPFS Hashes to a set of initial IPFS Nodes (initial seeders in bittorrent language). These seeders will download everything from the Hydra and once this is finished, the Hydra can (in theory) stop distributing that jobset since from this moment the initial seeders and everyone else running IPFS on their Nix(OS) machine will start distributing.
Have a look at this script which is a basic implementation of what I describe.

How to distribute the .narinfo files is open for debate. Either use the traditional HTTP method (a .narinfo hardly generates any traffic) or also put the information inside some IPFS/IPLD structure.

The upside of distributing using HTTP is that there is a single authority that does the mapping between .nar files and IPFS hashes and no IPFS daemon needs to be installed on the "client" side since .nar
files can also be fetched using a gateway (e.g. one of the initial seeders, some local machine or the one running @ https://ipfs.io).

I am confident that IPFS could revolutionize the way we distribute things but I don't consider it mature enough to be running on every machine out there. We need to find pragmatic solutions and come up with some sort road map for Nix and IPFS.
Starting to distribute .nar files using IPFS could be the first step, mapping .nar files from a mounted IPFS to /nix/store the second step, making all sources (fetchgit, fetchFromGithub) available through IPFS the third (what @knupfer started) and the utopia of building everything from IPFS to IPFS the last one. ๐Ÿš€ โžก๏ธ ๐ŸŒ‘

@nbp
Member
nbp commented Jan 19, 2017

A part from the fact that /nix/store can contain files which are not safe for sharing, because of other issues.
I want to raise security concerns to any P2P & Nix integration.

The biggest issue here is how to guarantee the anonymity of both peers. To highlight the issues let suppose we have 2 peers, Alice (A) and Bob (B) as usual, and that A request one package P to B.

  • B sees which version of P is requested, and knows the IP of A. Thus can deduce that A does not yet have P.
    If A does not have P, this means that either A is installing it for the first time, or upgrading it P.
    In which case, B can try to attack A with the issues fixed in the latest version of P.

  • A sees if P is available, and knows the IP of B. If newer version of P are not available on B, then either P is no longer used in B's configuration or P is not yet upgraded.
    In which case, A can try to attack B with the issues fixed in the latest version of P which is not available on B.

In both cases we might think that both issues can be avoided by faking the fact that we have or not a package P, by forwarding the content of someone else. But this suffer from timing attack, and might increase the DoS surface.

What these examples highlight is that we need to either trust the peers, or that we need to provide anonymity between the peers, such that nor A, nor B knows the IP of the others.

@mguentner

@nbp Thanks for mentioning this!

That is very true and will be something that needs to be addressed once it makes sense to run a IPFS daemon on the same system that requests the /nix/store paths using IPFS (as nar or directly
pinning them).

For now IPFS itself is the biggest security concern on a system and then the information about the system it potentially leaks.

However, currently every NixOS user who uses https://cache.nixos.org leaks information about the installed versions to a central entity (Amazon Cloudfront) and all systems in between (filesize).

Depends on your scenario but running a local IPFS gateway might even improve security by reducing the ability to fingerprint your system since many Nix installtions potentially share this gateway. But that's just guesswork plus the security is based partly on obscurity :)

@cleverca22

@nbp
another factor to consider, is that IPFS will advertise the multi-hash of every object you can serve

even if you never advertise locally built things with secrets like users-groups.json, you are still going to advertise you have a copy of hello-2.10 built with a given nixpkgs, and then an attacker could make use of that

@knupfer
knupfer commented Jan 21, 2017

You could serve only store paths which could be garbage collected. So you'll only leak information when you download from ipfs, but not by serving.

@cleverca22

but now you will never contribute bandwidth towards current build-products, only out of date things

@knupfer
knupfer commented Jan 21, 2017

Or build-products which you've deinstalled, or brought only via nix-shell into your system

@cleverca22

yeah, that would limit its use-fullness while giving security, feels more like something the end-user should decide on via a config option

@knupfer
knupfer commented Jan 21, 2017

Agree. Don't forget that newer version of sources have normally a lot of untouched files, so it would even help with old garbage (this is obviously not so often with binaries).

@cleverca22

main issue i can spot with adding raw uncompressed NAR's to the IPFS network is the lack of compression, and lack of file-level dedup within the NAR, but the IPLD stuff i've heard about could add the NAR in file sized chunks, inspecting the contents of the NAR as it goes, at the cost of having a different hash from plain "ipfs add"

@vcunat
Member
vcunat commented Jan 21, 2017

I think you do get file-level dedup within and across NARs, as IPFS is supposed to do chunking based on content IIRC.

@Ericson2314
Member

@copumpkin's comment #520 (comment) sketching a possible implementation of non-deterministic dependencies shares a lot of characteristics with IPNS.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment