RFC: Add IPFS to Nix #1167

Closed
wants to merge 4 commits into
from

Projects

None yet
@mguentner
mguentner commented Dec 29, 2016 edited

This adds IPFS support to Nix. 🚀
It adds .nar files to IPFS aswell, and writes the resulting hash to the .narinfo and signs the results.
When the .narinfo is accessed, the .nar can be fetched from IPFS instead of using the HTTP method.

Please have a look at a nixops hydra setup where this is explained in more detail.
https://github.com/mguentner/nix-ipfs

This is a proof of concept. More code will follow once the design is approved and finished.

Ref: #859
Ref: ipfs/notes#51

mguentner added some commits Dec 3, 2016
@mguentner mguentner libstore: add sources depending on config
Signed-off-by: Maximilian Güntner <code@klandest.in>
9b06253
@mguentner mguentner libstore: add basic IPFS support
adds support to 'cat' and 'add' .nar files to IPFS.
If IPFS should be used to fetch .nar files without
using the API interface, a gateway can be used
aswell. Adding file through a gateway is not possible.

Signed-off-by: Maximilian Güntner <code@klandest.in>
d2dbe8f
@mguentner mguentner referenced this pull request Jan 1, 2017
Open

Nix and IPFS #859

@wscott
wscott commented Jan 1, 2017

Is the .nar file compressed? If not, you might want to enable the rabin-fingerprint chunker when writing the .nar files. This will allow deduplication of identical files inside multiple archives. I don't think IPFS uses rabin by default yet.

From the command-line that is done with 'ifps add --chunker rabin FILE'.
You might also want to tweak the params to use a large blocksize.

Even compressed can work if you use a 'rsync-able' compression algorithm.

@mguentner

@wscott : The .nar files are compressed according to the compression option. So in order to have best results this should be set to compression=none when creating the binary cache/nar files.
The --chunker option looks interesting, I will look into it.

@@ -4,14 +4,48 @@ libstore_NAME = libnixstore
libstore_DIR := $(d)
-libstore_SOURCES := $(wildcard $(d)/*.cc)
+libstore_SOURCES := \
@domenkozar
domenkozar Jan 2, 2017 Member

Why is wildcard not used?

@mguentner
mguentner Jan 2, 2017 edited

Since the additional IPFS sources are the second config dependent input for libstore I wanted to make a clean solution instead of following s3-binary-cache-store.cc (being the first one) with adding a lot of #if ENABLE to the source files. That way only sources are compiled and linked that the config requires. Makes the build processes a bit cleaner and easier to debug.

mguentner added some commits Jan 3, 2017
@mguentner mguentner libstore: make IPFS publishing optional
Signed-off-by: Maximilian Güntner <code@klandest.in>
5496413
@mguentner mguentner libstore: remove comment
Signed-off-by: Maximilian Güntner <code@klandest.in>
0880000
@mguentner

Publishing to IPFS is now optional (default off/disabled). An example of how to generate a uncompressed binary cache looks like this:

nix copy --to file:///var/www/example.org/cache?secret-key=/etc/nix/hydra.example.org-1/secret\&compression=none\&publish-to-ipfs=1 -r /nix/store/wkhdf9jinag5750mqlax6z2zbwhqb76n-hello-2.10/
@mguentner

Added for future reference.

The .narinfo is fingerprinted and signed. The fingerprint currently includes
this information

std::string ValidPathInfo::fingerprint() const
{
    if (narSize == 0 || !narHash)
        throw Error(format("cannot calculate fingerprint of path ‘%s’ because its size/hash is not known")
            % path);
    return
        "1;" + path + ";"
        + printHashType(narHash.type) + ":" + printHash32(narHash) + ";"
        + std::to_string(narSize) + ";"
        + concatStringsSep(",", references);
}

[1]

So the IPFSHash is signed indirectly through the narHash since this will be compared to the hash
of the result of the IPFS download (i.e. the downloaded data is not validated until it has been fetched
completely). However the current design provides that the .narinfo is fetched from a trusted source (i.e.
cache.nixos.org using TLS).

[1] From: https://github.com/NixOS/nix/blob/master/src/libstore/store-api.cc#L523

@copumpkin
Member
@shlevy
Member
shlevy commented Jan 10, 2017

Can you write up a quick summary of what IPFS is and why I should care?

@Mic92
Contributor
Mic92 commented Jan 18, 2017 edited

@shlevy I cannot speak for @mguentner, but @vcunat provide some motivation for IPFS

@veprbl
Contributor
veprbl commented Jan 18, 2017 edited

I haven't been carefully tracking IPFS threads, but since nobody else answered I will pile my thoughts here, and people will correct me.

As far as I understand IPFS can make a global storage for NAR's so people can choose to host their builds or cache builds from others. This can potentially unload some bandwidth load from the Hydra S3.

Some enthusiasts could build and share things that are not currently being build by Hydra like python packages. This is already possible now, but it requires doing two things:

  1. Establishing a source (url of "nix-serve" instance)
  2. Establishing trust (NAR signing, SSL)

IPFS could eliminate the first step since the namespace now becomes global and NAR's could probably be discovered through IPLD. The current implementation doesn't do that because it requires IPFS address to be served with nar-info. But the distributed NAR hosting should work already.

There was also a discussion about implementing file or chunk deduplication over IPFS which could have potential for reducing sizes of things. Is this supposed to happen for download size or the size on the disk? I don't know.

Anything written above might be wrong. I don't claim that I possess excessive knowledge on the topic discussed. Please, don't get angry :)

@mguentner
mguentner commented Jan 19, 2017 edited

I wrote an article which explains why IPFS could be useful for NixOS.
That gives some answers to @shlevy's question:
https://sourcediver.org/blog/2017/01/18/distributing-nixos-with-ipfs-part-1/

Also I think that you are quite right @veprbl 👍

@shlevy
Member
shlevy commented Jan 19, 2017

Cool, thanks! Awesome idea.

+ else if (cmd == "cat_gw")
+ return "/ipfs/" + arg;
+ else
+ throw "No such command";
@edolstra
edolstra Jan 19, 2017 Member

I don't think we catch strings anywhere, so this should be throw Error("...").

+ }
+ ipfsHash = IPFSAccessor::addFile(narPath, *narCompressed);
+
+ if (!ipfsHash.empty()) {
@edolstra
edolstra Jan 19, 2017 Member

Why would ipfsHash be empty here?

@rht
rht Jan 22, 2017

In any case, it shouldn't -- an IpfsNode should emit the hash + name to the client when the process completes without an err.

@mguentner
mguentner Jan 23, 2017

The file is uploaded through the HTTP API and a lot can go wrong there.
As for the cpp part, this is the relevant code:
https://github.com/vasild/cpp-ipfs-api/blob/master/src/client.cc#L164

I have tested the code in this PR with more paths after posting this RFC and quite some requests failed silently as the function is void and does not raise anything. As this is unacceptable, the next iteration of the implementation needs to include error handling when adding files if this feature will be even included.
Reason:
My research into IPFS revealed that one needs to pay attention to a lot of things (it's not ftp after all). These include trivial things like selecting a chunker and rather complex tasks like collecting garbage after n ipfs adds while not throwing away unpinned content (race condition).
From a design perspective, adding nars/nix store paths to IPFS must be handles by a separate tool as this is to much complexity to go into Nix (following Ken Thompson's philosophy here).
(Also part of the reason for #1167 (comment))

@@ -41,6 +41,8 @@ NarInfo::NarInfo(const Store & store, const std::string & s, const std::string &
compression = value;
else if (name == "FileHash")
fileHash = parseHashField(value);
+ else if (name == "IPFSHash")
@Ericson2314
Ericson2314 Jan 19, 2017 Member

Should we call this IpfsNarHash or something for future comparability with computing Nars on the fly from some nicer format? Or am I jumping the gun for a file called nar-info.cc after all :).

@@ -290,8 +310,23 @@ bool BinaryCacheStore::isValidPathUncached(const Path & storePath)
void BinaryCacheStore::narFromPath(const Path & storePath, Sink & sink)
{
auto info = queryPathInfo(storePath).cast<const NarInfo>();
+ std::shared_ptr<std::string> nar;
+
+#if ENABLE_IPFS
@nbp
nbp Jan 19, 2017 Member

As commented, downloading from IPFS can open the door for new attacks. I will recommend adding a download-from-ipfs flag as well which is disabled by default.

@mguentner

Thank you for the reviews / comments. I am currently rewriting the implementation as injecting IPFS directly into BinaryCacheStore seems not like the best way to go forward anymore -
instead I am writing a dedicated IPFSCacheStore that resolves /ipfs/

Stay tuned.

@edolstra
Member

@mguentner How would that work, and how would it handle narinfo files?

The present approach seems reasonable to me. It just needs a flag to enable/disable IPFS (e.g. as part of the store URI, like https://cache.nixos.org/?ipfs=true).

Another possibility: rather than add an IPFSHash field to narinfo, we could turn the URL field into a list, allowing the binary cache to announce multiple ways to get the file:

URL: nar/1cvgji7mk3q68f257fmwlqvz8rhfdla6y0lxwqq8nwxagy3w34cx.nar.xz # i.e. HTTP, relative URI
URL: ipfs:/ipfs/QmPXME1oRtoT627YKaDPDQ3PwA8tdP9rWuAAweLzqSwAWT

This would have the advantage of not putting transport-specific info in the NarInfo data structure.

@Ericson2314
Member
Ericson2314 commented Jan 20, 2017 edited

@mguentner I like that proposed code structure. From just how you described it (I haven't read the interface your implementing), I think having two separate implementations segues into a narless world nicely.

@mguentner

The binary cache implementation is finished:
https://github.com/NixIPFS/nix/tree/ipfs_binary_cache

Have a look at the code - there are some small TODOs in there but

    binaryCaches = [
      "/ipns/nix.ipfs.sourcediver.org/binary_cache"
    ];

just works 🎉 and can me merged soonish.

If you want to test it (easy: nixops VirtualBox config), have a look at
https://github.com/NixIPFS/notes/tree/master/examples/binary_cache

Currently there is only release-16.09-small but release-16.09 has been tested as well and will follow once we have the infrastructure in place (have a look at https://github.com/NixIPFS/infrastructure/issues) 👍

@mguentner

@edolstra Addressing your question:
The .narinfo files are available just like using https://cache.nixos.org with their signature from cache.nixos.org-1.
Example:
https://ipfs.io/ipfs/QmR9NabMW7E3XLksJTdtpWYsro9EZpy1NxcbtEtw8Cr8Sq/binary_cache/00n5n3g1jlffq11d4mq7hy1d6yr3x91p.narinfo

I like your idea of adding multiple URL: but NarInfo files are currently a map/dict and you are suggesting to make it a multimap, which I think adds too much complexity.
I still think that adding &ipfs=true to any binary cache is a nice to have but it currently only makes sense using the HTTP cache, right? So we would add code to BinaryCache that is only usable for one BinaryCache, namely the HTTPBinaryCache. In my opinion, the right way to go forward is to use a IPFS binary cache until another form of binary distribution is found (see #1006 + #859) or we will create way too much technical debt.

@matthiasbeyer
Contributor
matthiasbeyer commented Jan 30, 2017 edited

This might be a stupid question, but I'll ask it anyways:

If I package a piece of software, I define the nix expression for it. How do I add it to IPFS without changing the nix expression? I mean... I write down the default.nix, build it... and now things get pushed to IPFS, I get a hash. Now I need to add this hash to the default.nix - so, I need to change it...

or am I getting something wrong here?


Forget the above. I guess I get it...

@mguentner mguentner closed this Feb 4, 2017
@nmikhailov

@mguentner So what is the status of this? Why was it closed?

@mguentner

@nmikhailov

Nothing changed much since #1167 (comment)
I am trying to get everything organised over @ https://github.com/NixIPFS - the core infrastructure for the initial distribution needs to be setup and in parallel the binary cache needs to be merged into nix.
If you have spare servers/storage for the initial distribution have a look @ NixIPFS/infrastructure#1

Again, you can try the current status using this nixops config:
https://github.com/NixIPFS/notes/tree/master/examples/binary_cache

(Once the machines are setup, do nixops ssh -d name bc_user and realise a path, e.g. nix-store --realise /nix/store/005zk8a10js00kbhgcbq48h4cv5im1qn-yelp-3.22.0)

@matthiasbeyer
Contributor

@mguentner Is there/will there be a way to partially mirror a channel? I have machines, but not enough storage for a complete channel...

@mguentner

You will be able to run a local gateway that serves content from the binary cache and then caches/redistributes the content until it is garbage collected (LRU) depending on how much storage you allocate for this. If you want to warm your cache with a partial channel, you need to write a script/nixos test that requests all the hashes you are interested in storing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment