New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Add IPFS to Nix #1167

Closed
wants to merge 4 commits into
from

Conversation

Projects
None yet
@mguentner

mguentner commented Dec 29, 2016

This adds IPFS support to Nix. 馃殌
It adds .nar files to IPFS aswell, and writes the resulting hash to the .narinfo and signs the results.
When the .narinfo is accessed, the .nar can be fetched from IPFS instead of using the HTTP method.

Please have a look at a nixops hydra setup where this is explained in more detail.
https://github.com/mguentner/nix-ipfs

This is a proof of concept. More code will follow once the design is approved and finished.

Ref: #859
Ref: ipfs/notes#51

mguentner added some commits Dec 3, 2016

libstore: add sources depending on config
Signed-off-by: Maximilian G眉ntner <code@klandest.in>
libstore: add basic IPFS support
adds support to 'cat' and 'add' .nar files to IPFS.
If IPFS should be used to fetch .nar files without
using the API interface, a gateway can be used
aswell. Adding file through a gateway is not possible.

Signed-off-by: Maximilian G眉ntner <code@klandest.in>

@mguentner mguentner referenced this pull request Jan 1, 2017

Open

Nix and IPFS #859

@wscott

This comment has been minimized.

Show comment
Hide comment
@wscott

wscott Jan 1, 2017

Is the .nar file compressed? If not, you might want to enable the rabin-fingerprint chunker when writing the .nar files. This will allow deduplication of identical files inside multiple archives. I don't think IPFS uses rabin by default yet.

From the command-line that is done with 'ifps add --chunker rabin FILE'.
You might also want to tweak the params to use a large blocksize.

Even compressed can work if you use a 'rsync-able' compression algorithm.

wscott commented Jan 1, 2017

Is the .nar file compressed? If not, you might want to enable the rabin-fingerprint chunker when writing the .nar files. This will allow deduplication of identical files inside multiple archives. I don't think IPFS uses rabin by default yet.

From the command-line that is done with 'ifps add --chunker rabin FILE'.
You might also want to tweak the params to use a large blocksize.

Even compressed can work if you use a 'rsync-able' compression algorithm.

@mguentner

This comment has been minimized.

Show comment
Hide comment
@mguentner

mguentner Jan 1, 2017

@wscott : The .nar files are compressed according to the compression option. So in order to have best results this should be set to compression=none when creating the binary cache/nar files.
The --chunker option looks interesting, I will look into it.

@wscott : The .nar files are compressed according to the compression option. So in order to have best results this should be set to compression=none when creating the binary cache/nar files.
The --chunker option looks interesting, I will look into it.

@@ -4,14 +4,48 @@ libstore_NAME = libnixstore
libstore_DIR := $(d)
libstore_SOURCES := $(wildcard $(d)/*.cc)
libstore_SOURCES := \

This comment has been minimized.

@domenkozar

domenkozar Jan 2, 2017

Member

Why is wildcard not used?

@domenkozar

domenkozar Jan 2, 2017

Member

Why is wildcard not used?

This comment has been minimized.

@mguentner

mguentner Jan 2, 2017

Since the additional IPFS sources are the second config dependent input for libstore I wanted to make a clean solution instead of following s3-binary-cache-store.cc (being the first one) with adding a lot of #if ENABLE to the source files. That way only sources are compiled and linked that the config requires. Makes the build processes a bit cleaner and easier to debug.

@mguentner

mguentner Jan 2, 2017

Since the additional IPFS sources are the second config dependent input for libstore I wanted to make a clean solution instead of following s3-binary-cache-store.cc (being the first one) with adding a lot of #if ENABLE to the source files. That way only sources are compiled and linked that the config requires. Makes the build processes a bit cleaner and easier to debug.

mguentner added some commits Jan 3, 2017

libstore: make IPFS publishing optional
Signed-off-by: Maximilian G眉ntner <code@klandest.in>
libstore: remove comment
Signed-off-by: Maximilian G眉ntner <code@klandest.in>
@mguentner

This comment has been minimized.

Show comment
Hide comment
@mguentner

mguentner Jan 3, 2017

Publishing to IPFS is now optional (default off/disabled). An example of how to generate a uncompressed binary cache looks like this:

nix copy --to file:///var/www/example.org/cache?secret-key=/etc/nix/hydra.example.org-1/secret\&compression=none\&publish-to-ipfs=1 -r /nix/store/wkhdf9jinag5750mqlax6z2zbwhqb76n-hello-2.10/

Publishing to IPFS is now optional (default off/disabled). An example of how to generate a uncompressed binary cache looks like this:

nix copy --to file:///var/www/example.org/cache?secret-key=/etc/nix/hydra.example.org-1/secret\&compression=none\&publish-to-ipfs=1 -r /nix/store/wkhdf9jinag5750mqlax6z2zbwhqb76n-hello-2.10/
@mguentner

This comment has been minimized.

Show comment
Hide comment
@mguentner

mguentner Jan 8, 2017

Added for future reference.

The .narinfo is fingerprinted and signed. The fingerprint currently includes
this information

std::string ValidPathInfo::fingerprint() const
{
    if (narSize == 0 || !narHash)
        throw Error(format("cannot calculate fingerprint of path 鈥%s鈥 because its size/hash is not known")
            % path);
    return
        "1;" + path + ";"
        + printHashType(narHash.type) + ":" + printHash32(narHash) + ";"
        + std::to_string(narSize) + ";"
        + concatStringsSep(",", references);
}

[1]

So the IPFSHash is signed indirectly through the narHash since this will be compared to the hash
of the result of the IPFS download (i.e. the downloaded data is not validated until it has been fetched
completely). However the current design provides that the .narinfo is fetched from a trusted source (i.e.
cache.nixos.org using TLS).

[1] From: https://github.com/NixOS/nix/blob/master/src/libstore/store-api.cc#L523

Added for future reference.

The .narinfo is fingerprinted and signed. The fingerprint currently includes
this information

std::string ValidPathInfo::fingerprint() const
{
    if (narSize == 0 || !narHash)
        throw Error(format("cannot calculate fingerprint of path 鈥%s鈥 because its size/hash is not known")
            % path);
    return
        "1;" + path + ";"
        + printHashType(narHash.type) + ":" + printHash32(narHash) + ";"
        + std::to_string(narSize) + ";"
        + concatStringsSep(",", references);
}

[1]

So the IPFSHash is signed indirectly through the narHash since this will be compared to the hash
of the result of the IPFS download (i.e. the downloaded data is not validated until it has been fetched
completely). However the current design provides that the .narinfo is fetched from a trusted source (i.e.
cache.nixos.org using TLS).

[1] From: https://github.com/NixOS/nix/blob/master/src/libstore/store-api.cc#L523

@copumpkin

This comment has been minimized.

Show comment
Hide comment
Member

copumpkin commented Jan 9, 2017

@shlevy

This comment has been minimized.

Show comment
Hide comment
@shlevy

shlevy Jan 10, 2017

Member

Can you write up a quick summary of what IPFS is and why I should care?

Member

shlevy commented Jan 10, 2017

Can you write up a quick summary of what IPFS is and why I should care?

@Mic92

This comment has been minimized.

Show comment
Hide comment
@Mic92

Mic92 Jan 18, 2017

Contributor

@shlevy I cannot speak for @mguentner, but @vcunat provide some motivation for IPFS

Contributor

Mic92 commented Jan 18, 2017

@shlevy I cannot speak for @mguentner, but @vcunat provide some motivation for IPFS

@veprbl

This comment has been minimized.

Show comment
Hide comment
@veprbl

veprbl Jan 18, 2017

Contributor

I haven't been carefully tracking IPFS threads, but since nobody else answered I will pile my thoughts here, and people will correct me.

As far as I understand IPFS can make a global storage for NAR's so people can choose to host their builds or cache builds from others. This can potentially unload some bandwidth load from the Hydra S3.

Some enthusiasts could build and share things that are not currently being build by Hydra like python packages. This is already possible now, but it requires doing two things:

  1. Establishing a source (url of "nix-serve" instance)
  2. Establishing trust (NAR signing, SSL)

IPFS could eliminate the first step since the namespace now becomes global and NAR's could probably be discovered through IPLD. The current implementation doesn't do that because it requires IPFS address to be served with nar-info. But the distributed NAR hosting should work already.

There was also a discussion about implementing file or chunk deduplication over IPFS which could have potential for reducing sizes of things. Is this supposed to happen for download size or the size on the disk? I don't know.

Anything written above might be wrong. I don't claim that I possess excessive knowledge on the topic discussed. Please, don't get angry :)

Contributor

veprbl commented Jan 18, 2017

I haven't been carefully tracking IPFS threads, but since nobody else answered I will pile my thoughts here, and people will correct me.

As far as I understand IPFS can make a global storage for NAR's so people can choose to host their builds or cache builds from others. This can potentially unload some bandwidth load from the Hydra S3.

Some enthusiasts could build and share things that are not currently being build by Hydra like python packages. This is already possible now, but it requires doing two things:

  1. Establishing a source (url of "nix-serve" instance)
  2. Establishing trust (NAR signing, SSL)

IPFS could eliminate the first step since the namespace now becomes global and NAR's could probably be discovered through IPLD. The current implementation doesn't do that because it requires IPFS address to be served with nar-info. But the distributed NAR hosting should work already.

There was also a discussion about implementing file or chunk deduplication over IPFS which could have potential for reducing sizes of things. Is this supposed to happen for download size or the size on the disk? I don't know.

Anything written above might be wrong. I don't claim that I possess excessive knowledge on the topic discussed. Please, don't get angry :)

@mguentner

This comment has been minimized.

Show comment
Hide comment
@mguentner

mguentner Jan 19, 2017

I wrote an article which explains why IPFS could be useful for NixOS.
That gives some answers to @shlevy's question:
https://sourcediver.org/blog/2017/01/18/distributing-nixos-with-ipfs-part-1/

Also I think that you are quite right @veprbl 馃憤

mguentner commented Jan 19, 2017

I wrote an article which explains why IPFS could be useful for NixOS.
That gives some answers to @shlevy's question:
https://sourcediver.org/blog/2017/01/18/distributing-nixos-with-ipfs-part-1/

Also I think that you are quite right @veprbl 馃憤

@shlevy

This comment has been minimized.

Show comment
Hide comment
@shlevy

shlevy Jan 19, 2017

Member

Cool, thanks! Awesome idea.

Member

shlevy commented Jan 19, 2017

Cool, thanks! Awesome idea.

else if (cmd == "cat_gw")
return "/ipfs/" + arg;
else
throw "No such command";

This comment has been minimized.

@edolstra

edolstra Jan 19, 2017

Member

I don't think we catch strings anywhere, so this should be throw Error("...").

@edolstra

edolstra Jan 19, 2017

Member

I don't think we catch strings anywhere, so this should be throw Error("...").

}
ipfsHash = IPFSAccessor::addFile(narPath, *narCompressed);
if (!ipfsHash.empty()) {

This comment has been minimized.

@edolstra

edolstra Jan 19, 2017

Member

Why would ipfsHash be empty here?

@edolstra

edolstra Jan 19, 2017

Member

Why would ipfsHash be empty here?

This comment has been minimized.

@rht

rht Jan 22, 2017

In any case, it shouldn't -- an IpfsNode should emit the hash + name to the client when the process completes without an err.

@rht

rht Jan 22, 2017

In any case, it shouldn't -- an IpfsNode should emit the hash + name to the client when the process completes without an err.

This comment has been minimized.

@mguentner

mguentner Jan 23, 2017

The file is uploaded through the HTTP API and a lot can go wrong there.
As for the cpp part, this is the relevant code:
https://github.com/vasild/cpp-ipfs-api/blob/master/src/client.cc#L164

I have tested the code in this PR with more paths after posting this RFC and quite some requests failed silently as the function is void and does not raise anything. As this is unacceptable, the next iteration of the implementation needs to include error handling when adding files if this feature will be even included.
Reason:
My research into IPFS revealed that one needs to pay attention to a lot of things (it's not ftp after all). These include trivial things like selecting a chunker and rather complex tasks like collecting garbage after n ipfs adds while not throwing away unpinned content (race condition).
From a design perspective, adding nars/nix store paths to IPFS must be handles by a separate tool as this is to much complexity to go into Nix (following Ken Thompson's philosophy here).
(Also part of the reason for #1167 (comment))

@mguentner

mguentner Jan 23, 2017

The file is uploaded through the HTTP API and a lot can go wrong there.
As for the cpp part, this is the relevant code:
https://github.com/vasild/cpp-ipfs-api/blob/master/src/client.cc#L164

I have tested the code in this PR with more paths after posting this RFC and quite some requests failed silently as the function is void and does not raise anything. As this is unacceptable, the next iteration of the implementation needs to include error handling when adding files if this feature will be even included.
Reason:
My research into IPFS revealed that one needs to pay attention to a lot of things (it's not ftp after all). These include trivial things like selecting a chunker and rather complex tasks like collecting garbage after n ipfs adds while not throwing away unpinned content (race condition).
From a design perspective, adding nars/nix store paths to IPFS must be handles by a separate tool as this is to much complexity to go into Nix (following Ken Thompson's philosophy here).
(Also part of the reason for #1167 (comment))

@@ -41,6 +41,8 @@ NarInfo::NarInfo(const Store & store, const std::string & s, const std::string &
compression = value;
else if (name == "FileHash")
fileHash = parseHashField(value);
else if (name == "IPFSHash")

This comment has been minimized.

@Ericson2314

Ericson2314 Jan 19, 2017

Member

Should we call this IpfsNarHash or something for future comparability with computing Nars on the fly from some nicer format? Or am I jumping the gun for a file called nar-info.cc after all :).

@Ericson2314

Ericson2314 Jan 19, 2017

Member

Should we call this IpfsNarHash or something for future comparability with computing Nars on the fly from some nicer format? Or am I jumping the gun for a file called nar-info.cc after all :).

@@ -290,8 +310,23 @@ bool BinaryCacheStore::isValidPathUncached(const Path & storePath)
void BinaryCacheStore::narFromPath(const Path & storePath, Sink & sink)
{
auto info = queryPathInfo(storePath).cast<const NarInfo>();
std::shared_ptr<std::string> nar;
#if ENABLE_IPFS

This comment has been minimized.

@nbp

nbp Jan 19, 2017

Member

As commented, downloading from IPFS can open the door for new attacks. I will recommend adding a download-from-ipfs flag as well which is disabled by default.

@nbp

nbp Jan 19, 2017

Member

As commented, downloading from IPFS can open the door for new attacks. I will recommend adding a download-from-ipfs flag as well which is disabled by default.

@mguentner

This comment has been minimized.

Show comment
Hide comment
@mguentner

mguentner Jan 20, 2017

Thank you for the reviews / comments. I am currently rewriting the implementation as injecting IPFS directly into BinaryCacheStore seems not like the best way to go forward anymore -
instead I am writing a dedicated IPFSCacheStore that resolves /ipfs/

Stay tuned.

Thank you for the reviews / comments. I am currently rewriting the implementation as injecting IPFS directly into BinaryCacheStore seems not like the best way to go forward anymore -
instead I am writing a dedicated IPFSCacheStore that resolves /ipfs/

Stay tuned.

@edolstra

This comment has been minimized.

Show comment
Hide comment
@edolstra

edolstra Jan 20, 2017

Member

@mguentner How would that work, and how would it handle narinfo files?

The present approach seems reasonable to me. It just needs a flag to enable/disable IPFS (e.g. as part of the store URI, like https://cache.nixos.org/?ipfs=true).

Another possibility: rather than add an IPFSHash field to narinfo, we could turn the URL field into a list, allowing the binary cache to announce multiple ways to get the file:

URL: nar/1cvgji7mk3q68f257fmwlqvz8rhfdla6y0lxwqq8nwxagy3w34cx.nar.xz # i.e. HTTP, relative URI
URL: ipfs:/ipfs/QmPXME1oRtoT627YKaDPDQ3PwA8tdP9rWuAAweLzqSwAWT

This would have the advantage of not putting transport-specific info in the NarInfo data structure.

Member

edolstra commented Jan 20, 2017

@mguentner How would that work, and how would it handle narinfo files?

The present approach seems reasonable to me. It just needs a flag to enable/disable IPFS (e.g. as part of the store URI, like https://cache.nixos.org/?ipfs=true).

Another possibility: rather than add an IPFSHash field to narinfo, we could turn the URL field into a list, allowing the binary cache to announce multiple ways to get the file:

URL: nar/1cvgji7mk3q68f257fmwlqvz8rhfdla6y0lxwqq8nwxagy3w34cx.nar.xz # i.e. HTTP, relative URI
URL: ipfs:/ipfs/QmPXME1oRtoT627YKaDPDQ3PwA8tdP9rWuAAweLzqSwAWT

This would have the advantage of not putting transport-specific info in the NarInfo data structure.

@Ericson2314

This comment has been minimized.

Show comment
Hide comment
@Ericson2314

Ericson2314 Jan 20, 2017

Member

@mguentner I like that proposed code structure. From just how you described it (I haven't read the interface your implementing), I think having two separate implementations segues into a narless world nicely.

Member

Ericson2314 commented Jan 20, 2017

@mguentner I like that proposed code structure. From just how you described it (I haven't read the interface your implementing), I think having two separate implementations segues into a narless world nicely.

@mguentner

This comment has been minimized.

Show comment
Hide comment
@mguentner

mguentner Jan 29, 2017

The binary cache implementation is finished:
https://github.com/NixIPFS/nix/tree/ipfs_binary_cache

Have a look at the code - there are some small TODOs in there but

    binaryCaches = [
      "/ipns/nix.ipfs.sourcediver.org/binary_cache"
    ];

just works 馃帀 and can me merged soonish.

If you want to test it (easy: nixops VirtualBox config), have a look at
https://github.com/NixIPFS/notes/tree/master/examples/binary_cache

Currently there is only release-16.09-small but release-16.09 has been tested as well and will follow once we have the infrastructure in place (have a look at https://github.com/NixIPFS/infrastructure/issues) 馃憤

The binary cache implementation is finished:
https://github.com/NixIPFS/nix/tree/ipfs_binary_cache

Have a look at the code - there are some small TODOs in there but

    binaryCaches = [
      "/ipns/nix.ipfs.sourcediver.org/binary_cache"
    ];

just works 馃帀 and can me merged soonish.

If you want to test it (easy: nixops VirtualBox config), have a look at
https://github.com/NixIPFS/notes/tree/master/examples/binary_cache

Currently there is only release-16.09-small but release-16.09 has been tested as well and will follow once we have the infrastructure in place (have a look at https://github.com/NixIPFS/infrastructure/issues) 馃憤

@mguentner

This comment has been minimized.

Show comment
Hide comment
@mguentner

mguentner Jan 29, 2017

@edolstra Addressing your question:
The .narinfo files are available just like using https://cache.nixos.org with their signature from cache.nixos.org-1.
Example:
https://ipfs.io/ipfs/QmR9NabMW7E3XLksJTdtpWYsro9EZpy1NxcbtEtw8Cr8Sq/binary_cache/00n5n3g1jlffq11d4mq7hy1d6yr3x91p.narinfo

I like your idea of adding multiple URL: but NarInfo files are currently a map/dict and you are suggesting to make it a multimap, which I think adds too much complexity.
I still think that adding &ipfs=true to any binary cache is a nice to have but it currently only makes sense using the HTTP cache, right? So we would add code to BinaryCache that is only usable for one BinaryCache, namely the HTTPBinaryCache. In my opinion, the right way to go forward is to use a IPFS binary cache until another form of binary distribution is found (see #1006 + #859) or we will create way too much technical debt.

@edolstra Addressing your question:
The .narinfo files are available just like using https://cache.nixos.org with their signature from cache.nixos.org-1.
Example:
https://ipfs.io/ipfs/QmR9NabMW7E3XLksJTdtpWYsro9EZpy1NxcbtEtw8Cr8Sq/binary_cache/00n5n3g1jlffq11d4mq7hy1d6yr3x91p.narinfo

I like your idea of adding multiple URL: but NarInfo files are currently a map/dict and you are suggesting to make it a multimap, which I think adds too much complexity.
I still think that adding &ipfs=true to any binary cache is a nice to have but it currently only makes sense using the HTTP cache, right? So we would add code to BinaryCache that is only usable for one BinaryCache, namely the HTTPBinaryCache. In my opinion, the right way to go forward is to use a IPFS binary cache until another form of binary distribution is found (see #1006 + #859) or we will create way too much technical debt.

@matthiasbeyer

This comment has been minimized.

Show comment
Hide comment
@matthiasbeyer

matthiasbeyer Jan 30, 2017

Contributor

This might be a stupid question, but I'll ask it anyways:

If I package a piece of software, I define the nix expression for it. How do I add it to IPFS without changing the nix expression? I mean... I write down the default.nix, build it... and now things get pushed to IPFS, I get a hash. Now I need to add this hash to the default.nix - so, I need to change it...

or am I getting something wrong here?


Forget the above. I guess I get it...

Contributor

matthiasbeyer commented Jan 30, 2017

This might be a stupid question, but I'll ask it anyways:

If I package a piece of software, I define the nix expression for it. How do I add it to IPFS without changing the nix expression? I mean... I write down the default.nix, build it... and now things get pushed to IPFS, I get a hash. Now I need to add this hash to the default.nix - so, I need to change it...

or am I getting something wrong here?


Forget the above. I guess I get it...

@mguentner mguentner closed this Feb 4, 2017

@nmikhailov

This comment has been minimized.

Show comment
Hide comment
@nmikhailov

nmikhailov Feb 4, 2017

@mguentner So what is the status of this? Why was it closed?

@mguentner So what is the status of this? Why was it closed?

@mguentner

This comment has been minimized.

Show comment
Hide comment
@mguentner

mguentner Feb 4, 2017

@nmikhailov

Nothing changed much since #1167 (comment)
I am trying to get everything organised over @ https://github.com/NixIPFS - the core infrastructure for the initial distribution needs to be setup and in parallel the binary cache needs to be merged into nix.
If you have spare servers/storage for the initial distribution have a look @ NixIPFS/infrastructure#1

Again, you can try the current status using this nixops config:
https://github.com/NixIPFS/notes/tree/master/examples/binary_cache

(Once the machines are setup, do nixops ssh -d name bc_user and realise a path, e.g. nix-store --realise /nix/store/005zk8a10js00kbhgcbq48h4cv5im1qn-yelp-3.22.0)

@nmikhailov

Nothing changed much since #1167 (comment)
I am trying to get everything organised over @ https://github.com/NixIPFS - the core infrastructure for the initial distribution needs to be setup and in parallel the binary cache needs to be merged into nix.
If you have spare servers/storage for the initial distribution have a look @ NixIPFS/infrastructure#1

Again, you can try the current status using this nixops config:
https://github.com/NixIPFS/notes/tree/master/examples/binary_cache

(Once the machines are setup, do nixops ssh -d name bc_user and realise a path, e.g. nix-store --realise /nix/store/005zk8a10js00kbhgcbq48h4cv5im1qn-yelp-3.22.0)

@matthiasbeyer

This comment has been minimized.

Show comment
Hide comment
@matthiasbeyer

matthiasbeyer Feb 4, 2017

Contributor

@mguentner Is there/will there be a way to partially mirror a channel? I have machines, but not enough storage for a complete channel...

Contributor

matthiasbeyer commented Feb 4, 2017

@mguentner Is there/will there be a way to partially mirror a channel? I have machines, but not enough storage for a complete channel...

@mguentner

This comment has been minimized.

Show comment
Hide comment
@mguentner

mguentner Feb 4, 2017

You will be able to run a local gateway that serves content from the binary cache and then caches/redistributes the content until it is garbage collected (LRU) depending on how much storage you allocate for this. If you want to warm your cache with a partial channel, you need to write a script/nixos test that requests all the hashes you are interested in storing.

You will be able to run a local gateway that serves content from the binary cache and then caches/redistributes the content until it is garbage collected (LRU) depending on how much storage you allocate for this. If you want to warm your cache with a partial channel, you need to write a script/nixos test that requests all the hashes you are interested in storing.

@vcunat vcunat referenced this pull request Sep 15, 2017

Closed

cache.nixos.org is down #29389

@davidak davidak referenced this pull request Feb 6, 2018

Open

Peer-to-Peer substitutes #8

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment