Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't ship binaries and tarballs in repositories #6

Closed
bgruening opened this issue Aug 7, 2015 · 18 comments
Closed

Don't ship binaries and tarballs in repositories #6

bgruening opened this issue Aug 7, 2015 · 18 comments

Comments

@bgruening
Copy link
Member

I would like to put the binaries/tarballs etc. into a public location, separated from the source (Dockerfile), or use the original source (if they are trustable).
This will improve usability for us developers and shrink dramatically the download time. Moreover, a few packages like PeptideShaker are to big to store them in github.

Maybe the EBI can sponsor some storage or we can try to get some Google-Drive running?
The Galaxy project is running this service: http://depot.galaxyproject.org/ most of our target tools should be there or we can put them into the depot if you agree to work more closely with the Galaxy community?

@ypriverol
Copy link
Member

@bgruening for now if we can allocate the space using http://depot.galaxyproject.org/ it would be fantastic. From the EBI, I will try to get some support but it would be mainly after the project is mature enough. Also, it would be important to keep the project as a community effort and I guess the Galaxy host for now is the best option. Opinions @leprevost @BioDocker/contributors
+1

@ypriverol ypriverol added this to the First Release of the project milestone Aug 7, 2015
@prvst
Copy link
Member

prvst commented Aug 7, 2015

I totally agree with removing source and binaries from repositories. I did that way first because it was more convenient for starting, also because some of the tools still depend on Source Forge and their service has been intermittent for the pass weeks.

@ypriverol
Copy link
Member

@bgruening @leprevost if we agree with issue #7 we can remove this from here.

@prvst
Copy link
Member

prvst commented Aug 8, 2015

agreed

On Fri, Aug 7, 2015 at 5:53 PM Yasset Perez-Riverol <
notifications@github.com> wrote:

@bgruening https://github.com/bgruening @leprevost
https://github.com/Leprevost if we agree with issue #7
#7 we can remove this
from here.


Reply to this email directly or view it on GitHub
#6 (comment)
.

@ypriverol
Copy link
Member

We will no support for now the binaries or tarballs inside the containers and in the future we will try to provide this feature through other resources such as Galaxy FTP, or other server. For now close the issue and agree we will only support github sources?

@sauloal
Copy link
Member

sauloal commented Aug 19, 2015

"only github sources". would that limit self hosted (by the creators) packages?

@ypriverol
Copy link
Member

@sauloal the idea is that as a project we should provide a way of cheking the quality of the containers, if the source server is not available then we will lost the reference to the package and them some of the tool will fail, for the end user this process needs to be blind, I dont want to download something that already fail. As starting point and for the health of @BioDocker I think would be good to have healthy containers until we found a way of checking this automatically and remove them. What do you think?

@sauloal
Copy link
Member

sauloal commented Aug 19, 2015

@ypriverol , I agree with the concept but the implementation is a problem. For example, a assembler I've created a docker container for can only be downloaded from the university's website after a request to the author. This is actually disturbingly common because the author use the number of request as argument for further funding from the university/agencies.

That said, source forge would be considered a good and stable repository until recently. As far as we know, github might be 6 months away of becoming source forge. then what? github as a preferred repository?. sure. exclusive? ....

@ypriverol
Copy link
Member

@sauloal @BioDocker/contributors As I say before, no perfect solution at the moment. In any case those examples when you need authentication to download the source will not work with any approach becuase you always need to provide the url and the users need to know howto subscribe etc. We need to cover a set of containers and images that make easy the work for the end-user. We @bgruening were talking about support from public servers and other community projects to host the sources such as Galaxy. We can start with those containers we can support with github and then, move on.

@sauloal if you have already a use case where the source of the in't in github, then we should think about how to support that, any public server solution?

@sauloal
Copy link
Member

sauloal commented Aug 20, 2015

@ypriverol , Masurca assembler comes to mind, a genome assembler. ( http://www.genome.umd.edu/masurca_compile.html ). This is the case where you can only download the code from their website after emailing the developer.

Another example is Quorum ( http://www.genome.umd.edu/quorum.html ), from the same group, a error correction tool for NGS. although you can download it directly (no need to ask for permission), the code is available at the university's FTP site.

My point is, I don't think that it should be set in stone whether we will exclusively accept github hosted software.

@bgruening
Copy link
Member Author

@sauloal I don't think we can include Masurca or Quorum here or in any other package management system, unless we get the permission to redistribute there tools without restrictions. People will learn that this is bad overtime and this will change, I'm pretty sure.

I think what @ypriverol is pointing out we should advertise stable download repositories. This can and will change over time but for reproducibility we need an sustainable archive. I'm on your site to make it not github only, but the should also keep an eye on the URLs we include and move tarball to more stable places. See my poorly attempt here: https://github.com/bgruening/download_store

In the end we need a replicable, distributable object store, maybe based on torrents so that every university can contribute to make packages sustainable and make us independent from commercial hosting services.

@ypriverol
Copy link
Member

@sauloal @bgruening My point is do this, by steps because then is difficult to control and grows the project. We can start the project by supporting github or other stable providers and and keep the issue open until we found a long-term solution. We are already evaluating options and this must be the aim now:
1- The source tarball should be provided by formal/stable/long-term support servers.
2- This is a community-driven effort, open-source, free, etc... if the providers needs license, subscriptions, we need to think in the best way to support this use cases, but my guess is that will be hard.
3- We should look and propose solutions for this long-term server support, we talk about Galaxy, EBI, any other?

If you are agree we can leave open this issue and look for options.

@sauloal
Copy link
Member

sauloal commented Aug 20, 2015

@bgruening , I really liked your idea of download_store . regarding quorum, you don't need to request it to be able to download. it is open and available on the university's FTP

@ypriverol , I completely agree with the spirit of your idea. I just think that there are too many tools that still are not in stable servers. this could really impact our reach. Could we somehow estimate how many programs from galaxy are privately hosted?

That said, I second your proposition of starting only with github, as long as we agree that it is a best practice and not a rule set in stone which we won't change.

Regarding your point 2, we should create a section describing how it is strictly forbidden to upload unlicensed programs.

Regarding your point 3, we can also try google and amazon, both of which have biology initiatives which could may be host our data.

@prvst
Copy link
Member

prvst commented Aug 21, 2015

what about using github itself to host binaries? We could have an separate
account or repository only with binaries. Inside the container we could
just push the files.

On Thu, Aug 20, 2015 at 10:47 AM Saulo notifications@github.com wrote:

@bgruening https://github.com/bgruening , I really liked your idea of
download_store . regarding quorum, you don't need to request it to be able
to download. it is open and available on the university's FTP

@ypriverol https://github.com/ypriverol , I completely agree with the
spirit of your idea. I just think that there are too many tools that still
are not in stable servers. this could really impact our reach. Could we
somehow estimate how many programs from galaxy are privately hosted?

That said, I second your proposition of starting only with github, as long
as we agree that it is a best practice and not a rule set in stone which we
won't change.

Regarding your point 2, we should create a section describing how it is
strictly forbidden to upload unlicensed programs.

Regarding your point 3, we can also try google and amazon, both of which
have biology initiatives which could may be host our data.


Reply to this email directly or view it on GitHub
#6 (comment).

@sauloal
Copy link
Member

sauloal commented Aug 21, 2015

@leprevost , that's exacly what @bgruening 's repository does

@prvst
Copy link
Member

prvst commented Aug 26, 2015

OK, lets move on with this point. I will create a repository called binaries inside the biodocker group and I will start moving the binaries. If everyone is in favor of using that solution for now, I will close this thread.

@sauloal
Copy link
Member

sauloal commented Aug 26, 2015

+1

1 similar comment
@bgruening
Copy link
Member Author

👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants