This repository has been archived by the owner. It is now read-only.

Migrate most notable formulae to core #6331

Closed
iMichka opened this Issue Sep 20, 2017 · 32 comments

Comments

Projects
None yet
7 participants
@iMichka
Copy link
Contributor

iMichka commented Sep 20, 2017

Formulae with enough notability for core

  • matplotlib - 8500 do not migrate: use pip or vendor.
  • samtools - 3769
  • htslib - 2784 #6366
  • bedtools - 1682 #6343
  • mumps - 1610
  • lammps - 1554
  • bowtie2 - 1441 #6345
  • bwa - 1406
  • openalpr - 1239
  • vcftools - 1068 #6517
  • sratoolkit - 1065
  • openni - 1000
  • openimageio - 861
  • bcftools - 790
  • igv - 786 #6367
  • cantera - 763
  • picard-tools - 755
  • bamtools - 734
  • kallisto - 717
  • bowtie - 681
  • express - 577
  • fastx_toolkit - 519
  • rstudio-server - 517 (removed)
  • anvio - 481
  • raxml - 481 #6346
  • cd-hit - 447
  • bedops - 434
  • astrometry-net - 421
  • diamond - 390
  • trinity - 379
  • prodigal - 375
  • cgns - 375
  • butterflow - 352
  • g2o - 329
  • field3d - 327
  • nextflow - 325
  • seqtk - 325
  • igvtools - 315
  • freebayes - 299
  • prokka - 296
  • boost-compute - 268
  • trilinos - 266
  • jellyfish - 264
  • nest - 257
  • oce - 256
  • cmdstan - 239
  • libsigrokdecode - 236
  • gatk - 232
  • vt - 207
  • orocos-kdl - 205
  • salmon - 196
  • bioawk - 195
  • abyss - 193 #6332
  • cutadapt - 188
  • rna-star - 185
  • vigra - 180
  • pandaseq - 177
  • dealii - 175
  • samtools@0.1 - 173
  • shogun - 165
  • vcflib - 162
  • plink2 - 161
  • lmod - 159
  • canu - 159
  • arrayfire - 157
  • dynare - 153
  • alembic - 151
  • blasr - 151
  • liblbfgs - 146
  • simpleitk - 135
  • libccd - 134
  • genometools - 133
  • vsearch - 132
  • velvet - 131
  • nanopolish - 125
  • phyml - 124
  • prank - 121
  • poretools - 121
  • shark - 119
  • mothur - 115
  • littler - 112
  • stringtie - 110
  • dwgsim - 110
  • beast - 97 #6341
  • sambamba - 93
  • opencollada - 90
  • openbr - 90
  • kraken - 90
  • artemis - 90
  • delly - 88
  • sumo - 87
  • sickle - 86
  • seqan - 86
  • osgearth - 83
  • soapdenovo - 82
  • pilon - 78
  • arb - 76
  • beast2 - 74
  • megahit - 70
  • lastz - 69
  • p4est - 67
  • crfsuite - 66
  • mash - 62
  • alpscore - 60
  • galsim - 58
  • flint - 55
  • apophenia - 53
  • dgtal - 51
  • yices - 48
  • sga - 48
  • bam-readcount - 47
  • miniasm - 46
  • tamarin-prover - 46
  • vislcg3 - 44
  • minimap - 43
  • vcfanno - 43
  • bamutil - 41
  • symengine - 41
  • itensor - 40
  • xraylib - 40
  • ensembl-tools - 39
  • bali-phy - 38 (removed, #6428)
  • mallet - 34
  • t-coffee - 34
  • idba - 34
  • hyphy - 33
  • adam - 31
  • wiggletools - 30
  • data-science-toolbox - 30
  • samblaster - 29
  • kat - 27
  • harry - 26
  • bpipe - 26
  • orthofinder - 25
  • beetl - 25
  • sailfish - 24
  • lumpy-sv - 24
  • fplll - 24
  • pykep - 23
  • swarm - 23
  • madlib - 23
  • ascii_plots - 22
  • acado - 22 - has only beta versions, need to be stable first: acado/acado#233
  • ucsc-genome-browser - 22
  • blis - 21
  • velvetoptimiser - 20
  • galib - 20
  • lighter - 19
  • unicycler - 18
  • libsbol - 17
  • ray - 17
  • xbyak - 16
  • nixio - 14
  • arcs - 14
  • newick-utils - 13
  • sally - 13
  • centrifuge - 12
  • oswitch - 12
  • libminc - 10
  • statismo - 10
  • libdivsufsort - 10
  • daligner - 10
  • reapr - 9
  • omcompiler - 9
  • elemental - 8
  • snap-aligner - 7
  • k8 - 7
  • sdsl-lite - 6
  • dazz_db - 6
  • methpipe - 5
  • ome-files - 5
  • psmc - 5
  • grabix - 4
  • joinx - 4
  • ome-common - 4
  • biopieces - 4
  • ome-xml - 4
  • fermi-lite - 4
  • wopr - 4
  • fermi - 3
  • snid - 2
  • cusp - 2
  • fermikit - 2
  • mhap - 1
  • vague - 0
  • ogdraw - 0

Formulae which are not hosted on GitHub but worth migrating

@iMichka

This comment has been minimized.

Copy link
Contributor

iMichka commented Sep 20, 2017

@sjackman @MikeMcQuaid @ilovezfs @fxcoudert @jonchang

This is the list compiled by @ilovezfs for all formulae hosted on GitHub which are notable enough to be migrated to core. I do not say we need to migrate everything, some of these do not build anymore. We should write down the state for each formula here to keep track of the migration.

We can extend the list with a separate section for formulae that are not on GitHub, but notable enough to be migrated.

@iMichka iMichka added the help wanted label Sep 20, 2017

@sjackman

This comment has been minimized.

Copy link
Contributor

sjackman commented Sep 20, 2017

@tmozgach and I are looking into the downloads stats for those formulae that are not on GitHub.

@iMichka

This comment has been minimized.

Copy link
Contributor

iMichka commented Sep 22, 2017

I'm adding download stats for the 90 last days for each formula. Everything which has >= 50 downloads is good to go, others may get rejected. Be careful with beta/alpha versions.

@ilovezfs

This comment has been minimized.

Copy link
Contributor

ilovezfs commented Sep 22, 2017

Also, if it's packaged by Debian but falls short of the 50 installs in 90 days line, then it may still be OK. If you want to argue that something which falls short of the 50 installs in 90 days line, and is also not packaged by Debian, should still be migrated, I'm happy to hear the argument.

@iMichka once you have the stats, it might be worth sorting the list by them.

@iMichka

This comment has been minimized.

Copy link
Contributor

iMichka commented Sep 22, 2017

Yes, I'll sort the list, so that we can focus on the most used ones first.

@ilovezfs

This comment has been minimized.

Copy link
Contributor

ilovezfs commented Sep 22, 2017

And of course if something was added to science < 90 days ago, we'll have to prorate it :)

@ilovezfs

This comment has been minimized.

Copy link
Contributor

ilovezfs commented Sep 22, 2017

Hmm I'm seeing lower numbers. For example,

bash-4.4$ brew formula-analytics --days-ago=90 jellyfish
install events in the last 90 days for jellyfish
=====================================================================================================
1 | homebrew/science/jellyfish                                                        | 198 | 100.00%
=====================================================================================================
Total                                                                                 | 198 |    100%
=====================================================================================================

But you have jellyfish - 264. That is odd.

@iMichka

This comment has been minimized.

Copy link
Contributor

iMichka commented Sep 23, 2017

Not sure about the discrepancy. I used the we interface, maybe I did something wrong. Anyway I used the same method for all the formulae here. Some may have a little bit more or less installs, I did not always count the installs done with options.

78 formulae won't make it. We should concentrate on the 108 first, going from the top to the bottom.

@ilovezfs

This comment has been minimized.

Copy link
Contributor

ilovezfs commented Sep 24, 2017

Here are the ones with at least 50 install events in the last 90 days, with list sorted by install-on-request events. Table is (name, install-on-request events, install events)

matplotlib 3267 5231
samtools 1443 2270
lammps 914 1033
bedtools 858 907
openalpr 614 663
bwa 564 806
cantera 557 577
vcftools 429 636
sratoolkit 426 624
bowtie2 422 1073
openimageio 401 552
openni 400 539
kallisto 381 407
rstudio-server 375 407
bcftools 345 406
picard-tools 332 500
igv 310 462
anvio 292 289
fastx_toolkit 238 271
bowtie 233 383
astrometry-net 225 251
nest 206 210
cd-hit 188 273
mumps 182 1522
butterflow 178 195
bedops 164 272
g2o 164 186
trinity 163 207
bamtools 161 560
raxml 157 317
htslib 151 2567
seqtk 150 156
prokka 140 154
freebayes 137 185
gatk 126 126
igvtools 123 182
nextflow 113 215
boost-compute 105 184
trilinos 102 172
vt 98 98
cutadapt 95 99
blasr 95 106
salmon 94 99
express 94 487
abyss 93 101
bioawk 89 90
rna-star 87 94
samtools@0.1 86 108
dealii 85 96
canu 82 89
arrayfire 80 91
pandaseq 79 101
cgns 76 259
shogun 74 80
libsigrokdecode 73 175
dynare 72 78
cmdstan 71 123
plink2 68 90
alembic 65 86
phyml 64 83
diamond 62 327
lmod 61 73
velvet 59 71
simpleitk 58 76
nanopolish 58 65
prank 56 58
vsearch 55 75
stringtie 54 58
oce 54 201
jellyfish 54 198
kraken 53 57
shark 52 70
openbr 52 65
poretools 50 63
genometools 49 86
mothur 48 61
prodigal 48 329
littler 43 71
dwgsim 43 56
artemis 40 57
sumo 40 51
sambamba 38 57
liblbfgs 37 109
libccd 37 104
delly 36 54
vigra 34 128
opencollada 31 55
vcflib 30 147
field3d 28 295
orocos-kdl 16 198
@ilovezfs

This comment has been minimized.

Copy link
Contributor

ilovezfs commented Sep 24, 2017

And here is the same for the ones with less than 50 install events in the last 90 days:

soapdenovo 40 42
sickle 39 41
seqan 37 49
crfsuite 37 38
osgearth 34 42
idba 34 36
beast2 33 41
arb 32 46
pilon 32 40
galsim 32 33
vislcg3 31 35
megahit 29 43
alpscore 28 36
mash 27 39
sga 26 39
lastz 26 39
yices 25 25
tamarin-prover 24 24
beast 24 24
flint 23 36
p4est 22 40
xraylib 21 25
itensor 21 24
bam-readcount 21 22
symengine 21 21
bamutil 20 20
apophenia 19 22
adam 18 19
miniasm 18 18
nixio 17 18
t-coffee 17 17
acado 17 17
orthofinder 16 16
kat 15 18
samblaster 15 17
hyphy 15 16
data-science-toolbox 15 15
bali-phy 14 23
vcfanno 14 15
ensembl-tools 14 14
mallet 13 19
harry 13 13
wiggletools 12 14
sailfish 12 12
pykep 11 11
ascii_plots 11 11
dgtal 10 39
minimap 10 22
bpipe 10 16
swarm 10 11
velvetoptimiser 10 10
ucsc-genome-browser 10 10
galib 10 10
blis 10 10
unicycler 9 9
lighter 9 16
lumpy-sv 9 13
ray 8 9
libsbol 8 8
madlib 8 13
beetl 8 10
sally 7 7
centrifuge 7 7
arcs 7 7
fplll 7 23
newick-utils 5 8
libdivsufsort 5 5
daligner 5 5
psmc 4 5
libminc 4 4
ome-common 3 5
ome-files 3 4
omcompiler 3 4
sdsl-lite 3 3
k8 3 3
elemental 3 3
dazz_db 3 3
oswitch 3 10
reapr 2 5
wopr 2 2
snap-aligner 2 2
methpipe 2 2
joinx 2 2
grabix 2 2
fermi-lite 2 2
biopieces 2 2
xbyak 2 15
ome-xml 1 3
snid 1 1
fermikit 1 1
fermi 1 1
cusp 1 1
mhap 1 0
vague 0 0
statismo 0 0
ogdraw 0 0
@sjackman

This comment has been minimized.

Copy link
Contributor

sjackman commented Sep 24, 2017

@tmozgach See above for the list of Homebrew/science formulae that are being moved to Homebrew/core and a table of the number of downloads in the last ninety days.

@HadrienG

This comment has been minimized.

Copy link
Contributor

HadrienG commented Oct 4, 2017

I was in the process of moving mothur.rb ( Homebrew/homebrew-core#18906 ) and it doesn't go so smoothly as Core doesn't accept bin.install for new formulas

I cross-compared the list of notable formulas above with the list of formulas containing bin.install and here is the list:

bedops
bioawk
butterflow
bwa
canu
cmdstan
delly
dwgsim
freebayes
igv
kraken
lammps
littler
mothur
nanopolish
nextflow
openni
phyml
plink2
prank
prodigal
raxml
rna-star
rstudio-server
sambamba
samtools
samtools@0.1
seqtk
stringtie
vcflib
velvet
vt

These formulas will be tricky to migrate as they will probably require changing the Makefile upstream

Commands used:

rg "bin.install" -l > bin_install.txt
comm -12 <(cut -f 1 -d ' ' notable.txt | sort) <(cat bin_install.txt | rev | cut -c 4- | rev | sort)
@MikeMcQuaid

This comment has been minimized.

Copy link
Member

MikeMcQuaid commented Oct 4, 2017

I was in the process of moving mothur.rb ( Homebrew/homebrew-core#18906 ) and it doesn't go so smoothly as Core doesn't accept bin.install for new formulas

That's not quite the case; formulae with Makefiles we should have changes submitted to add a make install to that. This makes the software easier to package everywhere, not just Homebrew.

@HadrienG

This comment has been minimized.

Copy link
Contributor

HadrienG commented Oct 4, 2017

I don't dispute the fact that having a make install step is better practice :)
That doesn't change the fact that bin.install doesn't pass review in core and that the above formulas will require changes to their Makefiles if we want to migrate them

@ilovezfs

This comment has been minimized.

Copy link
Contributor

ilovezfs commented Oct 4, 2017

Things without make install (or an install script that takes an installation destination as an argument) were typically not intended for distribution in package managers. If upstream is unwilling to add a make install target, or an install script that takes an installation location, or we can't even contact them to request it (or contact them with a patch adding it ourselves and get it merged) then it's likely we're going to have a hard time reporting any future bugs and getting them fixed.

In my experience, most upstreams are perfectly happy to add this when asked. And if we can't contact them to ask, that's an even bigger problem.

@charleshan5330 charleshan5330 referenced this issue Oct 16, 2017

Closed

matplotlib: 2.1.0 #6382

8 of 8 tasks complete

@ilovezfs ilovezfs referenced this issue Oct 26, 2017

Closed

fgsl 1.2.0 (new formula) #19881

4 of 4 tasks complete
@sjackman

This comment has been minimized.

Copy link
Contributor

sjackman commented Nov 2, 2017

Just curious, who is planning to work on migrating the eligible formulae to Homebrew/core? Tanya and I can chip in for the bioinformatics formulae. Is anyone planning on tackling the others?

@sjackman sjackman referenced this issue Nov 5, 2017

Closed

rstudio-server v1.1.383 #6393

7 of 7 tasks complete
@McNoggins

This comment has been minimized.

Copy link

McNoggins commented Nov 9, 2017

Hi everyone, I'd like to help with this as I am a regular user of quite a few of the software on this list. I don't have a lot of experience with Homebrew but I could help migrating stuff such as mumps. Is there a standard way of doing this or is it just as simple as submitting a new formula to core?

@sjackman

This comment has been minimized.

Copy link
Contributor

sjackman commented Nov 9, 2017

@ilovezfs Is there a brew command to move a formula from one repo to another and retain its git history? Do we want to retain its git history?

@sjackman

This comment has been minimized.

Copy link
Contributor

sjackman commented Nov 9, 2017

@McNoggins Great! Thanks for your offer to help, Denis.

@ilovezfs

This comment has been minimized.

Copy link
Contributor

ilovezfs commented Nov 10, 2017

@sjackman no brew command and not if you mean into core. But you can do that with some git fu if you mean into a third party tap. Personally I wouldn't worry about it in that case either but it is possible.

@sjackman

This comment has been minimized.

Copy link
Contributor

sjackman commented Nov 10, 2017

@McNoggins There you are. Just submit a new formula PR to Homebrew/core. You'll also need to submit a PR to Homebrew/science to remove the formula and add a line to
https://github.com/Homebrew/homebrew-science/blob/master/tap_migrations.json
Please link the two PRs with a comment.

@sjackman

This comment has been minimized.

Copy link
Contributor

sjackman commented Dec 23, 2017

The following tables include the number of installations on macOS is in the last 90 days as of 2017-12-13. I also have installations for Linux, not included here. The GitHub Notable column is as of 2017-10-11.

Bioinformatics

https://gist.github.com/sjackman/d1db0d73597b674bb23b51221a2914cd#file-metrics-bioinformatics-macos-tsv

All of Homebrew/science

https://gist.github.com/sjackman/d1db0d73597b674bb23b51221a2914cd#file-metrics-science-macos-tsv

@bblacey

This comment has been minimized.

Copy link
Contributor

bblacey commented Dec 31, 2017

Just finished porting opencascade to core and considering additional Formulae upon which FreeCAD depends. What minimum requirements must be met for a science formula to be imported to core other than a volunteer to port it? I am considering the following:

orocos-kdl
med-file
nglib
matplotlib

Also, what does the deprecation of homebrew-science mean for Linuxbrew?

@bblacey

This comment has been minimized.

Copy link
Contributor

bblacey commented Dec 31, 2017

I just found and read #6365 so it seems like volunteer plus brew audit --online as the initial gating requirements? The volunteer must also increase the Formula quality/maintainability to pass brew audit --ilovezfs before it is accepted to core? Is my understanding correct?

@MikeMcQuaid

This comment has been minimized.

Copy link
Member

MikeMcQuaid commented Jan 1, 2018

I just found and read #6365 so it seems like volunteer plus brew audit --online as the initial gating requirements? The volunteer must also increase the Formula quality/maintainability to pass brew audit --ilovezfs before it is accepted to core? Is my understanding correct?

@bblacey Yup 👍, thanks for clarifiying.

@MikeMcQuaid

This comment has been minimized.

Copy link
Member

MikeMcQuaid commented Jan 1, 2018

This tap has been deprecated and will shortly be archived.

If you wish to migrate other widely used formulae listed here to Homebrew/homebrew-core please submit a pull request there.

@sjackman

This comment has been minimized.

Copy link
Contributor

sjackman commented Jan 12, 2018

The formulae in Homebrew/science have been archived at https://github.com/brewsci/homebrew-science. I've created a new tap for bioinformatics formulae at https://github.com/brewsci/homebrew-bio.
🍺 brew tap brewsci/bio
🍺 brew tap brewsci/science
💀 brew untap homebrew/science

@MikeMcQuaid

This comment has been minimized.

Copy link
Member

MikeMcQuaid commented Jan 12, 2018

Please note, though, that https://github.com/brewsci/homebrew-science is the same tap with the same issues that led to this one being deprecated/archived and is not endorsed in any way by the Homebrew project.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.