D7.1: The flow of code and patches in open source projects #148

Closed
minrk opened this Issue Sep 8, 2015 · 35 comments

Comments

Projects
None yet
6 participants
@minrk
Contributor

minrk commented Sep 8, 2015

OpenDreamKit builds on top of a large ecosystem of (mostly) academic open-source systems, many of which are large-scale themselves: for example our chosen test system SageMath is the outcome of a decade of work by hundred of contributors; many others are decades old. The social engineering aspects involved in such a large ecosystems are therefore both intricate and central for its long run sustainability. This motivates OpenDreamKit's objective in WP7 of studying the collaborative processes of free open source (mathematical) software development so as to produce guidelines for best practice as well as to develop ideas for extending existing processes to an “ecosystem of systems”.

In this deliverable we survey the methodology, data, and tools needed to assess development models of large-scale academic open-source systems, such as the probable correlation between the size of the atomic contribution vs. the speed of the contribution making it into the code, and collect appropriate statistical data, to be published as a report (and possibly a conference publication). While in the proposal it was assumed that the latter might require non-trivial amount of programming work, even only for our test system, great open-source tools to address precisely these kinds of questions were released last year, and we used one of them instead.

Accomplishments:

  • a large number of publications and online sources was reviewed for applicability
  • various analytic tools were tried on a sample of SageMath components
  • results were summarised in a report, with conclusions and pointers to further possible developments

@minrk minrk added this to the D7.1 milestone Sep 8, 2015

@nthiery nthiery modified the milestones: Month 18: 2017-02-28, D7.1 Mar 22, 2016

@alex-konovalov

This comment has been minimized.

Show comment
Hide comment
@alex-konovalov

alex-konovalov Jun 29, 2016

Member

This is interesting: The shape of open source in which @arfon presents' some GitHub analytics with nice visualisations. There is no tool to produce such visualisation for other repositories: it was a combination of D3 and MySQL queries against GitHub databases. One could try to look at Cauldron - the platform to analyse GitHub repositorise using [Grimoire Lab](http://grimoirelab.github.io/ and https://cauldron.io/) analytical tools.

Member

alex-konovalov commented Jun 29, 2016

This is interesting: The shape of open source in which @arfon presents' some GitHub analytics with nice visualisations. There is no tool to produce such visualisation for other repositories: it was a combination of D3 and MySQL queries against GitHub databases. One could try to look at Cauldron - the platform to analyse GitHub repositorise using [Grimoire Lab](http://grimoirelab.github.io/ and https://cauldron.io/) analytical tools.

@alex-konovalov
@npch

This comment has been minimized.

Show comment
Hide comment
@npch

npch Aug 11, 2016

It might be interesting to use the tidytext package to do analysis of some of this data extracted into tidy data form - see http://juliasilge.com/blog/Life-Changing-Magic/ http://varianceexplained.org/r/trump-tweets/

npch commented Aug 11, 2016

It might be interesting to use the tidytext package to do analysis of some of this data extracted into tidy data form - see http://juliasilge.com/blog/Life-Changing-Magic/ http://varianceexplained.org/r/trump-tweets/

@bpilorget

This comment has been minimized.

Show comment
Hide comment
@bpilorget

bpilorget Nov 21, 2016

Contributor

@dimpase (WP leader and lead beneficiary)
This deliverable is due for February 2017

Contributor

bpilorget commented Nov 21, 2016

@dimpase (WP leader and lead beneficiary)
This deliverable is due for February 2017

@nthiery

This comment has been minimized.

Show comment
Hide comment
@nthiery

nthiery Feb 6, 2017

Contributor

Dear M18 deliverable leaders,

Just a reminder that reports are due for mid-february, to buy us some time for proofreading, feedback, and final submission before February 28th. See our README for details on the process.

In practice, I'll be offline February 12-19, and the week right after will be pretty busy. Therefore, it would be helpful if a first draft could be available sometime this week, so that I can have a head start reviewing it.

Thanks in advance!

Contributor

nthiery commented Feb 6, 2017

Dear M18 deliverable leaders,

Just a reminder that reports are due for mid-february, to buy us some time for proofreading, feedback, and final submission before February 28th. See our README for details on the process.

In practice, I'll be offline February 12-19, and the week right after will be pretty busy. Therefore, it would be helpful if a first draft could be available sometime this week, so that I can have a head start reviewing it.

Thanks in advance!

@bpilorget

This comment has been minimized.

Show comment
Hide comment
@bpilorget

bpilorget Feb 22, 2017

Contributor

@dimpase How is everything going? A report must be delivered by the 28th February

Contributor

bpilorget commented Feb 22, 2017

@dimpase How is everything going? A report must be delivered by the 28th February

@dimpase

This comment has been minimized.

Show comment
Hide comment
@dimpase

dimpase Feb 26, 2017

Contributor

@bpilorget @nthiery I commited the 1st draft, will do a bit of fiddling, like proofreading etc, but that's basically done, as far as I am concerned.

Contributor

dimpase commented Feb 26, 2017

@bpilorget @nthiery I commited the 1st draft, will do a bit of fiddling, like proofreading etc, but that's basically done, as far as I am concerned.

@nthiery

This comment has been minimized.

Show comment
Hide comment
@nthiery

nthiery Feb 26, 2017

Contributor

Hi @dimpase,

Thanks for this first draft; that's an interesting survey of the litterature.

I proofread the tex file, fixed some minor typos and pushed.

For the abstract (aka github issue description), please edit the first comment on this issue, rather than the markdown file in the repo. To automatically update the latter:

rm WP7/D7.1/report.pdf WP7/D7.1/github-issue-description.*
make WP7/D7.1/report.pdf

In terms of the content of the abstract, you may want to take e.g. #98 as an example, and check the notes in the README.

TODO:

  • In the abstract and/or the report itself: clarify what was achieved for this deliverable in addition to reviewing the litterature: which data was or will be collected, which tool was or will be implemented, etc.
  • In the conclusion: "ought to be developped": does this just mean that this is required to derive significant conclusions, or that we actually are planning to do it as part of ODK?
  • @alex-konovalov: could you please have a look at the report and provide feed back?

Thanks in advance!

Contributor

nthiery commented Feb 26, 2017

Hi @dimpase,

Thanks for this first draft; that's an interesting survey of the litterature.

I proofread the tex file, fixed some minor typos and pushed.

For the abstract (aka github issue description), please edit the first comment on this issue, rather than the markdown file in the repo. To automatically update the latter:

rm WP7/D7.1/report.pdf WP7/D7.1/github-issue-description.*
make WP7/D7.1/report.pdf

In terms of the content of the abstract, you may want to take e.g. #98 as an example, and check the notes in the README.

TODO:

  • In the abstract and/or the report itself: clarify what was achieved for this deliverable in addition to reviewing the litterature: which data was or will be collected, which tool was or will be implemented, etc.
  • In the conclusion: "ought to be developped": does this just mean that this is required to derive significant conclusions, or that we actually are planning to do it as part of ODK?
  • @alex-konovalov: could you please have a look at the report and provide feed back?

Thanks in advance!

@dimpase

This comment has been minimized.

Show comment
Hide comment
@dimpase

dimpase Feb 26, 2017

Contributor

@nthiery Thanks. I followed your instructions: edited the 1st comment; removed github-issue-description.*, then make failed, as it complained about inability to find github-issue-description.md - and so I put the latter back. Then make did run, and created pdf - with an empty abstract.
Do you mean to say that github-issue-description.tex is meant to be generated by getting the 1st comment from github (via its API?) ?

EDIT: I missed that I needed more stuff like yaml pip-installed... I get

$ make WP7/D7.1/report.pdf
(issue=`python3 bin/get_issue WP7/D7.1/report.tex`; echo "# Deliverable description, as taken from Github issue #$issue on `date -I` {.notoc}\n"; python3 bin/get_issue_body $issue) > WP7/D7.1/github-issue-description.md
Traceback (most recent call last):
  File "bin/get_issue", line 5, in <module>
    import yaml
ImportError: No module named 'yaml'
Traceback (most recent call last):
  File "bin/get_issue_body", line 4, in <module>
    from github import Github
ImportError: No module named 'github'
make: *** [Makefile:21: WP7/D7.1/github-issue-description.md] Error 1

OK, so far I got to the stage where I miss pandoc, having installed pyyaml and PyGithub...

Contributor

dimpase commented Feb 26, 2017

@nthiery Thanks. I followed your instructions: edited the 1st comment; removed github-issue-description.*, then make failed, as it complained about inability to find github-issue-description.md - and so I put the latter back. Then make did run, and created pdf - with an empty abstract.
Do you mean to say that github-issue-description.tex is meant to be generated by getting the 1st comment from github (via its API?) ?

EDIT: I missed that I needed more stuff like yaml pip-installed... I get

$ make WP7/D7.1/report.pdf
(issue=`python3 bin/get_issue WP7/D7.1/report.tex`; echo "# Deliverable description, as taken from Github issue #$issue on `date -I` {.notoc}\n"; python3 bin/get_issue_body $issue) > WP7/D7.1/github-issue-description.md
Traceback (most recent call last):
  File "bin/get_issue", line 5, in <module>
    import yaml
ImportError: No module named 'yaml'
Traceback (most recent call last):
  File "bin/get_issue_body", line 4, in <module>
    from github import Github
ImportError: No module named 'github'
make: *** [Makefile:21: WP7/D7.1/github-issue-description.md] Error 1

OK, so far I got to the stage where I miss pandoc, having installed pyyaml and PyGithub...

@alex-konovalov

This comment has been minimized.

Show comment
Hide comment
@alex-konovalov

alex-konovalov Feb 27, 2017

Member

Thanks. From a quick glance what I can see is:

  • need to update issue description in the report
  • figure 2 matches my impression
  • Figure 3 also matches my "if it's done, it's done" motto: most of PR are closed quickly.
  • figure 3 is not easy to read initially; both figures would benefit from some text describing what's there.
  • can we have an equivalent of figure 3 for open issues?
  • for GAP, you may wish to cite this blog post http://www.codima.ac.uk/2016/03/09/gap-on-github-one-year-on/ somewhere
  • desirable to have more pictures, not only about GAP. Can we analyse some other repositories?
Member

alex-konovalov commented Feb 27, 2017

Thanks. From a quick glance what I can see is:

  • need to update issue description in the report
  • figure 2 matches my impression
  • Figure 3 also matches my "if it's done, it's done" motto: most of PR are closed quickly.
  • figure 3 is not easy to read initially; both figures would benefit from some text describing what's there.
  • can we have an equivalent of figure 3 for open issues?
  • for GAP, you may wish to cite this blog post http://www.codima.ac.uk/2016/03/09/gap-on-github-one-year-on/ somewhere
  • desirable to have more pictures, not only about GAP. Can we analyse some other repositories?
@alex-konovalov

This comment has been minimized.

Show comment
Hide comment
@alex-konovalov

alex-konovalov Feb 27, 2017

Member

Time permitting, I'd like a better horizontal resolution and individual months on Fig.3, and also more clearly indicated month markers on the horizontal axis.

Member

alex-konovalov commented Feb 27, 2017

Time permitting, I'd like a better horizontal resolution and individual months on Fig.3, and also more clearly indicated month markers on the horizontal axis.

@nthiery

This comment has been minimized.

Show comment
Hide comment
@nthiery

nthiery Feb 27, 2017

Contributor
Contributor

nthiery commented Feb 27, 2017

@dimpase

This comment has been minimized.

Show comment
Hide comment
@dimpase

dimpase Feb 27, 2017

Contributor

@nthiery it did work, yes---although there is some strange {.notoc}\n string that makes it all way into the resulting pdf. Not sure where it comes from, it is already in markdown.

Contributor

dimpase commented Feb 27, 2017

@nthiery it did work, yes---although there is some strange {.notoc}\n string that makes it all way into the resulting pdf. Not sure where it comes from, it is already in markdown.

@nthiery

This comment has been minimized.

Show comment
Hide comment
@nthiery

nthiery Feb 27, 2017

Contributor

Just to avoid conflicts: I am now about to do some minor edits to the issue description and report, and check the .notoc thingy at this occasion.

Contributor

nthiery commented Feb 27, 2017

Just to avoid conflicts: I am now about to do some minor edits to the issue description and report, and check the .notoc thingy at this occasion.

@nthiery

This comment has been minimized.

Show comment
Hide comment
@nthiery

nthiery Feb 27, 2017

Contributor
Contributor

nthiery commented Feb 27, 2017

@dimpase

This comment has been minimized.

Show comment
Hide comment
@dimpase

dimpase Feb 27, 2017

Contributor

just for the record: I used

$ pandoc -v
pandoc 1.19.2.1
Compiled with pandoc-types 1.17.0.4, texmath 0.9.1, skylighting 0.3
Contributor

dimpase commented Feb 27, 2017

just for the record: I used

$ pandoc -v
pandoc 1.19.2.1
Compiled with pandoc-types 1.17.0.4, texmath 0.9.1, skylighting 0.3
@nthiery

This comment has been minimized.

Show comment
Hide comment
@nthiery

nthiery Feb 27, 2017

Contributor

Ok; thanks for the info. My pandoc is older :-) Anyway; no big deal, to be investigated later.

Contributor

nthiery commented Feb 27, 2017

Ok; thanks for the info. My pandoc is older :-) Anyway; no big deal, to be investigated later.

@dimpase

This comment has been minimized.

Show comment
Hide comment
@dimpase

dimpase Feb 27, 2017

Contributor
Contributor

dimpase commented Feb 27, 2017

@nthiery

This comment has been minimized.

Show comment
Hide comment
@nthiery

nthiery Feb 27, 2017

Contributor

@OpenDreamKit/wp7: feedback on the report for this deliverable is welcome! You can access the current pdf by clicking "Final report" above. Well, it's not quite final, but will be soon :-)
@dimpase: you may want to update report-final.pdf from time to time.

Contributor

nthiery commented Feb 27, 2017

@OpenDreamKit/wp7: feedback on the report for this deliverable is welcome! You can access the current pdf by clicking "Final report" above. Well, it's not quite final, but will be soon :-)
@dimpase: you may want to update report-final.pdf from time to time.

@nthiery

This comment has been minimized.

Show comment
Hide comment
@nthiery

nthiery Feb 27, 2017

Contributor

Yes, I am done! (see also my private e-mail)

Contributor

nthiery commented Feb 27, 2017

Yes, I am done! (see also my private e-mail)

@dimpase

This comment has been minimized.

Show comment
Hide comment
@dimpase

dimpase Feb 27, 2017

Contributor

I've edited the D7.1 description to indicate software availability, as discussed, and pushed the corr. changes. Now to teaching, till the evening.

Contributor

dimpase commented Feb 27, 2017

I've edited the D7.1 description to indicate software availability, as discussed, and pushed the corr. changes. Now to teaching, till the evening.

@alex-konovalov

This comment has been minimized.

Show comment
Hide comment
@alex-konovalov

alex-konovalov Feb 27, 2017

Member

I've submitted PR #219 merged by @dimpase. I also think that the section "Openness, licensing, etc" is not necessary. There is no unique point of view here - while @dimpase argues that GPL-style licenses are "better in sense of keeping the community together", many open science advocates will advise to use as permissive license as possible to facilitate maximal reuse, and this may point to other licenses. From my point of view, the topic of the report is "The flow of code and patches in open source projects" and it's not required to cover licenses here.

Member

alex-konovalov commented Feb 27, 2017

I've submitted PR #219 merged by @dimpase. I also think that the section "Openness, licensing, etc" is not necessary. There is no unique point of view here - while @dimpase argues that GPL-style licenses are "better in sense of keeping the community together", many open science advocates will advise to use as permissive license as possible to facilitate maximal reuse, and this may point to other licenses. From my point of view, the topic of the report is "The flow of code and patches in open source projects" and it's not required to cover licenses here.

@dimpase

This comment has been minimized.

Show comment
Hide comment
@dimpase

dimpase Feb 28, 2017

Contributor

Well, flow of code and patches depend on the license type, no doubt.

The only place where more permissive is better is commercialisation. Licenses are akin to locks.
If you do not lock your bicycle it will be more used, and indeed it's more convenient not to bother with the locks, the problem is that very soon you might not see it again :-). As we are interested in building an open-source VRE, a long-term project, better locks might come handy...

Contributor

dimpase commented Feb 28, 2017

Well, flow of code and patches depend on the license type, no doubt.

The only place where more permissive is better is commercialisation. Licenses are akin to locks.
If you do not lock your bicycle it will be more used, and indeed it's more convenient not to bother with the locks, the problem is that very soon you might not see it again :-). As we are interested in building an open-source VRE, a long-term project, better locks might come handy...

@dimpase

This comment has been minimized.

Show comment
Hide comment
@dimpase

dimpase Feb 28, 2017

Contributor

as far as I am concerned it is ready for submission. But feel free to modify...

Contributor

dimpase commented Feb 28, 2017

as far as I am concerned it is ready for submission. But feel free to modify...

@nthiery

This comment has been minimized.

Show comment
Hide comment
@nthiery

nthiery Feb 28, 2017

Contributor

Cool. @alex-konovalov or anyone from @OpenDreamKit/wp7: let me know if you still want to do some work on this deliverable.
By default, I am now planning to spend about one hour on D1.4, and submit this one after.
Cheers,

Contributor

nthiery commented Feb 28, 2017

Cool. @alex-konovalov or anyone from @OpenDreamKit/wp7: let me know if you still want to do some work on this deliverable.
By default, I am now planning to spend about one hour on D1.4, and submit this one after.
Cheers,

@alex-konovalov

This comment has been minimized.

Show comment
Hide comment
@alex-konovalov

alex-konovalov Feb 28, 2017

Member

@nthiery I have several typos fixed - will push soon. But the question about section 6 still remains open...

Member

alex-konovalov commented Feb 28, 2017

@nthiery I have several typos fixed - will push soon. But the question about section 6 still remains open...

@nthiery

This comment has been minimized.

Show comment
Hide comment
@nthiery

nthiery Feb 28, 2017

Contributor

I agree with @dimpase that there is something to be said about licences; they certainly do influence the flow of code and patches. So often it's licensing issues that have prevented the flow of code from otherwise usefull software. We could mention a couple striking examples which hurt us badly, like gap3, Nauty, or graphviz which we could not include as standard packages in Sage.
That being said, I agree with @alex-konovalov that I'd rather avoid getting into opinions and religious debates about the pros and cons of specific licences.
Anyone up for a quick rewrite of this section in the next hour or so?
Otherwise, I'll just strip out the section and submit.

Contributor

nthiery commented Feb 28, 2017

I agree with @dimpase that there is something to be said about licences; they certainly do influence the flow of code and patches. So often it's licensing issues that have prevented the flow of code from otherwise usefull software. We could mention a couple striking examples which hurt us badly, like gap3, Nauty, or graphviz which we could not include as standard packages in Sage.
That being said, I agree with @alex-konovalov that I'd rather avoid getting into opinions and religious debates about the pros and cons of specific licences.
Anyone up for a quick rewrite of this section in the next hour or so?
Otherwise, I'll just strip out the section and submit.

@dimpase

This comment has been minimized.

Show comment
Hide comment
@dimpase

dimpase Feb 28, 2017

Contributor

I can tone this down, but I don't think removing it completely makes sense. IMHO in UK in particular it has become so customary to ignore open source as something that does not make £££ (impactwise, too EPSRC until recently absolutely did not encourage any open-source projects whatsoever), and so the opinion in universities was always tilted toward easy commercialisation. But we are not aiming at selling ODK to the highest bidder, do we?

Further I really do not see a harm in mentioning our point of view in the report.

And indeed we can mention how much time it did cost me to force nauty to be released under a GPL-compatible license... Hell, count it towards the time spent on this deliverable. :-)

Contributor

dimpase commented Feb 28, 2017

I can tone this down, but I don't think removing it completely makes sense. IMHO in UK in particular it has become so customary to ignore open source as something that does not make £££ (impactwise, too EPSRC until recently absolutely did not encourage any open-source projects whatsoever), and so the opinion in universities was always tilted toward easy commercialisation. But we are not aiming at selling ODK to the highest bidder, do we?

Further I really do not see a harm in mentioning our point of view in the report.

And indeed we can mention how much time it did cost me to force nauty to be released under a GPL-compatible license... Hell, count it towards the time spent on this deliverable. :-)

@alex-konovalov

This comment has been minimized.

Show comment
Hide comment
@alex-konovalov

alex-konovalov Feb 28, 2017

Member

@dimpase it's basically about the 2nd paragraph.

Member

alex-konovalov commented Feb 28, 2017

@dimpase it's basically about the 2nd paragraph.

@nthiery

This comment has been minimized.

Show comment
Hide comment
@nthiery

nthiery Feb 28, 2017

Contributor

I agree about mentionning our point of view about open source licenses. But for GPL vs BSD this is not even something there is a consensus about in ODK. E.g. Jupyter uses revised BSD.

Contributor

nthiery commented Feb 28, 2017

I agree about mentionning our point of view about open source licenses. But for GPL vs BSD this is not even something there is a consensus about in ODK. E.g. Jupyter uses revised BSD.

@nthiery

This comment has been minimized.

Show comment
Hide comment
@nthiery

nthiery Feb 28, 2017

Contributor

I can just remove the paragraph about GPL vs BSD. But then section 6 becomes rather tiny, unless it gets fleshed up a bit, e.g. with some Nauty/... story.
Thus, let me ask again: is there someone ready to take the time to implement that now?
We need to submit, and I need to go to bed not too late :-)

Contributor

nthiery commented Feb 28, 2017

I can just remove the paragraph about GPL vs BSD. But then section 6 becomes rather tiny, unless it gets fleshed up a bit, e.g. with some Nauty/... story.
Thus, let me ask again: is there someone ready to take the time to implement that now?
We need to submit, and I need to go to bed not too late :-)

@alex-konovalov

This comment has been minimized.

Show comment
Hide comment
@nthiery

This comment has been minimized.

Show comment
Hide comment
@nthiery

nthiery Feb 28, 2017

Contributor

For the record: we resolved the Section 6 status by chatting on gitter.

Contributor

nthiery commented Feb 28, 2017

For the record: we resolved the Section 6 status by chatting on gitter.

@dimpase

This comment has been minimized.

Show comment
Hide comment
@dimpase

dimpase Feb 28, 2017

Contributor

I propose to include in Sage a module written in this language and licenced under https://choosealicense.com/licenses/wtfpl/

Contributor

dimpase commented Feb 28, 2017

I propose to include in Sage a module written in this language and licenced under https://choosealicense.com/licenses/wtfpl/

@nthiery

This comment has been minimized.

Show comment
Hide comment
@nthiery

nthiery Feb 28, 2017

Contributor

Submitted!
Thanks @dimpase for your work on this deliverable and report, both being borderline w.r.t. our usual comfort zone :-)
Thanks @alex-konovalov for the reviewing help!

Contributor

nthiery commented Feb 28, 2017

Submitted!
Thanks @dimpase for your work on this deliverable and report, both being borderline w.r.t. our usual comfort zone :-)
Thanks @alex-konovalov for the reviewing help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment