Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

D7.1: The flow of code and patches in open source projects #148

Closed
minrk opened this issue Sep 8, 2015 · 35 comments
Closed

D7.1: The flow of code and patches in open source projects #148

minrk opened this issue Sep 8, 2015 · 35 comments

Comments

@minrk
Copy link
Contributor

@minrk minrk commented Sep 8, 2015

OpenDreamKit builds on top of a large ecosystem of (mostly) academic open-source systems, many of which are large-scale themselves: for example our chosen test system SageMath is the outcome of a decade of work by hundred of contributors; many others are decades old. The social engineering aspects involved in such a large ecosystems are therefore both intricate and central for its long run sustainability. This motivates OpenDreamKit's objective in WP7 of studying the collaborative processes of free open source (mathematical) software development so as to produce guidelines for best practice as well as to develop ideas for extending existing processes to an “ecosystem of systems”.

In this deliverable we survey the methodology, data, and tools needed to assess development models of large-scale academic open-source systems, such as the probable correlation between the size of the atomic contribution vs. the speed of the contribution making it into the code, and collect appropriate statistical data, to be published as a report (and possibly a conference publication). While in the proposal it was assumed that the latter might require non-trivial amount of programming work, even only for our test system, great open-source tools to address precisely these kinds of questions were released last year, and we used one of them instead.

Accomplishments:

  • a large number of publications and online sources was reviewed for applicability
  • various analytic tools were tried on a sample of SageMath components
  • results were summarised in a report, with conclusions and pointers to further possible developments
@minrk minrk added this to the D7.1 milestone Sep 8, 2015
@nthiery nthiery modified the milestones: Month 18: 2017-02-28, D7.1 Mar 22, 2016
@alex-konovalov
Copy link
Member

@alex-konovalov alex-konovalov commented Jun 29, 2016

This is interesting: The shape of open source in which @arfon presents' some GitHub analytics with nice visualisations. There is no tool to produce such visualisation for other repositories: it was a combination of D3 and MySQL queries against GitHub databases. One could try to look at Cauldron - the platform to analyse GitHub repositorise using [Grimoire Lab](http://grimoirelab.github.io/ and https://cauldron.io/) analytical tools.

@npch
Copy link

@npch npch commented Aug 11, 2016

It might be interesting to use the tidytext package to do analysis of some of this data extracted into tidy data form - see http://juliasilge.com/blog/Life-Changing-Magic/ http://varianceexplained.org/r/trump-tweets/

@bpilorget
Copy link
Contributor

@bpilorget bpilorget commented Nov 21, 2016

@dimpase (WP leader and lead beneficiary)
This deliverable is due for February 2017

@nthiery
Copy link
Contributor

@nthiery nthiery commented Feb 6, 2017

Dear M18 deliverable leaders,

Just a reminder that reports are due for mid-february, to buy us some time for proofreading, feedback, and final submission before February 28th. See our README for details on the process.

In practice, I'll be offline February 12-19, and the week right after will be pretty busy. Therefore, it would be helpful if a first draft could be available sometime this week, so that I can have a head start reviewing it.

Thanks in advance!

@bpilorget
Copy link
Contributor

@bpilorget bpilorget commented Feb 22, 2017

@dimpase How is everything going? A report must be delivered by the 28th February

@dimpase
Copy link
Contributor

@dimpase dimpase commented Feb 26, 2017

@bpilorget @nthiery I commited the 1st draft, will do a bit of fiddling, like proofreading etc, but that's basically done, as far as I am concerned.

@nthiery
Copy link
Contributor

@nthiery nthiery commented Feb 26, 2017

Hi @dimpase,

Thanks for this first draft; that's an interesting survey of the litterature.

I proofread the tex file, fixed some minor typos and pushed.

For the abstract (aka github issue description), please edit the first comment on this issue, rather than the markdown file in the repo. To automatically update the latter:

rm WP7/D7.1/report.pdf WP7/D7.1/github-issue-description.*
make WP7/D7.1/report.pdf

In terms of the content of the abstract, you may want to take e.g. #98 as an example, and check the notes in the README.

TODO:

  • In the abstract and/or the report itself: clarify what was achieved for this deliverable in addition to reviewing the litterature: which data was or will be collected, which tool was or will be implemented, etc.
  • In the conclusion: "ought to be developped": does this just mean that this is required to derive significant conclusions, or that we actually are planning to do it as part of ODK?
  • @alex-konovalov: could you please have a look at the report and provide feed back?

Thanks in advance!

@dimpase
Copy link
Contributor

@dimpase dimpase commented Feb 26, 2017

@nthiery Thanks. I followed your instructions: edited the 1st comment; removed github-issue-description.*, then make failed, as it complained about inability to find github-issue-description.md - and so I put the latter back. Then make did run, and created pdf - with an empty abstract.
Do you mean to say that github-issue-description.tex is meant to be generated by getting the 1st comment from github (via its API?) ?

EDIT: I missed that I needed more stuff like yaml pip-installed... I get

$ make WP7/D7.1/report.pdf
(issue=`python3 bin/get_issue WP7/D7.1/report.tex`; echo "# Deliverable description, as taken from Github issue #$issue on `date -I` {.notoc}\n"; python3 bin/get_issue_body $issue) > WP7/D7.1/github-issue-description.md
Traceback (most recent call last):
  File "bin/get_issue", line 5, in <module>
    import yaml
ImportError: No module named 'yaml'
Traceback (most recent call last):
  File "bin/get_issue_body", line 4, in <module>
    from github import Github
ImportError: No module named 'github'
make: *** [Makefile:21: WP7/D7.1/github-issue-description.md] Error 1

OK, so far I got to the stage where I miss pandoc, having installed pyyaml and PyGithub...

@alex-konovalov
Copy link
Member

@alex-konovalov alex-konovalov commented Feb 27, 2017

Thanks. From a quick glance what I can see is:

  • need to update issue description in the report
  • figure 2 matches my impression
  • Figure 3 also matches my "if it's done, it's done" motto: most of PR are closed quickly.
  • figure 3 is not easy to read initially; both figures would benefit from some text describing what's there.
  • can we have an equivalent of figure 3 for open issues?
  • for GAP, you may wish to cite this blog post http://www.codima.ac.uk/2016/03/09/gap-on-github-one-year-on/ somewhere
  • desirable to have more pictures, not only about GAP. Can we analyse some other repositories?
@alex-konovalov
Copy link
Member

@alex-konovalov alex-konovalov commented Feb 27, 2017

Time permitting, I'd like a better horizontal resolution and individual months on Fig.3, and also more clearly indicated month markers on the horizontal axis.

@nthiery
Copy link
Contributor

@nthiery nthiery commented Feb 27, 2017

@dimpase
Copy link
Contributor

@dimpase dimpase commented Feb 27, 2017

@nthiery it did work, yes---although there is some strange {.notoc}\n string that makes it all way into the resulting pdf. Not sure where it comes from, it is already in markdown.

@nthiery
Copy link
Contributor

@nthiery nthiery commented Feb 27, 2017

Just to avoid conflicts: I am now about to do some minor edits to the issue description and report, and check the .notoc thingy at this occasion.

@nthiery
Copy link
Contributor

@nthiery nthiery commented Feb 27, 2017

@dimpase
Copy link
Contributor

@dimpase dimpase commented Feb 27, 2017

just for the record: I used

$ pandoc -v
pandoc 1.19.2.1
Compiled with pandoc-types 1.17.0.4, texmath 0.9.1, skylighting 0.3
@nthiery
Copy link
Contributor

@nthiery nthiery commented Feb 27, 2017

Ok; thanks for the info. My pandoc is older :-) Anyway; no big deal, to be investigated later.

@dimpase
Copy link
Contributor

@dimpase dimpase commented Feb 27, 2017

@nthiery
Copy link
Contributor

@nthiery nthiery commented Feb 27, 2017

@OpenDreamKit/wp7: feedback on the report for this deliverable is welcome! You can access the current pdf by clicking "Final report" above. Well, it's not quite final, but will be soon :-)
@dimpase: you may want to update report-final.pdf from time to time.

@nthiery
Copy link
Contributor

@nthiery nthiery commented Feb 27, 2017

Yes, I am done! (see also my private e-mail)

@dimpase
Copy link
Contributor

@dimpase dimpase commented Feb 27, 2017

I've edited the D7.1 description to indicate software availability, as discussed, and pushed the corr. changes. Now to teaching, till the evening.

@alex-konovalov
Copy link
Member

@alex-konovalov alex-konovalov commented Feb 27, 2017

I've submitted PR #219 merged by @dimpase. I also think that the section "Openness, licensing, etc" is not necessary. There is no unique point of view here - while @dimpase argues that GPL-style licenses are "better in sense of keeping the community together", many open science advocates will advise to use as permissive license as possible to facilitate maximal reuse, and this may point to other licenses. From my point of view, the topic of the report is "The flow of code and patches in open source projects" and it's not required to cover licenses here.

@dimpase
Copy link
Contributor

@dimpase dimpase commented Feb 28, 2017

Well, flow of code and patches depend on the license type, no doubt.

The only place where more permissive is better is commercialisation. Licenses are akin to locks.
If you do not lock your bicycle it will be more used, and indeed it's more convenient not to bother with the locks, the problem is that very soon you might not see it again :-). As we are interested in building an open-source VRE, a long-term project, better locks might come handy...

@dimpase
Copy link
Contributor

@dimpase dimpase commented Feb 28, 2017

as far as I am concerned it is ready for submission. But feel free to modify...

@nthiery
Copy link
Contributor

@nthiery nthiery commented Feb 28, 2017

Cool. @alex-konovalov or anyone from @OpenDreamKit/wp7: let me know if you still want to do some work on this deliverable.
By default, I am now planning to spend about one hour on D1.4, and submit this one after.
Cheers,

@alex-konovalov
Copy link
Member

@alex-konovalov alex-konovalov commented Feb 28, 2017

@nthiery I have several typos fixed - will push soon. But the question about section 6 still remains open...

@nthiery
Copy link
Contributor

@nthiery nthiery commented Feb 28, 2017

I agree with @dimpase that there is something to be said about licences; they certainly do influence the flow of code and patches. So often it's licensing issues that have prevented the flow of code from otherwise usefull software. We could mention a couple striking examples which hurt us badly, like gap3, Nauty, or graphviz which we could not include as standard packages in Sage.
That being said, I agree with @alex-konovalov that I'd rather avoid getting into opinions and religious debates about the pros and cons of specific licences.
Anyone up for a quick rewrite of this section in the next hour or so?
Otherwise, I'll just strip out the section and submit.

@dimpase
Copy link
Contributor

@dimpase dimpase commented Feb 28, 2017

I can tone this down, but I don't think removing it completely makes sense. IMHO in UK in particular it has become so customary to ignore open source as something that does not make £££ (impactwise, too EPSRC until recently absolutely did not encourage any open-source projects whatsoever), and so the opinion in universities was always tilted toward easy commercialisation. But we are not aiming at selling ODK to the highest bidder, do we?

Further I really do not see a harm in mentioning our point of view in the report.

And indeed we can mention how much time it did cost me to force nauty to be released under a GPL-compatible license... Hell, count it towards the time spent on this deliverable. :-)

@alex-konovalov
Copy link
Member

@alex-konovalov alex-konovalov commented Feb 28, 2017

@dimpase it's basically about the 2nd paragraph.

@nthiery
Copy link
Contributor

@nthiery nthiery commented Feb 28, 2017

I agree about mentionning our point of view about open source licenses. But for GPL vs BSD this is not even something there is a consensus about in ODK. E.g. Jupyter uses revised BSD.

@nthiery
Copy link
Contributor

@nthiery nthiery commented Feb 28, 2017

I can just remove the paragraph about GPL vs BSD. But then section 6 becomes rather tiny, unless it gets fleshed up a bit, e.g. with some Nauty/... story.
Thus, let me ask again: is there someone ready to take the time to implement that now?
We need to submit, and I need to go to bed not too late :-)

@alex-konovalov
Copy link
Member

@alex-konovalov alex-konovalov commented Feb 28, 2017

@nthiery
Copy link
Contributor

@nthiery nthiery commented Feb 28, 2017

For the record: we resolved the Section 6 status by chatting on gitter.

@dimpase
Copy link
Contributor

@dimpase dimpase commented Feb 28, 2017

I propose to include in Sage a module written in this language and licenced under https://choosealicense.com/licenses/wtfpl/

@nthiery
Copy link
Contributor

@nthiery nthiery commented Feb 28, 2017

Submitted!
Thanks @dimpase for your work on this deliverable and report, both being borderline w.r.t. our usual comfort zone :-)
Thanks @alex-konovalov for the reviewing help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
6 participants
You can’t perform that action at this time.