Navigation Menu

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

10 year reproducibility challenge: reproduction of J. Chem. Phys. 112, 4417 (2000) #11

Closed
mcbaneg opened this issue Nov 23, 2019 · 25 comments

Comments

@mcbaneg
Copy link

mcbaneg commented Nov 23, 2019

article.pdf

Original article: Gottfried and McBane, J. Chem. Phys. 112, 4417 (2000)

PDF URL: https://github.com/ReScience/submissions/files/3882888/article.pdf
Metadata URL: https://github.com/ReScience/submissions/files/3888185/metadata.zip
Code URL: https://github.com/mcbaneg/tenyears-submission (also: http://faculty.gvsu.edu/mcbaneg/virial6-resciencec.zip)

Scientific domain: Chemical Physics
Programming language: Fortran
Suggested editor:

@khinsen
Copy link

khinsen commented Nov 25, 2019

@mcbaneg Thanks for your submission! Before we start reviewing, could you please check the Metadata URL?

Note that I have edited your submission slightly. Our "PDF URL" is supposed to point to the reproduction article, not the original article.

Update: I'd better say that I tried to edit your submisson. GitHub lets me do it but what it shows is still the original text. Could you please update the URLs yourself?

@mcbaneg
Copy link
Author

mcbaneg commented Nov 25, 2019

Okay, I have modifed the PDF link to point to the .pdf of the draft that I attached to the posting (that is, uploaded to GitHub.)
I was unable to do the same with the metadata.yaml file because when I try to attach it I get a message "We don't support that file type, Try again with [list of file extensions it accepts, including .zip]" So, I made a .zip file out of metadata.yaml and uploaded that; I hope that works.

The original metadata.yaml is present at the site I originally listed, but either the web server doesn't automatically start a download when a link to a .yaml file is followed, or the browser doesn't recognize that as what it should do.

If there's a way to upload the metadata.yaml file to the GitHub submission, I'd be happy to do it to save the user the hassle of unzipping.

-G.

@khinsen
Copy link

khinsen commented Nov 26, 2019

Good question. I have never used the GitHub file upload feature for anything but images. Most ReScience authors create a GitHub repository for their submission, which contains all their code plus the article source code and the metadata. That makes it easy to put links to any file anywhere, and also allows reviewers and editors to propose modifications (which, as an editor, I do for the metadata in the end of the publication process). Another advantage is that GitHub repositories are automatically archived by Software Heritage, so they are safe in case of GitHub shutting down, being hackes, changing their rules, etc.

Anyway, the ZIP you uploaded for the metadata works well for me!

@khinsen
Copy link

khinsen commented Nov 26, 2019

@pdebuyl Would you willing to review this paper?

Note that it is a special case: the first submission to the Ten Years Challenge, written in the absence of any specific instructions from ReScience. With the consentment of the author, we plan to use this article and the review to prepare more detailed guidelines for other authors.

Also, since we expect to have many submissions to this special issue, we will try a revised reviewing procedure as well, with a single reviewer.

@pdebuyl
Copy link
Member

pdebuyl commented Nov 27, 2019

Hi @khinsen I will review. I'll take notes along the way :-)

@pdebuyl
Copy link
Member

pdebuyl commented Nov 27, 2019

@mcbaneg Are you familiar with git? At some point, I think that we will upload the old code and likely modifications to the old code that were needed to get it running. I can assist if necessary, by either uploading myself your contributions to github or by providing specific instructions to do so yourself.

@mcbaneg
Copy link
Author

mcbaneg commented Nov 27, 2019 via email

@mcbaneg
Copy link
Author

mcbaneg commented Nov 27, 2019 via email

@pdebuyl
Copy link
Member

pdebuyl commented Dec 3, 2019

Thank you @mcbaneg for the update.

I will follow mostly the regular review process that you can read at https://rescience.github.io/write/ and https://rescience.github.io/edit/ , adapting of course to the specifics of the challenge.

I provide a first check list here (the first four items come from ReScience's guidelines).

  • The actual replication of the research. The reviewer must evaluate the authors’ claims about a successful replication, applying the standards of the field.
  • Reproducibility of the replication. The reviewers must be able to run the proposed implementation on their computer, and obtain the same results as the authors with the limits of the state of of the art of the field.
  • Clarity of the code and the instructions for running it. Uncommented or obfuscated code is as bad as no code at all.
  • Clarity and completeness of the accompanying article, in which the authors should clearly state why they think they have replicated the paper (same figures, same graphics, same behavior, etc.) and explain any obstacles they have encountered during the replication work.
  • Past source code: availability and licensing
  • Modifications needed to reproduce the results.
  • Update the source code: availability and licensing.

I'll provide a first report this week so that we can move forward with the review.

Regarding BLAS, I installed it via the Debian package system and added -lblas to the gfortran link line :-)

@mcbaneg
Copy link
Author

mcbaneg commented Dec 3, 2019 via email

@pdebuyl
Copy link
Member

pdebuyl commented Dec 10, 2019

@mcbaneg I also believe that the requirements for the ten year challenge should be somehow
relaxed.

Please find below my first report. This is a first case as it is a success and my comments
revolve mostly around our publishing principles: licensing and environment information.

Replication: The scope of the results is well presented: the second virial coefficient
for a mixture of two gases. Even though it might seem redundant, I suggest to add the table
with the numerical values. In all ReScience papers, we include the final data in some form,
table or figure.

Clarity of the code and instructions for running it: The code is written in a procedural
style, which is expected for Fortran 77. The code is commented sufficiently to follow the
logic.

I could run the code rather easily. I ran gfortran -c file.f for all Fortran files except
for the common block file vecstorh2co.f, then gfortran -o virial6 *.o -lblas and ran the
code as indicated in the README file: virial6 < az009f.inp. The instructions should of the
README file are sufficient for Fortran programmers.

gfortran reports a warning when passing a character variable of the inappropriate
length. This only occurs for the label of the error and does not influence any of the
numerical routines.

Past source code: The source code is available. Licensing, as far as I could see from
web searches, seems ok for the external code: molscat has been released under the GPL since
the 90's and the IQPACK routines from netlib should be, unless otherwise stated, in the
public domain.

The license for the main program is not defined, this should be fixed.

I believe that you can also comment on the choice of technology, here Fortran. This is
consistent with the aims of the ten year challenge.

An extra request I have concerns the platform definition. Please state, for both the old
environment (if possible) and the new: CPU type, operating system, compiler (including the
version).

There was an example for Python on the previous website
https://github.com/ReScience/rescience.github.io/blob/sources/07-platform-instruction and
some papers provide the information
https://github.com/ReScience-Archives/Senden-Schuecker-Hahne-Diesmann-Goebel-2018/tree/master/code
https://github.com/ReScience-Archives/Detorakis-2017/tree/master/code

@khinsen
Copy link

khinsen commented Dec 12, 2019

Thanks @pdebuyl! Given your experience with this first review, could you please check if the author guidelines draft covers everything you consider important?

@mcbaneg
Copy link
Author

mcbaneg commented Dec 12, 2019

@pdebuyl: thank you for the review. I am working on addressing your comments.

I have one question: you said

I believe that you can also comment on the choice of technology, here Fortran. This is
consistent with the aims of the ten year challenge.

I had tried to do that at the end of Section 2, where I wrote

In molecular physics it was (and is) common practice for groups developing new PESs to provide Fortran subroutines implementing their surfaces as functions of the coordinates. Such a routine was available for the Jankowski and Szalewicz surface, deposited as supplementary information with their original publication. A revised version of this routine, more efficient for rapid multiple evaluations, was used for the project. The main Virial6 program was written in Fortran for compatibility with such PES routines. The IQPACK library was then selected as a convenient and well-established Fortran source of accurate numerical quadrature data.

Is that paragraph inadequately clear, or is some other information about the choice of Fortran missing?

Thank you!
G.

@pdebuyl
Copy link
Member

pdebuyl commented Dec 17, 2019

@mcbaneg the paragraph quoted informs the reader that Fortran is a common choice in your field of research. There are other criteria for picking a language:

  • Are there software resources in that language? You mention IQPACK as an afterthought but the existence of BLAS and of the spherical routines can also be important.
  • The ease of implementation, envisioned lasting character and the availability of compilers are other subjective motivations that you can comment on, for instance.

It does not have to be a long explanation and you can pick what you wish as a motivation. As we look at "old" software, I believe that it is important to add context about our technological choices.

Regards,

Pierre

@mcbaneg
Copy link
Author

mcbaneg commented Dec 20, 2019

@pdebuyl and @khinsen: I have modified the submission in response to the comments:

Even though it might seem redundant, I suggest to add the table
with the numerical values. In all ReScience papers, we include the final data in some form,
table or figure.

Done, now Table 1 in section 4.

Licensing, as far as I could see from
web searches, seems ok for the external code: molscat has been released under the GPL since
the 90's and the IQPACK routines from netlib should be, unless otherwise stated, in the
public domain.
The license for the main program is not defined, this should be fixed.

I have added a LICENSE.txt file. I did not want to assign new licenses to code produced by others so I have linked to the relevant licenses as best I could. The LICENSE.txt also includes a note explaining that the CO_H2 potential surface is the work of Jankowski and Szalewicz, and use of it in published work requires appropriate citation independent of the license that applies to the software used to implement it.

I believe that you can also comment on the choice of technology, here Fortran. This is
consistent with the aims of the ten year challenge.

I have added a modest amount to the material at the end of section 2 in response.

An extra request I have concerns the platform definition. Please state, for both the old
environment (if possible) and the new: CPU type, operating system, compiler (including the
version).

I have added this information to the README.md file.

@pdebuyl
Copy link
Member

pdebuyl commented Dec 20, 2019

@mcbaneg Thank you for the update, I am happy with the modifications.

Before finalizing the review, I only have two remarks:

  1. The comment about mandatory citation of the 1998 paper is, as far as I can tell, not compatible with the MIT license.
  2. You can still decide to archive your code either on zenodo or on software heritage.

@khinsen @rougier can you comment on the license issue here? Also, once the article is accepted, do you take the process over or do I need to do something?

@mcbaneg
Copy link
Author

mcbaneg commented Dec 20, 2019

@khinsen, @pdebuyl, @rougier ,

The comment about mandatory citation of the 1998 paper is, as far as I can tell, not compatible with the MIT license.

I'd like advice on how to do this correctly. I want to distinguish between the code I wrote and the original H2-CO potential surface of Jankowski and Szalewicz that they constructed and described in their paper. Ordinary scientific ethics requires that anyone who uses the J&S surface describing H2-CO and publishes results built with it must acknowledge J&S with a citation. This is true whether they use the routine that J&S originally supplied, my program derived from it, or any other evaluation mechanism including a new one written for the purpose.

My code provides a particular implementation of their surface. A permissive license applied to my code does not relieve the user of the obligation to acknowledge J&S in publications describing use of their work. I was trying to convey that idea with the statement

The files vectorh2co.f, vecsetuph2co.f, and vecevalh2co.f implement the H2-CO potential energy surface of P. Jankowski and K. Szalewicz, Journal of Chemical Physics 108, 3554 (1998). Any published work that uses their surface must cite that paper.

at the top of LICENSE.txt.

How do I do this right?

Thank you!

@khinsen
Copy link

khinsen commented Dec 21, 2019

My point of view on the licensing question is that licenses and citation requirement live in different universes. LIcenses are legally binding and apply to software as a tool. Citation requirements are moral (I haven't heard of any case going to court) and apply more to methods than to implementations. The idea of citing software itself is a recent invention and (as I see it!) just a workaround to make a broken system of credit a bit less negative for software developers.

I agree with @pdebuyl that a requirement to cite anything is incompatible with the MIT license, for two reasons: 1) the MIT license has no provision for such restrictions, 2) the MIT license is about using and copying software, whereas a citation request is about publishing results obtained with the software.

So I'd keep citation out of the license but add a separate CITATION file (see https://www.software.ac.uk/blog/2017-12-12-standard-format-citation-files) to establish the moral requirements for academic use.

@mcbaneg
Copy link
Author

mcbaneg commented Jan 26, 2020

@khinsen, @pdebuyl, @rougier ,

I have now modified my submission to try to address all the comments. In addition to the changes described in #11 (comment),

  1. I have removed comments about citing Jankowski and Szalewicz from the LICENSE file, and added them as a note in the README.md. My study of the CITATION.cff format indicated that it was specifically aimed at software/code citations, not to the ordinary publication/citation mechanism.
  2. I did add a CITATION.cff file applying to Virial6 itself.
  3. I added a PLATFORM information section to README.md.
  4. I have tried to extend the description of the choice of Fortran for the project.

I have not submitted the code explicitly to either Software Heritage or Zenodo. I did not want to duplicate the automatic acquisition of the GitHub project by Software Heritage, or any further processes done by ReScience C. Once again I will welcome recommendations about what action might be best.

Thank you!

@khinsen
Copy link

khinsen commented Jan 27, 2020

Submitting your GitHub repository to Software Heritage will only make it archived earlier, so it can do no harm. Software Heritage does automatic deduplication in the same way that git does: via content-addressable storage. In fact, we added the recommendation to submit to Software Heritage at the suggestion of the Software Heritage team!

@mcbaneg
Copy link
Author

mcbaneg commented Jan 28, 2020

Okay, I've now made a "save code now" request to Software Heritage.

-G.

@pdebuyl
Copy link
Member

pdebuyl commented Jan 28, 2020

@mcbaneg @khinsen @rougier

I have no further comments or questions on the manuscript and I recommend publication.

@khinsen @rougier What's next in the process?

@rougier
Copy link
Member

rougier commented Jan 28, 2020

@khinsen will need now to actually publish the paper which is not totally straightforward yet. @mcbaneg will need to update the metadata with the information provided by @khinsen and we may need to transform the template a bit to have software heritage identifier on code (instead of DOI).

@khinsen
Copy link

khinsen commented Jan 28, 2020

I'll try to do all of that tomorrow. Thanks @rougier for pointing out the SWH identifier, that's indeed a new feature but one that should also be useful for future regular articles.

@khinsen
Copy link

khinsen commented Jan 29, 2020

The article is published: https://zenodo.org/record/3630224

Congratulations to @mcbaneg and thanks to @pdebuyl for the review!

@mcbaneg: Please accept pull request mcbaneg/tenyears-submission#1 to your code/article repository to update the metadata and the final article PDF.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants