Report of Reviewer 2 #2

jchiquet · 2023-01-09T22:35:14Z

Associate Editor: Julien Chiquet
Reviewer 2 : Mathurin Massias (chose to lift his anonymity)

Reviewer 1: Reviewing history

Paper submitted March 30, 2022
Reviewer invited August 29, 2022
Review 1 received October 8, 2022
Paper revised December 15, 2022
Reviewer invited _December 16, 2022 _
Review 2 received _ December 16, 2022_
Paper conditionally accepted January 02, 2023
Paper published January 11, 2023

First Round

Recommendation

Revise and resubmit

Summary

The paper proposes coppy, a package to work with copulas in Python. Copula are a powerful statistical tool to model dependency between random variables. They have been extensively used in theoretical and practical works. Current implementations are done in R, and having the same functionalities in Python is a great asset for the community.

The code is well written, comes with a variety of known copulae (Archimedean, elliptical, extreme value copulae) to sample from. The paper demonstrates usecases of sampling from copulae in a clear fashion.

Code remarks

My first set of remarks concern the package itself. Implementing the following changes would really improve the practical impact of the released code, that is of wide interest to the statistical community. They are a few low hanging fruits that require minor effort to implement:

Transform the code into a proper python package (add a setup.py file containing useful information and that allows making the code installable with pip install . run at the root of the repo. Currently, how is the code used, with a sys.path.append?). Using setuptools.setup() in setup.py will also make dependencies explicit (through parameter install_requires of the setup function) and installed automatically
Release the package on PyPI

author' answer: This is now done! The package clayton is now released on PyPI and can be easily installed with any devices using pip install clayton. Here is the link of the project.

Add unit tests, and report test coverage. Run tests automatically at each PR using Github Actions.

author' answer: Thank you for suggesting the inclusion of unit tests. I have now created unit tests for the clayton package and they are located in the clayton folder. More details are available at this
link. I have designed classical tests such as instantiating the object, trying some invalid values to raise error messages, and sampling from the copula. These tests have been performed for all the copulas included in the package. Additionally, I have set up Github Actions to run the tests automatically for each pull request. This ensures that the package maintains a high level of quality and reliability.

flake8 . | wc -l currently returns 945 PEP8 violations. Most of them can be fixed automatically using autoformat on save in vscode for example.

author' answer: �Thank you for pointing out the high number of PEP8 violations in the code. I apologize for not being aware of these basic rules. Most of the violations have been corrected using the autoformat feature in vscode and pylint. However, some violations still remain because I need to include $\LaTeX$ in the documentation for proper rendering with Sphinx. I will continue to work on reducing the number of violations and improving the overall quality of the code. Thank you for bringing this to my attention.

Add pep8 compliance testing with Github Actions.

author' answer: Thank you for suggesting the inclusion of PEP8 compliance testing with Github Actions. I have now set up the necessary workflows to run these tests automatically. More details are available at this link. This will help to ensure that the code follows the PEP8 style guide and maintains a high level of quality. Thank you for bringing this to my attention

Add and publish documentation. The code, in particular classes, is well documented, it would be nice to have this displayed online, eg with sphinx.

author' answer: Thank you for suggesting the use of Sphinx to publish the documentation online. I have now followed your advice and published the documentation on my Github page. I appreciate your suggestion and I am glad that it was not time-consuming to set up. Thank you for helping me improve the documentation for my package.

Consider making the package name all lowercase (as in pytorch, numpy, scikit-learn, matplotlib, …)
Consider using CamelCase for class names (MonteCarlo instead of Monte_Carlo)
YMMV, but coppy is extremely close to copy, a fundamental method in Python. Googling Python coppy is obfuscated by Python copy

author' answer: Thank you for pointing out the potential confusion between the coppy package and the built-in Python method called ”copy.” I agree that the similarity in the names could be problematic and I have now renamed the package to ”clayton” to avoid this issue. The new name also hints at the famous Clayton copula, which is included in the package. Thank you for bringing this to my attention and helping me improve the package.

Remarks about the submission itself

There are a few typos in the paper, that could use a careful proofreading. I have highlighted a few here, but not all:

[...]

Code execution remarks

sns.displot(data = df_wmado, x = "scaled", color = seagreen(0.5)) > I am missing the seagreen import

author' answer: We have to import it from seaborn. It’s quite burdensome that I decided to change it from
classical colors in the whole code #C5C5C5 or #6F6F6F.

The cell df_wmado = monte.finite_sample(inv_cdf = [norm.ppf, expon.ppf], corr = True) takes a lot of time to run on my machine (at least >5 minutes).

author' answer: The sample size and the number of iteration were both reduce. It takes at least 20 seconds to
execute.

Could you diminish the size of the sample, or warn in the cell that the code requires a long time to be executed?

Second Round

Recommendation

Accept with minor revisions

Comments to author

The revision has addressed all my concerns. Regarding the introduced changes, I have the following minor remarks that are all easy to take into account:

In Fig 4, can you consider adding color to points? The current black and white rendering does not convey the information in the easiest way possible.
The sentence " Consider the following equation:" is followed by the requirements of the package.
The noisy aspect of the curves in Fig 4, Archimedean panel, suggests that more vectors should be sampled to cancel out this variability (It can be done easily since the current running time is around 0.2 s per value of $d$; you can also take values of d further apart. )
The clayton package, whose Python code can be found in this GitHub repository > consider writing the URL explicitly
The package quality has been greatly improved. Nevertheless, some files should be removed from the repo and added to the gitignore, ie the __pycache__ and .ipynb_checkpoints folders, the egg info folder.
In the paper, replace:
np.sqrt( 1 / (2*np.pi * sigma**2 ) ) * np.exp(- ( x - x0) ** 2 / (2 * sigma **2) )
by
np.sqrt(1. / (2*np.pi * sigma**2)) * np.exp(-(x - x0) ** 2 / (2 * sigma**2))
In the package, the author uses n_sample, the extremely popular scikit-learn uses n_samples so following their convention should be considered.

All remarks taken into account in final submission

The text was updated successfully, but these errors were encountered:

jchiquet added the review Report written before acceptance, now public label Jan 9, 2023

jchiquet changed the title ~~Report of Reviewer 2 - WIP~~ Report of Reviewer 2 Jan 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Report of Reviewer 2 #2

Report of Reviewer 2 #2

jchiquet commented Jan 9, 2023 •

edited

Loading

Report of Reviewer 2 #2

Report of Reviewer 2 #2

Comments

jchiquet commented Jan 9, 2023 • edited Loading

Reviewer 1: Reviewing history

First Round

Recommendation

Summary

Code remarks

Remarks about the submission itself

Code execution remarks

Second Round

Recommendation

Comments to author

jchiquet commented Jan 9, 2023 •

edited

Loading