You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Associate Editor: Julien Chiquet
Reviewer 2 : Mathurin Massias (chose to lift his anonymity)
Reviewer 1: Reviewing history
Paper submitted March 30, 2022
Reviewer invited August 29, 2022
Review 1 received October 8, 2022
Paper revised December 15, 2022
Reviewer invited _December 16, 2022 _
Review 2 received _ December 16, 2022_
Paper conditionally accepted January 02, 2023
Paper published January 11, 2023
First Round
Recommendation
Revise and resubmit
Summary
The paper proposes coppy, a package to work with copulas in Python. Copula are a powerful statistical tool to model dependency between random variables. They have been extensively used in theoretical and practical works. Current implementations are done in R, and having the same functionalities in Python is a great asset for the community.
The code is well written, comes with a variety of known copulae (Archimedean, elliptical, extreme value copulae) to sample from. The paper demonstrates usecases of sampling from copulae in a clear fashion.
Code remarks
My first set of remarks concern the package itself. Implementing the following changes would really improve the practical impact of the released code, that is of wide interest to the statistical community. They are a few low hanging fruits that require minor effort to implement:
Transform the code into a proper python package (add a setup.py file containing useful information and that allows making the code installable with pip install . run at the root of the repo. Currently, how is the code used, with a sys.path.append?). Using setuptools.setup() in setup.py will also make dependencies explicit (through parameter install_requires of the setup function) and installed automatically
Release the package on PyPI
author' answer: This is now done! The package clayton is now released on PyPI and can be easily installed with any devices using pip install clayton. Here is the link of the project.
Add unit tests, and report test coverage. Run tests automatically at each PR using Github Actions.
author' answer: Thank you for suggesting the inclusion of unit tests. I have now created unit tests for the clayton package and they are located in the clayton folder. More details are available at this
link. I have designed classical tests such as instantiating the object, trying some invalid values to raise error messages, and sampling from the copula. These tests have been performed for all the copulas included in the package. Additionally, I have set up Github Actions to run the tests automatically for each pull request. This ensures that the package maintains a high level of quality and reliability.
flake8 . | wc -l currently returns 945 PEP8 violations. Most of them can be fixed automatically using autoformat on save in vscode for example.
author' answer: �Thank you for pointing out the high number of PEP8 violations in the code. I apologize for not being aware of these basic rules. Most of the violations have been corrected using the autoformat feature in vscode and pylint. However, some violations still remain because I need to include $\LaTeX$ in the documentation for proper rendering with Sphinx. I will continue to work on reducing the number of violations and improving the overall quality of the code. Thank you for bringing this to my attention.
Add pep8 compliance testing with Github Actions.
author' answer: Thank you for suggesting the inclusion of PEP8 compliance testing with Github Actions. I have now set up the necessary workflows to run these tests automatically. More details are available at this link. This will help to ensure that the code follows the PEP8 style guide and maintains a high level of quality. Thank you for bringing this to my attention
Add and publish documentation. The code, in particular classes, is well documented, it would be nice to have this displayed online, eg with sphinx.
author' answer: Thank you for suggesting the use of Sphinx to publish the documentation online. I have now followed your advice and published the documentation on my Github page. I appreciate your suggestion and I am glad that it was not time-consuming to set up. Thank you for helping me improve the documentation for my package.
Consider making the package name all lowercase (as in pytorch, numpy, scikit-learn, matplotlib, …)
Consider using CamelCase for class names (MonteCarlo instead of Monte_Carlo)
YMMV, but coppy is extremely close to copy, a fundamental method in Python. Googling Pythoncoppy is obfuscated by Pythoncopy
author' answer: Thank you for pointing out the potential confusion between the coppy package and the built-in Python method called ”copy.” I agree that the similarity in the names could be problematic and I have now renamed the package to ”clayton” to avoid this issue. The new name also hints at the famous Clayton copula, which is included in the package. Thank you for bringing this to my attention and helping me improve the package.
Remarks about the submission itself
There are a few typos in the paper, that could use a careful proofreading. I have highlighted a few here, but not all:
[...]
Code execution remarks
sns.displot(data = df_wmado, x = "scaled", color = seagreen(0.5)) > I am missing the seagreen import
author' answer: We have to import it from seaborn. It’s quite burdensome that I decided to change it from
classical colors in the whole code #C5C5C5 or #6F6F6F.
The cell df_wmado = monte.finite_sample(inv_cdf = [norm.ppf, expon.ppf], corr = True) takes a lot of time to run on my machine (at least >5 minutes).
author' answer: The sample size and the number of iteration were both reduce. It takes at least 20 seconds to
execute.
Could you diminish the size of the sample, or warn in the cell that the code requires a long time to be executed?
Second Round
Recommendation
Accept with minor revisions
Comments to author
The revision has addressed all my concerns. Regarding the introduced changes, I have the following minor remarks that are all easy to take into account:
In Fig 4, can you consider adding color to points? The current black and white rendering does not convey the information in the easiest way possible.
The sentence " Consider the following equation:" is followed by the requirements of the package.
The noisy aspect of the curves in Fig 4, Archimedean panel, suggests that more vectors should be sampled to cancel out this variability (It can be done easily since the current running time is around 0.2 s per value of $d$; you can also take values of d further apart. )
The clayton package, whose Python code can be found in this GitHub repository > consider writing the URL explicitly
The package quality has been greatly improved. Nevertheless, some files should be removed from the repo and added to the gitignore, ie the __pycache__ and .ipynb_checkpoints folders, the egg info folder.
In the paper, replace: np.sqrt( 1 / (2*np.pi * sigma**2 ) ) * np.exp(- ( x - x0) ** 2 / (2 * sigma **2) )
by np.sqrt(1. / (2*np.pi * sigma**2)) * np.exp(-(x - x0) ** 2 / (2 * sigma**2))
In the package, the author uses n_sample, the extremely popular scikit-learn uses n_samples so following their convention should be considered.
All remarks taken into account in final submission
The text was updated successfully, but these errors were encountered:
Associate Editor: Julien Chiquet
Reviewer 2 : Mathurin Massias (chose to lift his anonymity)
Reviewer 1: Reviewing history
First Round
Recommendation
Revise and resubmit
Summary
The paper proposes
coppy
, a package to work with copulas inPython
. Copula are a powerful statistical tool to model dependency between random variables. They have been extensively used in theoretical and practical works. Current implementations are done inR
, and having the same functionalities in Python is a great asset for the community.The code is well written, comes with a variety of known copulae (Archimedean, elliptical, extreme value copulae) to sample from. The paper demonstrates usecases of sampling from copulae in a clear fashion.
Code remarks
My first set of remarks concern the package itself. Implementing the following changes would really improve the practical impact of the released code, that is of wide interest to the statistical community. They are a few low hanging fruits that require minor effort to implement:
setup.py
file containing useful information and that allows making the code installable withpip install
. run at the root of the repo. Currently, how is the code used, with asys.path.append
?). Usingsetuptools.setup()
insetup.py
will also make dependencies explicit (through parameterinstall_requires
of thesetup
function) and installed automaticallyflake8 . | wc -l
currently returns 945 PEP8 violations. Most of them can be fixed automatically using autoformat on save in vscode for example.MonteCarlo
instead ofMonte_Carlo
)coppy
is extremely close tocopy
, a fundamental method inPython
. GooglingPython
coppy
is obfuscated byPython
copy
Remarks about the submission itself
There are a few typos in the paper, that could use a careful proofreading. I have highlighted a few here, but not all:
[...]
Code execution remarks
sns.displot(data = df_wmado, x = "scaled", color = seagreen(0.5))
> I am missing the seagreen importdf_wmado = monte.finite_sample(inv_cdf = [norm.ppf, expon.ppf], corr = True)
takes a lot of time to run on my machine (at least >5 minutes).Second Round
Recommendation
Accept with minor revisions
Comments to author
The revision has addressed all my concerns. Regarding the introduced changes, I have the following minor remarks that are all easy to take into account:
clayton
package, whose Python code can be found in this GitHub repository > consider writing the URL explicitly__pycache__
and.ipynb_checkpoints
folders, the egg info folder.In the paper, replace:
np.sqrt( 1 / (2*np.pi * sigma**2 ) ) * np.exp(- ( x - x0) ** 2 / (2 * sigma **2) )
by
np.sqrt(1. / (2*np.pi * sigma**2)) * np.exp(-(x - x0) ** 2 / (2 * sigma**2))
n_sample
, the extremely popularscikit-learn
uses n_samples so following their convention should be considered.All remarks taken into account in final submission
The text was updated successfully, but these errors were encountered: