Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

beta test on other packages? #14

Open
tdhock opened this issue Aug 17, 2022 · 23 comments
Open

beta test on other packages? #14

tdhock opened this issue Aug 17, 2022 · 23 comments

Comments

@tdhock
Copy link
Collaborator

tdhock commented Aug 17, 2022

Hi @FabrizioSandri there is a list of other packages in which RcppDeepState has found problems, https://akhikolla.github.io./packages-folders/
Not sure how many are on github, but at least this one is, https://github.com/WinVector/RcppDynProg/ Could you please fork/PR to see if we can replicate Akhila's results? (not sure if the issues have been fixed since then)
Also maybe worth investigating how many other of these packages have github repos.

@tdhock
Copy link
Collaborator Author

tdhock commented Aug 17, 2022

So here is a list of all of the packages in the intersection (Akhila found issues via RcppDeepState last year, and the R package hosted in a repo on github)

  [1] "https://github.com/ironholds/humaniformat"           
  [2] "https://github.com/jMotif/jmotif-R"                  
  [3] "https://github.com/Ironholds/olctools"               
  [4] "https://github.com/WinVector/RcppDynProg"            
  [5] "https://github.com/shabbychef/BWStest"               
  [6] "https://github.com/CollinErickson/CGGP"              
  [7] "https://github.com/ekstroem/MESS"                    
  [8] "https://github.com/paulhibbing/PAutilities"          
  [9] "https://github.com/ms609/Quartet"                    
 [10] "https://github.com/MikeJaredS/hermiter"              
 [11] "https://github.com/dahtah/imager"                    
 [12] "https://github.com/ntthung/ldsr"                     
 [13] "https://github.com/saraswatmks/superml"              
 [14] "https://github.com/statistikat/surveysd"             
 [15] "https://github.com/Ironholds/urltools"               
 [16] "https://github.com/wbnicholson/BigVAR"               
 [17] "https://github.com/duncanplee/CARBayes"              
 [18] "https://github.com/alexanderrobitzsch/CDM"           
 [19] "https://github.com/emilsjoerup/DriftBurstHypothesis" 
 [20] "https://github.com/mdsteiner/EFAtools"               
 [21] "https://github.com/wush978/FeatureHashing"           
 [22] "https://github.com/BMasinde/FlyingR"                 
 [23] "https://github.com/Wenchao-Ma/GDINA"                 
 [24] "https://github.com/wadpac/GGIR"                      
 [25] "https://github.com/vpicheny/GPGame"                  
 [26] "https://github.com/mbinois/GPareto"                  
 [27] "https://github.com/thijsjanzen/GUILDS"               
 [28] "https://github.com/raymondtsr/KSgeneral"             
 [29] "https://github.com/Lionning/MuChPoint"               
 [30] "https://github.com/stla/OwenQ"                       
 [31] "https://github.com/jansteinfeld/PP"                  
 [32] "https://github.com/anastasiospanagiotelis/ProbReco"  
 [33] "https://github.com/jkrijthe/RSSL"                    
 [34] "https://github.com/bleutner/RStoolbox"               
 [35] "https://github.com/BZPaper/RTransferEntropy"         
 [36] "https://github.com/PolMine/RcppCWB"                  
 [37] "https://github.com/RcppCore/RcppEigen"               
 [38] "https://github.com/yixuan/RcppNumerical"             
 [39] "https://github.com/TReynkens/ReIns"                  
 [40] "https://github.com/rudeboybert/SpatialEpi"           
 [41] "https://github.com/bzhanglab/WebGestaltR"            
 [42] "https://github.com/btbeal/adheRenceRX"               
 [43] "https://github.com/thomasp85/ambient"                
 [44] "https://github.com/jmsigner/amt"                     
 [45] "https://github.com/rnuske/apcf"                      
 [46] "https://github.com/jchiquet/aricode"                 
 [47] "https://github.com/rorynolan/autothresholdr"         
 [48] "https://github.com/zpneal/backbone"                  
 [49] "https://github.com/bmihaljevic/bnclassify"           
 [50] "https://github.com/digEmAll/bsearchtools"            
 [51] "https://github.com/kasperwelbers/corpustools"        
 [52] "https://github.com/lrnv/cort"                        
 [53] "https://github.com/mhahsler/dbscan"                  
 [54] "https://github.com/feng-li/dng"                      
 [55] "https://github.com/eheinzen/elo"                     
 [56] "https://github.com/mariarizzo/energy"                
 [57] "https://github.com/bdsegal/exceedProb"               
 [58] "https://github.com/somakd/fad"                       
 [59] "https://github.com/souravc83/fastAdaboost"           
 [60] "https://github.com/lrberge/fixest"                   
 [61] "https://github.com/munterfi/flexpolyline"            
 [62] "https://github.com/aberHRML/forestControl"           
 [63] "https://github.com/tbalan/frailtyEM"                 
 [64] "https://github.com/thomasp85/ggraph"                 
 [65] "https://github.com/achubaty/grainscape"              
 [66] "https://github.com/schochastics/graphlayouts"        
 [67] "https://github.com/davidbuch/gretel"                 
 [68] "https://github.com/agisga/grpSLOPE"                  
 [69] "https://github.com/ipeaGIT/gtfs2gps"                 
 [70] "https://github.com/jonathancornelissen/highfrequency"
 [71] "https://github.com/hughparsonage/hutilscpp"          
 [72] "https://github.com/rezakj/iCellR"                    
 [73] "https://github.com/traets/idefix"                    
 [74] "https://github.com/ShotaOchi/imagerExtra"            
 [75] "https://github.com/alexanderrobitzsch/immer"         
 [76] "https://github.com/immunomind/immunarch"             
 [77] "https://github.com/qinwf/jiebaR"                     
 [78] "https://github.com/dariomasante/landscapeR"          
 [79] "https://github.com/spedygiorgio/lifecontingencies"   
 [80] "https://github.com/alexanderrobitzsch/mnlfa"         
 [81] "https://github.com/pascalkieslich/mousetrap"         
 [82] "https://github.com/rorynolan/nandb"                  
 [83] "https://github.com/jknape/nmixgof"                   
 [84] "https://github.com/ycphs/openxlsx"                   
 [85] "https://github.com/EdwinTh/padr"                     
 [86] "https://github.com/thomasp85/particles"              
 [87] "https://github.com/ropensci/parzer"                  
 [88] "https://github.com/christinehohensinn/pcIRT"         
 [89] "https://github.com/fmichonneau/phylobase"            
 [90] "https://github.com/olssol/psrwe"                     
 [91] "https://github.com/PhilippPro/quantregRanger"        
 [92] "https://github.com/alarm-redist/redist"              
 [93] "https://github.com/erlisR/robustBLME"                
 [94] "https://github.com/r-spatial/s2"                     
 [95] "https://github.com/raim/segmenTier"                  
 [96] "https://github.com/mooresm/serrsBayes"               
 [97] "https://github.com/michbur/signalhsmm"               
 [98] "https://github.com/schochastics/signnet"             
 [99] "https://github.com/statistikat/simPop"               
[100] "https://github.com/kgoldfeld/simstudy"               
[101] "https://github.com/sdparsons/splithalf"              
[102] "https://github.com/schnorr/starvz"                   
[103] "https://github.com/RaphaelS1/survivalmodels"         
[104] "https://github.com/rstub/swephR"                     
[105] "https://github.com/business-science/tibbletime"      
[106] "https://github.com/const-ae/tidygenomics"            
[107] "https://github.com/thomasp85/tidygraph"              
[108] "https://github.com/nacnudus/tidyxl"                  
[109] "https://github.com/marjoleinbruijning/trackdem"      
[110] "https://github.com/irkaal/triangulr"                 
[111] "https://github.com/thomasp85/tweenr"                 
[112] "https://github.com/jlmelville/uwot"                  
[113] "https://github.com/hypertidy/vapour"                 
[114] "https://github.com/paleolimbot/wk"                   
[115] "https://github.com/paleolimbot/wkutils"              

Source code here, https://tdhock.github.io/blog/2022/packages-on-github/

Since there are so many, it would be better to programmatically fork/PR these repos. It may be possible with https://cloud.r-project.org/web/packages/gh/ can you please investigate?

@tdhock
Copy link
Collaborator Author

tdhock commented Aug 18, 2022

you can also do the git commit/push programmatically with https://docs.ropensci.org/git2r/reference/push.html

@FabrizioSandri
Copy link
Owner

Thank you for providing me with all of this great documentation. I'll get to work on this.

@FabrizioSandri
Copy link
Owner

Hi, @tdhock, here is the link to my blog article in response to yours. In this new blog post I discuss how to automatically beta test the packages listed above.

To avoid a massive generation of repositories under my user profile, as stated in the blog article, I specified a variable batch_size that limits the amount of packages automatically tested. In the blog post's Future work section, I outline a possible solution to this problem that involves creating a GitHub organization to host all of these repositories.

Using a batch_size equal to 2, the script automatically tested two packages. The test results are available here:

@tdhock
Copy link
Collaborator Author

tdhock commented Aug 24, 2022

hi @FabrizioSandri these are really excellent results. It is great to see that this works, and the results are almost the same as the previus results from Akhila. A couple of comments.

  1. for deleting the github.com prefix, you used the following code, repos <- nc::capture_all_str(pkg.repos$repo.url, "https://github.com/", repo_full_name=".*") which works but maybe a simpler solution is pkg.repos[, repo_full_name := sub("https://github.com/", "", repo.url) ]. Another solution using nc would be nc::capture_first_df(pkg.repos, repo.url=list("https://github.com/", repo_full_name=".*")).
  2. For the PR it is requesting RcppDeepState branch of your fork to be merged into master branch of your fork. Would the GH action work if you request RcppDeepState branch of your fork to be merged into master/main branch of upstream repo? That would be better from the beta test perspective, so the upstream devs could actually start seeing the RcppDeepState report in every one of their PRs.

@tdhock
Copy link
Collaborator Author

tdhock commented Aug 24, 2022

from our experience with tdhock/binsegRcpp#16 the upstream maintainer will have to click an approve button as in the image below, before they can see the RcppDeepState report,
image
Therefore it may be good to do both PRs (from your fork to your fork, and from your fork to upstream), with a link from the upstream PR to the results in your fork PR.

@tdhock
Copy link
Collaborator Author

tdhock commented Aug 24, 2022

it seems that the RcppDeepState action runs, but it can not post a comment in the PR, because of limited read only permissions, https://docs.github.com/en/actions/security-guides/automatic-token-authentication#permissions-for-the-github_token

@FabrizioSandri
Copy link
Owner

Searching in the GithHub documentation I discovered some interesting concepts about running workflows from public forks. First and foremost, I found:

To help prevent this, workflows on pull requests to public repositories from some outside contributors will not run automatically, and might need to be approved first. By default, all first-time contributors require approval to run workflows.

Source: Approving workflow runs from public forks

According to what I can see here, the user need approval from one maintainer with write access in order to run a workflow inside a remote repository. This has been demonstrated by our experience with tdhock/binsegRcpp#16.

However, this is not the only issue: the person who opens a pull request is not permitted to use the remote repository's GITHUB_TOKEN to automatically add a comment in the pull request. The token is only valid inside the same repository.

When you enable GitHub Actions, GitHub installs a GitHub App on your repository. The GITHUB_TOKEN secret is a GitHub App installation access token. You can use the installation access token to authenticate on behalf of the GitHub App installed on your repository. The token's permissions are limited to the repository that contains your workflow.

Source: Automatic token authentication

Workaround

Based on your suggestions a possible workaround is to:

  1. fork the remote repository;
  2. Create a self-pull request(from fork to fork) in the forked repository to examine the package using RcppDeepState-action;
  3. If RcppDeepState finds at least one error, make a pull request on the remote repository(from fork to upstream);
  4. Copy and paste the RcppDeepState report from the forked repository's pull request into the pull request of the remote repository.

@FabrizioSandri
Copy link
Owner

  1. for deleting the github.com prefix, you used the following code, repos <- nc::capture_all_str(pkg.repos$repo.url, "https://github.com/", repo_full_name=".*") which works but maybe a simpler solution is pkg.repos[, repo_full_name := sub("https://github.com/", "", repo.url) ]. Another solution using nc would be nc::capture_first_df(pkg.repos, repo.url=list("https://github.com/", repo_full_name=".*")).

I agree with this; I followed your suggestions and updated the script in the blog post.

@tdhock
Copy link
Collaborator Author

tdhock commented Aug 25, 2022

sounds like a great plan!

@FabrizioSandri
Copy link
Owner

FabrizioSandri commented Aug 30, 2022

Workaround

Based on your suggestions a possible workaround is to:
1. fork the remote repository;
2. Create a self-pull request(from fork to fork) in the forked repository to examine the package using RcppDeepState-action;
3. If RcppDeepState finds at least one error, make a pull request on the remote repository(from fork to upstream);
4. Copy and paste the RcppDeepState report from the forked repository's pull request into the pull request of the remote repository.

@tdhock I've just published a new blog article that includes a detailed description of the script that implements the last two steps of the aforementioned plan. I increased the batch_size option to 10 this time, resulting in the analysis of 7 repositories:

  • RcppDeepState discovered no issues in one repository, thus it was not evaluated;
  • Instead, there was a compilation error that prevented the harness from being compiled for the other two. I determined through log analysis that the report for the hermiter package was not created because the RcppParallel library was not installed and was not listed in the package's requirements.

@tdhock
Copy link
Collaborator Author

tdhock commented Aug 30, 2022

looks like good progress. I see that the comment in the upstream PR is like the following,

This package contains problems, according to RcppDeepState. The report was generated by RcppDeepState-action in this repository's fork and is accessible RcppDeepState/RcppDynProg#1.

I think you should add a sentence to clarify what the PR does, something like:

This PR adds a new Github Action which runs RcppDeepState+valgrind on your package. That means the C++ functions of your package will be tested with random inputs, and there will be a comment like this one for each new PR (which reports if valgrind found any issues with random inputs).

@tdhock
Copy link
Collaborator Author

tdhock commented Aug 30, 2022

about the RcppParallel not found, I think we need to add some compilation flag like -I/path/to/RcppParallel
Maybe it would be more robust to just run R CMD build on the target package and then use whatever -I flags are displayed in the g++ command lines?

@tdhock
Copy link
Collaborator Author

tdhock commented Aug 30, 2022

in WinVector/RcppDynProg#1 the table truncation feature seems to be working.

@FabrizioSandri
Copy link
Owner

looks like good progress. I see that the comment in the upstream PR is like the following,

This package contains problems, according to RcppDeepState. The report was generated by RcppDeepState-action in this repository's fork and is accessible RcppDeepState/RcppDynProg#1.

I think you should add a sentence to clarify what the PR does, something like:

This PR adds a new Github Action which runs RcppDeepState+valgrind on your package. That means the C++ functions of your package will be tested with random inputs, and there will be a comment like this one for each new PR (which reports if valgrind found any issues with random inputs).

You are right; the comment should be improved. The new message has been added to the script on my blog post. On the seven tested pull requests, I will manually change the current comments to include that message

@FabrizioSandri
Copy link
Owner

about the RcppParallel not found, I think we need to add some compilation flag like -I/path/to/RcppParallel Maybe it would be more robust to just run R CMD build on the target package and then use whatever -I flags are displayed in the g++ command lines?

It seems that there is a linking issue. Same problem here: #19

@FabrizioSandri
Copy link
Owner

about the RcppParallel not found, I think we need to add some compilation flag like -I/path/to/RcppParallel Maybe it would be more robust to just run R CMD build on the target package and then use whatever -I flags are displayed in the g++ command lines?

I found this line at the beginning of the logs of the RcppDeepState workflow run within the hermiter repository:

ERROR: dependency 'BH' is not available for package 'hermiter'

The issue appears to be caused by the install.packages script, which is unable to install the BH prerequisite for hermiter.

@tdhock
Copy link
Collaborator Author

tdhock commented Aug 30, 2022

as discussed today the solution involves using devtools::install() to install target repo/pkg along with dependencies, before compiling test harnesses.
The RcppParallel not found was coming from some strange code which would try to compile all src/*.cpp files if the package shared object/so file was not found (we agreed that code should be deleted, and we should stop with an error if there is no so file).

@FabrizioSandri
Copy link
Owner

@tdhock I discovered why the dependencies had not been installed for the packages. The motivation is that the package is initially built and installed using R CMD install, without its dependencies. After that, executing the following command resulted in the dependencies not being installed because the package was already installed: setdiff returns an empty set.

  install.packages(setdiff(basename(package), rownames(installed.packages())),
                           repos = "http://cran.us.r-project.org")

@FabrizioSandri
Copy link
Owner

FabrizioSandri commented Sep 3, 2022

Hi, @tdhock. RcppDeepState-action discovered issues in the TreeSearch package that Akhila did not mention on the page dedicated to the Rcpp-based packages where RcppDeepState discovered issues. Link to the workflow run.

I believe that one final change for the action before submission is to improve the workflow logs. Some users, for example, will run the action outside of the pull request, as seen in the aforementioned TreeSearch package workflow run. Currently, the action outputs the logtable to standard output, and it would be preferable to format this table in a more readable manner, as well as provide a message that helps the user to immediately understand if RcppDeepState discovered issues. As a result, I feel that one pull request dedicated to log enhancements is required.

Here is the link to the pull request #21

@FabrizioSandri
Copy link
Owner

I tested another set of repositories as a part of beta testing. This is the list of repos where RcppDeepState discovered errors(Link to the pull requests):

@tdhock
Copy link
Collaborator Author

tdhock commented Sep 6, 2022

great thanks for the update! As for the PR comments, it looks like there are too many links/here at the end of the comment, "is accessible here.here.here.here."

@FabrizioSandri
Copy link
Owner

Good catch! I just realized I was reusing the pull_body variable in the for loop that opens a remote pull request automatically. I'll manually correct them and update the script in the blog article.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants