Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[R-package] increase the code coverage of R-package #269

Closed
StrikerRUS opened this issue Nov 26, 2018 · 16 comments · Fixed by #313
Closed

[R-package] increase the code coverage of R-package #269

StrikerRUS opened this issue Nov 26, 2018 · 16 comments · Fixed by #313

Comments

@StrikerRUS
Copy link
Member

After #267 has been merged, it can be easily observed which pieces of code are uncovered:

image

@mlampros @jameslamb Guys, may I ask you for your help?

@jameslamb
Copy link
Contributor

yes definitely! I host an open source hack night at my company every two weeks. Next one is next Wednesday...mind if I unleash the attendees on this?

Seems like a good learning opportunity and a good task for people who know R but are learning how open source works. And also it would be valuable for the project!

@mlampros
Copy link
Collaborator

@StrikerRUS I'm ok with @jameslamb 's idea.

@StrikerRUS
Copy link
Member Author

Wow, that's cool! Many thanks!

@jameslamb
Copy link
Contributor

Great, thanks!

@StrikerRUS
Copy link
Member Author

@jameslamb Hi!

Can't wait to see your R-ninjas here! 😄

@jameslamb jameslamb mentioned this issue Jan 16, 2019
@jameslamb
Copy link
Contributor

@StrikerRUS hey my team and I did spend a few hours working on hits. Unfortunately we were never able to get past the "skip if Python modules don't exist" checks (see my comment on #276 ).

I am definitely interested in coming back and trying but can say it was pretty difficult to get the tests to run :/

@jameslamb
Copy link
Contributor

It's something I'm curious, maybe you can tell me...did CRAN ever complain about the use of reticulate for the R package? That's not a pattern I've seen before (e.g. in LightGBM or XGBoost) and I think the skip_if_module_not_found things mean that most of the R-package test suite is effectively being skipped on CRAN.

Teach me!

@StrikerRUS
Copy link
Member Author

StrikerRUS commented Jan 16, 2019

@jameslamb Thanks a lot for your tries! I really appreciate it!

Unfortunately, I'm completely unfamiliar with R language :-( . Everything R-related was developed by @mlampros and is maintained by him too (many thanks to him for this BTW!). The only thing I've done was environment setup for R on CI services: Travis and Appveyor.

I remember, @mlampros told me that every test is indeed skipped at CRAN, because it's impossible to install required Python package (rgf_python) there for successful run (#208 (comment)). At CI side, I personally look at codecov value: if <20%, then everything was skipped: #230 (comment).

@jameslamb
Copy link
Contributor

jameslamb commented Jan 16, 2019

Hey yeah I was thinking about that! We should make codecov fail builds as a sign that the R tests got skipped.

Yeah CRAN (very reasonably) does not let you know install stuff on their machines. This is something my friends @bburns632 and @jayqi and I working on pkgnet had to do some WILD stuff to get around (pkgnet#108 if you're curious).

I think to feel confident in the R tests here we'd need to run them on Travis AND explicitly put in a check on codecov that says "if coverage falls below threshold, fail build". @mlampros if you agree I can create an issue for this and try to address it.


@StrikerRUS to your comment "I'm completely unfamiliar with R", I can explain the root of the issue here. Basically most machine learning libraries that follow the pattern of "implement in something low-level then write APIs in Python and R" have native code in R or Python calling the low-level code. So e.g. for LightGBM R-package we're using Rcpp and the built-ins like .Call() to directly run C++ code.

RGF is different in that it does that in Python and then uses an R package called reticulate to call the Python code. So instead of R --> C++ we're going R --> Python --> C++.

This is bad in the sense that you need to have a full Python setup to run the R stuff, but it's good in the sense that development can focus almost exclusively on the Python side and making the R package work only means ensuring it plays nicely with Python.

@StrikerRUS
Copy link
Member Author

@jameslamb Totally agree with you! I can say even more: here, in this part Python --> C++, Python calls exe file providing a paths to previously saved data files on hard drive as arguments, instead of efficiently communicating though C API interface of dll file. But better something than nothing...

@mlampros
Copy link
Collaborator

@jameslamb,

"if you agree I can create an issue for this and try to address it."

I'm ok with this, you can proceed.

@StrikerRUS
Copy link
Member Author

@jameslamb Hi!

Just letting you know that the code coverage of R-package is now checked at the CI side with a 50%-threshold.

@jameslamb
Copy link
Contributor

@StrikerRUS hey awesome!

@StrikerRUS
Copy link
Member Author

After #311 and #312 have been merged, it seems that all code lines related to sparse features are covered:

image

@mlampros May I kindly ask you to help with writing tests for the rest features?

@mlampros
Copy link
Collaborator

@StrikerRUS,

  • for the package.R file we can not add tests because it's required mainly in order to load the python modules in R.

  • I don't know if it would be feasible to add tests for the RGF_cleanup_temp_files.R file because it requires that temporary files are created first that should be deleted afterwards ( the cleanup() python function). I mean we have to keep track of the temporary directories or files that are present before the cleanup to find out if the cleanup() took actually place. Do you utilize a corresponding test in python?

  • I could add tests for the Internal_class.R function. Can I do this till the end of the week? thanks.

@StrikerRUS
Copy link
Member Author

@mlampros Ah, I see. I think, in Python we test only cleanup method:

def test_cleanup(self):
est1 = self.estimator_class(**self.kwargs)
est1.fit(self.X_train, self.y_train)
est2 = self.estimator_class(**self.kwargs)
est2.fit(self.X_train, self.y_train)
self.assertNotEqual(est1.cleanup(), 0)
self.assertEqual(est1.cleanup(), 0)

I guess tests for Internal_class.R will be enough.

Sure, any time you can! Thank you very much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging a pull request may close this issue.

3 participants