Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GSoC 8-9 #10

Merged
merged 12 commits into from
Aug 7, 2020
Merged

GSoC 8-9 #10

merged 12 commits into from
Aug 7, 2020

Conversation

Yashwants19
Copy link
Owner

Hi @coatless @eddelbuettel @rcurtin, As per the timeline this PR is regarding the automatically generation of markdown documentation.
I have almost completed the work. In the following weeks I will open a PR in mlpack/master for a fine review before publishing the package to CRAN.

Thank You.

@coatless
Copy link

@Yashwants19 looks great! I'll read through it in more detail tomorrow morning.

@Yashwants19
Copy link
Owner Author

@Yashwants19 looks great! I'll read through it in more detail tomorrow morning.

Thank you. This sounds good. :)

@Yashwants19 Yashwants19 changed the title Initial commit. GSoC 8-9 Jul 27, 2020
Copy link

@coatless coatless left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few quick comments on the documentation

README.md Outdated

gonum

If you would like to build the R bindings, make sure that R >= 3.5 is

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may want to bump the minimum requirement up to R 4.0.0 or R 3.6.0. There were a lot of internal changes from R 3.5 to R 3.6.

One change that might bite us on an earlier version is the data.matrix() conversion for factor() inside of data.frame(). Unfortunately, R 4.0.0 is when factors were handled by this method in the desired format.

data.matrix() now converts character columns to factors and from
this to integers.

https://stat.ethz.ch/pipermail/r-announce/2020/000653.html

If the goal is to support R 3.6.0, we'll probably need to backport the function change.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, I have updated the version to 4.0.0, may be further we can discussion it on main PR/issue(in mlpack/master).

Copy link

@eddelbuettel eddelbuettel Aug 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aiming for R 4.0.0 is entirely reasonable. It is the current release. Supporting older release is a nice-to-have but really not that important. If you can do it "for free" or near-free, sure. Otherwise ... move on and stick with R 4.0.0,

doc/guide/r_quickstart.hpp Outdated Show resolved Hide resolved
doc/guide/r_quickstart.hpp Outdated Show resolved Hide resolved
doc/guide/r_quickstart.hpp Show resolved Hide resolved
Building the R bindings from scratch is a little more in-depth, though. For
information on that, follow the instructions on the @ref build page, and be sure
to specify @c -DBUILD_R_BINDINGS=ON to CMake; you may need to also set the
location of the R program with @c -DR_EXECUTABLE=/path/to/R.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct me if I'm wrong, but we are already packaging the mlpack source inside of the package in the mlpack-r repository? I think there might be two "from source" categories here.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So actually I'm not sure it's needed to have an mlpack-r repository. Based on my understanding, an upload to CRAN happens not by, e.g., giving the location of a Github repository (like in Julia), but instead by directly uploading a .tar.gz. This means that instead of an mlpack-r repository, we could do one of the following:

(1) at every mlpack release, manually build with make r, package what's in build/src/mlpack/bindings/r/ and upload to CRAN
(2) whenever a new release is tagged, fire off a Travis or Azure job that does (1) automatically
(3) same as (2) except use Jenkins
(4) same as (2) except use Github Actions

Does that work? Or will there be people who... don't want to use CRAN, but install directly from a Github URL or something?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we may require to do some discussion here. As per my knowledge its good do a R CMD check on the package in the "proper" environment, before releasing/publishing the package. If we do R CMD check using ctest while building with mlpack, it may be possible that for building the RcppMLPACK R may use installed boost/cereal serialization files required to build mlpack or it may use some mlpack files which might not correctly travelled to RcppMLPACK while cmaking. Hence I thought it may be helpful if we go for mlpack-r with a R CMD check in proper environment, Here Github Actions/Azure might help us with the package release after a successful build onmlpack-r.
Or it may be possible that we may do these checks locally after every release in the proper environment and then go for a CRAN upload.
Just an idea, I am really sorry if I am missing something or making some blunder here.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 thanks for the explanation @Yashwants19. In that case, if we were using the mlpack-r approach you described, would the mlpack-r repository just contain a clone of the mlpack repository at the appropriate tag?

I do agree that having some kind of system set up to run R CMD check before submitting to CRAN would be great, although it seems like CRAN will automatically run R CMD check too---so maybe we can just depend on CRAN failing the uploaded package as our "CI check" for a new release?

Copy link

@eddelbuettel eddelbuettel Aug 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so maybe we can just depend on CRAN failing

Please don't. CRAN really is just a few overworked humans. It is not a test bed. We have e.g. Rhub for that.

Consider CRAN uploads as a manual, explicit step. Think Journal submission (which is exaggerating somewhat to make the point), not "random copy of a file to Dropbox" or another web service.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eek, I didn't realize that. Yes, in this case, let's not overwork the humans even more. :)

}
else
{
return "matrix()";

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return "matrix()";
return "matrix(c(...))";

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done without ... 😄

src/mlpack/bindings/R/print_type_doc_impl.hpp Outdated Show resolved Hide resolved
src/mlpack/bindings/R/print_type_doc_impl.hpp Outdated Show resolved Hide resolved
src/mlpack/bindings/R/print_type_doc_impl.hpp Outdated Show resolved Hide resolved
inline std::string ProgramCall(const std::string& programName)
{
std::ostringstream oss;
oss << "R> ";

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe set this as a command line prefix default variable?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function use to build markdown documentation for our website(For e.g. : https://www.mlpack.org/doc/mlpack-3.3.2/julia_documentation.html#adaboost ). This will be not used in our R package so I thought it may be a nice way to represent it , but I can change it as a command line prefix.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The prompt for an R workspace is just >, so personally I think that could be okay to use here.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(It's configurable; and some of us use R> do make it more visually distinct. But either form works.)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yashwants19 sorry for the late reply on this one. But, when I said:

Maybe set this as a command line prefix default variable?

My thought was:

std::string command_prefix = "R> ";

As the prefix is listed in two places within the file.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@coatless I am really sorry, I misunderstood this part, I will update as suggested.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated this in the last commit.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! Sorry for the initial feedback not being concise.

Copy link

@rcurtin rcurtin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work @Yashwants19; great to see the final pieces coming together. I left some comments---hopefully they're helpful. 👍

CMakeLists.txt Outdated
# Set minimum library version required by mlpack.
set(ARMADILLO_VERSION "8.400.0")
set(ENSMALLEN_VERSION "2.10.0")
set(BOOST_VERSION "1.58")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, this is a good place to put all the dependencies! Do you want to move this to the top of the file so it's easier to find?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have moved this as suggested.

@@ -32,7 +32,7 @@ bindings to other languages. It is meant to be a machine learning analog to
LAPACK, and aims to implement a wide array of machine learning methods and
functions as a "swiss army knife" for machine learning researchers. In addition
to its powerful C++ interface, mlpack also provides command-line programs,
Python bindings, and Julia bindings.
Python bindings, Julia bindings, Go bindings and R bindings.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This list is getting longer and longer... 🎉

README.md Outdated Show resolved Hide resolved
doc/guide/r_quickstart.hpp Outdated Show resolved Hide resolved
Building the R bindings from scratch is a little more in-depth, though. For
information on that, follow the instructions on the @ref build page, and be sure
to specify @c -DBUILD_R_BINDINGS=ON to CMake; you may need to also set the
location of the R program with @c -DR_EXECUTABLE=/path/to/R.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So actually I'm not sure it's needed to have an mlpack-r repository. Based on my understanding, an upload to CRAN happens not by, e.g., giving the location of a Github repository (like in Julia), but instead by directly uploading a .tar.gz. This means that instead of an mlpack-r repository, we could do one of the following:

(1) at every mlpack release, manually build with make r, package what's in build/src/mlpack/bindings/r/ and upload to CRAN
(2) whenever a new release is tagged, fire off a Travis or Azure job that does (1) automatically
(3) same as (2) except use Jenkins
(4) same as (2) except use Github Actions

Does that work? Or will there be people who... don't want to use CRAN, but install directly from a Github URL or something?

# Get the names of the movies for user 1.
cat("Recommendations for user 1:\n")
for (i in 1:10) {
cat(" ", i, ":", as.character(movies[output$output[i], 3]), "\n")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point @coatless! @Yashwants19 I took a look through the code in this repository and in the RcppMLPACK repository and I think that we aren't correctly handling the difference between R's 1-indexing and C++'s 0-indexing. I believe that we will need to modify, e.g., SetParamUCol() and SetParamURow() to subtract 1 from the values when converting to C++, and add 1 to the values when converting back to R. That is because when we represent labels in C++, we expect them to take values between 0 and (the number of classes - 1), whereas in R it seems like we would expect a user to provide data that has classes between 1 and the number of classes. Let me know if I overlooked something and the code already does that. 👍

std::is_same<T, arma::Col<size_t>>::value ||
std::is_same<T, arma::Mat<size_t>>::value)
{
return "matrix(as.integer())";
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea here is to provide the R code for the default for an empty matrix of integer type... I think that c(...) won't work since the user would have to fill out what ... is. Let me know if I misunderstood---I'm certainly not an expert. But I did find that matrix(as.integer()) worked and gave an empty matrix (I think) but matrix(as.integer(c(...))) failed:

> matrix(as.integer(c(...)))
Error in matrix(as.integer(c(...))) : '...' used in an incorrect context

"'factor'. These values will be converted to numeric indices before "
"being passed to mlpack, and then inside mlpack they will be properly "
"treated as categorical variables, so there is no need to do one-hot "
"encoding for this matrix type.";
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice explanation! Always great when we can advertise that one-hot encoding isn't necessary, since it's a pain (both implementationally and computationally 😄)

@@ -21,4 +21,4 @@ add_python_binding(adaboost)
add_julia_binding(adaboost)
add_go_binding(adaboost)
add_r_binding(adaboost)
add_markdown_docs(adaboost "cli;python;julia;go" "classification")
add_markdown_docs(adaboost "cli;python;julia;go;r" "classification")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great to see this too. :)

@@ -24,4 +24,4 @@ add_cli_executable(range_search)
#add_julia_binding(range_search)
#add_go_binding(range_search)
#add_r_binding(range_search)
add_markdown_docs(range_search "cli;go" "geometry")
add_markdown_docs(range_search "cli" "geometry")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, nice catch. 👍

Yashwants19 and others added 5 commits August 2, 2020 09:13
Co-authored-by: James J Balamuta <coatless@users.noreply.github.com>
Co-authored-by: Ryan Curtin <ryan@ratml.org>
Co-authored-by: James J Balamuta <coatless@users.noreply.github.com>
Copy link

@rcurtin rcurtin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few final comments. I think from my end everything pretty much looks good if you can handle these comments; we still have time to figure out the actual deployment / mlpack-r repository, so we don't have to resolve that issue now of course. 👍

src/mlpack/bindings/R/mlpack/src/r_util.cpp Outdated Show resolved Hide resolved
src/mlpack/bindings/R/mlpack/src/r_util.cpp Outdated Show resolved Hide resolved
src/mlpack/bindings/R/mlpack/src/r_util.cpp Outdated Show resolved Hide resolved
src/mlpack/bindings/R/mlpack/src/r_util.cpp Outdated Show resolved Hide resolved
Copy link

@rcurtin rcurtin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what's best for the matrix(as.integer(c())) discussion, but that's the only comment I have remaining. I don't have any opposition if you want to merge as-is now and then we can handle that part later. 👍

@eddelbuettel
Copy link

I just noticed that. I only fairly recently grok'ed this myself but a nicer way instead if as.XXX(c()), is to do this:

R> matrix(double(),0,0)
<0 x 0 matrix>
R> matrix(integer(),0,0)
<0 x 0 matrix>
R> 

It results in the desired type:

R> storage.mode(matrix(integer(),0,0))
[1] "integer"
R> storage.mode(matrix(double(),0,0))
[1] "double"
R> 

@Yashwants19
Copy link
Owner Author

Hi @eddelbuettel, I have updated the default value as suggested, Thank you for the suggestion you are the best. :)

@eddelbuettel
Copy link

You're too kind, and you're doing really good work. And @rcurtin has an awesome attention to detail and infinite patience too :)

@Yashwants19
Copy link
Owner Author

Okay, lets merge this. :)

@Yashwants19 Yashwants19 merged commit 568cee1 into R-bindings Aug 7, 2020
@Yashwants19 Yashwants19 deleted the R-markdown-documentation branch October 1, 2020 07:02
Yashwants19 pushed a commit that referenced this pull request Nov 15, 2020
updating my master branch
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants