-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GSoC 4-5 #7
GSoC 4-5 #7
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @Yashwants19, it's so much easier to review this a piece at a time instead of the whole thing at once like with the Go bindings! 😄
I hope that the comments I've left here are useful. Everything is looking good to me... somehow I get the idea that you have done this before... 😄
CMake/R/ConfigureRCPP.cmake
Outdated
|
||
string(REGEX REPLACE "PARAM_MODEL_OUT\\(" "" MODELS_OUT_STRIP1 "${MODELS_OUT}") | ||
string(REGEX REPLACE "," "" MODELS_OUT_STRIP2 "${MODELS_OUT_STRIP1}") | ||
string(REGEX REPLACE "[<>,]" "" MODELS_OUT_SAFE_STRIP2 "${MODELS_OUT_STRIP1}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure how feasible this is, but maybe it is worth considering refactoring this into a common CMake function which we can then call from both this file and AppendSerialization.cmake
(and other places where similar searching for model types is done)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Merged both the functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you push the commit? Maybe I overlooked it but it seems like the code is still in both places.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, It seems earlier I missed those files. But Now, I pushed the changes in the last commit(9522fd9).
src/mlpack/bindings/R/CMakeLists.txt
Outdated
add_custom_command(TARGET r_build POST_BUILD | ||
COMMAND rm ARGS -Rf "model.txt" | ||
WORKING_DIRECTORY "${CMAKE_CURRENT_BINARY_DIR}/mlpack" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way we can do this with, e.g., ${CMAKE_COMMAND} -E <remove>
or similar? The rm
won't necessarily be compatible with all systems.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Totally missed this. I have updated this, as suggested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, looks good.
src/mlpack/bindings/R/CMakeLists.txt
Outdated
file(TOUCH | ||
"${CMAKE_CURRENT_BINARY_DIR}/mlpack/model.txt") | ||
|
||
# Generate configure file from configure.ac. | ||
execute_process(COMMAND autoreconf -i | ||
WORKING_DIRECTORY "${CMAKE_CURRENT_BINARY_DIR}/mlpack/") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I missed this last PR (or maybe I didn't and this is the second time I write this, so, sorry if so!). Can we expect that autoreconf
will be available based on checking for R and Rcpp like we did earlier? Will we know that it's available on the PATH? I worry about this somehow not being available.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added FindAutotools.cmake
to find autoreconf before building R-bindings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, this looks good to me. The only question I still have (and I have no idea of the answer) is, does R ship with autoreconf
? Or is it expected to be available on the system? The only reason I write this comment is, if R does come with autoreconf
in its distribution, then we can try to use that one even if find_package(Autotools)
doesn't find one on the system.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure about this, that whether R does come with autoreconf
or not, May be @coatless or @eddelbuettel can enlighten us more about this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From https://cran.r-project.org/doc/manuals/r-patched/R-admin.html#DOCF68:
R has embedded autoconf-2.69 from 2012 and automake-1.16.1 from 2018. Generally, these are invoked by R via R CMD {build,check}
. That said, I'm not sure they will be found on the PATH
for non-Linux systems.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- You do not require autotools.
- autotools is never "required"
- It can optionally be used by an R package because as optional script
configure
can be present - When present, it will be used. It simple is not required to be present.
- The only requirement is the name
configure
and the '0755' mode to run it. - Some package has shell scripts named
configure
.
I don't have much more time now, but happy to continue on a quick call this eve, or another eve.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, that could be helpful. Let me know what time works for you. I have a few questions (that I have hopefully made coherent) about how the R package build process works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @eddelbuettel for the discussion this morning about the details of what's going on under the hood in the R side. I believe I have a far more correct understanding now. I can now ask a much clearer question than I was asking before.
What I am seeing is that the existing RcppMLPACK package---and other Rcpp packages I look at---ship with a configure
script already: https://github.com/Yashwants19/RcppMLPACK/blob/master/configure
That is, when a user downloads that package from CRAN, it then runs the pre-provided configure
script. R doesn't use autotools to generate the configure
from configure.ac
, or anything like that. However, that configure
script I linked to is far longer and more complex than the configure.ac
file. Thus, it makes sense from the mlpack side to only include configure.ac
, and then at the time of mlpack's build to produce the R bindings, we can run autotools (whether it is the one provided by R or not) in order to go from configure.ac
-> configure
, which we can then distribute as part of the CRAN package.
@Yashwants19 all of this to say, I think it was the right thing to add FindAutotools.cmake
since we aren't guaranteed to have that at the time we are building mlpack. Thank you for doing that. 👍 It would be great if we could also update the search location to try and find the embedded version that R ships too, based on this comment: #7 (comment)
However, I don't think that's necessary to do right now. Up to you. (Honestly, most systems that have CMake available will also have autotools available, so finding the embedded R version may not ever be necessary in practice.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[ If one uses autoconf
to generate configure
from configure.ac
(as is common) then yes, configure
becomes large--100kb or more. But it is also selfcontained, no other binary needed, And yes, the configure
will be pre-made by the author / packager and the user generally never ever builds it. And, just to re-iterate, with R we can have scripts called configure
that can be anything as long as they have this name, and are executable. And again, they are optional. We do not have to have one if we don;t need one. ]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have removed configure.ac
file.
@@ -12,7 +12,7 @@ SystemRequirements: A C++11 compiler. Versions 4.8.*, 4.9.* or later of GCC | |||
License: GPL (>= 2) | |||
Depends: R (>= 3.2.0) | |||
Imports: Rcpp (>= 0.12.12) | |||
LinkingTo: Rcpp, RcppArmadillo, BH | |||
LinkingTo: Rcpp, RcppArmadillo, BH, RcppEnsmallen |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Super awesome that this is available; thanks @coatless!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot @coatless!
@@ -1,4 +1,4 @@ | |||
CXX_STD = CXX11 | |||
|
|||
PKG_CXXFLAGS = -I. @OPENMP_FLAG@ | |||
PKG_CXXFLAGS = -I. @OPENMP_FLAG@ -DNDEBUG |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want to add -DNDEBUG
, should we also add -O3
and similar? I am not sure if R's build process will already do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Internally R's uses these build flags:
g++ -std=gnu++11 -I"/usr/share/R/include" -DNDEBUG -I"/home/asus/R/x86_64-pc-linux-gnu-library/3.6/Rcpp/include" -I"/home/asus/R/x86_64-pc-linux-gnu-library/3.6/RcppArmadillo/include" -I"/home/asus/R/x86_64-pc-linux-gnu-library/3.6/BH/include" -I"/home/asus/R/x86_64-pc-linux-gnu-library/3.6/RcppEnsmallen/include" -I. -fopenmp -DNDEBUG -fpic -g -O2 -fdebug-prefix-map=/build/r-base-jbaK_j/r-base-3.6.3=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g -c RcppExports.cpp -o RcppExports.o
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(And I should add that some are "coming down on us" from the system -- I for the Debian package I inherit some from the default compiler settings.) And then we add a few layer by layer. Here of course we have the compund effect of
- the system
- R
- Rcpp
- RcppArmadillo
- RcppEnsmallen
so it grows :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yashwants19 @rcurtin let's avoid explicitly adding flags, e.g. -DNDEBUG
and -O3
, since R should add these flags by default depending on the platform. I say this as we want to let R handle setting flags as much as possible due to the CRAN test suite easily being angered if any newly added flags lead to a non-portable R package.
For @OPENMP_FLAG@
, I think this is coming from the configure.ac
check? If so, this will correctly insert/suppress the SHLIB_OPENMP_CXXFLAGS
flag depending on whether OpenMP is present or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have removed these flags.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 understood, thanks for the explanation!
src/mlpack/core/math/random.hpp
Outdated
@@ -41,7 +41,12 @@ inline void RandomSeed(const size_t seed) | |||
{ | |||
#if (!defined(BINDING_TYPE) || BINDING_TYPE != BINDING_TYPE_TEST) | |||
randGen.seed((uint32_t) seed); | |||
srand((unsigned int) seed); | |||
# if(BINDING_TYPE == BINDING_TYPE_R) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# if(BINDING_TYPE == BINDING_TYPE_R) | |
#if (BINDING_TYPE == BINDING_TYPE_R) |
src/mlpack/core/math/random.hpp
Outdated
srand((unsigned int) seed); | ||
# if(BINDING_TYPE == BINDING_TYPE_R) | ||
// To suppress Found ‘srand’, possibly from ‘srand’ (C). | ||
(void)((unsigned int) seed); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(void)((unsigned int) seed); | |
(void) seed; |
I think this will avoid the warning too. Correct me if I'm wrong. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This works great.
@@ -332,6 +332,9 @@ PARAM_FLAG("verbose", "Display informational messages and the full list of " | |||
|
|||
#elif(BINDING_TYPE == BINDING_TYPE_R) // This is a R binding. | |||
|
|||
// This doesn't actually matter for this binding type. | |||
#define BINDING_MATRIX_TRANSPOSED true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might want to think about this one again later. The binding it really matters for is nmf
. For NMF, given a matrix V, we want to decompose it into V ~= W * H
. So, we need to make sure that we can just take whatever W
and H
we get from the mlpack NMF binding in R and multiply them together. Anyway, there is no huge need to discuss now, but let's just try and remember that as we get closer to having everything working, we should hand-test the NMF binding from R to make sure that true
is the right value here. 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, This sounds good.
* if (!identical(\<param_name\>, NA)) { | ||
* \<param_name\> = to_matrix_with_info(\<param_name\>) | ||
* CLI_SetParam\<type\>("\<param_name\>", \<param_name\>$info, | ||
* \<param_name\>$data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This maybe is a question for @coatless and @eddelbuettel too. I know that R does not specify types for function arguments (or, well, I think!), but we should probably have a discussion on what's expected to be passed, especially for the type the code here uses: the "categorical matrix".
In mlpack we support categorical features basically by pairing a matrix with a vector of bool
s that indicate which dimensions are categorical and which are numeric. We have some nice structures that make that a little bit easier to work with, but when we interface with other languages, what we have to pass between languages amounts to a bool
array and a matrix.
That part itself isn't hard (the code here already does it), but the more difficult part that involves a bit of "taste" is what the user should pass in to the R function. Should we expect them to pass in what this code expects? (I think that is a data frame with a column $info
and a column (columns?) $data
.) Does that feel "R native"? Or can we do better? I really have no idea, I'm just posing the question. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
R is dynamically typed so ... kinda hard to impose a type.
R has a specific type for categorical variables: factor
. Essentially an int
vector plus a string hash map. So {blue, green, blue, red}
could be decoded as {1,2,1,3}
with the associate vector {"blue", "green", "red"} for "labels" for (1:3). Sadly that factor
type does not travel too well to/from C++.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see---so, does it make sense to try and take a factor
as inputs to our bindings, then do a little bit of mapping? All we really need to pass to the mlpack side is that int
vector; we can leave the string hash map on the R side. I am not sure of the exact representation, but if we can wrangle whatever it is we get as input into an Armadillo matrix plus a boolean vector (which would take value false
for any "normal" numeric dimensions and true
for any factor
dimensions), then I think that is all we'd need to do.
<< "\")" << endl; | ||
MLPACK_COUT_STREAM << endl; | ||
|
||
// Handle each input argument's processing before calling mlpackMain(). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before we start handling each of the input parameters and passing them to mlpack's CLI
object, do we need to check that they are of the right type? What would happen if, e.g., I passed TRUE
to a matrix parameter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Matrix type parameter will be handled properly by to_matrix()
. All thanks to Rcpp
, It is handling "type checking" pretty nicely.
For Example:
> cf(algorithm = 1)
Error in CLI_SetParamString("algorithm", algorithm) :
Expecting a single string value: [type=double; extent=1].
But If you want I can implement a layer of type checking above this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, nice! This looks great to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fixes---everything is looking great. 👍
CMake/R/ConfigureRCPP.cmake
Outdated
|
||
string(REGEX REPLACE "PARAM_MODEL_OUT\\(" "" MODELS_OUT_STRIP1 "${MODELS_OUT}") | ||
string(REGEX REPLACE "," "" MODELS_OUT_STRIP2 "${MODELS_OUT_STRIP1}") | ||
string(REGEX REPLACE "[<>,]" "" MODELS_OUT_SAFE_STRIP2 "${MODELS_OUT_STRIP1}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you push the commit? Maybe I overlooked it but it seems like the code is still in both places.
src/mlpack/bindings/R/CMakeLists.txt
Outdated
file(TOUCH | ||
"${CMAKE_CURRENT_BINARY_DIR}/mlpack/model.txt") | ||
|
||
# Generate configure file from configure.ac. | ||
execute_process(COMMAND autoreconf -i | ||
WORKING_DIRECTORY "${CMAKE_CURRENT_BINARY_DIR}/mlpack/") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, this looks good to me. The only question I still have (and I have no idea of the answer) is, does R ship with autoreconf
? Or is it expected to be available on the system? The only reason I write this comment is, if R does come with autoreconf
in its distribution, then we can try to use that one even if find_package(Autotools)
doesn't find one on the system.
src/mlpack/bindings/R/CMakeLists.txt
Outdated
add_custom_command(TARGET r_build POST_BUILD | ||
COMMAND rm ARGS -Rf "model.txt" | ||
WORKING_DIRECTORY "${CMAKE_CURRENT_BINARY_DIR}/mlpack" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, looks good.
src/mlpack/bindings/R/mlpack/cleanup
Outdated
rm -f src/*.o src/*.so src/*.dylib src/*~ *~ | ||
|
||
## autoconf/configure leftovers | ||
rm -rf autom4te.cache/ config.log config.status src/Makevars confdefs.h |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where will this file be used? I have a concern that it isn't platform-independent, but I'm not sure what it's used for, so maybe my concern doesn't matter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A common (yet optional) part of R packages. If present, script cleanup
can be run at end. Maybe have gotten copied over into MLPACK proper by association :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aye, it was copied over when the shell of the R package was incorporated into the binding, c.f.
4702342
to
6e9ba2e
Compare
6e9ba2e
to
bdb6454
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yashwants19 sorry, I forgot to approve this one---I think that everything here is in good shape.
From my perspective there is still one unresolved discussion (#7 (comment)) but maybe that makes sense to create a separate issue for? It certainly doesn't need to be handled for good in this PR, but it may be worth some discussion on what the best way is to map "categorical matrices" from R to mlpack.
Okay I am merging this now. Thank you for all the help. |
@Yashwants19 did you write up a blog post for Week 5? I don't think I saw anything come through on my end. |
I am really sorry, I didn't post the link here and post it in the issue #5 (Comment) only. BTW this is the link for Week 5 (GSoC 4-5 Part-2). |
Awesome, thanks @Yashwants19! 👍 |
* softmin activation function added * shift added * shift changed for inputmax to inputmin * Test added for softmin forward function * errors fixed * suggested changes * forward function fixed * added softmin to HISTORY.md * forward result calculation source changed * space added in comment * backward function added * backward test function added, values left * Added WeightSize() to linear.hpp atrous_convolution.hpp add.hpp. * tests made similar to softmax * conflict fix * fixed HISTORY.md conflict * tests made similar to softmax * Update activation_functions_test.cpp * Removed header iostream * Apply suggestions from code review Co-authored-by: Marcus Edel <marcus.edel@fu-berlin.de> Co-authored-by: Ryan Curtin <ryan@ratml.org> Include bias term in linear layer. * catch2 for mean_shift_test.cpp * fixed backward function * reverted changes to main/mean_shift_test.cpp * corrected the test cases * main/mean_shift_test.cpp from boost to catch2 * migrated mean_shift_test from boost to catch2 * tests changed * tests changed * Test for WeightSetVisitor and WeightSizeVisitor * Fix common failures by increasing threshold * Typo fix * Auto Cancel build on new push and enable cache for build. * Windows fix. * Install R-bindings dependencies separately. * rcmdcheck doesn't for building mlpack_r_tarball. * Install roxygen2. * Specify platform in windows build. * Stop github actions running on a forked repo. * softmin activation function added shift added shift changed for inputmax to inputmin Test added for softmin forward function errors fixed suggested changes forward function fixed added softmin to HISTORY.md forward result calculation source changed space added in comment backward function added backward test function added, values left tests made similar to softmax conflict fix fixed HISTORY.md conflict tests made similar to softmax Update activation_functions_test.cpp Removed header iostream fixed backward function reverted changes to main/mean_shift_test.cpp corrected the test cases tests changed tests changed * Fix static issues * All static issue fixed (hopefully) * Migrate det and distribution test to catch2 Co-authored-by: Utkarsh Rai <utkarshrai491@gmail.com> Co-authored-by: kartikdutt18 <kartikdutt@live.in> Co-authored-by: Yashwant <yashwantsingh.sngh@gmail.com> Co-authored-by: Ryan Curtin <ryan@ratml.org> Co-authored-by: kartikdutt18 <39593019+kartikdutt18@users.noreply.github.com> Co-authored-by: Ryan Birmingham <birm@gatech.edu> Co-authored-by: jeffin143 <jeffinsam@karunya.edu.in>
This reverts commit 403b11e.
Hi @coatless @eddelbuettel @rcurtin, As per this issue and the timeline this PR is regarding the automatically generation of .cpp and .r files.
I have almost completed the work, but still I have to add proper comments and may be complete some overlooked work. In the following weeks I will add the documentation in .R files.
Thank You.