Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GSoC 4-5 #7

Merged
merged 8 commits into from
Jul 13, 2020
Merged

GSoC 4-5 #7

merged 8 commits into from
Jul 13, 2020

Conversation

Yashwants19
Copy link
Owner

Hi @coatless @eddelbuettel @rcurtin, As per this issue and the timeline this PR is regarding the automatically generation of .cpp and .r files.
I have almost completed the work, but still I have to add proper comments and may be complete some overlooked work. In the following weeks I will add the documentation in .R files.

Thank You.

Copy link

@rcurtin rcurtin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @Yashwants19, it's so much easier to review this a piece at a time instead of the whole thing at once like with the Go bindings! 😄

I hope that the comments I've left here are useful. Everything is looking good to me... somehow I get the idea that you have done this before... 😄


string(REGEX REPLACE "PARAM_MODEL_OUT\\(" "" MODELS_OUT_STRIP1 "${MODELS_OUT}")
string(REGEX REPLACE "," "" MODELS_OUT_STRIP2 "${MODELS_OUT_STRIP1}")
string(REGEX REPLACE "[<>,]" "" MODELS_OUT_SAFE_STRIP2 "${MODELS_OUT_STRIP1}")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure how feasible this is, but maybe it is worth considering refactoring this into a common CMake function which we can then call from both this file and AppendSerialization.cmake (and other places where similar searching for model types is done)?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merged both the functions.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you push the commit? Maybe I overlooked it but it seems like the code is still in both places.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, It seems earlier I missed those files. But Now, I pushed the changes in the last commit(9522fd9).

add_custom_command(TARGET r_build POST_BUILD
COMMAND rm ARGS -Rf "model.txt"
WORKING_DIRECTORY "${CMAKE_CURRENT_BINARY_DIR}/mlpack"
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way we can do this with, e.g., ${CMAKE_COMMAND} -E <remove> or similar? The rm won't necessarily be compatible with all systems.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally missed this. I have updated this, as suggested.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, looks good.

file(TOUCH
"${CMAKE_CURRENT_BINARY_DIR}/mlpack/model.txt")

# Generate configure file from configure.ac.
execute_process(COMMAND autoreconf -i
WORKING_DIRECTORY "${CMAKE_CURRENT_BINARY_DIR}/mlpack/")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I missed this last PR (or maybe I didn't and this is the second time I write this, so, sorry if so!). Can we expect that autoreconf will be available based on checking for R and Rcpp like we did earlier? Will we know that it's available on the PATH? I worry about this somehow not being available.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added FindAutotools.cmake to find autoreconf before building R-bindings.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, this looks good to me. The only question I still have (and I have no idea of the answer) is, does R ship with autoreconf? Or is it expected to be available on the system? The only reason I write this comment is, if R does come with autoreconf in its distribution, then we can try to use that one even if find_package(Autotools) doesn't find one on the system.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure about this, that whether R does come with autoreconf or not, May be @coatless or @eddelbuettel can enlighten us more about this.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From https://cran.r-project.org/doc/manuals/r-patched/R-admin.html#DOCF68:

R has embedded autoconf-2.69 from 2012 and automake-1.16.1 from 2018. Generally, these are invoked by R via R CMD {build,check}. That said, I'm not sure they will be found on the PATH for non-Linux systems.

Copy link

@eddelbuettel eddelbuettel Jul 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. You do not require autotools.
  2. autotools is never "required"
  3. It can optionally be used by an R package because as optional script configure can be present
  4. When present, it will be used. It simple is not required to be present.
  5. The only requirement is the name configure and the '0755' mode to run it.
  6. Some package has shell scripts named configure.

I don't have much more time now, but happy to continue on a quick call this eve, or another eve.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, that could be helpful. Let me know what time works for you. I have a few questions (that I have hopefully made coherent) about how the R package build process works.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @eddelbuettel for the discussion this morning about the details of what's going on under the hood in the R side. I believe I have a far more correct understanding now. I can now ask a much clearer question than I was asking before.

What I am seeing is that the existing RcppMLPACK package---and other Rcpp packages I look at---ship with a configure script already: https://github.com/Yashwants19/RcppMLPACK/blob/master/configure

That is, when a user downloads that package from CRAN, it then runs the pre-provided configure script. R doesn't use autotools to generate the configure from configure.ac, or anything like that. However, that configure script I linked to is far longer and more complex than the configure.ac file. Thus, it makes sense from the mlpack side to only include configure.ac, and then at the time of mlpack's build to produce the R bindings, we can run autotools (whether it is the one provided by R or not) in order to go from configure.ac -> configure, which we can then distribute as part of the CRAN package.

@Yashwants19 all of this to say, I think it was the right thing to add FindAutotools.cmake since we aren't guaranteed to have that at the time we are building mlpack. Thank you for doing that. 👍 It would be great if we could also update the search location to try and find the embedded version that R ships too, based on this comment: #7 (comment)

However, I don't think that's necessary to do right now. Up to you. (Honestly, most systems that have CMake available will also have autotools available, so finding the embedded R version may not ever be necessary in practice.)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[ If one uses autoconf to generate configure from configure.ac (as is common) then yes, configure becomes large--100kb or more. But it is also selfcontained, no other binary needed, And yes, the configure will be pre-made by the author / packager and the user generally never ever builds it. And, just to re-iterate, with R we can have scripts called configure that can be anything as long as they have this name, and are executable. And again, they are optional. We do not have to have one if we don;t need one. ]

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have removed configure.ac file.

@@ -12,7 +12,7 @@ SystemRequirements: A C++11 compiler. Versions 4.8.*, 4.9.* or later of GCC
License: GPL (>= 2)
Depends: R (>= 3.2.0)
Imports: Rcpp (>= 0.12.12)
LinkingTo: Rcpp, RcppArmadillo, BH
LinkingTo: Rcpp, RcppArmadillo, BH, RcppEnsmallen
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super awesome that this is available; thanks @coatless!

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot @coatless!

@@ -1,4 +1,4 @@
CXX_STD = CXX11

PKG_CXXFLAGS = -I. @OPENMP_FLAG@
PKG_CXXFLAGS = -I. @OPENMP_FLAG@ -DNDEBUG
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to add -DNDEBUG, should we also add -O3 and similar? I am not sure if R's build process will already do that.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Internally R's uses these build flags:

g++ -std=gnu++11 -I"/usr/share/R/include" -DNDEBUG -I"/home/asus/R/x86_64-pc-linux-gnu-library/3.6/Rcpp/include" -I"/home/asus/R/x86_64-pc-linux-gnu-library/3.6/RcppArmadillo/include" -I"/home/asus/R/x86_64-pc-linux-gnu-library/3.6/BH/include" -I"/home/asus/R/x86_64-pc-linux-gnu-library/3.6/RcppEnsmallen/include" -I. -fopenmp -DNDEBUG -fpic -g -O2 -fdebug-prefix-map=/build/r-base-jbaK_j/r-base-3.6.3=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g -c RcppExports.cpp -o RcppExports.o

Copy link

@eddelbuettel eddelbuettel Jul 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(And I should add that some are "coming down on us" from the system -- I for the Debian package I inherit some from the default compiler settings.) And then we add a few layer by layer. Here of course we have the compund effect of

  • the system
  • R
  • Rcpp
  • RcppArmadillo
  • RcppEnsmallen

so it grows :)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yashwants19 @rcurtin let's avoid explicitly adding flags, e.g. -DNDEBUG and -O3, since R should add these flags by default depending on the platform. I say this as we want to let R handle setting flags as much as possible due to the CRAN test suite easily being angered if any newly added flags lead to a non-portable R package.

For @OPENMP_FLAG@, I think this is coming from the configure.ac check? If so, this will correctly insert/suppress the SHLIB_OPENMP_CXXFLAGS flag depending on whether OpenMP is present or not.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have removed these flags.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 understood, thanks for the explanation!

@@ -41,7 +41,12 @@ inline void RandomSeed(const size_t seed)
{
#if (!defined(BINDING_TYPE) || BINDING_TYPE != BINDING_TYPE_TEST)
randGen.seed((uint32_t) seed);
srand((unsigned int) seed);
# if(BINDING_TYPE == BINDING_TYPE_R)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# if(BINDING_TYPE == BINDING_TYPE_R)
#if (BINDING_TYPE == BINDING_TYPE_R)

srand((unsigned int) seed);
# if(BINDING_TYPE == BINDING_TYPE_R)
// To suppress Found ‘srand’, possibly from ‘srand’ (C).
(void)((unsigned int) seed);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
(void)((unsigned int) seed);
(void) seed;

I think this will avoid the warning too. Correct me if I'm wrong. :)

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works great.

@@ -332,6 +332,9 @@ PARAM_FLAG("verbose", "Display informational messages and the full list of "

#elif(BINDING_TYPE == BINDING_TYPE_R) // This is a R binding.

// This doesn't actually matter for this binding type.
#define BINDING_MATRIX_TRANSPOSED true
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might want to think about this one again later. The binding it really matters for is nmf. For NMF, given a matrix V, we want to decompose it into V ~= W * H. So, we need to make sure that we can just take whatever W and H we get from the mlpack NMF binding in R and multiply them together. Anyway, there is no huge need to discuss now, but let's just try and remember that as we get closer to having everything working, we should hand-test the NMF binding from R to make sure that true is the right value here. 👍

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, This sounds good.

* if (!identical(\<param_name\>, NA)) {
* \<param_name\> = to_matrix_with_info(\<param_name\>)
* CLI_SetParam\<type\>("\<param_name\>", \<param_name\>$info,
* \<param_name\>$data)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This maybe is a question for @coatless and @eddelbuettel too. I know that R does not specify types for function arguments (or, well, I think!), but we should probably have a discussion on what's expected to be passed, especially for the type the code here uses: the "categorical matrix".

In mlpack we support categorical features basically by pairing a matrix with a vector of bools that indicate which dimensions are categorical and which are numeric. We have some nice structures that make that a little bit easier to work with, but when we interface with other languages, what we have to pass between languages amounts to a bool array and a matrix.

That part itself isn't hard (the code here already does it), but the more difficult part that involves a bit of "taste" is what the user should pass in to the R function. Should we expect them to pass in what this code expects? (I think that is a data frame with a column $info and a column (columns?) $data.) Does that feel "R native"? Or can we do better? I really have no idea, I'm just posing the question. :)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

R is dynamically typed so ... kinda hard to impose a type.

R has a specific type for categorical variables: factor. Essentially an int vector plus a string hash map. So {blue, green, blue, red} could be decoded as {1,2,1,3} with the associate vector {"blue", "green", "red"} for "labels" for (1:3). Sadly that factor type does not travel too well to/from C++.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see---so, does it make sense to try and take a factor as inputs to our bindings, then do a little bit of mapping? All we really need to pass to the mlpack side is that int vector; we can leave the string hash map on the R side. I am not sure of the exact representation, but if we can wrangle whatever it is we get as input into an Armadillo matrix plus a boolean vector (which would take value false for any "normal" numeric dimensions and true for any factor dimensions), then I think that is all we'd need to do.

<< "\")" << endl;
MLPACK_COUT_STREAM << endl;

// Handle each input argument's processing before calling mlpackMain().
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before we start handling each of the input parameters and passing them to mlpack's CLI object, do we need to check that they are of the right type? What would happen if, e.g., I passed TRUE to a matrix parameter?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Matrix type parameter will be handled properly by to_matrix(). All thanks to Rcpp, It is handling "type checking" pretty nicely.
For Example:

> cf(algorithm = 1)
Error in CLI_SetParamString("algorithm", algorithm) : 
  Expecting a single string value: [type=double; extent=1].

But If you want I can implement a layer of type checking above this.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, nice! This looks great to me.

Copy link

@rcurtin rcurtin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fixes---everything is looking great. 👍


string(REGEX REPLACE "PARAM_MODEL_OUT\\(" "" MODELS_OUT_STRIP1 "${MODELS_OUT}")
string(REGEX REPLACE "," "" MODELS_OUT_STRIP2 "${MODELS_OUT_STRIP1}")
string(REGEX REPLACE "[<>,]" "" MODELS_OUT_SAFE_STRIP2 "${MODELS_OUT_STRIP1}")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you push the commit? Maybe I overlooked it but it seems like the code is still in both places.

file(TOUCH
"${CMAKE_CURRENT_BINARY_DIR}/mlpack/model.txt")

# Generate configure file from configure.ac.
execute_process(COMMAND autoreconf -i
WORKING_DIRECTORY "${CMAKE_CURRENT_BINARY_DIR}/mlpack/")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, this looks good to me. The only question I still have (and I have no idea of the answer) is, does R ship with autoreconf? Or is it expected to be available on the system? The only reason I write this comment is, if R does come with autoreconf in its distribution, then we can try to use that one even if find_package(Autotools) doesn't find one on the system.

add_custom_command(TARGET r_build POST_BUILD
COMMAND rm ARGS -Rf "model.txt"
WORKING_DIRECTORY "${CMAKE_CURRENT_BINARY_DIR}/mlpack"
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, looks good.

rm -f src/*.o src/*.so src/*.dylib src/*~ *~

## autoconf/configure leftovers
rm -rf autom4te.cache/ config.log config.status src/Makevars confdefs.h
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where will this file be used? I have a concern that it isn't platform-independent, but I'm not sure what it's used for, so maybe my concern doesn't matter.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A common (yet optional) part of R packages. If present, script cleanup can be run at end. Maybe have gotten copied over into MLPACK proper by association :)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aye, it was copied over when the shell of the R package was incorporated into the binding, c.f.

https://github.com/Yashwants19/mlpack/tree/8a46560c62575a5ec3eec5bedb5915ceb95f3041/src/mlpack/bindings/R/mlpack

Copy link

@rcurtin rcurtin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yashwants19 sorry, I forgot to approve this one---I think that everything here is in good shape.

From my perspective there is still one unresolved discussion (#7 (comment)) but maybe that makes sense to create a separate issue for? It certainly doesn't need to be handled for good in this PR, but it may be worth some discussion on what the best way is to map "categorical matrices" from R to mlpack.

@Yashwants19
Copy link
Owner Author

Okay I am merging this now. Thank you for all the help.

@Yashwants19 Yashwants19 merged commit 08cc204 into R-bindings Jul 13, 2020
@coatless
Copy link

@Yashwants19 did you write up a blog post for Week 5? I don't think I saw anything come through on my end.

@Yashwants19
Copy link
Owner Author

I am really sorry, I didn't post the link here and post it in the issue #5 (Comment) only. BTW this is the link for Week 5 (GSoC 4-5 Part-2).
Thank You :)

@rcurtin
Copy link

rcurtin commented Jul 14, 2020

Awesome, thanks @Yashwants19! 👍

Yashwants19 added a commit that referenced this pull request Oct 19, 2020
* softmin activation function added

* shift added

* shift changed for inputmax to inputmin

* Test added for softmin forward function

* errors fixed

* suggested changes

* forward function fixed

* added softmin to HISTORY.md

* forward result calculation source changed

* space added in comment

* backward function added

* backward test function added, values left

* Added WeightSize() to

linear.hpp
atrous_convolution.hpp
add.hpp.

* tests made similar to softmax

* conflict fix

* fixed HISTORY.md conflict

* tests made similar to softmax

* Update activation_functions_test.cpp

* Removed header iostream

* Apply suggestions from code review

Co-authored-by: Marcus Edel <marcus.edel@fu-berlin.de>
Co-authored-by: Ryan Curtin <ryan@ratml.org>
Include bias term in linear layer.

* catch2 for mean_shift_test.cpp

* fixed backward function

* reverted changes to main/mean_shift_test.cpp

* corrected the test cases

* main/mean_shift_test.cpp from boost to catch2

* migrated mean_shift_test from boost to catch2

* tests changed

* tests changed

* Test for WeightSetVisitor and WeightSizeVisitor

* Fix common failures by increasing threshold

* Typo fix

* Auto Cancel build on new push and enable cache for build.

* Windows fix.

* Install R-bindings dependencies separately.

* rcmdcheck doesn't for building mlpack_r_tarball.

* Install roxygen2.

* Specify platform in windows build.

* Stop github actions running on a forked repo.

* softmin activation function added

shift added

shift changed for inputmax to inputmin

Test added for softmin forward function

errors fixed

suggested changes

forward function fixed

added softmin to HISTORY.md

forward result calculation source changed

space added in comment

backward function added

backward test function added, values left

tests made similar to softmax

conflict fix

fixed HISTORY.md conflict

tests made similar to softmax

Update activation_functions_test.cpp

Removed header iostream

fixed backward function

reverted changes to main/mean_shift_test.cpp

corrected the test cases

tests changed

tests changed

* Fix static issues

* All static issue fixed (hopefully)

* Migrate det and distribution test to catch2

Co-authored-by: Utkarsh Rai <utkarshrai491@gmail.com>
Co-authored-by: kartikdutt18 <kartikdutt@live.in>
Co-authored-by: Yashwant <yashwantsingh.sngh@gmail.com>
Co-authored-by: Ryan Curtin <ryan@ratml.org>
Co-authored-by: kartikdutt18 <39593019+kartikdutt18@users.noreply.github.com>
Co-authored-by: Ryan Birmingham <birm@gatech.edu>
Co-authored-by: jeffin143 <jeffinsam@karunya.edu.in>
Yashwants19 pushed a commit that referenced this pull request Oct 19, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants