-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GSoC 6-7 #8
GSoC 6-7 #8
Conversation
5637bc6
to
c128054
Compare
@Yashwants19 so, the main part that jumps out at me from https://gist.github.com/Yashwants19/b52606bce1412c1972b8bdab728d992e are interweaved examples with descriptions. We're often seeking to separate out the code from the documentation, e.g. #' @title AdaBoost
#'
#' @description
#' This program implements the AdaBoost (or Adaptive Boosting) algorithm. The
#' variant of AdaBoost implemented here is AdaBoost.MH. It uses a weak learner,
#' either decision stumps or perceptrons, and over many iterations, creates a
#' strong learner that is a weighted ensemble of weak learners. It runs these
#' iterations until a tolerance value is crossed for change in the value of the
#' weighted training error.
#'
#' @param input_model Input AdaBoost model.
#' @param iterations The maximum number of boosting iterations to be run (0 will run
#' until convergence.) Default value "1000".
#' @param labels Labels for the training set.
#' @param test Test dataset.
#' @param tolerance The tolerance for change in values of the weighted error during
#' training. Default value "1e-10".
#' @param training Dataset for training AdaBoost.
#' @param verbose Display informational messages and the full list of parameters and
#' timers at the end of execution. Default value "FALSE".
#' @param weak_learner The type of weak learner to use: 'decision_stump', or
#' 'perceptron'. Default value "decision_stump".
#'
#' @return A list with several components:
#' \item{output}{Predicted labels for the test set.}
#' \item{output_model}{Output trained AdaBoost model.}
#' \item{predictions}{Predicted labels for the test set.}
#' \item{probabilities}{Predicted class probabilities for each point in the test
#' set.}
#'
#' @details
#' This program allows training of an AdaBoost model, and then application of
#' that model to a test dataset. To train a model, a dataset must be passed
#' with the "training" option. Labels can be given with the "labels" option; if
#' no labels are specified, the labels will be assumed to be the last column of
#' the input dataset. Alternately, an AdaBoost model may be loaded with the
#' "input_model" option.
#'
#' Once a model is trained or loaded, it may be used to provide class
#' predictions for a given test dataset. A test dataset may be specified with
#' the "test" parameter. The predicted classes for each point in the test
#' dataset are output to the "predictions" output parameter. The AdaBoost model
#' itself is output to the "output_model" output parameter.
#'
#' Note: the following parameter is deprecated and will be removed in mlpack
#' 4.0.0: "output".
#' Use "predictions" instead of "output". additional
#'
#' @references
#' For more information about the algorithm, see the paper "Improved Boosting
#' Algorithms Using Confidence-Rated Predictions", by R.E. Schapire and Y.
#' Singer.
#'
#' @author
#' MLPACK Developers
#'
#' @export
#' @examples
#' # For example, to run AdaBoost on an input dataset "data" with labels
#' # "labels"and perceptrons as the weak learner type, storing the trained model
#' # in "model", one could use the following command:
#'
#' output <- adaboost(training=data, labels=labels,
#' weak_learner="perceptron")
#' model <- output$output_model
#'
#' # Similarly, an already-trained model in "model" can be used to provide class
#' # predictions from test data "test_data" and store the output in "predictions"
#' # with the following command:
#'
#' output <- adaboost(input_model=model, test=test_data)
#' predictions <- output$predictions Also, we'll need to make sure each example takes less than ~5 seconds to run. If the example takes more than 5 seconds, we're going to need to protect it with #' @examples
#' \donttest{
#' output <- adaboost(input_model=model, test=test_data)
#' predictions <- output$predictions
#' } The In terms of the package build stage, the other options to use would be:
There was a shift away from using |
Hi @coatless, Thank you for the great example, I will update the documentation as soon as possible. |
@Yashwants19 @coatless I'm not sure exactly how we can do this; the "input" from the C++ side looks like this: PROGRAM_INFO("AdaBoost",
// Short description.
"An implementation of the AdaBoost.MH (Adaptive Boosting) algorithm for "
"classification. This can be used to train an AdaBoost model on labeled "
"data or use an existing AdaBoost model to predict the classes of new "
"points.",
// Long description.
"This program implements the AdaBoost (or Adaptive "
"Boosting) algorithm. The variant of AdaBoost implemented here is "
"AdaBoost.MH. It uses a weak learner, either decision stumps or "
"perceptrons, and over many iterations, creates a strong learner that is a "
"weighted ensemble of weak learners. It runs these iterations until a "
"tolerance value is crossed for change in the value of the weighted "
"training error."
"\n\n"
"For more information about the algorithm, see the paper \"Improved "
"Boosting Algorithms Using Confidence-Rated Predictions\", by R.E. Schapire"
" and Y. Singer."
"\n\n"
"This program allows training of an AdaBoost model, and then application of"
" that model to a test dataset. To train a model, a dataset must be passed"
" with the " + PRINT_PARAM_STRING("training") + " option. Labels can be "
"given with the " + PRINT_PARAM_STRING("labels") + " option; if no labels "
"are specified, the labels will be assumed to be the last column of the "
"input dataset. Alternately, an AdaBoost model may be loaded with the " +
PRINT_PARAM_STRING("input_model") + " option."
"\n\n"
"Once a model is trained or loaded, it may be used to provide class "
"predictions for a given test dataset. A test dataset may be specified "
"with the " + PRINT_PARAM_STRING("test") + " parameter. The predicted "
"classes for each point in the test dataset are output to the " +
PRINT_PARAM_STRING("predictions") + " output parameter. The AdaBoost "
"model itself is output to the " + PRINT_PARAM_STRING("output_model") +
" output parameter."
"\n\n"
"Note: the following parameter is deprecated and "
"will be removed in mlpack 4.0.0: " + PRINT_PARAM_STRING("output") +
"."
"\n"
"Use " + PRINT_PARAM_STRING("predictions") + " instead of " +
PRINT_PARAM_STRING("output") + '.' +
"\n\n"
"For example, to run AdaBoost on an input dataset " +
PRINT_DATASET("data") + " with labels " + PRINT_DATASET("labels") +
"and perceptrons as the weak learner type, storing the trained model in " +
PRINT_MODEL("model") + ", one could use the following command: "
"\n\n" +
PRINT_CALL("adaboost", "training", "data", "labels", "labels",
"output_model", "model", "weak_learner", "perceptron") +
"\n\n"
"Similarly, an already-trained model in " + PRINT_MODEL("model") + " can"
" be used to provide class predictions from test data " +
PRINT_DATASET("test_data") + " and store the output in " +
PRINT_DATASET("predictions") + " with the following command: "
"\n\n" +
PRINT_CALL("adaboost", "input_model", "model", "test", "test_data",
"predictions", "predictions"),
// See also...
SEE_ALSO("AdaBoost on Wikipedia", "https://en.wikipedia.org/wiki/AdaBoost"),
SEE_ALSO("Improved boosting algorithms using confidence-rated predictions "
"(pdf)", "http://rob.schapire.net/papers/SchapireSi98.pdf"),
SEE_ALSO("Perceptron", "#perceptron"),
SEE_ALSO("Decision Stump", "#decision_stump"),
SEE_ALSO("mlpack::adaboost::AdaBoost C++ class documentation",
"@doxygen/classmlpack_1_1adaboost_1_1AdaBoost.html")); So, basically, even in our sources, the examples are interleaved with the "long description". Unless @Yashwants19 has a better idea it seems like maybe the only option here is to add some kind of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally I think the implementation looks great. My only comments are really minor.
It seems to me like this would be the right time to make sure that the documentation looks "R-native"; so, @coatless and @eddelbuettel, if you have additional comments about the generated documentation @Yashwants19 showed, now would be a great time to work out those issues. :)
inline std::string ParamString(const std::string& paramName) | ||
{ | ||
// For a R binding we don't need to know the type. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like this is an extra line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed.
src/mlpack/bindings/go/print_doc.hpp
Outdated
@@ -16,7 +16,7 @@ | |||
#include <mlpack/prereqs.hpp> | |||
#include <mlpack/core/util/hyphenate_string.hpp> | |||
#include "get_go_type.hpp" | |||
#include "camel_case.hpp" | |||
#include <mlpack/bindings/utils/camel_case.hpp> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, there also exists the directory src/mlpack/core/util/
which contains things in the namespace mlpack::core::util
. Maybe it is better here to match that name with mlpack::bindings::util
? (Instead of having util
there and utils
here.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
#ifndef MLPACK_BINDINGS_GO_CAMEL_CASE_HPP | ||
#define MLPACK_BINDINGS_GO_CAMEL_CASE_HPP | ||
#ifndef MLPACK_BINDINGS_UTILS_CAMEL_CASE_HPP | ||
#define MLPACK_BINDINGS_UTILS_CAMEL_CASE_HPP |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a nice cleanup---I appreciate that you've done this. It reduces the overall amount of code which is great. 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank You.
if (prefix.size() >= 80) | ||
{ | ||
throw std::invalid_argument("Prefix size must be less than 80"); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch! I imagine that was kind of irritating to debug... :)
I was also looking for the same. I will love to discussion that how this can done in the best way. |
I haven't had time to look in any detail but I was also hoping that maybe a two step process of first doing what is done at the source level here, and then letting R's roxygen2 generate the help pages and maybe reorder in the process may do the trick. Maybe that'll work. |
a788479
to
75fa69f
Compare
0e52b1d
to
5db856e
Compare
Hi @rcurtin I have done some rough work for "example" on the source side, Please take a look. |
@@ -2,7 +2,7 @@ | |||
* @file bindings/tests/clean_memory.cpp | |||
* @author Ryan Curtin | |||
* | |||
* Delete any pointers held by the CLI object. | |||
* Delete any pointers held by the IO object. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure, whether these CLI's are ignored or these are intentional. And If they are ignored should I open a different PR or we are good here with these changes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it looks to me like there were some things that were accidentally missed. Do you want to open a PR to the main repository with those things? Alternately I can do it, just let me know---I can see that these were missed, and also the names of the utility functions in src/mlpack/bindings/julia/mlpack/cli.jl.in
. I'm not sure if the Python utility functions need to be adapted too; I didn't check. Anyway let me know what you'd like to do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will open a PR soon for the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @Yashwants19, everything looks great to me. Many of the comments I left are perhaps larger scope than just this PR, so, let me know what you think of them. We can always open issues in the main mlpack repository to handle some of those other things, or, even, we could handle them some other time in this repo, or maybe not at all depending on what you think. 👍
@@ -30,17 +30,17 @@ if (NOT (MODEL_FILE_TYPE MATCHES "\"${MODEL_SAFE_TYPES}\"")) | |||
set(MODEL_PTR_IMPLS "${MODEL_PTR_IMPLS} | |||
// Get the pointer to a ${MODEL_TYPE} parameter. | |||
// [[Rcpp::export]] | |||
SEXP CLI_GetParam${MODEL_SAFE_TYPE}Ptr(const std::string& paramName) | |||
SEXP IO_GetParam${MODEL_SAFE_TYPE}Ptr(const std::string& paramName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for taking the time to do this---I know it is a bit of a tedious refactoring. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:)
@@ -108,7 +108,7 @@ The example above has only shown a little bit of the functionality of mlpack. | |||
Lots of other commands are available with different functionality. A full list | |||
of commands and full documentation for each can be found on the following page: | |||
|
|||
- <a href="https://mlpack.org/doc/mlpack-git/cli_documentation.html">IO documentation</a> | |||
- <a href="https://mlpack.org/doc/mlpack-git/cli_documentation.html">CLI documentation</a> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahhh, thank you. Actually do you want to submit this one as a patch batch to the mlpack repository?
@@ -2,7 +2,7 @@ | |||
* @file bindings/tests/clean_memory.cpp | |||
* @author Ryan Curtin | |||
* | |||
* Delete any pointers held by the CLI object. | |||
* Delete any pointers held by the IO object. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it looks to me like there were some things that were accidentally missed. Do you want to open a PR to the main repository with those things? Alternately I can do it, just let me know---I can see that these were missed, and also the names of the utility functions in src/mlpack/bindings/julia/mlpack/cli.jl.in
. I'm not sure if the Python utility functions need to be adapted too; I didn't check. Anyway let me know what you'd like to do.
mlpack::util::ProgramDoc \ | ||
io_programdoc_dummy_object = mlpack::util::ProgramDoc(NAME, SHORT_DESC, \ | ||
[]() { return std::string(DESC) + std::string(EXAMPLE); }, []() \ | ||
{ return ""; }, { __VA_ARGS__ }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, this is one option for what I was thinking. A couple further thoughts:
- It looks like this only allows one
EXAMPLE
, so we don't have a way to split into multiple examples. - The number of parameters to
PROGRAM_INFO
is getting large. What would you think about splittingPROGRAM_INFO
into a bunch of macros, like this:
BINDING_NAME("program name");
BINDING_SHORT_DESC("This is a short, two-sentence description of what the program does.");
BINDING_LONG_DESC("This is a long description of what the program does."
" It might be many lines long and have lots of details about different options.");
BINDING_EXAMPLE("This contains one example for this particular binding.\n" +
PROGRAM_CALL(...));
BINDING_EXAMPLE("This contains another example for this particular binding.\n" +
PROGRAM_CALL(...));
// There could be many of these "see alsos".
BINDING_SEE_ALSO("https://en.wikipedia.org/wiki/Machine_learning");
It would take a little bit of refactoring but I wonder if that is a cleaner approach overall. What you've done in this PR is fine for now, I think, but I do think maybe it's worth some discussion about a longer-term cleanup. If you think either of those ideas are useful, and we can work it into a quick proposal, we should open an issue back in the main mlpack repository and then we can do it (I can help with the refactoring of course! :)).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks awesome. We should definitely discuss on this, it will give a great help if you can open a issue in the main mlpack repository. May be it will be better if we handle this in some different PR in the main mlpack repository, as this translation is not directly related to R-bindings. It seems like according to the current progress, I might complete R-bindings before time and this translation may result into a great stretch goal to my timeline.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with that---let me open an issue in the mlpack repository later today. 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome thank you for opening the issue.
@@ -1129,55 +1130,55 @@ using DatasetInfo = DatasetMapper<IncrementPolicy, std::string>; | |||
#ifdef __COUNTER__ | |||
#define PARAM_IN(T, ID, DESC, ALIAS, DEF, REQ) \ | |||
static mlpack::util::Option<T> \ | |||
JOIN(cli_option_dummy_object_in_, __COUNTER__) \ | |||
JOIN(io_option_dummy_object_in_, __COUNTER__) \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahhh, nice catch!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you :)
src/mlpack/methods/cf/cf_main.cpp
Outdated
"\n\n", | ||
// Example. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"\n\n", | |
// Example. | |
// Example. |
Nice, it's really cool how simple this refactoring turned out to be. 👍 Still, I think maybe we can remove the extra newlines now, since the binding printing code should hopefully be handling that for us. Plus, as users add new bindings, they may not know that they need to add newlines to the end of the program documentation.
Anyway, my suggestion here doesn't totally work, because I can't put the comma with the previous line. 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will soon push the changes regarding newline 👍.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have made changes in the last commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, thanks! 👍 I would say whenever you're ready to merge this go ahead. Not sure if you were done here yet or if you wanted to do some more, but in any case, don't feel obligated to wait on another approval from me. :)
"--tolerance option. By default, the transition matrix is randomly " | ||
"initialized and the emission distributions are initialized to fit the " | ||
"extent of the data." | ||
+ PRINT_PARAM_STRING("tolerance") + "option. By default, the transition " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch! 💯
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:)
src/mlpack/bindings/R/print_R.cpp
Outdated
cout << util::HyphenateString(programInfo.documentation(), "#' ") << endl; | ||
cout << "#' @author" << endl; | ||
cout << "#'\n#' @author" << endl; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it would be easier to just add another endl
after HyphenateString
? i.e.
cout << util::HyphenateString(programInfo.documentation(), "#' ") << endl << endl;
or similar. Realistically both of these function just the same, but I think there's something nice about consistency of always using endl
or always using \n
, but not mixing both. Just my opinion. 😄
Co-authored-by: Ryan Curtin <ryan@ratml.org>
b130822
to
f622640
Compare
Okay, lets merge this. :) |
Hey everyone, here is the link for my weekly blog. Kindly share your views. Thank you. |
Awesome, thanks! If you like, it might be useful to include some example usages from R that now work, etc., for any reader who might not want to dig into the details. However, up to you. 👍 |
Sure I will update the blog post. |
Hi @coatless @eddelbuettel @rcurtin, As per the timeline this PR is regarding the automatically generation of documentation in .R files. Here is an example : https://gist.github.com/Yashwants19/b52606bce1412c1972b8bdab728d992e
I have almost completed the work. This PR also contain some commits regarding #7 which I will rebase after merging GSoC 4-5 in R-Bindings. In the following weeks I will add the markdown documentation.
Thank You.