-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial Quick start Example for R-bindings. #12
Comments
Fantastic stuff. Just ran it. (Minor: Any chance we can rename these to not clash? The following object is masked from ‘package:stats’:
kmeans
The following object is masked from ‘package:base’:
det Caused by |
Minor edits just to "standard style" (which for long-time R users is what ESS encoded, these days of course everybody is using the RStudio defaults :-/) I used Yuge congrats on getting all this done without any other R dependencies. I really like that you stuck to standard *Edit: I was sloppy with the code. Better version below now. suppressMessages({
library("RcppMLPACK")
library("data.table")
})
# Load the dataset from an online URL. Replace with 'covertype.csv.gz' if you
# want to use on the full dataset.
#df <- fread("https://www.mlpack.org/datasets/covertype-small.csv.gz")
df <- fread("https://www.mlpack.org/datasets/covertype.csv.gz")
# Split the labels.
labels <- df[, .(label)]
dataset <- df[, label:=NULL]
# Split the dataset using RcppMLPACK.
prepdata <- preprocess_split(input = dataset, input_labels = labels,
test_ratio = 0.3, verbose = TRUE)
# Train a random forest.
output <- random_forest(training = prepdata$training,
labels = prepdata$training_labels,
print_training_accuracy = TRUE,
num_trees = 10,
minimum_leaf_size = 3,
verbose = TRUE)
rf_model <- output$output_model
# Predict the labels of the test points.
output <- random_forest(input_model = rf_model, test = prepdata$test, verbose = TRUE)
# Now print the accuracy. The third return value ('probabilities'), which we
# ignored here, could also be used to generate an ROC curve.
correct <- sum(output$predictions == prepdata$test_labels)
cat(correct, "out of", length(prepdata$test_labels), "test points correct",
correct / length(prepdata$test_labels) * 100.0, "%\n") |
Thank You. :)
Here @rcurtin can tell us more about renaming these bindings. |
This looks awesome. Thank you for the great edit |
Regarding Food for thought: Maybe we keep |
This sounds valid. I was also thinking about the same. I am really bad at naming part, should I replace |
How about uppercasing |
Awesome! It works great. I'd be a bit hesitant to do manual things like uppercasing some of the letters---ideally, we don't need to do any modification of the original bindings as produced by mlpack. Perhaps we should just prefix each name with |
May I suggest we all take a step back, take a deep breath, and count to sixty. There are 16000 R packages. Collisions happen. There are things we can control, and others we cannot. We may well clash (as a package) which another package also providing But clashing with base R is another matter. Stomping on baseR There are a few things we can, or cannot do. The easiest may be to not export as There are other choices. I recommend discussing a few before baby-bathwater style jumping to broad conclusions. [1] For argument's sake cleaner load as in R> library(RcppMLPACK)
R> without side-effects can be had by not exporting: edd@rob:~/git/rcppmlpack-gsos2020(master)$ git diff NAMESPACE
diff --git a/NAMESPACE b/NAMESPACE
index c799fc3..c4f98f3 100644
--- a/NAMESPACE
+++ b/NAMESPACE
@@ -9,7 +9,7 @@ export(cf)
export(dbscan)
export(decision_stump)
export(decision_tree)
-export(det)
+#export(det)
export(emst)
export(fastmks)
export(gmm_generate)
@@ -24,7 +24,7 @@ export(image_converter)
export(kde)
export(kernel_pca)
export(kfn)
-export(kmeans)
+#export(kmeans)
export(knn)
export(krann)
export(lars)
edd@rob:~/git/rcppmlpack-gsos2020(master)$ |
I'm not really picky on what the best way to handle this issue is, but, from my end it is key that we don't manually change names of just a couple of packages if it's avoidable. All of those exported functions are automatically generated, and if we want to change just one or two names, we have to integrate that into our automatic binding generation system, and we should ideally avoid that---it's kind of a kludge, and it breaks consistency with all of the rest of the bindings in other languages. For instance, in Python it's I hope that what I'm writing here makes sense---consistency is key. So however @Yashwants19 wants to solve it is fine with me, so long as we are consistent in the way I described above. 👍 |
I hear you. One option is also to leave as is, and you get consistently created names from MLPACK. Which is a plus. Two of those clobber and hide base R function. Which is a minus. The rest, as always in life, is choices and trade-offs and reasonable people may disagree over best practices. |
There is even this (and I since reverted to an unchanged
That loads RcppMLPACK and all its exported functions except those listed in |
Which do you think is "best" from the R perspective? (I realize it is a tradeoff 😄) If |
Nobody knows about exclude. My personal recommendation is to rename, or not to export. (But I am very hard nosed on this aspect.) Let's take a break, maybe come back in a day. |
Can we use |
I've always thought camelcasing is a bit weird, because it's hard to get right---technically SVM is an acronym, so it should be |
@rcurtin we can prefix all functions with either
I'm in favor of I will strongly say that we should avoid conflicting with any base R package. So, prefixing or capitalization seems to be where we need to go. (This change should be backported to the |
@rcurtin for the love of $deity please do not make this a snake_case vs camelCase discussion. This. Is. A. Much. Narrower. Question. We have
My personal preference would be to
This is the last message I plan to write on this as I have too many other bike sheds to paint 😁 |
I really don't think it is in this case. I'm not sure how I can write this message in a way that can signal to you that I'm not in it for the sake of arguing but for the sake of finding the way forward, but please rest assured that I have attempted to write and rewrite this response about three times to figure it out. I hope I've done an okay job. We're on the same side and we have the same goals! I promise. My personal preference is also the same as yours:
But with a slight modification, which the reason that I brought camelcase and capitalization and all the other options onto the table: we have control over the mlpack side so we can actually avoid conflicts! Based on what @coatless suggested, if we prefix with
So that would be my vote. If we all agree that that's a reasonable way forward, then I think we are set here. |
Needlessly clutters the user's namespace. Stylistically-speaking worse than the status quo. "Cleaner" but less usable. Sometimes the only way out. I don't think we are in that situation "but opinions about the shape of the earth differ". Also, vi and emacs. |
👍 I see---since I have little R context, I wouldn't have understood those things. Thank you for clarifying. It is possible (although it implies some extra code to create) to maintain a "list" on the mlpack side of function names that don't get exported; we'd also have to make clear in the autogenerated documentation that (a) the name interferes with other R functionality and (b) it should be used as I don't feel like I have enough R context to choose what's best here (my preference that I wrote above remains, but it's a weak preference). I think all the pros and cons of each approach are well documented now. Maybe @Yashwants19 should decide what he thinks is best here since he is the one of us who lives in "both worlds"? :) |
We can also punt for now and revisit later. |
Re-focusing to the markdown docs, @Yashwants19 let's place the above example into a vignette that ships with the package. Let's ship it "pre-made" instead of rendering it with the package. Could you try adding it in a similar way to the Rcpp package? https://github.com/RcppCore/Rcpp/tree/master/vignettes/rmd |
Also, do we want to ship data with the package? I noticed we're downloading data from a remote source: temp <- tempfile()
download.file("https://www.mlpack.org/datasets/covertype-small.csv.gz", temp)
df <- read.csv(temp) We could probably use examples from https://www.mlpack.org/datasets/ that are less than 1 MB in size. Glancing at the included R data sets and the example directory, I think only data(package="datasets") |
Yes, I like pre-made vignettes as they are simpler. They rely on Sweave which comes with R, and basic latex/pdf processing. Simpler, in the sense of fewer files, examples are in the anytime package and others. And the basic "key" to how it works was written up by my pal Mark here in a short blog post -- in essence just a latex/pdf trick to point to an existing pdf file. Please poke us if you have questions. I am doing the same in a few packages. (Re the data set download; I think downloading ot perfectly fine for a static demo or even for a recurrent demo. In every build it would be a bit much. Ambivalent as to whether we need to ship that data set....) |
I don't think we need to ship the dataset. This quickstart example is kind of intended as a "copy-pastable" example that you can immediately run and then perhaps adapt to your needs. (Plus, as the documentation points out you can change |
I am really sorry for the late reply, I think so vignettes doesn't require for building the package(from mlpack repo) or vignettes doesn't require any auto-generation touch. Hence it might be useful if we add these files directly to Further, I will follow mentor's advice for the same. |
@Yashwants19 do you mean you'd like to ship By the way, if we can set up some scripts to directly build the R bindings and send them off to CRAN, a separate mlpack/mlpack-r repository might be unnecessary. But I think we can figure that out when we get there. 😄 |
Subsample it one more time and include a 200kb (compressed) data set ? |
|
@Yashwants19 @rcurtin let's avoid shipping data. Doing so will likely open up a new can of worms. We can revisit on a subsequent CRAN release/update or during the revision period. I think the key here would be to just ensure a pre-made vignette of the quickstart guide -- that sprouted from this issue -- is made available: https://github.com/Yashwants19/mlpack/pull/10/files#diff-808b284b15b5e4942bed147539b9665d Since it is pre-made, we're not having to worry about data sizes or including data inside the package. |
Hello everyone,
We are almost ready with R-bindings, but one fine review is still required. And one more important task is that we are left with is markdown documentation, which is not directly related to the package that we going distribute.
For building the package from source, you can go through these steps:
Currently you can also try to install R-bindings from github which is pretty straightforward; you can just use:
R -e 'devtools::install_github("Yashwants19/RcppMLPACK")'
Lets see a quickstart example. After installing the package, you can copy-paste this code directly into R to run it.
Thank You :).
The text was updated successfully, but these errors were encountered: