Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vignette update #64

Closed
eddelbuettel opened this issue Nov 16, 2020 · 9 comments
Closed

Vignette update #64

eddelbuettel opened this issue Nov 16, 2020 · 9 comments

Comments

@eddelbuettel
Copy link
Owner

While release 0.0.17 went out nicely and without issues thanks to the updates in uwot and BiocNeighbors, we did not get around to updating the vignette for the newly added header file.

@eddelbuettel
Copy link
Owner Author

A first stab at it is at 4c52fe5

@LTLA comments welcome. Your vignette after all ;-)

@LTLA
Copy link
Contributor

LTLA commented Nov 17, 2020

Oh yeah. Look at how much code got deleted! It's beautiful.

I wonder if there's a convenient way that we could locally double-check that the code in there actually compiles. Would be kind of embarrassing if we forgot a semicolon or something. Currently, I have to manually copy the code over and sourceCpp it:

#include "Rcpp.h"
#include "RcppAnnoy.h"
#include <vector>
#include <algorithm>

typedef float ANNOYTYPE;

typedef
AnnoyIndex<int, ANNOYTYPE, Euclidean, Kiss64Random,
           AnnoyIndexSingleThreadedBuildPolicy>
MyAnnoyIndex;

// [[Rcpp::depends(RcppAnnoy)]]
// [[Rcpp::export(rng=false)]] 
int thingy(Rcpp::NumericMatrix mat, int c, int K, Rcpp::NumericVector Q) {
    const size_t nsamples=mat.nrow();
    const size_t ndims=mat.ncol();

    MyAnnoyIndex obj(ndims);
    // from <vector>        
    std::vector<ANNOYTYPE> tmp(ndims); 
    for (size_t i=0; i<nsamples; ++i) {
        Rcpp::NumericMatrix::Row cr=mat.row(i);
        // from <algorithm>
        std::copy(cr.begin(), cr.end(), tmp.begin()); 
        obj.add_item(i, tmp.data());
    }
    obj.build(50);

    obj.save("annoy.index");
    MyAnnoyIndex obj2(ndims);
    obj2.load("annoy.index"); // same as 'obj'.

    std::vector<int> neighbor_index;
    std::vector<ANNOYTYPE> neighbor_dist;
    obj.get_nns_by_item(c, K + 1, -1, &neighbor_index,
                        &neighbor_dist); 

    std::vector<ANNOYTYPE> QUERY(Q.begin(), Q.end());
    ANNOYTYPE* query = QUERY.data();
    obj.get_nns_by_vector(query, K+1, -1, 
                          &neighbor_index,
                          &neighbor_dist);                                    
    return 1;
}

This exercise was already useful as I noticed that we never tell people where the AnnoyIndexThreadedBuildPolicy type comes from. Is there a reason why we couldn't just define this in RcppAnnoy.h as well? Then all downstream packages would respond easily and consistently to the setting of the ANNOYLIB_MULTITHREADED_BUILD macro in some site-wide Makevars.

@eddelbuettel
Copy link
Owner Author

Sorry, first morning only settling -- how would we check 'if it compiles' without compiling? Do you mean a package-level unit test?

As for defining the Policy typedef there: yeah, I briefly thought about that too. Could it clash with upstream though?

@LTLA
Copy link
Contributor

LTLA commented Nov 18, 2020

how would we check 'if it compiles' without compiling? Do you mean a package-level unit test?

What I did above was to copy and paste the C++ code from the vignette into its own .cpp file, add the extra frills (the #includes, function definition, return statement) and then run that through sourceCpp(). The idea is simply to check whether we made a mistake in our demonstration code in the vignette. This check doesn't have to be part of the package build process; we can just run it once every time we have a new release, just to satisfy ourselves that the vignette is not discussing uncompilable code. And to catch omissions like that Policy thing.

I use this strategy to check that my C++ code is correct in the beachmat vignette. In fact, I remember the first version of the RcppAnnoy vignette had similar behavior, but was removed to simplify passage through CRAN. We could restore that functionality but keep it disabled for regular build/checks, and only turn it on once a release to check that the vignette compiles and yields a runnable function. It would save future us from having to do the above copy-and-pasting exercise to check the vignette's code correctness.

Could it clash with upstream though?

I guess you could call it RcppAnnoyIndexSingleThreadedPolicy, and we'd be pretty confident that it won't clash with whatever Annoy decides to call things. From a downstream package's perspective, I have to change the #includes anyway, so slapping Rcpp onto the class name doesn't bother me.

@eddelbuettel
Copy link
Owner Author

Thanks for documenting where that file came from. We already have a test environment with tinytest, I may slap it in there. (I still prefer my default 'run' of builds to be light via --no-manual --no-vignettes.)

Prefixing is a good idea. I could take the current block from my source file in the package

#ifdef ANNOYLIB_MULTITHREADED_BUILD
  typedef AnnoyIndexMultiThreadedBuildPolicy AnnoyIndexThreadedBuildPolicy;
#else
  typedef AnnoyIndexSingleThreadedBuildPolicy AnnoyIndexThreadedBuildPolicy;
#endif

typedef Annoy<int32_t, float,    Angular,   Kiss64Random, AnnoyIndexThreadedBuildPolicy> AnnoyAngular;
typedef Annoy<int32_t, float,    Euclidean, Kiss64Random, AnnoyIndexThreadedBuildPolicy> AnnoyEuclidean;
typedef Annoy<int32_t, float,    Manhattan, Kiss64Random, AnnoyIndexThreadedBuildPolicy> AnnoyManhattan;
typedef Annoy<int32_t, uint64_t, Hamming,   Kiss64Random, AnnoyIndexThreadedBuildPolicy> AnnoyHamming;

and prefix all with Rcpp (and maybe shorten them, how about RcppAnnoyThreadingPolicy ?) and also rename the four typedefs to RcppAnnoyAngular etc. Seems indeed cleanest.

@LTLA
Copy link
Contributor

LTLA commented Nov 19, 2020

Re. the vignette: the cleanest solution may be to insert the "frills" as unevaluated and hidden code chunks in the vignette. This won't affect your build process and it won't affect the vignette's appearance. Then, when we want to check the vignette's accuracy, we just knitr::purl() the vignette to extract all the C++ code into a single file and compile that. This gives us the ability to dynamically collate the C++ source from the vignette without the tedious (and error-prone) copy-pasting that I had to do manually.

@eddelbuettel
Copy link
Owner Author

There are many possibilities to skin this cat but as stated, I generally do not even load texlive in CI and hence skip vignettes.

@eddelbuettel
Copy link
Owner Author

Re. the vignette: the cleanest solution may be to insert the "frills" as unevaluated and hidden code chunks in the vignette.

Or as an 'appendix' and actually compile it. We have plenty of 'space' as the vignette is currently down to three columns (see here for the current pdf -- and I did spend two minutes on the .bib file and updated that.)

Can you contribute a quick 'sample call' for these "frills"? A matrix and vector to pass in. Should we maybe return something?

@eddelbuettel
Copy link
Owner Author

Taken care of via #66 and earlier commits. Nice work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants