Skip to content
main
Switch branches/tags
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
R
 
 
 
 
 
 
 
 
 
 
man
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

BigKnn

Build Status codecov.io

BigKnn is part of HADES.

Introduction

An R package implementing a large scale k-nearest neighbor (KNN) classifier using the Lucene search engine.

Features

  • Build KNN classifiers of arbitrary scale (up to millions of rows, millions of features)
  • Fast classification performance due to use of highly optimized search engine (Lucene)
  • Supports both weighted and unweighted KNN

Examples

covariates <- data.frame(rowIds = c(1,1,1,2,2,3),
                         covariateIds = c(10,11,12,10,11,12),
                         covariateValues = c(1,1,1,1,1,1))

outcomes <- data.frame(rowIds = c(1,2,3),
                       y = c(1,0,0))
					   
dataForPrediction <- Andromeda::andromeda(covariates = covariates, 
                                          outcomes = outcomes)

indexFolder <- "s:/temp/lucene"

buildKnn(outcomes = dataForPrediction$outcomes,
         covariates = dataForPrediction$covariates,
         indexFolder = indexFolder)

prediction <- predictKnn(outcomes = dataForPrediction$outcomes,
                         covariates = dataForPrediction$covariates,
                         indexFolder = indexFolder,
                         k = 10,
                         weighted = TRUE)

Technology

BigKnn is an R package using the Java based Lucene search engine. The data for the KNN is stored in a folder on the local file system.

System Requirements

Running the package requires R with the package rJava installed. Also requires Java 1.8 or higher.

Installation

  1. See the instructions here for configuring your R environment, including Java.

  2. Use the following commands in R to install the BigKnn package:

install.packages("remotes")
remotes::install_github("ohdsi/BigKnn")

User Documentation

Documentation can be found on the package website.

PDF versions of the documentation are also available:

Support

Contributing

Read here how you can contribute to this package.

License

BigKnn is licensed under Apache License 2.0. Lucene fall under its own Apache License 2.0.

Development

BigKnn is being developed in R Studio and Eclipse

Development status

Stable.

About

An R package implementing a large scale k-nearest neighbor classifier using the Lucene search engine

Topics

Resources

Stars

Watchers

Forks