java-rapidminer-knn

Implementation of the KNN algorithm using RapidMiner

Usage

  docker run --rm --env [list of environment variables] hbpmip/java-rapidminer-knn:0.2.2 compute

where the environment variables are:

NODE: name of the node (machine) used for execution
JOB_ID: ID of the job.
IN_JDBC_DRIVER: org.postgresql.Driver
IN_JDBC_URL: URL to the input database, e.g. jdbc:postgresql://db:5432/features
IN_JDBC_USER: User for the input database
IN_JDBC_PASSWORD: Password for the input database
OUT_JDBC_DRIVER: org.postgresql.Driver
OUT_JDBC_URL: URL to the output database, jdbc:postgresql://db:5432/woken
OUT_JDBC_USER: User for the output database
OUT_JDBC_PASSWORD: Password for the output database
PARAM_variables: Name of the target variable (only one variable is supported for KNN)
PARAM_covariables: List of covariables
PARAM_query: Query selecting the variables and covariables to feed into the algorithm for training.
MODEL_PARAM_k (or PARAM_MODEL_k for compatibility): Number of class labels to search for.

Development process

The goal of this project is to create a Docker image containing the full R environment capable of:

Read parameters from the environment and connect to a database
Query the database and prepare the data
Run the algorithm (here, KNN)
Format the results into a format that can be easily shared. We are using the PFA format here in its JSON form.
Save the results into the result database.

The Docker image should contain the binaries and resources that this algorithm depends on.

The following scripts are provided to help you:

`./build.sh`

The main build script, it packages this project into a Docker image and performs the tests. It requires captain and Docker engine to run. If you cannot install captain on your platform, you may use the following commands to build the project:

  captain build
  # or
  docker build -t hbpmip/java-rapidminer-knn .

`./tests/test.sh`

This script performs the tests. It assumes that the image has been built before using ./build.sh

It executes the Docker image, starts an input database and a result database, then executes the algorithm using sample data for training.

You can run the tests with the command:

  ./tests/test.sh

Validation of the PFA output

Install Titus from OpenDatagroup Hadrian

Titus provides a tool called pfainspector

Check the validity of the PFA output of this algorithm with the following procedure:

Read the yaml document from the database ('data' column)
Convert the document from YAML to JSON, for example using yamltojson.com
Start pfainspector
Load the json

  load lreg_output.json as lreg_output

Validate the json

  pfa validate lreg_output

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

java-rapidminer-knn

Usage

Development process

`./build.sh`

`./tests/test.sh`

Validation of the PFA output

Files

README.md

Latest commit

History

README.md

File metadata and controls

java-rapidminer-knn

Usage

Development process

./build.sh

./tests/test.sh

Validation of the PFA output

`./build.sh`

`./tests/test.sh`