kfold¶ ↑

kfold creates K-fold splits from data files and assists in training and testing (useful for cross-validation in supervised machine learning)

Command overview¶ ↑

help                 Display global or [command] help documentation.		
split                Split a data file into K partitions		
test                 Apply trained models on a dataset previously split using kfold		
train                Train models on a dataset previously split using kfold

Example usage¶ ↑

10-fold cross-validation of the standard MaltParser on a treebank named shuffled.c32.conll may be done as follows:

kfold split -f -i shuffled.c32.conll --fold -d '\n\n'
kfold train -f --base shuffled.c32.conll -- java -jar ~/Tools/malt-1.4.1/malt.jar -c %B.model_%N -i %T -m learn
kfold test -f --base shuffled.c32.conll -- java -jar ~/Tools/malt-1.4.1/malt.jar -c %B.model_%N -i %T -o %O -m parse
eval07.pl -q -g shuffled.c32.conll -s shuffled.c32.conll.output

The MaltParser does not like to put its models in a subdirectory, so rather than using the standard model files suggested by kfold (%M), we construct custom non-nested model filenames using %B.model_%N.

Command details¶ ↑

The following is simply the output of the built-in help commands.

Splitting data files¶ ↑

NAME:

  split

DESCRIPTION:

  Given the data file INPUT, the partitions are written to files named INPUT.parts/{01..K}

SYNOPSIS:

  kfold split -i INPUT [options]

EXAMPLES:

# Split the file sample.txt into 4 parts
kfold split -k4 sample.txt

# Split the double-newline-delimited file sample.conll into 10 parts
kfold split -d"\n\n" sample.conll

OPTIONS:

-i, --input FILE 
    Data file to split

-k, --parts N 
    The number of partitions desired

-d, --delimiter DELIM 
    String used to separate individual entries (newline per default)

-g, --granularity N 
    Ensure the number of entries in each partition is divisible by N (useful for block-structured data)

-f, --overwrite 
    Remove existing parts prior to executing

--fold 
    Additionally, create K folds of K-1 parts in a another folder

--parts-name STRING 
    Use the given name as suffix for the partitions folder created

--folds-name STRING 
    Use the given name as suffix for the folds folder created

Training on the folds¶ ↑

NAME:

  train

DESCRIPTION:

  Given training data previously split in K parts and folds, train K models on the K folds

  Certain keywords in the training command and its arguments are interpolated at runtime:

   * %N  - fold number, e.g. '01'
   * %F  - fold filename, e.g. 'brown.train/01'
   * %I  - alias for %F
   * %M  - model filename, e.g. 'brown.models/01'
   * %B  - basename (as specified on the command line), e.g. 'brown'

SYNOPSIS:

  kfold train --base NAME [options] -- CMD [--CMD-OPTIONS] [CMD-ARGS]

EXAMPLES:

# Train MaltParser for cross-validation
kfold train -f --base shuffled.c32.conll -- java -jar ~/Tools/malt-1.4.1/malt.jar -c %B.model_%N -i %T -m learn

OPTIONS:

-f, --overwrite 
    Remove existing models prior to executing

--base NAME 
    Default prefix of training folds and model files

--folds-name SUFFIX 
    Look for folds {01..K} in the folder BASE.SUFFIX

--models-name SUFFIX 
    Yield model names as BASE.SUFFIX/{01..K} as interpolation pattern %M

Testing the models on their reciprocal data file parts¶ ↑

NAME:

  test

DESCRIPTION:

  Process K parts of a split datafile using K previously trained models.

  Certain keywords in the testing command and its arguments are interpolated at runtime:

   * %N  - part number, e.g. '01'
   * %T  - part filename, e.g. 'brown.test/01'
   * %I  - alias for %T
   * %O  - output filename, e.g. 'brown.outputs/01'
   * %M  - model filename, e.g. 'brown.models/01'
   * %B  - basename (as specified on the command line), e.g. 'brown'

SYNOPSIS:

  kfold test --base NAME [options] -- CMD [--CMD-OPTIONS] [CMD-ARGS]

EXAMPLES:

# Apply trained MaltParser models for cross-validation
kfold test -f --base shuffled.c32.conll -- java -jar ~/Tools/malt-1.4.1/malt.jar -c %B.model_%N -i %T -o %O -m parse

OPTIONS:

-f, --overwrite 
    Remove existing test output prior to executing

--base NAME 
    Default prefix of model files and test outputs

--parts-name SUFFIX 
    Look for parts {01..K} to be processed in the folder BASE.SUFFIX

--models-name SUFFIX 
    Yield model names as BASE.SUFFIX/{01..K} as interpolation pattern %M

--outputs-name SUFFIX 
    Yield output filenames as BASE.SUFFIX/{01..K} as interpolation pattern %O

--output-name SUFFIX 
    Put the concatenated output of all models in BASE.SUFFIX

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.idea		.idea
bin		bin
lib		lib
spec		spec
.gitignore		.gitignore
.rspec		.rspec
CHANGELOG		CHANGELOG
Gemfile		Gemfile
LICENSE		LICENSE
Manifest		Manifest
README.rdoc		README.rdoc
Rakefile		Rakefile
kfold.gemspec		kfold.gemspec

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

kfold¶ ↑

Command overview¶ ↑

Example usage¶ ↑

Command details¶ ↑

Splitting data files¶ ↑

Training on the folds¶ ↑

Testing the models on their reciprocal data file parts¶ ↑

About

Releases

Packages

Languages

crishoj/kfold

Folders and files

Latest commit

History

Repository files navigation

kfold¶ ↑

Command overview¶ ↑

Example usage¶ ↑

Command details¶ ↑

Splitting data files¶ ↑

Training on the folds¶ ↑

Testing the models on their reciprocal data file parts¶ ↑

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages