-
Notifications
You must be signed in to change notification settings - Fork 0
Reselection
You may create multiple solution sets, either manually or more usually through doing N-fold cross validation or asking for multiple solutions from an ILP solver. You can use Harvestman to find the common features selected between different solution sets. This will, in theory, yield a set of essential features that any reasonable solution must have.
This is a fairly easy mode of operation. If you wanted to do 5-fold cross validation on the Thousand Genomes dataset, you could run:
./HarvestmanConsole train -l super_pop -m 500 -i 0.3 -X 5 -v <data-directory>
Then, in the same directory, run this command to find the common features between all folds:
./HarvestmanConsole reselect -v .
This will output another feature selection set with the set intersection of all the features from the previous runs.