GA converges to too many features #3

GoogleCodeExporter · 2016-03-15T18:04:26Z

What steps will reproduce the problem?
1. Run the GA on any data set

What is the expected output? What do you see instead?
A subset of features is expected, but too many features appear to be selected.


Possible reasons:
JackKnifing is causing over-fitting
Somewhere the training cost is used to assess a genome instead of the test cost
Perhaps because EvolutionBestPerf was renamed EvolutionBestCostTest, and 
training cost was stored in EvolutionBestCost?

Original issue reported on code.google.com by alistair...@gmail.com on 28 Oct 2011 at 10:25

The text was updated successfully, but these errors were encountered:

GoogleCodeExporter · 2016-03-15T18:04:26Z

The early stop criterion was using EvolutionBestCost instead of 
EvolutionBestCostTest - this may have been the source of the error.

Original comment by alistair...@gmail.com on 28 Oct 2011 at 10:37

GoogleCodeExporter · 2016-03-15T18:04:26Z

The crsov_SP function, which calculates the "children" genomes from the genomes 
chosen for crossover, is using maximal genetic distance. The implementation can 
be understood as:

Get genetic distance between all N parent genomes
for i -> N
 - Get parent genome maximally distant from parent genome i
 - Cross-over these two genomes to produce two new genomes
 - Place these two new genomes into children genome population
end

Specifically to this issue, this means that when the GA begins to converge, it 
is more and more likely to combine the good genomes with a single, maximally 
distant genome. If this maximally distant genome has a useless feature, the 
feature may be incorporated into the "good" genome population because of the 
frequency of crossing over with the single maximally distant genome, and 
because additional features may not necessarily worsen performance.

Potential fix involves removing maximally distant genomes as they are 
crossed-over so each genome produces exactly two children - but this is just a 
theory.

Original comment by alistair...@gmail.com on 30 Nov 2011 at 4:04

GoogleCodeExporter added Type-Defect auto-migrated Priority-High Usability labels Mar 15, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GA converges to too many features #3

GA converges to too many features #3

GoogleCodeExporter commented Mar 15, 2016

GoogleCodeExporter commented Mar 15, 2016

GoogleCodeExporter commented Mar 15, 2016

GA converges to too many features #3

GA converges to too many features #3

Comments

GoogleCodeExporter commented Mar 15, 2016

GoogleCodeExporter commented Mar 15, 2016

GoogleCodeExporter commented Mar 15, 2016