Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GA converges to too many features #3

Open
GoogleCodeExporter opened this issue Mar 15, 2016 · 2 comments
Open

GA converges to too many features #3

GoogleCodeExporter opened this issue Mar 15, 2016 · 2 comments

Comments

@GoogleCodeExporter
Copy link

What steps will reproduce the problem?
1. Run the GA on any data set

What is the expected output? What do you see instead?
A subset of features is expected, but too many features appear to be selected.


Possible reasons:
JackKnifing is causing over-fitting
Somewhere the training cost is used to assess a genome instead of the test cost
Perhaps because EvolutionBestPerf was renamed EvolutionBestCostTest, and 
training cost was stored in EvolutionBestCost? 

Original issue reported on code.google.com by alistair...@gmail.com on 28 Oct 2011 at 10:25

@GoogleCodeExporter
Copy link
Author

The early stop criterion was using EvolutionBestCost instead of 
EvolutionBestCostTest - this may have been the source of the error.

Original comment by alistair...@gmail.com on 28 Oct 2011 at 10:37

@GoogleCodeExporter
Copy link
Author

The crsov_SP function, which calculates the "children" genomes from the genomes 
chosen for crossover, is using maximal genetic distance. The implementation can 
be understood as:

Get genetic distance between all N parent genomes
for i -> N
 - Get parent genome maximally distant from parent genome i
 - Cross-over these two genomes to produce two new genomes
 - Place these two new genomes into children genome population
end

Specifically to this issue, this means that when the GA begins to converge, it 
is more and more likely to combine the good genomes with a single, maximally 
distant genome. If this maximally distant genome has a useless feature, the 
feature may be incorporated into the "good" genome population because of the 
frequency of crossing over with the single maximally distant genome, and 
because additional features may not necessarily worsen performance.

Potential fix involves removing maximally distant genomes as they are 
crossed-over so each genome produces exactly two children - but this is just a 
theory.

Original comment by alistair...@gmail.com on 30 Nov 2011 at 4:04

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant