Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change Genome.dna from List[int] to numpy.ndarray #158

Closed
jakobj opened this issue Jun 24, 2020 · 8 comments
Closed

Change Genome.dna from List[int] to numpy.ndarray #158

jakobj opened this issue Jun 24, 2020 · 8 comments
Labels
wontfix This will not be worked on

Comments

@jakobj
Copy link
Member

jakobj commented Jun 24, 2020

The dna list has homogeneous types (all entries are integers), so we might as well use a numpy array to store its contents. For example this would make operations on the whole dna, e.g., as suggested in #157, faster.

@jakobj jakobj added the enhancement New feature or request label Jun 24, 2020
@jakobj jakobj added this to the 0.2.0 milestone Jun 24, 2020
@jakobj
Copy link
Member Author

jakobj commented Jun 29, 2020

i think as we make this change we can also change from a one-dimensional representation with our custom access functions to a three-dimensional structure with (rows, cols, genes) dimensions. i think this would simplify the implementation significantly. however, we'd need to think a bit about how to treat input/output nodes as their number might exceed the row count.

@HenrikMettler
Copy link
Contributor

You can assign me, if you want. Imo there are 3 options:

  • Do a 2 dim array: n_unit x n_genes_per_unit this has the disadvantage that levels-back is more complex to apply
  • split the Dna into 3 arrays: 1 for the inputs (n_inputs x 1 x n_genes), 1 for the hidden (n_rows x n_cols x n_genes) 1 for the output (1,1,n_genes) - apart from the obvious disadvantages, this would have the advantage that we could also change the arity of input and output since they are seperate
  • fill inputs and outputs with Nan values to match the dimensionality

@jakobj
Copy link
Member Author

jakobj commented Jul 21, 2020

good suggestions! this is not as easy as i thought as the number of inputs, rows, and outputs can differ, so a 2d array would always have regions that are not used, as you also observe. maybe we should rethink the benefits of changing the representation of the dna. what would we gain over the current implementation by either of the three options?

@HenrikMettler
Copy link
Contributor

I guess I am not telling any news that the theoretical benefits are less memory and faster compute (see eg: https://www.geeksforgeeks.org/python-lists-vs-numpy-arrays/).
But in practical if I am not mistaking only few things happen with the dna per individual per generation: Evaluation to a function (learning-rule) and replacement of some values in mutation. I don't think that in any practical use case this is computationally expensive compared to the fitness evaluation. So in essence i don't think the benefits would be large

@mschmidt87
Copy link
Member

I think that one advantage of a representation as a multi-dimensional array would be that a lot of our code becomes easier. Especially the iter_input_regions, and other iterators become easier because we don't have to do this complicated indexing of the 1-d data structure.

@mschmidt87
Copy link
Member

Any opinions here?

@jakobj
Copy link
Member Author

jakobj commented Jul 31, 2020

in my opinion there's no need to change it from it's current form, but i would be convinced by an implementation that leads to a decrease in code complexity ;)

@jakobj jakobj removed the help wanted Extra attention is needed label Aug 3, 2020
@jakobj jakobj removed this from the 0.2.0 milestone Aug 4, 2020
@jakobj jakobj added wontfix This will not be worked on and removed enhancement New feature or request labels Aug 4, 2020
@jakobj
Copy link
Member Author

jakobj commented Aug 4, 2020

closing for now. feel free to reopen in combination with a concrete implementation.

@jakobj jakobj closed this as completed Aug 4, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
wontfix This will not be worked on
Projects
None yet
Development

No branches or pull requests

3 participants