Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Artipixoids utility in Origin of Life research #1

Open
Arcprotorp opened this Issue Jan 14, 2019 · 8 comments

Comments

Projects
None yet
2 participants
@Arcprotorp
Copy link

Arcprotorp commented Jan 14, 2019

Dear Andrey Zamaraev,

I will try to stay short (at least I tried). I am a biology PhD student working on topics in evolutionary and theoretical biology. I was just thinking on a project to try out some "origin of life related" model. Particularly, because our department (Hungary, ELTE, Department of Plant Systematics, Ecology and Theoretical Biology) generally works on cellular automata like simulations, yet most of them are C-based stuff, also requiring supercomputers to execute them in a satisfactory speed.

I was always wondering whether GPUs can be utilized for these tasks, but being a biologist, programming is already a dark magic for me (except the basic usage of python, which I my humble self prefer). Learning GPU programming might exceed my capabilities. However, I was delighted when I saw your Xentica project (including the predecessor EvoLife), it seems to me, they are running pretty smooth, without any "SkyNet supercluster". In short your work is simply awesome, and I really hope to propagate it in my field.

So I was wondering:
-First of all: Whether you might allow me to use the Xentica for my own research
-Is there a way to implement the state of a cell as an array like [2,3,4,3,3,3,1]. Each number stand for a given "gene", and this vector will be used in a cell division where the daughter cells will inherit this as some kind of "random share",like a hypergeometric distribution. Will these array-states will slow the simulation?
-Implement two kind of neighborhood: one is used as intended before, but the other will be used if a cell is dividing, and one of the daughter cells will occupy a random position but no further than this "division distance".

Best regards,
Adam Radvanyi

P.S.: Sorry for any blatant question, as I said, I am not a programmer, I only utilize python environments

@a5kin

This comment has been minimized.

Copy link
Owner

a5kin commented Jan 15, 2019

@Arcprotorp
Welcome to the project discussion, Adam! Really glad to hear from someone interested in genetic CA models. Will give you detailed answer later today.

@a5kin

This comment has been minimized.

Copy link
Owner

a5kin commented Jan 16, 2019

@Arcprotorp
Dear Adam,

I am very impressed, never though it would be the whole department running evolutionary CA simulations. That's pretty cool indeed!

GPU backend is what any simulation field needs now, in my opinion. If properly prepared, sims could run orders of magnitude faster than usual CPU implementations. Two GTX1080ti in your home station could easily outperform CPU cluster - the raw power of 7168 parallel cores really rocks! So, if you run a lot of simulations, you would consider it as a serious option.

Regarding Xentica, of course you are free to use it any way you like including commercial use, MIT license is very permissive. You can also fork the project and modify it as needed. But have to admit, the project is in very early state right now, a lot of features still have to be implemented in order to get it to mature level.

In the next version (0.2) I planned to focus on evolutionary stuff: genome as a cell property, randomly crossover it with neighbors, mutation, per-cell RNGs, etc. The main goal of 0.2 is to run EvoLife model built with Xentica, but really it would end in much more tools for building a wide range of evolutionary models. And it's already multidimensional ;)

So, the scenario you described will be very possible to implement in 0.2 using out-of-box features. The only exception is two neighborhoods at the same time. You must stick to the one only. In example, if the first neighborhood is the subset of second one, you choose the larger one. But there is a catch, large neighborhood may slow the whole simulation significantly, and also model will eat more RAM proportionally to the number of neighbors. My advise here is to think about possible workaround in model logic, and try to reach the same behavior using closest neighborhood only. One possible solution is to use "delayed division". When ready to divide, the cell passing "readiness" flag (along with genome) to a random neighbor. If neighbor is empty, new cell is born there, otherwise it's passing flag to another random neighbor. Or you can use some incrementing counter instead of flag, and let the cell be born only when it reach a threshold. Then what you get, is randomly "wandering" seeds of the new cells.

As for simulation speed, in Xentica it's influenced by 2 major factors:

  1. Single cell in-memory size. More info you encoded into a cell's state, slower it is. Also we have buffered neighbor states per each cell. So, more neighbors cell have, slower sim is too.
  2. The complexity of model's logic. More expressions you use, and harder expressions are, slower it runs. Addition, multiplication, bit operations are cheap. Division, trigonometry are expensive. Conditions also may cause slowdown, due to GPU architecture traits.

Btw, do you have any deadline for your research? The closest date when evolutionary features would be completed is late April, I think. Will start working on 0.2 in February, you could watch (eye icon) Xentica repo to be updated on a progress.

Stay tuned,
Andrey

@Arcprotorp

This comment has been minimized.

Copy link
Author

Arcprotorp commented Jan 17, 2019

Dear Andrey,

Thanks for the reply! To be precise, not a whole department is working on CAs, but its still around at least dozen people (but there are other CA fanatics in the field). As for the deadline, I am not in a hurry, plenty of other work to do, and I also wanted to understand the basics of GPU simulations, as well as your xentica environment, and maybe how CUDA works. So I guess I will be able to keep myself busy until April, especially with those nice features coming :)

Regarding my "lot-to-refine" model, and your concerns:

  1. Yeah I suspected that the main bottleneck will be the information per cell and its neighbors stored in the buffer. However, this latter might be simplified somehow, as I intend to account for only some "general property" of the neighborhood.
  2. Operation wise, geometric means were used in similar models (as a general descriptor of state: how "fit" a cell is, or how "good" the neighborhood is). I do not know how messy this will make it. But i will think about this, and also about using as few conditions as possible.
  3. Your proposal about the mechanism of division might actually act more realistic than mine, which is great.

What I currently has is a GTX1050Ti, but should your Xentica prove itself in this stuff, I see no reason to not to buy a bunch of GPU and just mindlessly explore the parameter space (or playing computer games in 128K). Also CA can be used in a bunch of different manners, I do not get why GPU-based CA sims did not get popular. There are a few articles but no one actually adapted them for a research).

All in all, I will definitely stay tuned, and if a have a more refined idea (and you are interested in), I might post it here.
Adam

@a5kin

This comment has been minimized.

Copy link
Owner

a5kin commented Feb 22, 2019

@Arcprotorp
Greetings, Adam!

I'm still there, just was a bit busy with my music project =) At last, EP is published and I'm ready to rush in Xentica. Will focus on it next few months in my spare time. Hopefully, version 0.2 will be ready before Xentica 2nd anniversary (May 5th).

How things are going at your side? Did you tried to run Xentica/Moire in your environment? If you have any problems running it, please feel free to open an issue, we will try to figure it out. Need to say, GTX1050Ti should be really enough to try all your ideas, and also to run small to medium sized simulations. More powerful cards are only required if you are planning to simulate huge fields (that do not fit in GPU RAM), or if you need for greater speed (single 1080Ti will run 3-4x times faster)

Did you refine your model's idea? If you don't mind to share, it would be great to learn the detailed description of it. Then, I would keep it in mind while developing new features. Just in order to not overlook something. You could also send it to samuraefff at gmail dot com, if you don't like sharing in public.

Cheers,
Andrey

@Arcprotorp

This comment has been minimized.

Copy link
Author

Arcprotorp commented Feb 23, 2019

Good to hear from you Andrey!

I'll check out your music stuff, in fact I was waiting for it :)

Regarding Xentica, I somehow managed to run it, although I had doubts that I can sort everything out in windows (I should really change to linux). The testing was successful (although CUDA drops some UserWarning, but it runs anyway):
Conway's Life: cca. 2.29G cells/s
Shifting Sands: cca. 2.54G cells/s
Moire GUI also works fine, it is so awesome to see a running simulation with your eyes.

Regarding my model, I decided to try implementing it on a simpler scale, using numpy, where at least I can also practice vectorial solutions. It is still in progress, I'll just have to figure out cellular divisions. Yet, I also have other projects to worry about so I am stalling. But for the model itself, I managed to put together a simple "schematic". Originally I made it for my supervisor, so I hope it is not a complete biologibberish:
https://docs.google.com/document/d/17uWV1fJ2lcpmT3111VYcLEf63KYNV_5zrrn2PeJDw7o/edit?usp=sharing

But despite stuck on numpy, I still hope that sooner or later I can switch to Xentica, possibly as the first experiment in evolutionary biology running on a GPU.
Best,
Adam

@a5kin

This comment has been minimized.

Copy link
Owner

a5kin commented Feb 26, 2019

@Arcprotorp
Whoa, that was pretty good results for GTX1050Ti! Do you own a boosted version of it? My friend's one is performing 20% worse for some reason. Maybe other hardware and drivers affecting the speed. Anyway, I'm very glad you ran it. You are probably the first who did it on Windows =)

Thank you so much also for sharing your model's idea. I finally get the basic idea of how it should perform, and it seems to be very interesting. I'll try to implement something similar with Xentica in the process of making new features. Have some doubts though:

  1. Fitness function will turn to zero, if any enzyme group is empty. This means cell will inevitably decay with no more chance to grow. Is it a desirable behaviour or am I missing something? I'd better fix it with some hack like adding 1 to every value in product, then substracting 1 from the rooted result.

  2. The best performance W_max as globally calculated value could cause significant slowdown to the GPU simulation. I'd exchange it with something purely local per each cell. Like maxing the fitness of strict neighbors only. Or even better, to have some "fitness field" powered by separate parameter, that will spread with the 'speed of light' over the board and decay over time. Then, it will produce smoothly blending areas of max fitness around cells. I've anyway planned to implement some field-alike cell's properties in core Xentica. It was scheduled for 0.3 version, which will deal more with physical simulations, but maybe it would be great to have it earlier.

  3. I'm a bit lost with your idea of cells division. First, "SCM division" is barely googlable, found nothing I could use as a clue. Can you please give any refs regarding it? Second, I have no idea why enzyme distribution after the division is hypergeometric. As I see it, we are drawing enzymes one by one, then putting them to one of daugher cells, with constant probability, correct? Then, at most it is binomial distribution over each of enzyme groups. Can you please update me with more detailed info about how enzymes are exactly spread during division phase?

Need to say, I'm glad you chose Numpy over Xentica :) For small-sized models, it is not significant boost in performance with GPU. And also, we both have less pain and more time to see how your model adapts for parallel execution. I will update you as soon as I make some progress with your concept. Btw, could I include your model to official Xentica examples if succeed? If so, what name would you prefer for it?

Warm regards,
Andrey

@Arcprotorp

This comment has been minimized.

Copy link
Author

Arcprotorp commented Feb 27, 2019

@a5kin Well, in fact this GTX1050Ti is running in a Dell G5 15 notebook, and hardly any info is found about it, I did not OC it in any way. However, I noticed that these results were shown only when I was plugged in. Running from battery was much worse.

About your remarks:

  1. Indeed, this is an intended feature, a cell missing any enzyme should not function, because some key reaction (provided by the enzyme) is missing, therefore the cell is to "decade/fade away slowly".
    Note: A later extension is planned, with some "horizontal gene transfer" mechanism, where neighboring cells can "donate" some genes permanently to their neighbors, giving them chance to "reignite" themselves, if they are decaying.

  2. Yeah, I suspected this would emerge as a problem with GPUs. I was thinking on a purely local solution, where everyone's fitness is based on a sigmoid-like curve calculated somehow. The reason for sticking for this global max is that usually this is simple, my colleagues tend to use similar approaches, and it is also simple in numpy. However, almost everyone agrees that this is not the only one solution, so I am entirely open for ideas. Could you elaborate on this "fitness-field"? How does it work in detail?

  3. You are totally right, and I also noticed this, mea culpa. This is definitely a binomial. My brain is not for parallel working, I was correcting homeworks about discrete probability distributions. SCM is the abbreviation of the Stochastic Corrector Model coined by Prof. Eörs Szathmáry, but you just defined exactly what it is meant to do.
    Correponding sources:
    http://originoflife.net/stochastic_corrector/
    https://www.ncbi.nlm.nih.gov/pubmed/2451771

"Btw, could I include your model to official Xentica examples if succeed? If so, what name would you prefer for it?"
Oh, hell yes, it would be an honor! Well, I am still thinking on the name, I guess the Metabolically Coupled Replicator System of Stochastic Corrector Models is a bit too long and cryptographic.
Currently the numpy-based script owns the name "MSE", which should be an abbreviation, of something, which is yet to devise. All suggestions are welcome.

Best regards,
Adam

@a5kin

This comment has been minimized.

Copy link
Owner

a5kin commented Mar 2, 2019

@Arcprotorp
Thanks for the clues to SCM, Adam. And also, congratulations on opening a repo for MSE! I've already peeked into it (looks convincing), although have no much time to dive into details. Stochastic Corrector sounds super cool to me, it's so menacing like a metal band's name =) Let me name the example script/class like this, then I'll just put the full name and references to your model in description.

The CA-powered "fields" is only the idea constantly rumbling in my head, never wrote it down before, neither tried to implement it. The idea itself is simple: each cell has a separate real-valued parameter called "field potential". Then, at each step you 1) find max potential in neighborhood including self; 2) calculate some momentary value for the cell (could be mass, charge or - fitness in your case); 3) calculate some mean between momentary and maxed potential value - that will be the new potential value. Optionally, you could divide corner neighbors by sqrt(2) to let the value propagate in "circles".

What you get in the result (or at least as I imagine it), are field potentials asymptotically following momentary values, and pulling neighbors to the same direction. That's in theory, in practice who knows, need to experiment and check a bunch of modifications like: weighted sum instead of max at part 1, moving/weighted average at part 3, play with params and so on. In case of sum, values are decaying faster, but you get an interesting behavior, like neighbors will feedback in consequent steps. I think it would produce wave-like patterns, which could be good for physical models.

In summary, for each cell you could 1) calculate fitness, 2) apply field dynamics, 3) use current field potential as W_max. The whole process is fully parallel. This would give your model completely different/undesirable behavior though. And also, if you are using NumPy, it's indeed much simpler (and quicker) to calculate the global maximum.

If you'll find something interesting with described "fields" approach, please let me know.

Best regards,
Andrey

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.