Make the algorithm actually produce smart behavior #3

cool-RR · 2020-10-11T08:40:16Z

No description provided.

LunarEngineer · 2020-10-11T12:13:25Z

First, I'd like to say that this is interesting, but if this works this will turn into a societal issue.

If you claim that you can predict human behavior then you open the door to preferential treatment of distinct groups.

Second, I think you have a core issue, which if solved, would enable this.

Please understand that this is a modestly informed opinion, and there are likely better ones, but I thought this was intriguing and wanted to lay this out as a potential solution.

That core issue is building an idea of how to trust, and action from, another agent. This will allow you to have agents which can then be rewarded as a function of "I see you and the way you think. You are more successful than me. This is my reward. This is yours." This would be connected directly to an action which moved their parameter weights as a function of the parameter space. Basically, which of your weights do I want to take, and how much.

In code, this would be reasonably easy.

Produce an array of agent weights, map along the actions as a float from zero to one, and then do an 'move x percentage change toward this number' as a function of my nets weights.

Ensure that you've got some samples of an agent doing 'something' (an array of their internal agent hyperparameter values, all current net weights, and a unique identifier per agent) and agents can differentiate that they can trust higher rewarded agents. I would recommend a sample construction strategy such that a linear trend was seen with respect to overall mean change and reward, but a downward trend seen with age. This would provide a sample space for the agents to begin training with, which can allow for learning to trust and adopt behavior patterns.

I think this is very doable, and that following the sampling strategy from above as you train your agents they can see their 'regret' in which they see 'if I changed, this would be my reward', which can allow them to differentiate trust on a by agent basis in a society.

That allows you to associate trust in named agents.

To short circuit this and prevent unnecessary training, ensure that all agents are drawing from a single pool of experience at training time to start, which then transitions to individual experience.

That will allow you to first gradient climb to a point where you can associate trust across a society.

At that point you then need to begin punishment as a random function of the number of times the agent has undergone a transition. This needs to enforce terminal reward signal as the random function approaches higher input.

This will provide the idea of 'sooner or later, you have to just be yourself', while still allowing for change to happen.

An agent which very strongly believes they can change in a better way by changing their weight sets will then still be able to gradient climb by weighting, but there will arrive a decision point where undergoing this change could produce less reward over an infinite timespan than not.

Now you've provided for a differentiable reward signal with respect to societal trust coupled to agent experience.

Ensure that you produce a terminal state as an increasing random function of experience during training only. I would highly recommend that the agent reward, at some point, is a function of rewarding some optimum balance of individuality across the population. You want to allow cliques, but prevent a population of loners, and definitely prevent the hive mind.

Now you've got individualized experience and your agents have a personality, but their actions still feed the communal pool of experience.

This allows for a racial memory which you can keep manageable by randomly dropping items as a function of experience history.

Be careful when you're doing this and ensure that your drop would never shift the base model of the relationship of trust to reward. This prevents your agents from unlearning trust.

I predict that at this point you will see societal structure evolve and you could monitor at this point.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make the algorithm actually produce smart behavior #3

Make the algorithm actually produce smart behavior #3

cool-RR commented Oct 11, 2020

LunarEngineer commented Oct 11, 2020

Make the algorithm actually produce smart behavior #3

Make the algorithm actually produce smart behavior #3

Comments

cool-RR commented Oct 11, 2020

LunarEngineer commented Oct 11, 2020