[View in Colaboratory](https://colab.research.google.com/github/act65/Notes/blob/master/evolution_of_collaboration.ipynb)

In [0]:
import numpy as np
import numpy.random as rnd
import matplotlib.pyplot as plt

import logging
logger = logging.getLogger()
logger.setLevel(logging.WARNING)

Trading makes no sense without it providing an advantage. The advantage (for all) is the increased efficiency from specialisation.

Goal: maximise individual wealth (measured by ?)

#### Reputation

- Necessary: (multi-agent, resource and info (gossip) exchange, private ownership, conserved resources, ...?)
- Trajectory: Steal -> Reputation
- Problem it solves: Disincentivise bad actors

#### Safe trade dances

- Necessary: (bounded memory for assesing trust)
- Trajectory: Reputation -> Safe trade dances
- Problem it solves: reputation does not scale with many agents (requires too much memory)

#### Debt

- Necessary: accurate memory
- Trajectory: Safe trade dances -> Debt/credit/lending/borrowing
- Prblem it solves: ??


#### (Ac)counting

- Necessary: ?
- Trajectory: Debt -> Accounting
- Problem it solves: Keeping track of many debtors/credits (can get messy quickly)

#### Money

- Necessary: (resource and info (gossip) exchange?, the ability to count, 
- Trajectory: Accounting -> Money
- Problem it solves: Factorises information in the item-item trade matrix (like a rank 1 decomposition)

#### Market (aka collaboration???)
 
- Necessary: (Harvest, safe exchange (aka trust?), bounded memory for assesing value)
- Trajectory: ?
- Problem it solves: individuals value estimates do not scale well with many resources (requires too much memory)

#### Arbitration

- Necessary: Bounded memory, partial info, 
- Trajectory: ? -> Arbitration
- Problem it solves: Inaccurate estimates of resource values (but how does that help individuals/the rest of the market?)

Other questions

- How is a market like a message passing algorithm?


In [0]:
def trade(r1, r2, t):
  """
  Want a representation that;
  - can smoothly go from un-safe trades to safe ones to money based trade.
  - that captures the rules of the trade, and not necessarily what is being traded.
  - That can be learned (need a space of trading schemes).

  Where does the money come from!? 
  It is a decomposition of t into what a1 and a2 value?
  t = np.outer(a.resource_values, b.resource_values) 
  
  Other thoughts;
  - what about 3 party trades? How to facilitate with a two party system?
  - What about mediators? And tusted third parties?
  - are linear functions succifient to caputre meaningful trades?
  - Problem is that reputation is a local fn and money is a globally agreed upon standard. 
  (that sounds similar to the gossip algorithm!?)
  
  Args:
    r1 (np.array): resources of agent 1. shape = 
    r1 (np.array): resources of agent 2. shape = 
    t (np.array):
    
  Returns:
    
  """
  # this is where marcus' work would fit in?!
  # simulating how the actual resources are echanged. 
  # a richer representation than just using transition matrices.
  # would allow for a finer trajectory? steal -> holding -> safe trade ->
  
  
  # represent as a graph, with t as the transition matrix.
  # sum of the stacked columns must be 1.
  
  r = np.hstack([r1, r2])  # could zip instead? would make easier to !??!
  
  
  r_ = np.dot(t, r)
  return tuple(np.split(r_, 2))

In [4]:
a, b = rnd.randint(0, 10, 4), rnd.randint(0, 10, 4)

X = np.fliplr(np.eye(2))
S = np.array([[1,1], [0,0]])
I = np.eye(len(a))

# no trade
nt = np.eye(len(a)*2)

# steal
st = np.kron(S, I)

# swap all
sw = np.kron(X, I)

# share
sh = np.kron(X, I)/2 + np.eye(len(a)*2)/2


# if it is being traded, a non zero in an off diagonal block. else identity
# t = np.array([
#   [1, 0, 0, 0.75],
#   [0, 0.25, 0, 0],  # trade 0.75 x rid2 for 0.25 x rid1
#   [0, 0.75, 1, 0],
#   [0, 0, 0, 0.25]
# ])



print(a, b)
trade(a, b, st)

[7 6 9 9] [8 3 0 1]


(array([15.,  9.,  9., 10.]), array([0., 0., 0., 0.]))

In [11]:
# problem. what you say you want to trade might be different to what you want do trade? (might cheat?)

def trade(r1, r2, t1, t2=None):
  if t2 is None:
    t2 = t1.T
  return r1 - np.dot(t1, r1) + np.dot(t2, r2), r2 - np.dot(t2.T, r2) + np.dot(t1, r1)
  
  
a, b = rnd.randint(0, 10, 4), rnd.randint(0, 10, 4)
print(a, b)

# no trade
t = np.eye(4)



# 
trade(a, b, rnd.permutation(np.eye(4)), rnd.permutation(np.eye(4)))




[4 8 3 2] [9 5 5 7]


(array([8., 5., 4., 9.]), array([5., 8., 4., 0.]))

In [0]:
def expected_reward(inputs):
  """
  This is where trust and value is captured. 
  (some sort of decomposition would be nice!? potential value and likelihood)
  
  Since the inputs are who, how, what, ... then which ever gives the 
  best expected reward should be chosen.
  If an agent always trades favourably, then we should learn to assign them a 
  high expected reward. Thus learning to pay attention to the `agent_id`, aka reputation.
  
  Args:
    inputs (np.array): [agent_id, resourc_id_sell, resourc_id_buy]
  
  """
  # could start with table Q values?
  pass

def choose_action(agent):
  """
  For now, only consider one for one trades?
  Or use 0, 2, 4, 6, 8?
  Will need to look into cts RL at some point?!
  """
  for i in range(n_agents):  # for each potential buyer, who
      for j in len(n_resources):  # for each items to sell
        for k in len(n_resources):  # for each items to sell
          Q[i,j,k] = expected_reward(i, j, k)
  return np.argmax(Q + noise)

In [0]:
def evaluate_trading_scheme(t):
  """
  A and B trade with C, but C rips A and B off. A and B tell D that C is a bad actor. 
  Next time C wants to make a trade, C chooses to trade with the agents who have 
  the best estimated reward (a tradeoff between price and trust?).

  To maximise expected reward (via trust), E comes up with a trade system that ...

  
  Aka, we use our Q functions to provide gradients, or act as a fitness fn, to evaluate various trade dances!?
  """
  
  # How is t constructed/agreed upon?!? !!!
  # Want to have each agent give some input/make a chose about t.
  # But for starters maybe just generate randomly and select fittest?
  pass

In [0]:
def diminishing_rewards(x):
  """
  As x increases y increases in diminishing amounts.
  This function is being used to caputre the increase in 
  skill of an agent as it harvests the same resource.
  """
  # this should be normalised and bounded. need to find a better fn
  # maybe discretise it to the natural numbers as well?
  n = 1.1
  return ((x+1)**(1-n) - 1)/(1-n)

def softmax(x):
  return np.exp(x)/np.sum(np.exp(x))

def cross_entropy(x):
  return -x*np.log(x) - (1-x)*np.log(1-x) 

In [0]:
# Would be faster (but maybe less intuitive) if I could
# frame these calculations from a global perspective.
# Batch them up into larger operations.

class Agent():
  def __init__(self, name, n_resources):
    self.name = name
    
    self.n_resources = n_resources
    self.rnd_explore = 0.5
    
    # harvest counts. counts the number of times each resources has been harvested.
    self.harvest_counts = np.zeros(n_resources)
    # the agents current resources
    self.resources = np.zeros(n_resources)
    
    # the agents estimate of the value of resources
    self.resource_values = rnd.random(n_resources)
    # the agents estimate of ?
    
    
  """
  HARVEST.
  
  Doesnt seem necessary for trading, but is necessary for specialisation?
  """
  def harvest_resource(self, resource_id):
    self.harvest_counts[resource_id] += 1
    logger.info("{}".format(self.harvest_counts))
    # the increase in efficiency with experience is the key!?
    resource = diminishing_rewards(self.harvest_counts[resource_id])
    self.resources[resource_id] += resource
    return resource
  
  def harvest_policy(self):
    # decision should be made based on;
    # who has what resources and what would they sell them for -- suppliers
    # who wants what resources (and how much) -- demanders
    # how much this agent values its own resourses.
    
    # all of that information should be captured in the 
    # current estimate of resource_values?!?
    
    
    # this is a greedy policy. 
    # want more flexibility here? memory for capturing dynamics. a rnn?
    return np.argmax(self.resource_values + self.rnd_explore*rnd.standard_normal(self.n_resources))
  
  def harvest(self):
    # pick a resource and harvest it
    self.harvest_resource(self.harvest_policy())
  
  def random_harvest(self):
    # could use this to make things simpler?
    self.resources += rnd.standard_normal(self.n_resources)
    
  
  """
  TRADE.
  
  Necessary for collaboration? (but how resources are traded is still unconstrained)
  (want to say) is sufficient for specialisation!?!? 
  """  
  def action_policy(self, agent_id, resources):
    # actions: trade, cheat
    # want to use EWA? init at 1.
    a = rnd.randint(0,2)
    logging.info("{}".format(a))
    return a
  
  def trade_policy(self):
    return rnd.random(), rnd.random(), rnd.randint(0,self.n_resources), rnd.randint(0,self.n_resources)
    
  
  def trade_resources(self, agent_id, resource_ids, supply=False, demand=False):
    """
    Args:
      who is the agent trading with?
      resource_id (tuple): (int, int) what is being traded for a for b
    
    """
    # so this happens in two rounds? 
    # 1. gather info from market (or use info from last time?)
    # 2. 
    
    # need to investigate how actual markets work... 
    # no. want to evolve this... but how!?
    
    v = self.resource_values[resource_ids]
    a = np.argmax(self.action_policy(who, resource_ids, v))
    
    # ahh. there is another loop here!?
    # accept reject various offers?
    # v = 
    
    return 0
  
  def trade(self, agent):
    """
    This has a lot of details to be worked out!!
    """
    trade_bool = self.action_policy(agent.name, agent.resources)
    if trade_bool:
      amount_a, amount_b, idx_a, idx_b = self.trade_policy()
      self.resources[idx_b] -= amount_a
      agent.resources[idx_a] += amount_b
      # i dont like the statefulness here. 
      # would prefer to do something more functional!?
      logging.info("Traded {} for {}".format(idx_a, idx_b))
  
  
  """
  VALUE
  """
  def reward(self, true_resource_values):
    """
    Args:
      true_resource_values: Where are these coming from? Expect all to be positive?
    """
    # is not additive?
    # having LOTs of wood far less valuable than
    # having some wood and a few nails
    # want a good distribution over resources.
    
    # seems like it might be too complicated,
    # what about building things out of resources? 
    # is that necessary to capture what I want to?
    # it would relieve the need for some measure of distribution. 
    # could just reward production of 'valuable' items
    
    # this might be diffierentiable? not sure what to do with that...
    
    # what is the right metric here!?!
    # return np.prod(v)
    
    
    
    # want a single local (per agent) reward. which generates trade and specialisation! 

    v = self.resources * true_resource_values
    h = cross_entropy(softmax(v))
    return np.sum(h) + np.sum(v)  # additive relation, or multiplicative or ???

  
  def update_resource_values(self, supply, demand):
    # could learn to do this?
    self.resource_values = np.dot(supply, demand.T)
  
  def value_policy(self, ):
    self.resource_values


In [0]:
class TradeDance():
  def __init__(self):
    pass
  
  def trade(self, A, B):
    # problem is that A/B might need to learn how to use each different dance.
    # (that brings us to three learning loops... how to use a dance, the dances, which dances to prefer)
    self.dance()

In [0]:
def simulate(dance):
  # agents play with the trade dance.
  # the figure out how to exploit their environment to maximise rewards
  agents = init_agents()
  
  
  # learn how to use the dance
  for _ in range(dance_iters):
    harvest()
    
    trade(dance)
    agent.learn()
    
  # estimate steady state behaviour of the dance.
  # how good is it at allowing the agents to get what they want?
  return np.mean([agent.rewards for agent in agents])  # the average still works globally. want to make local!


def evolve_trade():
  dances = ga.generate()
  for _ in iters:
    sims = [simulate(dance) for dance in dances]
    dances = ga.reproduce(dances, fitness=sims)  # agents prefer to use dances that 
    
    

    
# simulations are nice. but how do they help us understand?
# show that it is possible to learn X with A, but not possible with B.
# calculate sensitivity to various parameters
# ?
# how can we bring some more math into this?

In [0]:
n_resources = 12
n_agents = 4