<a href="https://colab.research.google.com/github/Lenguist/insight-game-ai/blob/main/simple_sim_united.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [8]:
#@title rules
"""
The Simple Simulation

Rules: There is a Seller (Agent) who interacts with a Buyer (environment) trying to sell a car.
The car has intrinsic value to the seller which is modelled with a variable value. Any price higher than value will constitute a profit for the Seller.
The Buyer has some max price they are willing to pay for the car. This value is generated from a uniform random distribution in the range of (range_min, range_max).
This range is known to the Seller. After the Buyer max price is set it will not change over the course of the interaction.
The Seller makes repeated offers to the Buyer and the Buyer will either accept the offer (if it is below maximum price) or reject it.
Every time the Seller makes an offer Buyer get more impatient and they are more likely to walk away from the deal entrirely.
This is modelled by the Buyer having some probability to walk (impatience) evaluated at the beginning of each round.
Impatience is incremented each round by a constant amount known to the Seller. The initial impatience is always 0.

The goal of the Seller agent is to maximize profit over many interactions with Buyers.
QUESTION: shouldn't the number of cars be constant? aka there is incentive to keep the car if the car supply is limited

Variables:
value - constant across interactions, known to the Agent
range_min, range_max - constant across interactions, known to the Agent
Buyer's max_price - generated uniformly random from range_min, range_max at the beginning of each interaction
imp_incr - constant across interactions, known to the Agent
imp_init - assumed to always be 0

Future complications to the model are immediately obvious from writing out the variables.
1. buyer max_price can be generated using a different known distribution
2. buyer max_price can be generated using an unknown distribution
3. imp_incr can be be non-constant, or hidden from the Seller
4. imp_init can differ to account for Buyer variability
We will discover which of those complications are informative in the process of experimentation.

After we experiment with those the next step would be to allow Buyer to make bids, limit the supply of vehicles, introduce competition with another Seller,
and make Buyer behaviour incrasingly more complex (bluffs, urgency, etc)
"""

"\nThe Simple Simulation\n\nRules: There is a Seller (Agent) who interacts with a Buyer (environment) trying to sell a car.\nThe car has intrinsic value to the seller which is modelled with a variable value. Any price higher than value will constitute a profit for the Seller.\nThe Buyer has some max price they are willing to pay for the car. This value is generated from a uniform random distribution in the range of (range_min, range_max). \nThis range is known to the Seller. After the Buyer max price is set it will not change over the course of the interaction.\nThe Seller makes repeated offers to the Buyer and the Buyer will either accept the offer (if it is below maximum price) or reject it.\nEvery time the Seller makes an offer Buyer get more impatient and they are more likely to walk away from the deal entrirely. \nThis is modelled by the Buyer having some probability to walk (impatience) evaluated at the beginning of each round.\nImpatience is incremented each round by a constant 

In [None]:
"""
code is structured in the following way. First, we initialize a Simulation with a certain value, range_min, range_max, imp_incr and imp_init.
after that, we initialize an Interaction which constitutes to a single negotiation. That will include a Buyer max_price which will be generated according to simulaiton rules.

In [124]:
import random
class Buyer(object):
    def __init__(self, maxprice, imp, imp_incr):
        self.maxprice = maxprice
        self.imp = imp
    def check_offer(self, offer):
      random_number = random.uniform(0, 1)
      if random_number <= buyer.imp:
        return "walk away"
      elif offer <= maxprice:
        return "accept offer"
      else:
        self.imp += imp_incr
        return "reject but continue"

In [130]:
class Seller(object):
  def __init__(self, value):
    self.value = value
    self.history = []
  def make_offer(self):
    raise NotImplementedError("Subclasses should implement this method.")
  def save_round(self, offer, decision):
    self.history.append({"offer":offer, "decision":decision})

class DescentByOneSeller(Seller):
  def __init__(self, value, init_offer):
    super().__init__(value)  # Call to parent's __init__ method
    self.init_offer = init_offer
  def make_offer(self):
    if len(self.history)==0:
      offer = self.init_offer
    else:
      offer = self.history[-1]["offer"]-1
    return offer

class UserInputSeller(Seller):
 def make_offer(self):
  offer = int(input("You: "))
  return offer

In [131]:
class Interaction(object):
  def __init__(self, buyer, seller, verbose=True):
    self.buyer = buyer
    self.seller = seller
    self.verbose = verbose # whether to print info

  def negotiation_round(self):
    if self.verbose:
      print(f"The current buyer impatience is {self.buyer.imp}.")
    offer = self.seller.make_offer()
    decision = self.buyer.check_offer(offer)
    self.seller.save_round(offer, decision)
    if self.verbose:
      print(f"Seller made offer of {offer}. The buyer decided to {decision}")
    return decision

  def run_interaction(self):
    decision = ""
    while decision != "walk away" and decision != "accept offer":
      decision = self.negotiation_round()
    if decision == "accept offer":
      final_offer = self.seller.history[-1]["offer"]
      profit = final_offer - self.seller.value
      if self.verbose:
        print(f"Deal made at {final_offer}")
        print(f"Buyer's max_price was {self.buyer.maxprice}")
    else:
      if self.verbose:
        print(f"No deal made - no profit.")
      profit = 0
    return profit

In [443]:
# user interaction
value = 10
range_min = 11
range_max = 15
imp_incr = 0.2
imp_init = 0

rounds = 4
i = 0
total_profit = 0
while i < rounds:
  maxprice = int(random.uniform(range_min, range_max+1)) #+1 to include the upper bound
  buyer = Buyer(maxprice, imp_init, imp_incr)
  seller = UserInputSeller(value)

  interaction = Interaction(buyer, seller, verbose=True)
  profit = interaction.run_interaction()
  total_profit += profit
  i+= 1
print(f"Total_profit is {total_profit} over {rounds} rounds for an average return of {round(total_profit/rounds,2)}")

The current buyer impatience is 0.


KeyboardInterrupt: ignored

In [438]:
# simulation
value = 10
range_min = 11
range_max = 15
imp_incr = 0.1
imp_init = 0

rounds = 1000000
i = 0
total_profit = 0
while i < rounds:
  maxprice = int(random.uniform(range_min, range_max+1)) #+1 to include the upper bound
  buyer = Buyer(maxprice, imp_init, imp_incr)
  seller = DescentByOneSeller(value, range_max)

  interaction = Interaction(buyer, seller, verbose=False)
  profit = interaction.run_interaction()
  total_profit += profit
  i+= 1
print(total_profit)

2411349


In [2]:
# to get optimal strategy
import math
from itertools import combinations

def all_possible_strategies(range_min, range_max, imp_incr, imp_init):
  strategies = []
  #get max number of moves possible
  max_moves = math.ceil((1-imp_init)/imp_incr + 1)

  # Reverse the input range
  numbers = list(range(range_min, range_max + 1))[::-1]

  # Generate all combinations
  all_combinations = [list(comb) for i in range(1, max_moves + 1) for comb in combinations(numbers, i)]

  # Filter out combinations that don't end with range_min or don't have k numbers
  filtered_combinations = [comb for comb in all_combinations if (comb[-1] == range_min) or len(comb) == max_moves]

  return filtered_combinations

In [3]:
all_strategies = all_possible_strategies(range_min, range_max, imp_incr, imp_init)
all_strategies

[[11],
 [15, 11],
 [14, 11],
 [13, 11],
 [12, 11],
 [15, 14, 11],
 [15, 13, 11],
 [15, 12, 11],
 [14, 13, 11],
 [14, 12, 11],
 [13, 12, 11],
 [15, 14, 13, 11],
 [15, 14, 12, 11],
 [15, 13, 12, 11],
 [14, 13, 12, 11],
 [15, 14, 13, 12, 11]]

In [None]:
def calculate_payoff_realprice(strategy, real_price, imp_incr, imp_init, value):
  payoff = 0
  last_imp = 0
  curr_imp = imp_init
  for i in range(len(strategy)):
    bid = strategy[i]
    if bid <= real_price:
      discount_factor = 1
      for j in range(i):
        discount_factor = discount_factor*(1-j*imp_incr)
      payoff += discount_factor*(bid-value)
      break
    if bid > real_price:
      last_imp = curr_imp
      curr_imp = last_imp+imp_incr
  return payoff

In [None]:
def calculate_expected_payoff(strategy, range_min, range_max, imp_incr, imp_init, value):
  expected_payoffs = []
  for real_price in range(range_min, range_max+1):
    expected_payoffs.append(calculate_payoff_realprice(strategy, real_price, imp_incr, imp_init, value))
  expected_payoffs.append(sum(expected_payoffs)/(range_max+1-range_min))
  return expected_payoffs

In [None]:
#calculate_expected_payoff(all_strategies, range_min, range_max, imp_incr, imp_init, value)

TypeError: ignored

In [None]:
import pandas as pd

header = ["R"]
data = [list(range(range_min, range_max+1)) + ["total"]]
for strategy in all_strategies:
  header.append(str(strategy))
  data.append(calculate_expected_payoff(strategy, range_min, range_max, imp_incr, imp_init, value))

# Transpose the data
data = list(map(list, zip(*data)))
# Convert the data to a pandas DataFrame
df = pd.DataFrame(data, columns=header)

In [None]:
df.iloc[len(df)-1][1:].sort_values()[:15]

[11]                  1.0
[15, 11]              1.8
[12, 11]              1.8
[14, 11]              2.2
[13, 11]              2.2
[15, 14, 11]         2.34
[15, 12, 11]         2.38
[13, 12, 11]         2.38
[15, 13, 11]         2.56
[14, 13, 11]         2.56
[14, 12, 11]         2.58
[15, 14, 13, 11]    2.628
[15, 14, 12, 11]    2.664
[15, 13, 12, 11]    2.704
[14, 13, 12, 11]    2.704
Name: 5, dtype: object