Make our custom modules available to numpy


In [1]:
import sys
sys.path.append('.')

Load some generic libraries


In [2]:
import pandas as pd
import numpy as np
import warnings
# pandas setting warnings can be ignored, as it is intendend often
warnings.simplefilter("ignore")

Load the custom modules


In [3]:
from agents import Buyer, Seller
from environments import MarketEnvironment
from my_environment import MyMarketEnvironment

Let's meet our agents
The cost or budget for sellers and buyers, is also referred to as reservation price in general.

Sellers first:

In [4]:
john = Seller('Seller John', 100)
nick = Seller('Seller Nick', 90)

sellers = [john, nick]

Then buyers:


In [5]:
alex = Buyer('Buyer Alex', 130)
kevin = Buyer('Buyer Kevin', 110)

buyers = [alex, kevin]

Now lets prepare our environment
First let's load an information setting
then let's load a matcher


In [6]:
from info_settings import BlackBoxSetting
from matchers import RandomMatcher

Now let's create the environment, and be careful because now I am using a reward scheme based on reservation price. Is this scheme better for my research goal? If yes why?

In [7]:
market_env = MyMarketEnvironment(sellers=sellers, buyers=buyers, max_steps=30,  
                                 matcher=RandomMatcher(reward_on_reference=True), setting=BlackBoxSetting)

[100.  90.   0.   0.]
[ inf  inf 130. 110.]


In [32]:
market_env.action_space.contains([200,300,1,2])

True

In [8]:
market_env.offers

Unnamed: 0,id,res_price,role,offer,time
0,Seller John,100,Seller,0,0
1,Seller Nick,90,Seller,0,0
2,Buyer Alex,130,Buyer,0,0
3,Buyer Kevin,110,Buyer,0,0


Now let's run a single step, deciding offers for all agents:
first we clean the environment, just in case
everything should be zeroes.

In [9]:
init_observation = market_env.reset()

Now for each agent we decide a price a bit above or lower from their cost or budget respectively for 2 steps.

For step 1:

In [10]:
step1_offers = {
    'Buyer Alex': alex.reservation_price - 10.0, 
    'Buyer Kevin': kevin.reservation_price - 5.0, 
    'Seller John' : john.reservation_price + 10.0, 
    'Seller Nick': nick.reservation_price +15.0
}
display(step1_offers)
observations, rewards, done, _ = market_env.step(step1_offers)
print('observations:')
display(observations)

{'Buyer Alex': 120.0,
 'Buyer Kevin': 105.0,
 'Seller John': 110.0,
 'Seller Nick': 105.0}

observations:


{'Seller John': array([110.]),
 'Seller Nick': array([105.]),
 'Buyer Alex': array([120.]),
 'Buyer Kevin': array([105.])}

For step 2:

In [11]:
step2_offers = {
    'Buyer Kevin': kevin.reservation_price - 15.0, 
    'Seller John' : john.reservation_price + 15.0, 
}
display(step2_offers)
print('observations:')
observations, rewards, done, _ = market_env.step(step2_offers)
display(observations)

{'Buyer Kevin': 95.0, 'Seller John': 115.0}

observations:


{'Seller John': array([115.]),
 'Seller Nick': array([105.]),
 'Buyer Alex': array([120.]),
 'Buyer Kevin': array([95.])}

now let's check when and if deals happened

In [12]:
pd.DataFrame(market_env.deal_history)

Unnamed: 0,Seller,Buyer,time,deal_price
0,Seller Nick,Buyer Alex,0,106.241697


and the history of offers

In [13]:
market_env.offers

Unnamed: 0,id,res_price,role,offer,time
0,Seller John,100,Seller,115.0,1
1,Seller Nick,90,Seller,105.0,0
2,Buyer Alex,130,Buyer,120.0,0
3,Buyer Kevin,110,Buyer,95.0,1


In [14]:
market_env.action_space

Box(4,)

The above showcase a simple run of how two steps are done. 
After a whole round is finished, we call reset.
In the lectures on 14.10.2019 and 21.10.2019, it will be shown how the MarketEnvironment is convertedto a gym environment. A single agent will be trained with reinforcement learning.
Still, you are encouraged to expand the example yourselves and proceed with implementing the environment!

In [15]:
import gym


In [16]:
gym.spaces.Box(0.0,-1e-10,shape=(9,)).sample()

array([-6.3556584e-11, -6.5860824e-11, -1.3696158e-11, -1.7152771e-11,
       -5.9497567e-11, -8.0157707e-11, -5.5657094e-11, -7.1894053e-11,
       -9.9288763e-12], dtype=float32)

In [29]:
market_env.action_space.sample()

array([101.44665 ,  -1.      ,  -1.      ,   4.515912], dtype=float32)