# Part 2: Environment

*(If something doesn't work - be sure to run Part 5, it has updated versions of each class, Parts 2 through 4 :)*

In this section we will create an environment to train our model to trade currencies at the *FOREX* market.  
But let's lay some ground rules for our environment first.  
  
1. The environment must behave just like real *FOREX* market would. This means that we must consider trading costs, time lag, commisions and changes in positions' prices over time;  
  
  
2. It would be nice if our `Environment` class would work in the same way as `gym` environment by **OpenAI** team does, as it's pretty easy to work with and  it also provides some reference;  
  
  
3. `Environment` class must work as fast as we can make it to, because model training will take enough time as it is.    
   
  
While we're on the topic of our future DQNN model let's define some parts of it as well. The model will be trained using  historical information of market price changes, as well as current information about agent's state in the environment. For that reason we'll need to implement some method to return historical data and agent's state at each timestep as current `observation` of the environment. We will define a *single timestep* length equal to *one minute* of real time.
  
That said, we can now define our `Environment` class more thoroughly. Let's break down some of the basic methods our class will have.   
  
1. **`step(action)`**  
Method takes an `action` as an argument and executes one of several actions: `Buy` definite amount of trading currency; `Sell` definite amount of trading currency; `Close` open positions; `Wait`, do nothing and wait; 
  

2. **`reset()`**   
Method resets all the local variables and returns an `observation`;  
  

3. **`_update_state()`**   
Method iterates through our data and returns new `observation`, `reward`, `info` and `done` flag to our agent.  
  
  


*Let's get to coding!*  

But first, as always, we need to load some necessary libraries

## Set up

In [208]:
# Basic libraries 
import numpy as np
import pandas as pd
import datetime as dt
import random
from utils import *
seed = 17
np.random.seed(seed)

## Environment

Let's define our `Environment` class and set some of the important properties and methods.

In [3]:
class Environment:
    
    def __init__(self, paths, timesteps, starting_funds=100, max_length=14, start_year=2017, end_year=2020,\
                 from_start=False, end_coef=0):
        """
        In this method we will add some properties to our class as well as set some parameters.
        
        :param starting_funds: amount of starting funds 
        :param max_length: maximum length of a training episode 
        :param paths: a tuple of paths to historical data datasets
        :param timesteps: a tuple of timesteps to return at each step
        """
        self._starting_funds = starting_funds
        self._max_length = max_length 
        self._from_start = from_start
        self._end_coef = end_coef
        path_to_1h, path_to_15min, path_to_5min, path_to_1min = paths 
        self._h1_timesteps, self._m15_timesteps, self._m5_timesteps, self._m1_timesteps = timesteps 

        self.h1_data = self._get_dataset(path_to_1h)
        self.m15_data = self._get_dataset(path_to_15min)
        self.m5_data = self._get_dataset(path_to_5min)
        self.m1_data = self._get_dataset(path_to_1min)
        
        self.m15_data = self.m15_data.loc[self.m15_data.Date.dt.date < self.h1_data.Date.iloc[-1]]
        self.m5_data = self.m5_data.loc[self.m5_data.Date.dt.date < self.h1_data.Date.iloc[-1]]
        self.m1_data = self.m1_data.loc[self.m1_data.Date.dt.date < self.h1_data.Date.iloc[-1]]
        
        self._max_positions = 1  # number of positions that can be opened at the same time         
        self._leverage = 500  # broker's leverage, a tool to provide small time traders more resources to trade with
        self._commision = 45/(10**6)  # broker's commission per trade
        self._amount_space = np.arange(1, 201)*1000  # range of tradeable currency amounts 
        
        # This is our action space. First part indicates general action, second, if present, direction 
        # of trade and third is trading amount in percents
        self.action_space = ["open.1.2", "open.1.5", "open.1.10", "open.0.2", "open.0.5", "open.0.10", "close", "hold"] 
        
        self._dict_keys = {"H":"Hour_LSTM", "M15":"M15_LSTM", "M5":"M5_LSTM", "M1":"M1_LSTM", "SI":"State_input"}
        self._start_time = pd.to_datetime("06:00", format='%H:%M').time() # Start time of each episode
        self._end_time = pd.to_datetime("20:00", format='%H:%M').time() # End time of each episode
        
        # Create a list of starting indexes in selected time period
        self._monday_indexes = self.m1_data.loc[(self.m1_data.Date.dt.time == self._start_time) &\
                                               (self.m1_data.Date.dt.weekday == 0) &\
                                              (self.m1_data.Date.dt.year.between(start_year,end_year)\
                                              )].index[int(self._h1_timesteps/24):]
        
        self._start_indexes = self.m1_data.loc[(self.m1_data.Date.dt.time == self._start_time) &\
                                               (self.m1_data.Date.dt.weekday.between(0,5)) &\
                                              (self.m1_data.Date.dt.year.between(start_year,end_year)\
                                              )].index[int(self._h1_timesteps/24):]
        

    
    def _get_dataset(self, path):
        """This method loads dataset from provided path"""
        
        data = pd.read_csv(path).reset_index(drop=True)
        data.Date = pd.to_datetime(data.Date, format='%Y-%m-%d %H:%M:%S')      
        return data
    
    
    def step(self, action):
        pass
    
    
    def _update_state(self):
        pass
    
    
    def reset(self):
        pass

Next we'll define **`reset()`** method.

In [4]:
def reset(self):
    """In this method we will perform actions to reset environment state"""
    # First we'll get a new starting day in our dataset
    if self._from_start:
        self._first_day_index = np.where(self._start_indexes == self._monday_indexes[0])[0][0]
    else:
        
        self._first_day_index = np.where(self._start_indexes ==\
                                         random.choice(self._monday_indexes[1:-2]))[0][0]
        
    self._day_index = 0
    self._initial_index = self._start_indexes[self._first_day_index]
    self._current_index = self._initial_index
    self._current_state = self.m1_data.iloc[self._current_index]
    
    # Next we'll reset funds, balance and open_positions variables
    self._funds = self._starting_funds
    self._balance = self._funds 
    self._open_positions = {}
    
    # Here we'll initialize time queue with historical market data
    self._init_time_deque()

    # Finally we'll reset variables which we send to the agent and return first state
    self._done = False
    self._reward = 0
    self._last_observation = self._get_observation()
    self._info = {"Balance": None, "Funds": None}

    return self._last_observation

Environment.reset = reset

We need to write two supporting methods in order for **`reset()`** to work: **`_init_time_deque()`** and **`_get_observation()`**.  
  
First one will initiate historical market data queue.
Basic logic is this: earlier in **`__init__()`** method we got information about `current_state`, so let's just take last `n_timesteps` records previous to current state date and time from each dataset.  

We must be careful though, because each timestep in the dataset contains information about Closing, Highest and Lowest prices during the period and we can't have our model looking in the future!  

For that reason we'll have to take last `n_timesteps` beginning from the second last record where `date_time < current_state.date_time`.  

In [5]:
from collections import deque

def _init_time_deque(self):
    """
    In this method we'll initialize time queue which we'll use to save time when updating environment state
    """
    h1 = self.h1_data.loc[self.h1_data.Date < self._current_state.Date][-self._h1_timesteps-1:-1] # get last values
    self._h1_queue = deque(h1.iloc[:, 1:].to_numpy(), maxlen=self._h1_timesteps) # initiate queue and repeat for every dataset

    m15 = self.m15_data.loc[self.m15_data.Date < self._current_state.Date][-self._m15_timesteps-1:-1]
    self._m15_queue = deque(m15.iloc[:, 1:].to_numpy(), maxlen=self._m15_timesteps)

    m5 = self.m5_data.loc[self.m5_data.Date < self._current_state.Date][-self._m5_timesteps-1:-1]
    self._m5_queue = deque(m5.iloc[:, 1:].to_numpy(), maxlen=self._m5_timesteps)

    m1 = self.m1_data.loc[self.m1_data.Date < self._current_state.Date][-self._m1_timesteps-1:-1]
    self._m1_queue = deque(m1.iloc[:, 1:].to_numpy(), maxlen=self._m1_timesteps) 
    
    
Environment._init_time_deque = _init_time_deque

Second method will update queues and return observation based on current state of the environment. We'll leave **`_get_current_agent_state()`** as blank for now.

In [6]:
def _get_observation(self):
    
    date_column_index = 0
    
    if self._current_state.Date.minute == 0:
        row = self.h1_data.iloc[self.h1_data.loc[self.h1_data.Date == self._current_state.Date].index - 1].to_numpy()[0][1:] 
        self._h1_queue.append(row)

    if self._current_state.Date.minute % 15 == 0:
        row = self.m15_data.iloc[self.m15_data.loc[self.m15_data.Date == self._current_state.Date].index - 1].to_numpy()[0][1:] 
        self._m15_queue.append(row)

    if self._current_state.Date.minute % 5 == 0:
        row = self.m5_data.iloc[self.m5_data.loc[self.m5_data.Date == self._current_state.Date].index - 1].to_numpy()[0][1:] 
        self._m5_queue.append(row)

    row = self.m1_data.iloc[self.m1_data.loc[self.m1_data.Date == self._current_state.Date].index - 1].to_numpy()[0][1:]   
    self._m1_queue.append(row)

    state = self._get_current_agent_state()

    return {self._dict_keys["H"]: np.expand_dims(np.array(self._h1_queue), axis=0).astype(np.float32),\
            self._dict_keys["M15"]: np.expand_dims(np.array(self._m15_queue), axis=0).astype(np.float32),\
            self._dict_keys["M5"]: np.expand_dims(np.array(self._m5_queue), axis=0).astype(np.float32),\
            self._dict_keys["M1"]: np.expand_dims(np.array(self._m1_queue), axis=0).astype(np.float32),\
            self._dict_keys["SI"]: np.expand_dims(np.array(state), axis=0).astype(np.float32)}

def _get_current_agent_state(self):
    pass

Environment._get_observation = _get_observation
Environment._get_current_agent_state = _get_current_agent_state

And now we're ready to define **`step()`** method for our environment. But before we do, let's define it's logic as well as some transaction formulas.  
  
When an `agent` tells `environment` to `Open` a position one of two things may happen. Either a new position will be opened or nothing will happen since `agent` already has maximum amount of open positions.  
  
In case new position can be opened following steps will take place:  
1. First we define `direction` of a newly opened position and choose appropriate `exchange_rate` (ask or bid) based on `current state`;  
2. Next we select the closest amount of currency there is to operate to the selected percentage of current balance;  
3. We calculate cost of leveraged trade `usd_cost`, amount of leveraged base currency `usd_amount`, amount of leveraged target currency `eur_amount` and `transaction_cost`;  
4. Finally we can update `funds` property of the environment and add new `Position` object to `open_positions` dictionary.  
  
When an `agent` tells `environment` to `Close` open positions the environment will execute the following steps:  
1. Calculate current amount of base currency that the position is worth and subtract from it the amount of base currency position was opened for, getting overall profit of the trade;
2. Add this profit to the `funds` field of the environment, minus the transaction cost;
3. Finally set the `open` flag of the position to `False`.  

When an `agent` tells `environment` to `Hold` nothing really happens, it's just a way for the agent to do nothing and take it's time.  
  
As I said before we need to create a new class `Position` which will hold all of the information about a trade in it's properties.  
Let's start with that.

In [57]:
class Position:
    
    def __init__(self, state, direction, exchange_rate, transaction_cost, cost_usd, usd_amount, eur_amount, leverage):
        self.state = state  # State at which the position was created
        self.direction = direction  # Direction, Ask(1) or Bid(0)
        
        self.open_date = state.Date
        self.open_exchange_rate = exchange_rate
        self.open = True
        
        self.close_date = None
        self.close_exchange_rate = 0
        
        self.transaction_cost = transaction_cost
        
        self.cost_usd = cost_usd  # Cost of the trade
        self.usd_amount = usd_amount  # Leveraged base currency amount
        self.eur_amount = eur_amount  # Leveraged target currency amount
        self._leverage = leverage
        self.current_profit = 0
        
        
    def _update_position(self, current_state):
        exchange_rate = current_state.Bid if self.direction==1 else current_state.Ask
        cur_usd_amount = self.eur_amount*exchange_rate
        self.current_profit = (cur_usd_amount - self.usd_amount) if self.direction==1 else (self.usd_amount - cur_usd_amount)
    
    
    def get_value(self):
        """
        Returns current value of an open trade and calculates profit
        """
        return round(self.cost_usd + self.current_profit - self.transaction_cost, 2)
    
    def get_profit(self):
        """
        Returns current profit of an open trade
        """
        return round(self.current_profit - self.transaction_cost*2, 2)
    
    def close(self, current_state):
        """
        Closes an open position
        """
        self.close_exchange_rate = current_state.Bid if self.direction==1 else current_state.Ask
        self.close_date = current_state.Date
        self.open = False
          
            
    def get_info(self):
        return {"Date": self.open_date, "Type": "Ask" if self.direction==1 else "Bid",\
                "At": self.open_exchange_rate, "Open": self.open, "C_Date": self.close_date,\
                "C_At": self.close_exchange_rate, \
                "Profit": self.current_profit-self.transaction_cost*2}

And now we can finally define **`step()`** method.

In [161]:
def step(self, action):
    
        action = self.action_space[action]
        action = action.split(".")
        closing_profit = None
        
        # Taking action
        if action[0]=="open":
            if sum(1 for pos in self._open_positions.values() if pos.open) != self._max_positions:
                direction = int(action[1])
                usd_cost = self._balance*(0.01*int(action[2]))
                exchange_rate = self._current_state.Ask if direction==1 else self._current_state.Bid
                cost_space = self._amount_space*exchange_rate/self._leverage
                usd_cost = cost_space[(np.abs(cost_space - usd_cost)).argmin()]

                usd_amount = round(usd_cost*self._leverage, 2)
                eur_amount = self._amount_space[np.where(cost_space == usd_cost)][0]
                transaction_cost = round(self._commision*usd_amount, 2)
                self._funds = round(self._funds - usd_cost - transaction_cost, 2)
                self._open_positions[len(self._open_positions)] = Position(self._current_state, direction, exchange_rate,\
                                                                           transaction_cost, usd_cost, usd_amount, \
                                                                           eur_amount, self._leverage)
        
        if action[0]=="close":
            self._close_positions()
        
        if action[0]=="hold":
            pass
        
        self._update_state()
        
        return self._last_observation, self._reward, self._done, self._info
    
def _close_positions(self):
    for key, position in self._open_positions.items():
        if position.open:
            self._funds = round(self._funds + position.get_value() - position.transaction_cost, 2)
            position.close(self._current_state)
                
def _update_state(self):
    pass

Environment._update_state = _update_state
Environment.step = step
Environment._close_positions = _close_positions

And now we are finally ready to construct the most important method - **`_update_state()`**. This method is the 'engine' of this class. It'll make everything move and rattle. Let's define it.  
In this method we will update current state of the environment as well as check it's `done` conditions. The episode is `done` if one of the following conditions is satisfied:  
1. The environment has reached it's maximum duration;
2. The `balance` or `funds` fields became less or equal to zero.  
  
Also if current day is over we should close every open position and transition to the next day, saving remaining `funds` and `balance`.  
  
In this method we will also calculate reward. For that we'll construct a method that finds total value of all open positions.  
  
And let's write a method to fetch current agent's state in the environment back to agent.

In [188]:
def _update_state(self): 
    """
    Updates the state of the environment
    """
    if self._current_state.Date.time() >= self._end_time:  
        self._close_positions()
        self._balance = self._funds
        
        self._day_index +=1
        self._current_index = self._start_indexes[self._first_day_index + self._day_index]
        self._current_state = self.m1_data.iloc[self._current_index]
        self._init_time_deque()

    else:
        self._current_index += 1
        self._current_state = self.m1_data.iloc[self._current_index]
    
    self._update_open_positions()
    
    self._balance = round(self._funds + self._get_open_positions_value(), 2)
    
    if self._balance <= self._starting_funds*self._end_coef or self._funds <= self._starting_funds*self._end_coef:
        self._done = True
    if self._day_index >= self._max_length:
        self._done = True
    
    
    self._reward = self._balance 
    self._last_observation = self._get_observation()
    self._info["Balance"]  = round(self._balance, 2)
    self._info["Funds"] =  round(self._funds, 2)
    self._info["Open_positions"] = sum(1 for pos in self._open_positions.values() if pos.open)


def _update_open_positions(self):
    for position in self._open_positions.values():
        if position.open:
            position._update_position(self._current_state)
    
def _get_open_positions_value(self):
    """
    Calculates value of currently open positions
    """
    value = 0
    for position in self._open_positions.values():
        if position.open:
            value += position.get_value()   
    return value

def _get_open_positions_profit(self):
    """
    Calculates value of currently open positions
    """
    profit = 0
    for position in self._open_positions.values():
        if position.open:
            profit += position.get_profit()   
            
    return profit

def _get_current_agent_state(self):
    """
    Returns information about agent's state in the environment
    """
    num_open = sum(1 for pos in self._open_positions.values() if pos.open) # Number of open positions
    total_profit = sum(pos.current_profit for pos in self._open_positions.values() if pos.open) # Profit of open positions
    
    return (num_open, total_profit/self._balance, self._funds/self._balance,\
           self._current_state.Ask, self._current_state.Bid, self._current_state.Spread)


Environment._update_state = _update_state
Environment._update_open_positions = _update_open_positions
Environment._get_open_positions_profit = _get_open_positions_profit
Environment._get_open_positions_value = _get_open_positions_value
Environment._get_current_agent_state = _get_current_agent_state

All done! Now let's run some tests in order to see if everything works.

In [189]:
path_to_1h = data_path + "exp-EURUSD-bars-1h-2016Jan-2020Jan.csv"
path_to_15min = data_path + "exp-EURUSD-bars-m15-2016Jan-2020Jan.csv"
path_to_5min = data_path + "exp-EURUSD-bars-m5-2016Jan-2020Jan.csv"
path_to_1min = data_path + "exp-EURUSD-bars-1m-2016Jan-2020Jan.csv"
paths = (path_to_1h, path_to_15min, path_to_5min, path_to_1min)
timesteps = (48, 64, 64, 128)

In [190]:
env = Environment(paths, timesteps, 100, 21, 2016, 2019, True) 

In [191]:
obs = env.reset()

In [192]:
env.action_space

['open.1.2',
 'open.1.5',
 'open.1.10',
 'open.0.2',
 'open.0.5',
 'open.0.10',
 'close',
 'hold']

## Testing..

Holding.

In [193]:
_, reward, done, info = env.step(-1)

In [194]:
print("Reward: %.6f" % reward)
print("Done: {}".format(done))
print("Information: %s" % info)

Reward: 100.000000
Done: False
Information: {'Balance': 100, 'Funds': 100, 'Open_positions': 0}


Buying and holding

In [195]:
_, reward, done, info = env.step(1)
for _ in range(100):
    _, reward, done, info = env.step(-1)

In [196]:
print("Reward: %.6f" % reward)
print("Done: {}".format(done))
print("Information: %s" % info)

Reward: 99.000000
Done: False
Information: {'Balance': 99.0, 'Funds': 95.54, 'Open_positions': 1}


Closing.

In [197]:
_, reward, done, info = env.step(-2)

In [198]:
print("Reward: %.6f" % reward)
print("Done: {}".format(done))
print("Information: %s" % info)

Reward: 98.900000
Done: False
Information: {'Balance': 98.9, 'Funds': 98.9, 'Open_positions': 0}


In [199]:
for _ in range(100):
    _, reward, done, info = env.step(-1)

In [200]:
print("Reward: %.6f" % reward)
print("Done: {}".format(done))
print("Information: %s" % info)

Reward: 98.900000
Done: False
Information: {'Balance': 98.9, 'Funds': 98.9, 'Open_positions': 0}


Other way around now.

In [201]:
obs = env.reset()

In [202]:
_, reward, done, info = env.step(3)
for _ in range(100):
    _, reward, done, info = env.step(-1)

In [203]:
print("Reward: %.6f" % reward)
print("Done: {}".format(done))
print("Information: %s" % info)

Reward: 100.350000
Done: False
Information: {'Balance': 100.35, 'Funds': 97.77, 'Open_positions': 1}


Closing.

In [204]:
_, reward, done, info = env.step(-2)

In [205]:
print("Reward: %.6f" % reward)
print("Done: {}".format(done))
print("Information: %s" % info)

Reward: 100.300000
Done: False
Information: {'Balance': 100.3, 'Funds': 100.3, 'Open_positions': 0}


In [206]:
for _ in range(100):
    _, reward, done, info = env.step(-1)

In [207]:
print("Reward: %.6f" % reward)
print("Done: {}".format(done))
print("Information: %s" % info)

Reward: 100.300000
Done: False
Information: {'Balance': 100.3, 'Funds': 100.3, 'Open_positions': 0}


Splendid. I'll save this class to a file and we can move onto the next step - creating an agent.