# Looking at other options for the reward functions

It seems that while the current pressure based reward seems to be working fine, maybe, doing some other things might make it perform even better. But changing reward function definition would also help, and most probably is the biggest factor for model performance and behavior. In this notebook, I will try to explore different reward function definitions, and also see how can they be calculated using TraCI for a specific step. That being said, the actual training for them will be done in separate colab ipynb notebooks.
___

## 1. Using wait time as a reward function:
- Waiting time can be a good candidate for reward function as well.
- It will be calculated as: $\sum_{lane}- 1*WaitTime_l$.
- However, I will need to see the behavior of it across various stages during the simulation, that is, I know that it will increase as rush hour comes, but it should also decrease when rush hour has ended. Otherwise, this will not be a good candidate for a reward function.

In [1]:
# importing libraries
import pandas as pd
import numpy as np
import os
import traci

In [3]:
# connecting with the simulation
traci.start(["sumo", "-c", "data/SingleIntersection.sumocfg", "--log", "logs/exptLog.yaml", "--duration-log.statistics", "true"])

(22, 'SUMO 1.23.1')

In [4]:
# simulating 10 steps and checking the waiting time for the incoming lanes
incomingLanes = ['E1_0', 'E1_1', 'E1_2', 'E3_0', 'E3_1', 'E3_2', '-E2_0', '-E2_1', '-E2_2', '-E4_0', '-E4_1', '-E4_2']

In [5]:
for i in range(10):
    # perform a simulation step, and then print the waiting time for each lane
    traci.simulationStep()
    waitTimes = {l:traci.lane.getWaitingTime(l) for l in incomingLanes}
    print("Wait time:", waitTimes)
    print('------------------------------------------------------')

Wait time: {'E1_0': 0.0, 'E1_1': 0.0, 'E1_2': 0.0, 'E3_0': 0.0, 'E3_1': 0.0, 'E3_2': 0.0, '-E2_0': 0.0, '-E2_1': 0.0, '-E2_2': 0.0, '-E4_0': 0.0, '-E4_1': 0.0, '-E4_2': 0.0}
------------------------------------------------------
Wait time: {'E1_0': 0.0, 'E1_1': 0.0, 'E1_2': 0.0, 'E3_0': 0.0, 'E3_1': 0.0, 'E3_2': 0.0, '-E2_0': 0.0, '-E2_1': 0.0, '-E2_2': 0.0, '-E4_0': 0.0, '-E4_1': 0.0, '-E4_2': 0.0}
------------------------------------------------------
Wait time: {'E1_0': 0.0, 'E1_1': 0.0, 'E1_2': 0.0, 'E3_0': 0.0, 'E3_1': 0.0, 'E3_2': 0.0, '-E2_0': 0.0, '-E2_1': 0.0, '-E2_2': 0.0, '-E4_0': 0.0, '-E4_1': 0.0, '-E4_2': 0.0}
------------------------------------------------------
Wait time: {'E1_0': 0.0, 'E1_1': 0.0, 'E1_2': 0.0, 'E3_0': 0.0, 'E3_1': 0.0, 'E3_2': 0.0, '-E2_0': 0.0, '-E2_1': 0.0, '-E2_2': 0.0, '-E4_0': 0.0, '-E4_1': 0.0, '-E4_2': 0.0}
------------------------------------------------------
Wait time: {'E1_0': 0.0, 'E1_1': 0.0, 'E1_2': 0.0, 'E3_0': 0.0, 'E3_1': 0.0, 'E3_2':

In [7]:
# simulating for another 20 seconds
for i in range(20):
    # perform a simulation step, and then print the waiting time for each lane
    traci.simulationStep()
    waitTimes = {l:traci.lane.getWaitingTime(l) for l in incomingLanes}
    print("Wait time:", waitTimes)
    print('------------------------------------------------------')

Wait time: {'E1_0': 0.0, 'E1_1': 0.0, 'E1_2': 0.0, 'E3_0': 0.0, 'E3_1': 0.0, 'E3_2': 0.0, '-E2_0': 0.0, '-E2_1': 0.0, '-E2_2': 0.0, '-E4_0': 0.0, '-E4_1': 0.0, '-E4_2': 0.0}
------------------------------------------------------
Wait time: {'E1_0': 0.0, 'E1_1': 0.0, 'E1_2': 0.0, 'E3_0': 0.0, 'E3_1': 0.0, 'E3_2': 0.0, '-E2_0': 0.0, '-E2_1': 0.0, '-E2_2': 0.0, '-E4_0': 0.0, '-E4_1': 0.0, '-E4_2': 0.0}
------------------------------------------------------
Wait time: {'E1_0': 0.0, 'E1_1': 0.0, 'E1_2': 0.0, 'E3_0': 0.0, 'E3_1': 0.0, 'E3_2': 0.0, '-E2_0': 0.0, '-E2_1': 0.0, '-E2_2': 0.0, '-E4_0': 0.0, '-E4_1': 0.0, '-E4_2': 0.0}
------------------------------------------------------
Wait time: {'E1_0': 0.0, 'E1_1': 0.0, 'E1_2': 0.0, 'E3_0': 0.0, 'E3_1': 0.0, 'E3_2': 0.0, '-E2_0': 0.0, '-E2_1': 0.0, '-E2_2': 0.0, '-E4_0': 0.0, '-E4_1': 0.0, '-E4_2': 0.0}
------------------------------------------------------
Wait time: {'E1_0': 0.0, 'E1_1': 0.0, 'E1_2': 0.0, 'E3_0': 0.0, 'E3_1': 0.0, 'E3_2':

In [8]:
traci.simulation.getTime()

30.0

In [9]:
# simulating for another 20 seconds
for i in range(20):
    # perform a simulation step, and then print the waiting time for each lane
    traci.simulationStep()
    waitTimes = {l:traci.lane.getWaitingTime(l) for l in incomingLanes}
    print("Wait time:", waitTimes)
    print('------------------------------------------------------')

Wait time: {'E1_0': 0.0, 'E1_1': 0.0, 'E1_2': 0.0, 'E3_0': 0.0, 'E3_1': 0.0, 'E3_2': 0.0, '-E2_0': 0.0, '-E2_1': 0.0, '-E2_2': 0.0, '-E4_0': 0.0, '-E4_1': 0.0, '-E4_2': 0.0}
------------------------------------------------------
Wait time: {'E1_0': 0.0, 'E1_1': 0.0, 'E1_2': 0.0, 'E3_0': 0.0, 'E3_1': 0.0, 'E3_2': 0.0, '-E2_0': 0.0, '-E2_1': 0.0, '-E2_2': 0.0, '-E4_0': 0.0, '-E4_1': 0.0, '-E4_2': 0.0}
------------------------------------------------------
Wait time: {'E1_0': 0.0, 'E1_1': 0.0, 'E1_2': 0.0, 'E3_0': 0.0, 'E3_1': 0.0, 'E3_2': 0.0, '-E2_0': 0.0, '-E2_1': 0.0, '-E2_2': 0.0, '-E4_0': 0.0, '-E4_1': 0.0, '-E4_2': 0.0}
------------------------------------------------------
Wait time: {'E1_0': 0.0, 'E1_1': 0.0, 'E1_2': 0.0, 'E3_0': 0.0, 'E3_1': 0.0, 'E3_2': 0.0, '-E2_0': 0.0, '-E2_1': 0.0, '-E2_2': 1.0, '-E4_0': 0.0, '-E4_1': 0.0, '-E4_2': 0.0}
------------------------------------------------------
Wait time: {'E1_0': 0.0, 'E1_1': 0.0, 'E1_2': 0.0, 'E3_0': 0.0, 'E3_1': 0.0, 'E3_2':

In [11]:
traci.simulation.getTime()

50.0

In [12]:
# simulating for another 20 seconds
for i in range(20):
    # perform a simulation step, and then print the waiting time for each lane
    traci.simulationStep()
    waitTimes = {l:traci.lane.getWaitingTime(l) for l in incomingLanes}
    print("Wait time:", waitTimes)
    print('------------------------------------------------------')

Wait time: {'E1_0': 0.0, 'E1_1': 0.0, 'E1_2': 0.0, 'E3_0': 0.0, 'E3_1': 9.0, 'E3_2': 7.0, '-E2_0': 0.0, '-E2_1': 0.0, '-E2_2': 18.0, '-E4_0': 0.0, '-E4_1': 0.0, '-E4_2': 8.0}
------------------------------------------------------
Wait time: {'E1_0': 0.0, 'E1_1': 0.0, 'E1_2': 0.0, 'E3_0': 0.0, 'E3_1': 10.0, 'E3_2': 8.0, '-E2_0': 0.0, '-E2_1': 0.0, '-E2_2': 19.0, '-E4_0': 0.0, '-E4_1': 0.0, '-E4_2': 9.0}
------------------------------------------------------
Wait time: {'E1_0': 0.0, 'E1_1': 0.0, 'E1_2': 0.0, 'E3_0': 0.0, 'E3_1': 11.0, 'E3_2': 9.0, '-E2_0': 0.0, '-E2_1': 0.0, '-E2_2': 20.0, '-E4_0': 0.0, '-E4_1': 0.0, '-E4_2': 10.0}
------------------------------------------------------
Wait time: {'E1_0': 0.0, 'E1_1': 0.0, 'E1_2': 0.0, 'E3_0': 0.0, 'E3_1': 12.0, 'E3_2': 10.0, '-E2_0': 0.0, '-E2_1': 0.0, '-E2_2': 21.0, '-E4_0': 0.0, '-E4_1': 0.0, '-E4_2': 11.0}
------------------------------------------------------
Wait time: {'E1_0': 0.0, 'E1_1': 0.0, 'E1_2': 0.0, 'E3_0': 0.0, 'E3_1': 13

In [13]:
traci.simulation.getTime()

70.0

In [14]:
# simulating for another 20 seconds
for i in range(20):
    # perform a simulation step, and then print the waiting time for each lane
    traci.simulationStep()
    waitTimes = {l:traci.lane.getWaitingTime(l) for l in incomingLanes}
    print("Wait time:", waitTimes)
    print('------------------------------------------------------')

Wait time: {'E1_0': 0.0, 'E1_1': 0.0, 'E1_2': 0.0, 'E3_0': 0.0, 'E3_1': 47.0, 'E3_2': 40.0, '-E2_0': 0.0, '-E2_1': 0.0, '-E2_2': 38.0, '-E4_0': 0.0, '-E4_1': 15.0, '-E4_2': 28.0}
------------------------------------------------------
Wait time: {'E1_0': 0.0, 'E1_1': 0.0, 'E1_2': 0.0, 'E3_0': 0.0, 'E3_1': 50.0, 'E3_2': 42.0, '-E2_0': 0.0, '-E2_1': 0.0, '-E2_2': 39.0, '-E4_0': 0.0, '-E4_1': 16.0, '-E4_2': 29.0}
------------------------------------------------------
Wait time: {'E1_0': 0.0, 'E1_1': 0.0, 'E1_2': 0.0, 'E3_0': 0.0, 'E3_1': 53.0, 'E3_2': 44.0, '-E2_0': 0.0, '-E2_1': 0.0, '-E2_2': 40.0, '-E4_0': 0.0, '-E4_1': 17.0, '-E4_2': 30.0}
------------------------------------------------------
Wait time: {'E1_0': 0.0, 'E1_1': 0.0, 'E1_2': 0.0, 'E3_0': 0.0, 'E3_1': 56.0, 'E3_2': 46.0, '-E2_0': 0.0, '-E2_1': 0.0, '-E2_2': 41.0, '-E4_0': 0.0, '-E4_1': 18.0, '-E4_2': 31.0}
------------------------------------------------------
Wait time: {'E1_0': 0.0, 'E1_1': 0.0, 'E1_2': 0.0, 'E3_0': 0.0, 

In [15]:
traci.simulation.getTime()

90.0

In [16]:
# simulating 5 more times
for i in range(5):
    # perform a simulation step, and then print the waiting time for each lane
    traci.simulationStep()
    waitTimes = {l:traci.lane.getWaitingTime(l) for l in incomingLanes}
    print("Wait time:", waitTimes)
    print('------------------------------------------------------')

Wait time: {'E1_0': 0.0, 'E1_1': 0.0, 'E1_2': 0.0, 'E3_0': 0.0, 'E3_1': 107.0, 'E3_2': 85.0, '-E2_0': 0.0, '-E2_1': 0.0, '-E2_2': 0.0, '-E4_0': 0.0, '-E4_1': 35.0, '-E4_2': 48.0}
------------------------------------------------------
Wait time: {'E1_0': 0.0, 'E1_1': 0.0, 'E1_2': 0.0, 'E3_0': 0.0, 'E3_1': 110.0, 'E3_2': 88.0, '-E2_0': 0.0, '-E2_1': 0.0, '-E2_2': 0.0, '-E4_0': 0.0, '-E4_1': 36.0, '-E4_2': 49.0}
------------------------------------------------------
Wait time: {'E1_0': 0.0, 'E1_1': 0.0, 'E1_2': 0.0, 'E3_0': 0.0, 'E3_1': 113.0, 'E3_2': 91.0, '-E2_0': 0.0, '-E2_1': 0.0, '-E2_2': 0.0, '-E4_0': 0.0, '-E4_1': 37.0, '-E4_2': 50.0}
------------------------------------------------------
Wait time: {'E1_0': 0.0, 'E1_1': 0.0, 'E1_2': 0.0, 'E3_0': 0.0, 'E3_1': 116.0, 'E3_2': 94.0, '-E2_0': 0.0, '-E2_1': 0.0, '-E2_2': 0.0, '-E4_0': 0.0, '-E4_1': 38.0, '-E4_2': 51.0}
------------------------------------------------------
Wait time: {'E1_0': 0.0, 'E1_1': 0.0, 'E1_2': 0.0, 'E3_0': 0.0, 

Okay so a few things to note after looking at the waiting times for different lanes:
1. The waiting time increases by 1 second for every one second that a vehicle is standing in the lane waiting.
2. If there are more than 1 vehicles waiting in a lane, the waiting time for each of them is considered. For example, if there are 3 vehicles that just started waiting in this step, after I step ahead one more time, the waiting time will increase by 3, as it counts 1 second for each car.
3. Once the vehicles start moving, it seems that the waittime gets reset to 0 for them. For example, if there are not many cars that are waiting, and all of them start moving, the waiting time for that lane becomes 0.

As for 3, now I need to check if it will be the case for when not all cars can move as well. That is because, I fear that if it becomes 0, as soon as a few vehicles start moving, that might give the model a wrong idea.

In [17]:
# going a liitle in the rush hour period, and then checking the waiting times
traci.simulation.getTime()

95.0

In [18]:
for i in range(815):
    traci.simulationStep()

In [19]:
traci.simulation.getTime()

910.0

In [25]:
# for i in range(80):
#     traci.simulationStep()
# printing the watitime
print({l:traci.lane.getWaitingTime(l) for l in incomingLanes})

{'E1_0': 0.0, 'E1_1': 865.0, 'E1_2': 528.0, 'E3_0': 0.0, 'E3_1': 3633.0, 'E3_2': 740.0, '-E2_0': 0.0, '-E2_1': 0.0, '-E2_2': 0.0, '-E4_0': 0.0, '-E4_1': 1266.0, '-E4_2': 1653.0}


In [26]:
traci.simulation.getTime()

990.0

In [27]:
# simulating one more step and checking the waittimes
traci.simulationStep()
print({l:traci.lane.getWaitingTime(l) for l in incomingLanes})

{'E1_0': 0.0, 'E1_1': 891.0, 'E1_2': 545.0, 'E3_0': 0.0, 'E3_1': 3659.0, 'E3_2': 749.0, '-E2_0': 0.0, '-E2_1': 0.0, '-E2_2': 0.0, '-E4_0': 0.0, '-E4_1': 1055.0, '-E4_2': 1479.0}


So confirmation for #3, the waittime for only those vehicles are removed that have started moving, in the back if there are any vehicles that are stopped, their wait time would still be considered, meaning that the wait time for entire lane isnt set to 0 as the lane gets a green light.

In [28]:
# confirming it by doing, say 3 more steps
traci.simulationStep()
traci.simulationStep()
traci.simulationStep()
print({l:traci.lane.getWaitingTime(l) for l in incomingLanes})

{'E1_0': 0.0, 'E1_1': 973.0, 'E1_2': 598.0, 'E3_0': 0.0, 'E3_1': 3737.0, 'E3_2': 776.0, '-E2_0': 0.0, '-E2_1': 0.0, '-E2_2': 0.0, '-E4_0': 0.0, '-E4_1': 463.0, '-E4_2': 989.0}


### Creating a class with reward function being the cumulative wait time

In [15]:
class SUMOEnvironment:
    '''
        This class is the environment implemented using SUMO and TRACI for a single intersection.
    '''
    def __init__(self, sumoCfgPath, sumoMode='sumo', maxTime=3600.0, enableLogging=True, logPath='logs/RLLog.txt'):
        self.sumoMode = sumoMode
        self.sumoCfgPath = sumoCfgPath # these two are made into class variables because they will also be used in the reset function
        self.maxTime = maxTime
        self.currTime = 0.0
        self.directions = ( # movement directions for calculating the reward function
            ('E1_2', '-E3_2'), # left
            ('E1_1', 'E2_1'), # straight
            ('E1_0', 'E4_0'), # right

            ('-E4_2', '-E1_2'), # left
            ('-E4_1', '-E3_1'), # straight
            ('-E4_0', 'E2_0'), # right

            ('-E2_2', 'E4_2'), # left
            ('-E2_1', '-E1_1'), # straight
            ('-E2_0', '-E3_0'), # right

            ('E3_2', 'E2_2'), # left
            ('E3_1', 'E4_1'), # straight
            ('E3_0', '-E1_0') # right
        )
        self.incoming = [t[0] for t in self.directions]
        self.capacity = 40

        # starting the simulation
        if enableLogging:
            traci.start([sumoMode, '-c', sumoCfgPath, "--log", logPath, "--duration-log.statistics", "true"])
        else:
            traci.start([sumoMode, '-c', sumoCfgPath])

    def _getState(self, intersectionId='Inter'):
        '''
            This function returns the state at the current time step.
        '''
        stArray = []
        for l in self.incoming:
            # getting the number of waiting vehicles in the lane
            vc = traci.lane.getLastStepHaltingNumber(l)
            stArray.append(vc)

        # at the end, appending the current state of the intersection
        cs = traci.trafficlight.getPhase(intersectionId)
        # ohe the phase
        csOhe = [0]*4
        csOhe[cs] = 1
        stArray.extend(csOhe)
        return stArray

    def _waitTimeReward(self):
        '''
            This function defines the reward function based on the waiting time of vehicles in the incoming lanes.
            The reward is defined as summ_incomingLanes (-1 * waiitTime(lane)).
        '''
        r = 0
        for l in self.incoming:
            r_i = -1 * traci.lane.getWaitingTime(l)
            r += r_i
        return r
    
    def _vicCountReward(self):
        r = 0
        # looping through the directions and calculating individual rewards
        for d in self.directions:
            # waiting in the incoming lane
            vIn = traci.lane.getLastStepHaltingNumber(d[0])
            vOut = traci.lane.getLastStepHaltingNumber(d[1])
            r_i = -1 * vIn * (1 - (vOut/self.capacity))
            r += r_i
        return r

    def _getReward(self, rewardType='waittime'):
        '''
            This function returns the reward as of the current state of the intersection.
        '''
        if rewardType=='waittime':
            r = self._waitTimeReward()
            return r
        elif rewardType=='viccounts':
            r = self._vicCountReward()
            return r
        else:
            raise ValueError("Please provide the correct reward function type! Accepted values [CASE SENSITIVE]: waiittime, viccounts.")

    def _step(self, t=10):
        '''
            This function moves the simulation t timesteps ahead. And if the total number of steps reaches the max allowed steps, it stop and returns if the iteration is done.
        '''
        finished = False
        for i in range(t):
            if self.currTime==self.maxTime:
                finished = True
                break
            # if not, the continue
            self.currTime  = self.currTime + 1.0
            traci.simulationStep()
        return finished

    def takeAction(self, action, intersectionId='Inter', t=10):
        '''
            This function performs the given action, steps the environment ahead for next t seconds/steps, and then returns the next state, reward and whether the simulation has finished or not.
        '''
        # take action: set the tl phase to the action value
        traci.trafficlight.setPhase(intersectionId, action)
        # simulate next t time steps and get the next state
        finished = self._step(t)
        # get the next state
        next_state = self._getState()
        # get the reward
        reward = self._getReward()

        return next_state, reward, finished

    def reset(self):
        '''
            This function resets the environment to the start and returns the starting state.
        '''
        # reseting the sumo engine
        traci.load(["-c", self.sumoCfgPath])
        self.currTime = 0.0
        return self._getState()

    def close(self):
        '''
            This function closes the connection of traci with the sumo environment.
            NOTE: After calling this function, you will need to reinitialize the object, as now the connection to SUMO has been closed for this. NEED TO FIND A BETTER WAY TO DO THIS.
        '''
        traci.close()

Okay so I have one concern here. Which is, that the rewards might explode, because wait time can get to 1000s of seconds, it might mean that during rush hours, the reward can become very small, around the order of -10-20k. Im not sure if it will create problems in training of the model. Wonder if theres any such thing as exploding gradients for reward functions in RL.<br>
Apparantly, very large reward values could affect training in a negative manner. I need to think of a good way to scale it if thats the case. But before that, Id like to train it using this and see how it performs.

In [17]:
# creating an env
env = SUMOEnvironment('data/SingleIntersection.sumocfg', logPath='logs/exptLog.yaml')
# env.close()

In [18]:
env._getState()

[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0]

In [11]:
# stepping 60 times
env._step(60)

False

In [12]:
# getting the reward
env._getReward()

-84.0

In [14]:
# for reference
# Wait time: {'E1_0': 0.0, 'E1_1': 0.0, 'E1_2': 0.0, 'E3_0': 0.0, 'E3_1': 18.0, 'E3_2': 18.0, '-E2_0': 0.0, '-E2_1': 0.0, '-E2_2': 27.0, '-E4_0': 0.0, '-E4_1': 4.0, '-E4_2': 17.0}
18+18+27+17+4

84