Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

results not changing no matter what the algorithm is #672

Closed
vahidqo opened this issue Nov 23, 2021 · 1 comment
Closed

results not changing no matter what the algorithm is #672

vahidqo opened this issue Nov 23, 2021 · 1 comment
Labels
custom gym env Issue related to Custom Gym Env No tech support We do not do tech support question Further information is requested

Comments

@vahidqo
Copy link

vahidqo commented Nov 23, 2021

I'll post it as a question as I am not quite sure that it is a bug. I have been experimenting for a while with the library in a custom environment for a project but I get same wrong results each time. I am pretty sure that my environment is true as I checked it with:

from stable_baselines3.common.env_checker import check_env

env = CustomEnv(arg1, ...)
# It will check your custom environment and output additional warnings if needed
check_env(env)

So my custom env is:

class bike(gym.Env):
    
    def __init__(self, *args, **kwargs):
        self.n_station = 5
        self.n_vehicle = 1
        self.vehicle_capacity = 4
        self.station_capacity = 20
        self.step_limit = 20
        self.info = {}
        
        self.action_dim1 = 2 * self.vehicle_capacity
        self.action_dim2 = 1 + self.n_station
        #self.obs_dim = self.n_vehicle * 2 + (self.n_station + 1) * 3
        self.obs_dim = self.n_vehicle * 3 + (self.n_station + 1) * 3
        box_low = np.zeros(self.obs_dim)
        box_high = np.hstack([
            np.repeat(self.n_station + 1, self.n_vehicle),
            np.repeat(self.vehicle_capacity, self.n_vehicle*2), # Vehicle capacities 6-7
            np.repeat(self.station_capacity, (self.n_station+1)*3),
        ])

         self.observation_space = Box(
              low=box_low,
              high=box_high,
              dtype=np.int)
                    
        self.action_space = MultiDiscrete([self.action_dim1,self.action_dim2])
        
        self.reset()
        
    def _STEP(self, action):
        done = False
        self.reward = 0

        if action[0] <= self.vehicle_capacity:
          self.load(action)   
        elif action[0] > self.vehicle_capacity:
          self.unload(action)
        else:
           raise Exception(f"Selected action ({action[0]}) outside of action space.")
        
        
        if self.step_count >= self.step_limit:
            done = True
            
        return self.state, self.reward, done, self.info
        
        
    def load(self, action):
      if self.bike[self.vehicle_location] - int(action[0]) < 0 or self.vehicle_load + int(action[0]) > self.vehicle_capacity :
        self.reward += -100
      else:
        self.bike[self.vehicle_location] -= int(action[0])
        self.vehicle_load += int(action[0])
        self.reward += float((np.sum(((self.bike-self.demand)>=0)*(self.demand))+np.sum(((self.bike-self.demand)<0)*(self.bike)))*2-np.sum(((self.bike-self.demand)<0)*(self.demand-self.bike))*4-int(action[0]))
        self.vehicle_location = int(action[1])
        self.empty = self.vehicle_capacity - self.vehicle_load
        self.step_count += 1
        self.state = self._update_state()

    def unload(self, action):
      if self.vehicle_load - int(action[0]) < 0:
        self.reward += -100
      else:
        self.bike[self.vehicle_location] += int(action[0])-self.vehicle_capacity
        self.vehicle_load -= (int(action[0])-self.vehicle_capacity)
        self.reward += float((np.sum(((self.bike-self.demand)>=0)*(self.demand))+np.sum(((self.bike-self.demand)<0)*(self.bike)))*2-np.sum(((self.bike-self.demand)<0)*(self.demand-self.bike))*4-(int(action[0])-self.vehicle_capacity))
        self.vehicle_location = int(action[1])
        self.empty = self.vehicle_capacity - self.vehicle_load
        self.step_count += 1
        self.state = self._update_state()
      

    def _update_orders(self):
        self.demand = np.random.randint(0, [1, 3, 3, 2, 2, 2])
        self.returndemand = np.random.randint(0, [1, 2, 2, 3, 3, 4])
    
    def _update_state(self):

        state = np.hstack([
            np.hstack(np.array([self.vehicle_location])),
            np.hstack(np.array([self.vehicle_load])),
            np.hstack(np.array([self.empty])),
            np.hstack(self.bike),
            np.hstack(self.demand),
            np.hstack(self.returndemand),
        ])

        self.bike -= ((self.bike-self.demand)>=0)*(self.demand)
        self.bike += self.returndemand
        self._update_orders()

        return state

    def _RESET(self):
        self.step_count = 0
        self.vehicle_load =  0
        self.vehicle_location = 0
        self.empty = self.vehicle_capacity - self.vehicle_load
        self.bike = np.array([0,1,4,3,3,3])
        self._update_orders()
        self.state = np.hstack([
            np.hstack(np.array([self.vehicle_location])),
            np.hstack(np.array([self.vehicle_load])),
            np.hstack(np.array([self.empty])),
            np.hstack(self.bike),
            np.hstack(self.demand),
            np.hstack(self.returndemand),
        ])
        self._update_orders()
        return self.state
        

    def step(self, action):
        return self._STEP(action)

    def reset(self):
        return self._RESET()

And learning model is:

env = DummyVecEnv([lambda: bike()])
model = PPO("MlpPolicy", env, verbose=1, tensorboard_log="logs")
model.learn(total_timesteps=2000000)

### Checklist

  • [y] I have read the documentation (required)
  • [y] I have checked that there is no similar issue in the repo (required)
  • [y] I have checked my env using the env checker (required)
  • [y] I have provided a minimal working example to reproduce the bug (required)
@vahidqo vahidqo added custom gym env Issue related to Custom Gym Env question Further information is requested labels Nov 23, 2021
@araffin araffin added the No tech support We do not do tech support label Nov 23, 2021
@Miffyli
Copy link
Collaborator

Miffyli commented Nov 24, 2021

Hey. Unfortunately we do not have time to provide custom tech support for solving your environments. If the environment checker says it is ok, then environment is functionally ok. Algorithms are also tested to work.

Also, please, in future fill in the issue template.

Closing as no tech support.

@Miffyli Miffyli closed this as completed Nov 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
custom gym env Issue related to Custom Gym Env No tech support We do not do tech support question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants