In [3]:
# !pip install stable-baselines3[extra]
a = !v4l2-ctl --device /dev/video0 --all | grep exposure_absolute
a

['              exposure_absolute 0x009a0902 (int)    : min=4 max=1250 step=1 default=156 value=156 flags=inactive']

In [2]:
!apt-get install ffmpeg freeglut3-dev xvfb  # For visualization

E: Could not open lock file /var/lib/dpkg/lock-frontend - open (13: Permission denied)
E: Unable to acquire the dpkg frontend lock (/var/lib/dpkg/lock-frontend), are you root?


## First steps with the Gym interface

An environment that follows the [gym interface](https://stable-baselines3.readthedocs.io/en/master/guide/custom_env.html) is quite simple to use.
It provides to this user mainly three methods:
- `reset()` called at the beginning of an episode, it returns an observation
- `step(action)` called to take an action with the environment, it returns the next observation, the immediate reward, whether the episode is over and additional information
- (Optional) `render(method='human')` which allow to visualize the agent in action. Note that graphical interface does not work on google colab, so we cannot use it directly (we have to rely on `method='rbg_array'` to retrieve an image of the scene

Under the hood, it also contains two useful properties:
- `observation_space` which one of the gym spaces (`Discrete`, `Box`, ...) and describe the type and shape of the observation
- `action_space` which is also a gym space object that describes the action space, so the type of action that can be taken

The best way to learn about gym spaces is to look at the [source code](https://github.com/openai/gym/tree/master/gym/spaces), but you need to know at least the main ones:
- `gym.spaces.Box`: A (possibly unbounded) box in $R^n$. Specifically, a Box represents the Cartesian product of n closed intervals. Each interval has the form of one of [a, b], (-oo, b], [a, oo), or (-oo, oo). Example: A 1D-Vector or an image observation can be described with the Box space.
```python
# Example for using image as input:
observation_space = spaces.Box(low=0, high=255, shape=(HEIGHT, WIDTH, N_CHANNELS), dtype=np.uint8)
```                                       

- `gym.spaces.Discrete`: A discrete space in $\{ 0, 1, \dots, n-1 \}$
  Example: if you have two actions ("left" and "right") you can represent your action space using `Discrete(2)`, the first action will be 0 and the second 1.



[Documentation on custom env](https://stable-baselines3.readthedocs.io/en/master/guide/custom_env.html)

Below you can find an example of a custom environment:

In [4]:
import gym
from gym import spaces



class CustomEnv(gym.Env):
    """Custom Environment that follows gym interface"""
    metadata = {'render.modes': ['human']}

    def __init__(self, arg1, arg2, ...):
        super(CustomEnv, self).__init__()
        
        #obtain min and max camera exposure values
        # e.g. obtain your camera info with 
        # !v4l2-ctl --device /dev/video0 --all | grep exposure_absolute 
        # save the result in a string and extract min and max in 2 env variables
        # round them to a multiple of 10 INSIDE their legal range, e.g. min = 4 -> min = 10
        
        # Define action and observation space
        # create an observation space with five values bounded from 0 to 1
        # create the videocapture object, force the autoexposure to manual
        # take some shots and throw them away 
        
        # define an action space composed of only 2 actions, you want to increase or decrease the exposire time
    
    def step(self, action):
        # here the environment process the received action
        # increase or decrease by 10 the current  exposure time
        # check if it's in the correct upper and lower bounds
        # change the camera parameter to the new value, take one frame
        # get the new resulting SSIM and return it for the observation
        # give +1 if the SSIM is higher than the one of the preious step (yes you need to save it somewhere)
        # give -1 if is lower than before
        # give 5 points if is it is the maximum
        
        return observation, reward, done, info
    
    
    def reset(self):
        # select random target exposure (target_exp) inside lower and upper exposure threshold that you found in _init_
        # round that target exposure to a multiple of 10
        # select a DIFFERENT starting exposure value (start_exp) and round it to multiple of 10
        # set exposure to target_exp, record and throw away some frames, after take one frame
        # this image (target_image) will be used to generate the metric for the RL algorithm
        
        # change exposure (again, throwing away some initial frames) to start_exp
        # get the ssim between the taken frame and target_image, return it as the initial observation
        
        # the reward initially should be 0, the done set to False
        
        
    return observation, reward, done
    
    
    def render(self, mode='human'):
        # write inside here the code to render at video the collected frames
        # add an empty black bar on tob where you print the SSIM betwwen the current image Vs the target one,
        # the exposition current and target value
    
    
    def close (self):
        #close the video capture element

SyntaxError: invalid syntax (<ipython-input-4-96c615b95efa>, line 10)

In [None]:
# Instantiate the env
env = CustomEnv(arg1, ...)
# Define and Train the agent
model = A2C('CnnPolicy', env).learn(total_timesteps=1000)

In [None]:
from stable_baselines.common.env_checker import check_env

env = CustomEnv(arg1, ...)
# It will check your custom environment and output additional warnings if needed
check_env(env)