# CTRL - Function Approximation Tutorial: Tabular Fitted Q Iteration (FQI)

Website: https://control-rl.github.io/

Gymnasium documentation: https://gymnasium.farama.org/

This notebook is adapted from the open-source tutorial of Antonin Raffin done at RLSS2023. The original tutorial can be found [here](https://github.com/araffin/rlss23-dqn-tutorial)

## Introduction

In this notebook, you will implement the Fitted Q Iteration (FQI) algorithm to solve the [CartPole](https://gymnasium.farama.org/environments/classic_control/cart_pole/) problem.

This notebooks will first cover the basics for using the Gymnasium library: how to instantiate an environment, step into it and collect training data from the FQI algorithm.

You will then learn how to implement step-by-step the FQI algorithm which is the predecessor of the [Deep Q-Network (DQN)](https://stable-baselines3.readthedocs.io/en/master/modules/dqn.html) algorithm.

## Install Dependencies

In [None]:
!pip install git+https://github.com/Control-RL/Function-Approximation --upgrade

In [None]:
!apt-get install ffmpeg  # For visualization

## First steps with the Gym interface

An environment that follows the [gym interface](https://gymnasium.farama.org/) is quite simple to use.
It provides to this user mainly three methods, which have the following signature (for gym versions > 0.26):

- `reset()` called at the beginning of an episode, it returns an observation and a dictionary with additional info (defaults to an empty dict)
- `step(action)` called to take an action with the environment, it returns the next observation, the immediate reward, whether new state is a terminal state (episode is finished), whether the max number of timesteps is reached (episode is artificially finished), and additional information
- (Optional) `render()` which allow to visualize the agent in action. Note that graphical interface does not work on google colab, so we cannot use it directly (we have to rely on `render_mode='rbg_array'` to retrieve an image of the scene).

Under the hood, it also contains two useful properties:
- `observation_space` which one of the gym spaces (`Discrete`, `Box`, ...) and describe the type and shape of the observation
- `action_space` which is also a gym space object that describes the action space, so the type of action that can be taken

The best way to learn about [gym spaces](https://gymnasium.farama.org/api/spaces/) is to look at the [source code](https://github.com/Farama-Foundation/Gymnasium/tree/main/gymnasium/spaces), but you need to know at least the main ones:
- `gym.spaces.Box`: A (possibly unbounded) box in $R^n$. Specifically, a Box represents the Cartesian product of n closed intervals. Each interval has the form of one of [a, b], (-oo, b], [a, oo), or (-oo, oo). Example: A 1D-Vector or an image observation can be described with the Box space.
```python
# Example for using image as input:
observation_space = spaces.Box(low=0, high=255, shape=(HEIGHT, WIDTH, N_CHANNELS), dtype=np.uint8)
```                                       

- `gym.spaces.Discrete`: A discrete space in $\{ 0, 1, \dots, n-1 \}$
  Example: if you have two actions ("left" and "right") you can represent your action space using `Discrete(2)`, the first action will be 0 and the second 1.

## CartPole Environment

For this example, we will use CartPole environment, a classic control problem.

"A pole is attached by an un-actuated joint to a cart, which moves along a frictionless track. The system is controlled by applying a force of +1 or -1 to the cart. The pendulum starts upright, and the goal is to prevent it from falling over. A reward of +1 is provided for every timestep that the pole remains upright. "

Cartpole environment: [https://gymnasium.farama.org/environments/classic_control/cart_pole/](https://gymnasium.farama.org/environments/classic_control/cart_pole/)

![Cartpole](https://cdn-images-1.medium.com/max/1143/1*h4WTQNVIsvMXJTCpXm_TAw.gif)

In [4]:
import gymnasium as gym
# Instantiate the environment
env_id = "CartPole-v1"
env_continuous  = gym.make("CartPole-v1")

In [5]:
# Box(4,) means that it is a Vector with 4 components
print("Observation space:", env_continuous.observation_space)
print("Shape:", env_continuous.observation_space.shape)
# Discrete(2) means that there is two discrete actions
print("Action space:", env_continuous.action_space)

Observation space: Box([-4.8               -inf -0.41887903        -inf], [4.8               inf 0.41887903        inf], (4,), float32)
Shape: (4,)
Action space: Discrete(2)


In [6]:
# Discretize the state space
from dqn_tutorial.env_utils import DiscretizeStateWrapper
import numpy as np

custom_bin_boundaries = gym.spaces.Box(
    low=np.array([-2.4, -10.0, -0.30, -5.0], dtype=np.float32),
    high=np.array([2.4, 10.0, 0.30, 5.0], dtype=np.float32),
)

n_bins = 4
env_discrete = DiscretizeStateWrapper(env_continuous, n_bins=n_bins, custom_bin_boundaries=custom_bin_boundaries)

In [7]:
env_discrete._bins

[(np.float32(-2.4),
  np.float32(-1.2),
  np.float32(0.0),
  np.float32(1.2),
  np.float32(2.4)),
 (np.float32(-10.0),
  np.float32(-5.0),
  np.float32(0.0),
  np.float32(5.0),
  np.float32(10.0)),
 (np.float32(-0.3),
  np.float32(-0.15),
  np.float32(0.0),
  np.float32(0.15),
  np.float32(0.3)),
 (np.float32(-5.0),
  np.float32(-2.5),
  np.float32(0.0),
  np.float32(2.5),
  np.float32(5.0))]

In [8]:
# Discrete(81) means that there is 81 discrete states (3^4)
print("Observation space:", env_discrete.observation_space)
print("Shape:", env_discrete.observation_space.shape)
# Discrete(2) means that there is two discrete actions
print("Action space:", env_discrete.action_space)

Observation space: Discrete(256)
Shape: ()
Action space: Discrete(2)


In [9]:
# The reset method is called at the beginning of an episode
obs, info = env_discrete.reset()

In [10]:
# Sample a random action
action = env_discrete.action_space.sample()
print(f"Sampled action: {action}")

Sampled action: 1


In [11]:
# step in the environment
obs, reward, terminated, truncated, info = env_discrete.step(action)

In [12]:
# Note the obs is an int (the state index)
# info is an empty dict for now but can contain any debugging info
# reward is a scalar
print(obs, reward, terminated, truncated, info)

89 1.0 False False {}


### Exercise (10 minutes): write the function to collect data

This function collects an offline dataset of transitions that will be used to train a model using the FQI algorithm.

See docstring of the function for what is expected as input/output.

In [13]:
from dataclasses import dataclass

from gymnasium import spaces


@dataclass
class OfflineData:
    """
    A class to store transitions.
    """

    observations: np.ndarray  # same as "state" in the theory
    next_observations: np.ndarray
    actions: np.ndarray
    rewards: np.ndarray
    terminateds: np.ndarray

In [14]:
def collect_data(env_discrete: DiscretizeStateWrapper, n_steps: int = 50_000) -> OfflineData:
    """
    Collect transitions using a random agent (sample action randomly).

    :param env_discrete: The environment to collect data from.
    :param n_steps: Number of steps to perform in the env_discrete.
    :return: The collected transitions.
    """

    # assert isinstance(env_discrete.observation_space, gym.spaces.Discrete)
    # Numpy arrays (buffers) to collect the data
    observations = np.zeros((n_steps, 1))
    next_observations = np.zeros((n_steps, 1))
    # Discrete actions
    actions = np.zeros((n_steps, 1))
    rewards = np.zeros((n_steps,))
    terminateds = np.zeros((n_steps,))

    # Variable to know if the episode is over (done = terminated or truncated)
    done = False
    # Start the first episode
    obs, _ = env_discrete.reset()

    ### YOUR CODE HERE
    # You need to collect transitions for `n_steps` using
    # a random agent (sample action uniformly).
    # Do not forget to reset the environment if the current episode is over
    # (done = terminated or truncated)
    #
    # TODO:
    # 1. Sample a random action
    # 2. Step in the env using this random action
    # 3. Retrieve the new transition data (observation, reward, ...)
    #  and update the numpy arrays (buffers)
    # 4. Repeat until you collected `n_steps` transitions

    for idx in range(n_steps):
        # Sample a random action
        action = env_discrete.action_space.sample()
        # Step in the environment
        next_obs, reward, terminated, truncated, info_ = env_discrete.step(action)

        # Store the transition
        # Note: we only record true termination (timeouts/truncations are artificial terminations)
        observations[idx, :] = obs
        next_observations[idx, :] = next_obs
        actions[idx, :] = action
        rewards[idx] = reward
        terminateds[idx] = terminated
        # Update current observation
        obs = next_obs
        # Check if the episode is over
        done = terminated or truncated

        # Don't forget to reset the env at the end of an episode
        if done:
            obs, _ = env_discrete.reset()

    ### END OF YOUR CODE

    return OfflineData(
        observations,
        next_observations,
        actions,
        rewards,
        terminateds,
    )

Let's try the collect data method:

In [15]:
n_steps = 50_000
# Collect transitions for n_steps
data = collect_data(env_discrete=env_discrete, n_steps=n_steps)

In [16]:
# Check the length of the collected data
assert len(data.observations) == n_steps
assert len(data.actions) == n_steps
# Check that there are multiple episodes in the data
assert not np.all(data.terminateds)
assert np.any(data.terminateds)

assert data.actions.shape == (n_steps, 1)
assert data.rewards.shape == (n_steps,)


np.sum(data.terminateds) 

np.float64(2252.0)

In [17]:
from pathlib import Path

from dqn_tutorial.fqi import save_data

output_filename = Path("../data") / f"{env_id}_data_dicretized_{n_bins}"
# Create folder if it doesn't exist
output_filename.parent.mkdir(parents=True, exist_ok=True)

# Save collected data using numpy
save_data(data, output_filename)

Saving to ../data/CartPole-v1_data_dicretized_4.npz


## Fitted Q Iteration (FQI) Algorithm


<div>
    <img src="attachment:008df18a-8225-4766-aa89-c713f5f953af.png" width="500"/>
</div>

In [18]:
from functools import partial
from pathlib import Path
from typing import Optional

import gymnasium as gym
import numpy as np
from gymnasium import spaces
from sklearn import tree
from sklearn.base import RegressorMixin
from sklearn.exceptions import NotFittedError
from sklearn.ensemble import GradientBoostingRegressor, RandomForestRegressor
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
from sklearn.neighbors import KNeighborsRegressor

### Choosing a Model

With FQI, you can use any regression model.


#### Efficient Custom Linear Regression

We will fit a regression model $f_\theta(x) = y$ where $x$ is a one-hot encoded state-action pair and $y$ is the target Q-value.

For linear regression we have $f_{\theta}(x) = \theta x $, where $\theta$ is the matrix of parameters of shape $(1, |S| |A|)$.

Because the input is one-hot encoded, only one element of $\theta$ is used for each state-action pair. So this high-dimensional 
regression problem collapsed into $|S| |A|$ scalar regression problems, one for each state-action pair: 

$\forall i \in [0, |S| |A|]: f_{\theta_i}(x) = \theta_i x_i$

Just like in the tabular case (previous tutorial), our model is composed of one parameter (Q-value) per state-action pair.

To make the code more efficient, we will use a custom linear regression model that solve all these scalar regression problems instead of solving the high-dimensional regression problem.

In [19]:
class CustomLinearRegression():
    """
    Adapt the linear regression to deal with the case where the input is a one-hot encoded vector.
    """
    def __init__(self):
        self.models = []

    def fit(self, X, y):
        self.models = []
        # Loop over all the possible states in the MDP
        for state in range(X.shape[1]):
            # Create a mask to select the rows where the state is active
            mask = X[:, state] == 1
            # print(f"State {state} has {mask.sum()} samples")
            y_state = y[mask]
            X_state = X[mask, state]
            if len(y_state) == 0:
                # print(f"No data for state {state}")	
                # No data for this state
                model = None
            else:
                # Fit a linear regression for each state
                model = LinearRegression().fit(X_state.reshape(-1, 1), y_state)
            self.models.append(model)
        return self

    def predict(self, X):
        # Loop over all the sampled states
        for i in range(X.shape[0]):
            state = np.where(X[i, :] == 1)[0]
            assert len(state) == 1
            state = state[0]
            model = self.models[state]
            if model is None:
                y_pred = np.array([0.])
            else:
                y_pred = model.predict(X[i, state].reshape(-1, 1))
            if i == 0:
                y_preds = y_pred
            else:
                y_preds = np.concatenate((y_preds, y_pred))
        return y_preds

    def score(self, X, y):
        y_preds = []
        y_true = []
        
        # Loop over all the sampled states
        for i in range(X.shape[0]):
            state = np.where(X[i, :] == 1)[0]
            assert len(state) == 1
            state = state[0]
            model = self.models[state]
            # compute the score for each state
            y_true.append(y[i])
            y_preds.append(model.predict(X[i, state].reshape(-1, 1))[0])
        y_true = np.array(y_true)
        y_preds = np.array(y_preds)
        u = ((y_true - y_preds) ** 2).sum()
        v = ((y_true - y_true.mean()) ** 2).sum()
        if v == 0:
            return 0
        return 1 - u/v

In [20]:
model_class = CustomLinearRegression # LinearRegression, GradientBoostingRegressor

### Loading offline dataset

In [21]:
from dqn_tutorial.fqi import load_data

# env_id = "CartPole-v1"
output_filename = Path("../data") / f"{env_id}_data_dicretized_{n_bins}.npz"
render_mode = "rgb_array"

# Create test environment
env = gym.make(env_id, render_mode=render_mode)
env_discrete = DiscretizeStateWrapper(env, n_bins=n_bins, custom_bin_boundaries=custom_bin_boundaries)

# Load saved transitions
data = load_data(output_filename)

### First Iteration of FQI

For $n = 0$, the initial training set is defined as:

- $x = \text{one-hot-encoding}(s_t, a_t) \in \{0,1\}^{|S| |A|} $
- $y = r_t$

We fit a regression model $f_\theta(x) = y$ to obtain $ Q^{n=0}_\theta(\text{one-hot-encoding}(s, a)) $

In [22]:
import numpy as np

def create_one_hot_obs_actions(obs, actions, action_space, observation_space):
    """
    Create a batch one-hot encoded matrix that represents the combination 
    of observation and action for each sample.
    
    Each observation-action pair is encoded as a one-hot vector of length 
    observation_space.n * action_space.n. The one-hot index for a given pair 
    (o, a) is computed as:
    
        index = o * action_space.n + a

    Parameters:
        obs: np.ndarray of shape (N,)
            A NumPy array of integers representing observations, each in the range [0, observation_space.n - 1].
        actions: np.ndarray of shape (N,)
            A NumPy array of integers representing actions, each in the range [0, action_space.n - 1].
        action_space: object
            An object with an attribute `n` that specifies the number of possible actions.
        observation_space: object
            An object with an attribute `n` that specifies the number of possible observations.
    
    Returns:
        np.ndarray:
            A one-hot encoded NumPy array of shape (N, observation_space.n * action_space.n) where each row 
            corresponds to a one-hot encoded representation of the observation-action pair.
    """
    # Put obs and actions in the int type
    obs = obs.astype(int)
    actions = actions.astype(int)
    # Number of samples
    N = obs.shape[0]
    
    # Total number of possible observation-action combinations
    total_dim = observation_space.n * action_space.n
    
    # Initialize the one-hot matrix with zeros
    one_hot_matrix = np.zeros((N, total_dim), dtype=int)
    
    # Compute the index for each observation-action pair.
    # For each sample, the index is: obs[i] * action_space.n + actions[i]
    indices = obs * action_space.n + actions
    
    # Set the corresponding positions to 1 for the one-hot encoding
    one_hot_matrix[np.arange(N), indices[:,0]] = 1
    
    return one_hot_matrix

In [23]:
# First iteration:
# The target q-value is the reward obtained
targets = data.rewards.copy()
# Create input for current observations and actions
# Create a one-hot encoder of the observations and actions
# so we can predict qf(s_t, a_t)
current_obs_input = create_one_hot_obs_actions(data.observations, data.actions, env_discrete.action_space, env_discrete.observation_space)

In [24]:
# Fit the estimator for the current target
model = model_class().fit(current_obs_input, targets)

### 1. Exercise (10 minutes): write the function to predict Q-Values


<div>
    <img src="attachment:e3d00cba-8345-4c65-8027-ab06d99d2305.png" width="300"/>
</div>

In [25]:
def get_q_values(
    model: RegressorMixin,
    obs: np.ndarray,
    n_actions: int,
) -> np.ndarray:
    """
    Retrieve the q-values for a set of observations.
    qf(q_t, action) for all possible actions.

    :param model: Q-value estimator
    :param obs: A batch of observations
    :param n_actions: Number of discrete actions.
    :return: The predicted q-values for the given observations
        (batch_size, n_actions)
    """
    batch_size = len(obs)
    q_values = np.zeros((batch_size, n_actions))

    ### YOUR CODE HERE
    # TODO: for every possible actions a:
    # 1. Create the regression model input $(s, a)$ for the action a
    # and states s (here a batch of observations)
    # 2. Predict the q-values for the batch of states
    # 3. Update q-values array for the current action a

    # Predict q-value for each action
    for action_idx in range(n_actions):
        # Note: we should do one hot encoding if not using CartPole (n_actions > 2)
        # Create a vector of size (batch_size, 1) for the current action
        # This allows to do batch prediction for all the provided observations
        actions = action_idx * np.ones((batch_size, 1))
        # Create a one-hot encoder of the observations and actions
        model_input = create_one_hot_obs_actions(obs, actions, env_discrete.action_space, env_discrete.observation_space)

        # Predict q-values for the given observation/action combination
        # shape: (batch_size, 1)
        predicted_q_values = model.predict(model_input)
        # Update the q-values array for the current action
        q_values[:, action_idx] = predicted_q_values

    ### END OF YOUR CODE

    return q_values

Let's test it with a subset of the collected data:

In [26]:
n_observations = 2
n_actions = int(env_discrete.action_space.n)

q_values = get_q_values(model, data.observations[:n_observations], n_actions)

assert q_values.shape == (n_observations, n_actions)

### 2. Exercise (8 minutes): write the function to evaluate a model

A greedy policy $\pi(s)$ can be defined using the q-value:

$\pi(s) = argmax_{a \in A} Q(\text{one-hot-encoding}(s, a))$.

It is the policy that takes the action with the highest q-value for a given state.

In [27]:
import os

from gymnasium.wrappers import RecordVideo


def evaluate(
    model: RegressorMixin,
    env_discrete: gym.Env,
    n_eval_episodes: int = 10,
    video_name: Optional[str] = None,
) -> None:
    episode_returns, episode_reward = [], 0.0
    total_episodes = 0
    done = False

    # Setup video recorder
    video_recorder = None
    if video_name is not None and env_discrete.render_mode == "rgb_array":
        os.makedirs("../logs/videos/", exist_ok=True)

        # New gym recorder always wants to cut video into episodes,
        # set video length big enough but not to inf (will cut into episodes)
        env_discrete = RecordVideo(env_discrete, "../logs/videos", step_trigger=lambda _: False, video_length=100_000)
        env_discrete.start_recording(video_name)

    obs, _ = env_discrete.reset()
    print(obs.shape)
    n_actions = int(env_discrete.action_space.n)
    assert isinstance(env_discrete.action_space, spaces.Discrete), "FQI only support discrete actions"

    while total_episodes < n_eval_episodes:

        ### YOUR CODE HERE

        # Retrieve the q-values for the current observation
        # you need to re-use `get_q_values()`
        # Note: you need to add a batch dimension to the observation
        # you can use `obs[np.newaxis, ...]` for that: (obs_dim,) -> (batch_size=1, obs_dim)
        q_values = get_q_values(
            model,
            obs[np.newaxis, ...],
            n_actions,
        )
        # Select the action that maximizes the q-value for each state
        # Don't forget to remove the batch dimension, you can `.item()` for that
        best_action = int(np.argmax(q_values, axis=1).item())

        # Send the action to the env
        obs, reward, terminated, truncated, _ = env_discrete.step(best_action)

        ### END OF YOUR CODE

        episode_reward += float(reward)

        done = terminated or truncated
        if done:
            episode_returns.append(episode_reward)
            episode_reward = 0.0
            total_episodes += 1
            obs, _ = env_discrete.reset()

    if isinstance(env_discrete, RecordVideo):
        print(f"Saving video to ../logs/videos/{video_name}")
        env_discrete.close()

    print(f"Total reward = {np.mean(episode_returns):.2f} +/- {np.std(episode_returns):.2f}")

In [28]:
# Evaluate the first iteration
evaluate(model, env_discrete, n_eval_episodes=10)

()
Total reward = 9.30 +/- 0.90


In [29]:
print(f"Score: {model.score(current_obs_input, targets):.2f}")

Score: 0.00


### 3. Exercise (20 minutes): the Fitted Q Iterations

1. Create the training set based on the previous iteration $ Q^{n-1}_\theta(s, a) $ and the transitions:
- input: $x = \text{one-hot-encoding}(s_t, a_t)$
- if $s_{t+1}$ is non-terminal: $y = r_t + \gamma \cdot \max_{a' \in A}(Q^{n-1}_\theta(\text{one-hot-encoding}(s_{t+1}, a')))$
- if $s_{t+1}$ is terminal, do not bootstrap: $y = r_t$

2. Fit a model $f_\theta$ using a regression algorithm to obtain $ Q^{n}_\theta(\text{one-hot-encoding}(s, a))$
 
\begin{aligned}
 f_\theta(x) = y
\end{aligned}

4. Repeat, $n = n + 1$

First, let's define some constants:

In [30]:
# Max number of iterations
n_iterations = 20
# How often do we evaluate the learned model
eval_freq = 2
# How many episodes to evaluate every eval-freq
n_eval_episodes = 10
# discount factor
gamma = 0.99
# Number of discrete actions
n_actions = int(env_discrete.action_space.n)

Then do several iteration of the FQI algorithm

In [31]:
for iter_idx in range(n_iterations):
    ### YOUR CODE HERE
    # TODO:
    # 1. Compute the q values for the next states using
    # the previous regression model
    # 2. Keep only the next q values that correspond
    # to the greedy-policy
    # 3. Construct the regression target (TD(0) target)
    # 4. Fit a new regression model with this new target

    # Construct TD(0) target
    # using current model and the next observations
    
    # First, retrieve the q-values for the next states
    # for all possible actions
    # you need to use `get_q_values()` method
    next_q_values = get_q_values(
        model,
        data.next_observations,
        n_actions=n_actions,
    )
    # Follow-greedy policy: use the action with the highest q-value
    # to compute the next q-values
    next_q_values = next_q_values.max(axis=1)
    # The new target is the reward + what our agent expect to get
    # if it follows a greedy policy (follow action with the highest q-value)
    # Reminder: you should not bootstrap if terminated=True
    # (you can mask the next q values for that using `np.logical_not`)
    should_bootstrap = np.logical_not(data.terminateds) # (1 - data.terminateds)
    targets = data.rewards + gamma * should_bootstrap * next_q_values
    # Update the q-value estimate with the current target,
    # i.e., fit a regression model using the new target
    model = model_class().fit(current_obs_input, targets)

    ### END OF YOUR CODE

    if (iter_idx + 1) % eval_freq == 0:
        print(f"Iter {iter_idx + 1}")
        print(f"Score: {model.score(current_obs_input, targets):.2f}")
        evaluate(model, env_discrete, n_eval_episodes)

Iter 2
Score: 0.53
()
Total reward = 42.60 +/- 8.37
Iter 4
Score: 0.66
()
Total reward = 133.90 +/- 33.19
Iter 6
Score: 0.69
()
Total reward = 135.90 +/- 50.13
Iter 8
Score: 0.70
()
Total reward = 150.40 +/- 34.98
Iter 10
Score: 0.70
()
Total reward = 143.80 +/- 39.18
Iter 12
Score: 0.70
()
Total reward = 141.20 +/- 50.93
Iter 14
Score: 0.70
()
Total reward = 133.70 +/- 46.17
Iter 16
Score: 0.70
()
Total reward = 138.60 +/- 37.52
Iter 18
Score: 0.70
()
Total reward = 132.40 +/- 27.77
Iter 20
Score: 0.70
()
Total reward = 131.70 +/- 40.40


### Record a video of the trained agent

In [34]:
eval_env = gym.make(env_id, render_mode="rgb_array")
eval_env = DiscretizeStateWrapper(eval_env, n_bins=n_bins, custom_bin_boundaries=custom_bin_boundaries)

video_name = f"FQI_{env_id}_tabular"
n_eval_episodes = 4

evaluate(model, eval_env, n_eval_episodes, video_name=video_name)

  logger.warn(


()
Saving video to ../logs/videos/FQI_CartPole-v1_tabular
Total reward = 136.75 +/- 49.23


In [35]:
from dqn_tutorial.notebook_utils import show_videos

# print(f"FQI agent on {env_id} after {n_iterations} iterations:")
show_videos("../logs/videos/", prefix=video_name)