Code corresponding to the DHP paper published on TPAMI 2018.
Fetching latest commit…
Cannot retrieve the latest commit at this time.


This repository provides database, code and results visualization for reproducing all the reported results in the paper:

Accpeted by TPAMI. By MC2 Lab @ Beihang University.

Specifically, this repository includes guidelines to:

Download PVS-HMEM database

Our PVS-HMEM (Panoramic Video Sequences with Head Movement & Eye Movement database) database contains both Head Movement and Eye Movement data of 58 subjects on 76 panoramic videos.

  • Blue dots represent the Head Movement.
  • Translucent blue circles represent the FoV.
  • Red dots represent the Eye Movement.

Download our PVS-HM database from DropBox. Please feel free to contact us by clicking here so that we can give you access permission to the file. Then extract it with:

tar -xzvf PVS-HM.tar.gz

Note that it contains all MP4 files of our database, along with the HM scanpath data FULLdata_per_video_frame.mat.

Note that the EM data is not provided here due to the scope of this work, however we release the EM data to facilitate the community in this repo.

For more details of the FULLdata_per_video_frame.mat file, refer to here (Note that you do not have to read the details of the mat file if you just want to run our code and reproduce the numbers).

Setup an environment to run our code

If you are not familiar with things in this section, refer to my personal basic setup for some guidelines or simply google it.

Install Anaconda according to the guidelines on their official site, then install other requirements with command lines:

sudo apt-get install tmux

source ~/.bashrc

# create env
conda create -n dhp_env python=2.7

# active env
source activate dhp_env

# install packages
pip install gym tensorflow universe

# clone project
git clone

# make remap excuatble
cd DHP-TensorFlow
chmod +x ./remap

Now you should run ./remap to make sure the remap is excuatble. It will log information as follows:

./remap [-i input] [-o output] [-f filter] [-m m] [-n n] [-w w] [-h h] [-t tf] [-y] src dst
	-i ... Input  file type: cube, rect, eqar, merc                   [rect]
	-o ... Output file type: cube, rect, eqar, merc, view             [rect]
	-f ... Filter type: nearest, linear, bicubic                   [bicubic]
	-m ... Input  height list                                          [500]
	-b ... Input  width                                                 [2m]
	-n ... Output height                                               [500]
	-v ... Output width                                                 [2n]
	-w ... Viewport width                                              [200]
	-h ... Viewport height                                             [200]
	-x ... Viewport fov x in degree                                     [90]
	-y ... Viewport fov y in degree                                     [90]
	-p ... Viewport center position phi (latitude)(degrees)              [0]
	-l ... Viewport center position tht (longitude)(degrees)             [0]
	-t ... Tracking data file                                         [none]
	-y ... Blend data together (only works with orec, etc ...)         [off]
	-z ... Number of frames                                            [MAX]
	-s ... Number of the start frame                                     [0]

If it is working, check a corresponding issue

Run our code

Please make sure you have:

  • More than 64 GB of RAM.
  • More then 600 GB space on the disk you store PVS-HM database.


This section clarifies procedures to train and test offline-DHP.


Set the database_path in to your database folder.

Generate YUV files. Set mode = 'data_processor' and data_processor_id = 'mp4_to_yuv' in and run:

source ~/.bashrc
source activate dhp_env

The converted YUV files will take about 600 Gb. The reason we have to use YUV files is that, the remap function that get FoV from a 360 image is a binary file that takes YUV and output YUV. We have developed a Python version of remap, but it turns out to be even slower than just reading and writing YUV files into the disk (for more then 5 times). We are trying to see if remap is important to produce our results.

Note that is a script that starts multiple process managed by tmux. Thus, after running, you can use tmux attach=session -t a3c to see how each process goes. More about tmux can be found here.

Warning: After you run, it will start a tmux session, where the program runs. If you are using mode = 'data_processor', please make sure each window exit normally and your task is complete without any error by navigating to each window of the tmux session.

Generate groundtruth heatmaps. Set mode = 'data_processor' and data_processor_id = 'generate_groundtruth_heatmaps' in and run:

source ~/.bashrc
source activate dhp_env

Set mode = 'off_line', procedure = 'train' and if_log_results = False in, run following:

source ~/.bashrc
source activate dhp_env

During the first few episode, you may find the CPU usage is extremely low, this is due to the sub-process is competing on remap function, which exchange data with disk. Later on, the CPU usage will increase.

Note that we trained for number_trained_steps = 1.113 * (10^6) to produce our results in the paper, we later found that training too much (10 times as many as 1.113 * (10^6)) may make the agent converge to FCB.


Note that the model is stored and restored automatically. Thus, as long as you did not change the log_dir in, previous trained model will be restored. Set mode = 'off_line', procedure = 'test' and if_log_results = True in, then run following:

source ~/.bashrc
source activate dhp_env

The code will generate and store predicted_heatmaps, predicted_scanpath and CC value.

If you are seeing

Starting training at step=<your-previous-global-step>

then the model is restored successfully. If you are seeing

Starting training at step=0

then you have not restored it successfully, refer to a corresponding issue

For results under more evaluation protocol. You may want to generate and store groundtruth_scanpaths with mode = 'data_processor' and data_processor_id = 'generate_groundtruth_scanpaths'.

Load our trained model

To load our trained model, download our model from DropBox link, extract it to the path ../results/, and set log_dir = "../results/reproduce_17". As has been said, the model in the log_dir will be automatically loaded.

Visualize training from TensorBoard

The code log multiple curves to help analysis the training process, type:

tensorboard --logdir <PATH>

where <PATH> is the log_dir in

Some hints on using the code.

  • mode = 'data_processor' is a efficient way to process data under our TMUX manager, the code is in
  • Some features we used in TensorFlow will be depreciated in a future version, we are using tf.__version__=1.6.0 to run our code.
  • Reinforcement Learning based methods are inherently stochastic, and we cannot guarantee producing exact the same numbers as those reported in our DHP paper. But if you do more runs, we are confident to say you can see consistent results.
Summary your results after testing

After you have tested the model (setting if_log_results=True), you can run


to summary the results. It will show results like:


which should be able to reproduce the numbers reported in the paper. If you meet any problem reproduce the numbers, please do not hesitate to contact us, you feed back on the environment settings and parameter settings will be well appreciated, since we are trying to provide the community a solid proposal.

Kill the session

Runpython to kill the session.

Meet some issues?

Please don not hesitate to open an issue. We do not encourage you to contact us directly, opening an issue would be the best way to raise up your questions.

Some known issues & fixations are:

Restore model failed.

Navigate to w-0 in tmux to see if this worker is working properly, because this worker is responsible for restoring model from disk, while other worker just async with it. Then check <log_dir>/train/checkpoint, it should look like:

model_checkpoint_path: "model.ckpt-5362890"
all_model_checkpoint_paths: "model.ckpt-5359910"
all_model_checkpoint_paths: "model.ckpt-5360655"
all_model_checkpoint_paths: "model.ckpt-5361444"
all_model_checkpoint_paths: "model.ckpt-5362210"
all_model_checkpoint_paths: "model.ckpt-5362890"

the model_checkpoint_path points to the latest checkpoint, the all_model_checkpoint_paths points to all available checkpoints. Please make what is listed here matches the files lies in <log_dir>/train/.

Then you will see a likely reason for restore failure is that the recent ckpt file is not stored completely when you killed the program, but it has been listed as available and should-be-restored in the checkpoint file. Thus, you can simply remove corresponding ckpt file, along with modifying codes in checkpoint file.

For example, in about case, change <log_dir>/train/checkpoint to:

model_checkpoint_path: "model.ckpt-5362210"
all_model_checkpoint_paths: "model.ckpt-5359910"
all_model_checkpoint_paths: "model.ckpt-5360655"
all_model_checkpoint_paths: "model.ckpt-5361444"
all_model_checkpoint_paths: "model.ckpt-5362210"

and delete <log_dir>/train/model.ckpt-5362890 will remove the most recent ckpt at 5362890 and restore the ckpt at 5362210.

Remap failed.

It is likely that you are running remap on /media/.. instead of /home/..., since the remap is only excutable on home. A quick fix is to copy remap to your home, chmod and test if it is excutable. After you comfirm that it works in home, change ./remap in to [ABSOLUTE_PATH_TO_YOUR_HOME]/remap.

Results Visualization

Reward Function

We propose a reward function that can capture transition of the attention.

Our reward function Baseline reward function

Specifically, in above example, the woman and the man are passing the basketball between each other, and subjects' attention are switching between them while they passing the basketball. Our reward function is able to capture these transitions of the attentions smoothly, while the baseline reward function makes the agent focus on the man all the time, even when the basketball is not in his hands.

Details of the mat data file.

The mat file includes 76 cells, corresponding to the HM data of all 76 videos. Each cell records the longitude and latitude of HM for 58 subjects, with a total of 116 columns. The longitude and latitude are arranged alternately. For example, the first and second column is the latitude and longitude of the first subject, respectively. Note that the sampling rate of the data is twice as the video FPS. The HM data takes the front center as the origin, and the upper & left as the positive direction. Thus, the longitude ranges from -180 to 180, and the latitude ranges from -90 to 90.


Yuhang Song Mai Xu Jianyi Wang Minglang Qiao Zulin Wang

Special Thanks

We would like to give special thanks to following researchers, for their valuable discussion and contribution to this work.

Ziyu Zhu Haochen Wang Chen Li Lai Jiang

The code is based on the A3C implementation by OpenAI, we thank a lot for their contribution to the community.


Please use this bibtex if you want to cite our work.

  title={Predicting Head Movement in Panoramic Video: A Deep Reinforcement Learning Approach},
  author={Xu, Mai and Song, Yuhang and Wang, Jianyi and Qiao, MingLang and Huo, Liangyu and Wang, Zulin},
  journal={IEEE Transactions on Pattern Analysis \& Machine Intelligence},