Tensorflow Implementation of the EMNLP-2018 paper Temporally Grounding Natural Sentence in Video by Jingyuan Chen et al.
pip install -r requirements.txt
- Download Glove word embedding data.
cd download/ sh download_glove.sh
- Download dataset features.
Put the feature hdf5 file in the corresponding directory
We decode TACoS/Charades videos using
fps=16 and extract C3D (fc6) features for each non-overlap 16-frame snippet. Therefore, each feature corresponds to 1-second snippet. For ActivityNet, each feature corresponds to 2-second snippet. To extract C3D fc6 features, I mainly refer to this code.
- Download trained models.
Download and put the checkpoints in corresponding
- Data Preprocessing (Optional)
cd datasets/tacos/ sh prepare_data.sh
Then copy the generated data in
Use correspondig scripts for preparing data for other datasets.
You may skip this procedure as the prepared data is already saved in
Testing and Evaluation
sh scripts/test_tacos.sh sh scripts/eval_tacos.sh
Use corresponding scripts for testing or evaluating for other datasets.
The predicted results are also provided in
Use corresponding scripts for training for other datasets.