How do I apply to my video files? #1

rajeevchhabra · 2018-01-29T04:19:48Z

Hi:
I have been able to run your algorithm on my machine (both training and test datasets). Now I would like to apply it to my dataset (my videos - they are not compressed to .h5). How do I do that? What function would I need to modify? Please guide.

KaiyangZhou · 2018-01-29T16:35:33Z

Hi @rajeevchhabra,
You can replace this line https://github.com/KaiyangZhou/vsumm-reinforce/blob/master/vsum_train.py#L84 with your own features, which should be of dimension (num_frames, feature_dim). You also need to modify https://github.com/KaiyangZhou/vsumm-reinforce/blob/master/vsum_train.py#L69 to store baseline rewards according to your own datasets.

zijunwei · 2018-02-06T16:23:27Z

Hi, is that possible to share the code you create the h5 dataset so that I can follow to create my own?
It doesn't even have to be runable
Thanks!

KaiyangZhou · 2018-02-06T17:06:07Z

Hi @zijunwei,
You can follow the code below to create your own data:

import h5py
h5_file_name = 'blah blah blah'
f = h5py.File(h5_file_name, 'w')

# video_names is a list of strings containing the 
# name of a video, e.g. 'video_1', 'video_2'
for name in video_names:
    f.create_dataset(name + '/features', data=data_of_name)
    f.create_dataset(name + '/gtscore', data=data_of_name)
    f.create_dataset(name + '/user_summary', data=data_of_name)
    f.create_dataset(name + '/change_points', data=data_of_name)
    f.create_dataset(name + '/n_frame_per_seg', data=data_of_name)
    f.create_dataset(name + '/n_frames', data=data_of_name)
    f.create_dataset(name + '/picks', data=data_of_name)
    f.create_dataset(name + '/n_steps', data=data_of_name)
    f.create_dataset(name + '/gtsummary', data=data_of_name)
    f.create_dataset(name + '/video_name', data=data_of_name)

f.close()

For a detailed description of the data format, please refer to the readme.txt in dataset which you downloaded via wget.

Instructions for h5py can be found at http://docs.h5py.org/en/latest/quick.html

Let me know if you have any problems.

zijunwei · 2018-02-06T17:18:39Z

Thanks!
For the readme.txt file you referred.

/key
    /features                 2D-array with shape (n_steps, feature-dimension)
    /gtscore                  1D-array with shape (n_steps), stores ground truth improtance score
    /user_summary             2D-array with shape (num_users, n_frames), each row is a binary vector
    /change_points            2D-array with shape (num_segments, 2), each row stores indices of a segment
    /n_frame_per_seg          1D-array with shape (num_segments), indicates number of frames in each segment
    /n_frames                 number of frames in original video
    /picks                    posotions of subsampled frames in original video
    /n_steps                  number of subsampled frames
    /gtsummary                1D-array with shape (n_steps), ground truth summary provided by user
    /video_name (optional)    original video name, only available for SumMe dataset

How the gtscore is computed and how is it different from gtsummary or the average of user_summary?
I didn't see you using gtscore or gtsummary in testing, just ask out of curiosity.
Thanks!

KaiyangZhou · 2018-02-06T17:59:38Z

gtscore and gtsummary are used for training only. I should have clarified this.

gtscore is the average of multiple importance scores (used by regression loss). gtsummary is a binary vector indicating indices of keyframes, and is provided by original datasets as well (this label can be used for maximum likelihood loss).

user_summary contains multiple key-clips given by human annotators and we need to compare our machine summary with each one of the user summaries.

Hope this clarifies.

zijunwei · 2018-02-17T15:58:20Z

Thanks! It's very helpful!

chandra-siri · 2018-03-30T10:12:50Z

@KaiyangZhou I'm trying to create .h5py file for my own video.
After reading the 'datasets/readme.txt' I understood that I need data like..features, n_frames, n_picks, n_steps. ( I could only understand what n_frames are :| )

But what exactly is features. I understand that it'll be a numpy matrix of shape (n_steps, feature_dimension). But what are these and how do I extract them for a given video frames ? Could you please give me more description about them

I've glanced through you paper, but I couldn't find about these .

KaiyangZhou · 2018-03-30T14:03:10Z

Hi @chandra-siri,

features contains feature vectors representing video frames. Each video frame can be represented by a feature vector (containing some semantic meanings), extracted by a pretrained convolutional neural network (e.g. GoogLeNet). picks is an array storing the position information of subsampled video frames. We do not process each video frame since adjacent frames are very similar. We can subsample a video with 2 frame per second or 1 frame per second, which will result in less frames but they are informative. picks is useful when we want to interpolate the subsampled frames into the original video (say you have obtained importance scores for subsampled frames and you want to get the scores for the entire video, picks can indicate which frames are scored and the scores of surrounding frames can be filled with these frames).

how do I extract them for a given video frames?

You can use off-the-shelf feature extractors to achieve this, e.g. pytorch. First, load the feature extractor, e.g. a pretrained neural network. Second, loop into each video frame and use the feature extractor to extract features from those frames. Each frame will be represented by a long feature vector. If you use GoogLeNet, you will end up with 1024-dimensional feature vector. Third, concatenate the extracted features to form a feature matrix, and save it to the h5 file as specified in the readme.txt.

The pseudo code below might be more clear:

features = []
for frame in video_frames:
    # frame is a numpy array of shape (channel, height, width)
    # do some preprocessing such as normalization
    frame = preprocess(frame)
    # apply the feature extractor to this frame
    feature = feature_extractor(frame)
    # save the feature
    features.append(feature)
features = concatenate(features) # now shape is (n_steps, feature_dimension)

Hope this would help.

chandra-siri · 2018-03-31T14:05:40Z

@KaiyangZhou This is very informative and helpful. I'll try out what you've mention, using googleNet (inception model v-3) and let you know. Thanks a lot !

chandra-siri · 2018-04-01T13:28:45Z

@KaiyangZhou As you told I've was able to extract frames. But in order to to get summary I also need change_points . Could you tell me what is change_points and also what is num_segments

KaiyangZhou · 2018-04-01T13:53:24Z

@chandra-siri
change_points corresponds to shot transitions, which are obtained by temporal segmentation approaches that segment a video into disjoint shots. num_segments is number of total segments a video is cut into. Please refer to this paper and this paper if you are unfamiliar with the pipeline.

Specifically, change_points look like

change_points = [
    0, 10;
    11, 20;
    21, 30;
]

This means the video is segmented into three parts. The first part ranges from frame 0 to frame 10, the second part ranges from frame 11 to frame 20, and so on and forth.

samrat1997 · 2018-04-11T07:59:01Z

How do I know which key in dataset corresponds to which video in SumMe dataset ?

KaiyangZhou · 2018-04-11T08:30:45Z

@samrat1997
SumMe: video name is stored in video_i/video_name.
TVSum: video1-50 corresponds to the same order in ydata-tvsum50.mat, which is the original matlab file provided by TVSum.

samrat1997 · 2018-04-11T08:59:43Z

@KaiyangZhou ... Thank you. I just realized that.

harora · 2018-06-01T15:37:32Z

@KaiyangZhou Hi . I've been trying to use the code to test on my dataset. I used the google inception v3 pretrained pytorch model to generate features and it has 1000 classes output. Hence my features shape is (num_frames,1000). However dataset used here has output 1024. Can you help regarding this? Will i have to modify and retrain inception model?

KaiyangZhou · 2018-06-02T08:23:51Z

@harora the feature dimension does not matter, you can just feed (num_features, any_num_dim) to the algorithm, you don't need to retrain the model

it is strange to use the class prelogits as feature vectors, it would make more sense to use the layer before softmax, e.g. 1024-dim for googlenet, 2048 for resnet

Petersteve · 2018-06-02T11:20:11Z

@KaiyangZhou hi,
did we generate change_points manually ? if not show me the code associated.

gtscore is it generated by the user manually? if not show me the code associated.

KaiyangZhou · 2018-06-02T18:50:01Z

@bersalimahmoud change_points are obtained by temporal segmentation method. gtscore is the average of human scores, so it can be used for supervised training (you won't need this anyway).

liuhaixiachina · 2018-06-18T17:48:00Z

@KaiyangZhou
regarding Visualize summary,
in readme, it says:

Visualize summary

You can use summary2video.py to transform the binary machine_summary to real summary video. You need to have a directory containing video frames. The code will automatically write summary frames to a video where the frame rate can be controlled. Use the following command to generate a .mp4 video

Where or how can I get the frames?

can i get frames from the .h5 files? or, shall I create frames from the raw videos?

Thank you very much !

KaiyangZhou · 2018-06-19T08:34:07Z

@liuhaixiachina you need to decompose a video before doing other things e.g. feature extraction. You can use ffmpeg or python to do it.

babyjie57 · 2018-06-20T01:10:09Z

@KaiyangZhou Hi, I am trying to use the code to test on my own video. I used a pretrained model to generate features and it has 4096 classes output. I see you said " the feature dimension does not matter" in the above. But, I got "RuntimeError: input.size(-1) must be equal to input_size. Expected 1024, got 4096".

Could you please tell me how to solve this issue?

Thanks a lot!

KaiyangZhou · 2018-06-20T09:27:48Z

@babyjie57 you need to change the argument input_dim=4096

babyjie57 · 2018-06-20T09:32:56Z

@KaiyangZhou Thanks for your reply. I also added '--input-dim 4096', but I got 'While copying the parameter named "rnn.weight_ih_l0_reverse", whose dimensions in the model are torch.Size([1024, 4096]) and whose dimensions in the checkpoint are torch.Size([1024, 1024]).'

Can you please tell me how to solve this issue?

Thanks!

KaiyangZhou · 2018-06-20T13:28:29Z

I presume you are loading a model which was trained with features of 1024 dimensions but initialized with feature dimension = 4096.

mctigger · 2018-08-30T13:54:59Z

Can you also publish the script for the KTS you used to generate the change points?

KaiyangZhou · 2018-08-30T15:41:32Z

@mctigger you can find the code here http://lear.inrialpes.fr/people/potapov/med_summaries.php

HrsPythonix · 2018-09-21T03:20:39Z

I got a question, if I want to use my own dataset, but there is no label in the dataset, when I construct the hdf5 file, what should I do with user_summary, gts_score and gtsummary?

Also I see these three labels only used in evaluation process, does this means that I can just delete them both in hdf5 and the evaluation function? (I mean in the pytorch implementation)

Moreover, if I want to use the result.json to generate a summarization video for a raw video, can I delete these three labels?

MuziSakura · 2018-11-06T04:13:51Z

Have you solved this problem? I want to use my own video data but I don't know how to deal with user_summary, gts_score and gtsummary.

anuragshas · 2018-12-05T10:48:43Z

How did you convert the video into signal for Kernel Temporal Segmentation (KTS) ?

gh2517956473 · 2019-02-27T03:35:50Z

@KaiyangZhou
How did you convert the video into signal for Kernel Temporal Segmentation (KTS) ?
Did you use CNN features for KTS ? Are the CNN features subsampled and extracted from a video with 2 frame per second or 1 frame per second?
Could you please share your code with me.
Thank you very much !

KaiyangZhou · 2019-02-27T23:22:27Z

How did you convert the video into signal for Kernel Temporal Segmentation (KTS) ?

You can decompose a video using either ffmpeg or opencv. For the latter, there is an example code on the opencv website. You can write sth like

import numpy as np
import cv2

cap = cv2.VideoCapture(0)
video_features = []

while(still_has_frame):
    # Capture frame-by-frame
    ret, frame = cap.read()
    # maybe skip this frame for downsampling
    # feature extraction
    feature = feature_extractor(frame) # or perform extraction on minibatch which leverages gpu
    # store feature
    video_features.append(feature)

summary = video_summarizer(video_features)

Did you use CNN features for KTS ? Are the CNN features subsampled and extracted from a video with 2 frame per second or 1 frame per second?

Yes. You can use CNN features which capture high-level semantics. Downsampling is a common technique as neighbour frames are redundant. 2fps/1fps is good.

KaiyangZhou · 2019-02-27T23:23:59Z

Annotations are not required for training. Only frame features are required by the algorithm. You can qualitatively evaluate the results by applying the trained model to unseen videos and watch the summaries.

gh2517956473 · 2019-03-06T08:50:30Z

@KaiyangZhou
Could you tell me where to download the original video（SumMe and TVSum）?
Thank you very much !

loveFaFa · 2019-03-14T08:12:11Z

Could you tell me where to download the original video（SumMe and TVSum）?
Thank you very much !

bdgp01 · 2019-03-25T22:15:33Z

Same question as above.

Could you please tell me where to download the original video（SumMe and TVSum）?
Thank you very much in advance !

rajlakshmi123 · 2019-05-30T10:31:32Z

@KaiyangZhou
Can you please tell me what is picks, how can we calculate it. What is it's dimensions?

wjb123 · 2019-05-30T16:38:09Z

@KaiyangZhou, how to use KTS to generate change points ? I use the official KTS code and employ CNN feature for each frame, but get same number of segments for every video. Is there any problem?

SinDongHwan · 2019-05-31T04:46:15Z

@KaiyangZhou To get change points, Does frames of a video input to X in "demo.py"? or Does features of each frames input?

chenchch94 · 2019-07-25T03:33:33Z

@wjb123Hi, Do you solve this problem???→ "how to use KTS to generate change points ? I use the official KTS code and employ CNN feature for each frame, but get same number of segments for every video. Is there any problem?"

SinDongHwan · 2019-07-25T03:43:38Z

@chenchch94
this is my repository of forked.
you can find generate_dataset.py in "utils" directory.
Good Luck!

neda60 · 2020-03-18T17:24:25Z

Hi,
I could generate the .h5 file for my own dataset, however, my dataset has no annotations. Is it possible to use your code without annotated videos? If so, how?
Thanks!

SinDongHwan · 2020-03-19T02:08:09Z

Hi, @neda60
For Train, Evaluation, It's not possible.
Bus just for test, It's possible.
To test, you need "features", "picks", "n_frame", "change_points", "n_frame_per_seg".

anaghazachariah · 2020-09-11T03:13:35Z

@KaiyangZhou Can you please share the code for .h5 file..How to deal with gtscore,gtsummary and usersummary?

vb637 · 2020-11-16T08:48:19Z

Hello，I want to use your RL code to extract key frames. Now I use a complex network to extract features, and store it in .h5 file. But i didn't have other attribute such as gtscore and gtsummary( Because I guess dataset at least has these three attribute). Now, I try to create gtscore by creating an all one numpy array, but I don't know whether this is right or wrong. If wrong, how can I compute gtscore. Meanwhile, I create gtsummary by random sampling some frames, should I uniformly sampling?

Fredham · 2022-06-27T10:32:24Z

@liuhaixiachina you need to decompose a video before doing other things e.g. feature extraction. You can use ffmpeg or python to do it.

I followed steps mentioned in README. It doesn't have video_frames neither. Shall I create frames from the raw videos?Is there any missing steps in README?How could I decompose video using ffmpeg or python?But there is no video in datasets.I also read the code of summary2video.py.Should I decompose "result.h5"

Fredham · 2022-06-27T10:34:59Z

Hi: I have been able to run your algorithm on my machine (both training and test datasets). Now I would like to apply it to my dataset (my videos - they are not compressed to .h5). How do I do that? What function would I need to modify? Please guide.
in readme, it says:
Visualize summary
You can use summary2video.py to transform the binary machine_summary to real summary video. You need to have a directory containing video frames. The code will automatically write summary frames to a video where the frame rate can be controlled. Use the following command to generate a .mp4 video

Where or how can I get the frames?I followed steps mentioned in README. It doesn't have video_frames neither. Shall I create frames from the raw videos?Is there any missing steps in README?How could I decompose video using ffmpeg or python?But there is no video in datasets.I also read the code of summary2video.py.Should I decompose "result.h5"?

Fredham · 2022-06-27T10:35:21Z

@KaiyangZhou How did you convert the video into signal for Kernel Temporal Segmentation (KTS) ? Did you use CNN features for KTS ? Are the CNN features subsampled and extracted from a video with 2 frame per second or 1 frame per second? Could you please share your code with me. Thank you very much !

Visualize summary
You can use summary2video.py to transform the binary machine_summary to real summary video. You need to have a directory containing video frames. The code will automatically write summary frames to a video where the frame rate can be controlled. Use the following command to generate a .mp4 video

Where or how can I get the frames?I followed steps mentioned in README. It doesn't have video_frames neither. Shall I create frames from the raw videos?Is there any missing steps in README?How could I decompose video using ffmpeg or python?But there is no video in datasets.I also read the code of summary2video.py.Should I decompose "result.h5"?

KaiyangZhou closed this as completed Mar 22, 2018

KaiyangZhou mentioned this issue Apr 21, 2018

TypeError: 'KeysView' object does not support indexing #3

Closed

HrsPythonix mentioned this issue Sep 25, 2018

How to construct a dataset without label #14

Open

akshaynotes mentioned this issue Sep 12, 2020

Hello,I can not find the dataset KaiyangZhou/pytorch-vsumm-reinforce#40

Open

mpalaourg mentioned this issue Mar 31, 2021

Reproduce features on TVSum and SumMe KaiyangZhou/pytorch-vsumm-reinforce#66

Open

yashkolli mentioned this issue Apr 25, 2021

Datasets yashkolli/Video-Summarization-Using-Attention#2

Closed

iamyugachang mentioned this issue Jun 9, 2021

How to generate summaries on new videos? li-plus/DSNet#5

Closed

hpppppp8 mentioned this issue Mar 15, 2022

IndexError: index 202 is out of bounds for axis 0 with size 0 weirme/FCSN#35

Open

How do I apply to my video files? #1

How do I apply to my video files? #1

Comments

rajeevchhabra commented Jan 29, 2018

KaiyangZhou commented Jan 29, 2018 • edited Loading

zijunwei commented Feb 6, 2018

KaiyangZhou commented Feb 6, 2018

zijunwei commented Feb 6, 2018

KaiyangZhou commented Feb 6, 2018

zijunwei commented Feb 17, 2018

chandra-siri commented Mar 30, 2018 • edited Loading

KaiyangZhou commented Mar 30, 2018

chandra-siri commented Mar 31, 2018

chandra-siri commented Apr 1, 2018

KaiyangZhou commented Apr 1, 2018

samrat1997 commented Apr 11, 2018

KaiyangZhou commented Apr 11, 2018

samrat1997 commented Apr 11, 2018

harora commented Jun 1, 2018

KaiyangZhou commented Jun 2, 2018

Petersteve commented Jun 2, 2018

KaiyangZhou commented Jun 2, 2018

liuhaixiachina commented Jun 18, 2018 • edited Loading

Visualize summary

KaiyangZhou commented Jun 19, 2018

babyjie57 commented Jun 20, 2018

KaiyangZhou commented Jun 20, 2018

babyjie57 commented Jun 20, 2018 • edited Loading

KaiyangZhou commented Jun 20, 2018

mctigger commented Aug 30, 2018

KaiyangZhou commented Aug 30, 2018

HrsPythonix commented Sep 21, 2018 • edited Loading

MuziSakura commented Nov 6, 2018

anuragshas commented Dec 5, 2018

gh2517956473 commented Feb 27, 2019

KaiyangZhou commented Feb 27, 2019

KaiyangZhou commented Feb 27, 2019

gh2517956473 commented Mar 6, 2019

loveFaFa commented Mar 14, 2019

bdgp01 commented Mar 25, 2019

rajlakshmi123 commented May 30, 2019

wjb123 commented May 30, 2019

SinDongHwan commented May 31, 2019

chenchch94 commented Jul 25, 2019

SinDongHwan commented Jul 25, 2019

neda60 commented Mar 18, 2020

SinDongHwan commented Mar 19, 2020

anaghazachariah commented Sep 11, 2020

vb637 commented Nov 16, 2020

Fredham commented Jun 27, 2022

Fredham commented Jun 27, 2022

Fredham commented Jun 27, 2022

KaiyangZhou commented Jan 29, 2018 •

edited

Loading

chandra-siri commented Mar 30, 2018 •

edited

Loading

liuhaixiachina commented Jun 18, 2018 •

edited

Loading

babyjie57 commented Jun 20, 2018 •

edited

Loading

HrsPythonix commented Sep 21, 2018 •

edited

Loading