-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How do I apply to my video files? #1
Comments
Hi @rajeevchhabra, |
Hi, is that possible to share the code you create the h5 dataset so that I can follow to create my own? |
Hi @zijunwei,
For a detailed description of the data format, please refer to the Instructions for h5py can be found at http://docs.h5py.org/en/latest/quick.html Let me know if you have any problems. |
Thanks!
How the gtscore is computed and how is it different from gtsummary or the average of user_summary? |
Hope this clarifies. |
Thanks! It's very helpful! |
@KaiyangZhou I'm trying to create .h5py file for my own video. But what exactly is features. I understand that it'll be a numpy matrix of shape (n_steps, feature_dimension). But what are these and how do I extract them for a given video frames ? Could you please give me more description about them I've glanced through you paper, but I couldn't find about these . |
Hi @chandra-siri,
You can use off-the-shelf feature extractors to achieve this, e.g. pytorch. First, load the feature extractor, e.g. a pretrained neural network. Second, loop into each video frame and use the feature extractor to extract features from those frames. Each frame will be represented by a long feature vector. If you use GoogLeNet, you will end up with 1024-dimensional feature vector. Third, concatenate the extracted features to form a feature matrix, and save it to the h5 file as specified in the readme.txt. The pseudo code below might be more clear: features = []
for frame in video_frames:
# frame is a numpy array of shape (channel, height, width)
# do some preprocessing such as normalization
frame = preprocess(frame)
# apply the feature extractor to this frame
feature = feature_extractor(frame)
# save the feature
features.append(feature)
features = concatenate(features) # now shape is (n_steps, feature_dimension) Hope this would help. |
@KaiyangZhou This is very informative and helpful. I'll try out what you've mention, using googleNet (inception model v-3) and let you know. Thanks a lot ! |
@KaiyangZhou As you told I've was able to extract frames. But in order to to get summary I also need |
@chandra-siri Specifically,
This means the video is segmented into three parts. The first part ranges from frame 0 to frame 10, the second part ranges from frame 11 to frame 20, and so on and forth. |
How do I know which key in dataset corresponds to which video in SumMe dataset ? |
@samrat1997 |
@KaiyangZhou ... Thank you. I just realized that. |
@KaiyangZhou Hi . I've been trying to use the code to test on my dataset. I used the google inception v3 pretrained pytorch model to generate features and it has 1000 classes output. Hence my features shape is (num_frames,1000). However dataset used here has output 1024. Can you help regarding this? Will i have to modify and retrain inception model? |
@harora the feature dimension does not matter, you can just feed (num_features, any_num_dim) to the algorithm, you don't need to retrain the model it is strange to use the class prelogits as feature vectors, it would make more sense to use the layer before softmax, e.g. 1024-dim for googlenet, 2048 for resnet |
@KaiyangZhou hi, gtscore is it generated by the user manually? if not show me the code associated. |
@bersalimahmoud |
@KaiyangZhou Visualize summaryYou can use Where or how can I get the frames? can i get frames from the .h5 files? or, shall I create frames from the raw videos? Thank you very much ! |
@liuhaixiachina you need to decompose a video before doing other things e.g. feature extraction. You can use ffmpeg or python to do it. |
@KaiyangZhou Hi, I am trying to use the code to test on my own video. I used a pretrained model to generate features and it has 4096 classes output. I see you said " the feature dimension does not matter" in the above. But, I got "RuntimeError: input.size(-1) must be equal to input_size. Expected 1024, got 4096". Could you please tell me how to solve this issue? Thanks a lot! |
@babyjie57 you need to change the argument |
@KaiyangZhou Thanks for your reply. I also added '--input-dim 4096', but I got 'While copying the parameter named "rnn.weight_ih_l0_reverse", whose dimensions in the model are torch.Size([1024, 4096]) and whose dimensions in the checkpoint are torch.Size([1024, 1024]).' Can you please tell me how to solve this issue? Thanks! |
I presume you are loading a model which was trained with features of 1024 dimensions but initialized with feature dimension = 4096. |
Can you also publish the script for the KTS you used to generate the change points? |
@mctigger you can find the code here http://lear.inrialpes.fr/people/potapov/med_summaries.php |
I got a question, if I want to use my own dataset, but there is no label in the dataset, when I construct the hdf5 file, what should I do with user_summary, gts_score and gtsummary? Also I see these three labels only used in evaluation process, does this means that I can just delete them both in hdf5 and the evaluation function? (I mean in the pytorch implementation) Moreover, if I want to use the result.json to generate a summarization video for a raw video, can I delete these three labels? |
Have you solved this problem? I want to use my own video data but I don't know how to deal with user_summary, gts_score and gtsummary. |
How did you convert the video into signal for Kernel Temporal Segmentation (KTS) ? |
@KaiyangZhou |
You can decompose a video using either ffmpeg or opencv. For the latter, there is an example code on the opencv website. You can write sth like import numpy as np
import cv2
cap = cv2.VideoCapture(0)
video_features = []
while(still_has_frame):
# Capture frame-by-frame
ret, frame = cap.read()
# maybe skip this frame for downsampling
# feature extraction
feature = feature_extractor(frame) # or perform extraction on minibatch which leverages gpu
# store feature
video_features.append(feature)
summary = video_summarizer(video_features)
Yes. You can use CNN features which capture high-level semantics. Downsampling is a common technique as neighbour frames are redundant. 2fps/1fps is good. |
Annotations are not required for training. Only frame features are required by the algorithm. You can qualitatively evaluate the results by applying the trained model to unseen videos and watch the summaries. |
@KaiyangZhou |
Could you tell me where to download the original video(SumMe and TVSum)? |
Same question as above. Could you please tell me where to download the original video(SumMe and TVSum)? |
@KaiyangZhou |
@KaiyangZhou, how to use KTS to generate change points ? I use the official KTS code and employ CNN feature for each frame, but get same number of segments for every video. Is there any problem? |
@KaiyangZhou To get change points, Does frames of a video input to X in "demo.py"? or Does features of each frames input? |
@wjb123Hi, Do you solve this problem???→ "how to use KTS to generate change points ? I use the official KTS code and employ CNN feature for each frame, but get same number of segments for every video. Is there any problem?" |
@chenchch94 |
Hi, |
Hi, @neda60 |
@KaiyangZhou Can you please share the code for .h5 file..How to deal with gtscore,gtsummary and usersummary? |
Hello,I want to use your RL code to extract key frames. Now I use a complex network to extract features, and store it in .h5 file. But i didn't have other attribute such as gtscore and gtsummary( Because I guess dataset at least has these three attribute). Now, I try to create gtscore by creating an all one numpy array, but I don't know whether this is right or wrong. If wrong, how can I compute gtscore. Meanwhile, I create gtsummary by random sampling some frames, should I uniformly sampling? |
I followed steps mentioned in README. It doesn't have video_frames neither. Shall I create frames from the raw videos?Is there any missing steps in README?How could I decompose video using ffmpeg or python?But there is no video in datasets.I also read the code of summary2video.py.Should I decompose "result.h5" |
Where or how can I get the frames?I followed steps mentioned in README. It doesn't have video_frames neither. Shall I create frames from the raw videos?Is there any missing steps in README?How could I decompose video using ffmpeg or python?But there is no video in datasets.I also read the code of summary2video.py.Should I decompose "result.h5"? |
Visualize summary Where or how can I get the frames?I followed steps mentioned in README. It doesn't have video_frames neither. Shall I create frames from the raw videos?Is there any missing steps in README?How could I decompose video using ffmpeg or python?But there is no video in datasets.I also read the code of summary2video.py.Should I decompose "result.h5"? |
Hi:
I have been able to run your algorithm on my machine (both training and test datasets). Now I would like to apply it to my dataset (my videos - they are not compressed to .h5). How do I do that? What function would I need to modify? Please guide.
The text was updated successfully, but these errors were encountered: