Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

frame-level labeling and feature extraction output #12

Closed
HYPJUDY opened this issue Sep 10, 2017 · 41 comments
Closed

frame-level labeling and feature extraction output #12

HYPJUDY opened this issue Sep 10, 2017 · 41 comments

Comments

@HYPJUDY
Copy link

HYPJUDY commented Sep 10, 2017

Hi, when I run ./xfeat.sh I meet the following problems relating to feat. As you have said, output results will be stored in feat but

  1. I ran demo successfully but there's nothing new produced in demo/feat.
  2. I ran feature extraction in THUMOS/test successfully but there's no output relating to feat.

Thanks!

@HYPJUDY
Copy link
Author

HYPJUDY commented Sep 10, 2017

I find out that the feat directory and its sub directories (e.g. feat/0000004/ in demo) should be generated firstly before running ./xfeat.sh.

@HYPJUDY HYPJUDY closed this as completed Sep 10, 2017
@buaa-luzhi
Copy link

@HYPJUDY
Have you solved the problem of labeling the training set?

@HYPJUDY
Copy link
Author

HYPJUDY commented Sep 12, 2017

Hi @buaa-luzhi , I labeled the training set according to #2 , but I haven't reproduced the results reported in the paper so I am not sure whether I missed something.

@buaa-luzhi
Copy link

Hi,@HYPJUDY
Thank you for your reply!
How many classes are there in the training set?
Inculding 1-20 action, and also inculdes background and ambiguous? and how do identify these two classes labels?
Which actions instance are the background or ambiguous?
Thank you very much, looking forward to your reply!

@HYPJUDY
Copy link
Author

HYPJUDY commented Sep 13, 2017

Hi @buaa-luzhi , you can find annotations in THUMOS14 website. For example, TH14_Temporal_annotations_test contains the temporal annotations of action instances in the test vides of 20 classes.
In this case, there are 22 possible frame-level classes(from the first to the last: background, action1-20, ambiguous) labeled from 0 to 21. You need to assign frame-level label during training in gen_test_bin_and_list.py. During prediction, ambiguous frames will be removed and background class will not be considered.

@HYPJUDY HYPJUDY changed the title feature extraction output results are not sotred in feat feature extraction output results are not stored in feat Sep 13, 2017
@buaa-luzhi
Copy link

Hi, @HYPJUDY
Thank you very much!
I still do not quite understand!
In TH14_Temporal_annotations_test, including Ambiguous_test.txt and BaseballPitch_test.txt, BasketballDunk_test.txt...,
Will these videos(in Ambiguous) be used to make a training set?
if so, background video will also be used to training network, am I right?
Last question, how to assign frame-level label during traing in gen_test_bin_and_list.py?

In Ambiguous_val.txt, all videos frame label assign 0?
In BaseballPitch_val.txt, ..................................................... 1?
.....................................................................................................
In VolleyballSpiking_val.txt, ................................................20?
There is no annotation in THUMOS14 website for background videos. So, how to assign
frame-level label?

Thanks for your reply, You are very warm-hearted!

@HYPJUDY
Copy link
Author

HYPJUDY commented Sep 14, 2017

Hi @buaa-luzhi , let me illustrate class labeling by giving the code of one-shot labeling for THUMOS14 test data (all 213 test videos which are not entirely background videos):

### gen_oneshot_label_for_test_action_video.py ###
import os
import numpy as np
import math

# background (#0) + thumos 20 classes (#1 to #20) + ambiguous class (#21)
classname = ['BaseballPitch', 'BasketballDunk', 'Billiards', 'CleanAndJerk', 'CliffDiving', 'CricketBowling',
'CricketShot', 'Diving', 'FrisbeeCatch', 'GolfSwing', 'HammerThrow', 'HighJump', 'JavelinThrow', 'LongJump',
'PoleVault', 'Shotput', 'SoccerPenalty', 'TennisSwing', 'ThrowDiscus', 'VolleyballSpiking', 'Ambiguous']
classlabel = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21]

filepath = '~/TH14_Temporal_Annotations_Test/annotations/annotation/'
inputdir = '~/THUMOS14_frame/test/' # a list of folders; each folder contains all frames for one video
sr = 25 # sampling rate per frame

action_video_dict = {}

totalframe = 0
for v in sorted(os.listdir(inputdir)):
    imglist = sorted(os.listdir(os.path.join(inputdir,v)))
    print v, len(imglist)
    totalframe = totalframe + len(imglist)
    vlabel = np.zeros(len(imglist)) # init
    action_video_dict[v] = vlabel
print 'total frame number:', totalframe

'''
Since the CDC authors used the following method for labeling:
The ground truth data used during testing for evaluation is multi-label of 21 classes.
But during training, we simply use one-hot label and only treat frames (that belongs to diving but not belongs to cliffdiving) as diving frames.
During prediction, all frames predicted as cliffdiving will also be set as diving to form multi-label prediction.

In one-hot labeling, we need to assign Diving class first and then CliffDiving.
By this order, CliffDiving label will overwirte part of Diving label.
Otherwise, all CliffDiving label will be overwritted by Diving label.
'''
# for i in range(21):
for i in range(20, -1, -1):
    f = open(filepath+classname[i]+'_test.txt', 'r')
    lines = f.readlines()
    for line in lines:
        line = line.strip().split(' ')
        s = int(float(line[2]) * sr)
        e = int(math.ceil(float(line[3]) * sr)) # e.g. 27.3 -> 28
        v_len = len(action_video_dict[line[0]])
        if e > v_len: # some THUOMOS14 annotations are wrong!
            print line[0], v_len, s, e, classname[i], line[2], line[3]
            continue # ignore or manually remove the wrong annotations
        for j in range(s, e + 1): # frame starts from #1
            if j == 0:
                j = 1
            action_video_dict[line[0]][j-1] = classlabel[i] # but index starts from #0

with open('oneshot_label_test_action_video.lst', 'w') as fw:
    for v in sorted(action_video_dict.iterkeys()):
        fw.write(v)
        for i in range(len(action_video_dict[v])):
                fw.write(" %d" % action_video_dict[v][i])
        fw.write('\n')

This code will generate oneshot_label_test_action_video.lst:
Each line has the format of video_name label_of_first_frame ...... label_of_last_frame.

video_test_0000004 0 0 0 0 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 7 7 7 7 7 7 7 7 7 7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 0 0 0 0 0 0 0 0 0 0 0 0 0 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
video_test_0000006 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
......
video_test_0001558 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

For example, first line means video_test_0000004 has 845 frames (because the video is ~33 seconds and FPS=25). Label of frame#5 to frame#28 is 6 (CricketBowling). This can be find from CricketBowling_test.txt 's first line video_test_0000004 0.2 1.1. Ambiguous frames are labeled by number 21. If one frame is not in action 1-20 or is not ambiguous, then it is background (labeled by number 0).
I think the labeling of training set is similar. Hope this help you.

@HYPJUDY HYPJUDY changed the title feature extraction output results are not stored in feat frame-level labeling and feature extraction output Sep 14, 2017
@sunnyxiaohu
Copy link

@HYPJUDY Hi, Have you reproduced the performance in the paper using the model convdeconv-TH14_iter_24390 ?

@HYPJUDY
Copy link
Author

HYPJUDY commented Oct 20, 2017

Hi @sunnyxiaohu, yes, I have.

@sunnyxiaohu
Copy link

@HYPJUDY Hi, How many segments (*.bin) have you generated for test.lst ? Officially, there would be 36182, however, I just get 36177.

@HYPJUDY
Copy link
Author

HYPJUDY commented Nov 1, 2017

Hi @sunnyxiaohu , I got exactly the same number (36177) as you. It's ok.

@shuangshuangguo
Copy link

@HYPJUDY I also get 36177 segments. I found my extracted frames always less than author's, actually just a frame. For example, for test_video_0000004, author has 846 frames while I only have 845 frames.

By the way, can you reproduce thumos14 results as paper?? My result is 2% lower. I don't know why, could you give me suggestion?

@HYPJUDY
Copy link
Author

HYPJUDY commented Nov 8, 2017

Hi @gss-ucas , that's normal. I got lower temporal action localization mAP but higher per-frame labeling mAP. Following are my results FYI:

Method Temporal action localization mAP on THUMOS' 14 (IoU threshold from 0.3 to 0.7) Per-frame labeling mAP on THUMOS' 14
CDC reported in paper 40.10, 29.40, 23.30, 13.10, 7.90 44.40
CDC author provided model 39.39, 29.18, 23.12, 13.00, 7.78 45.26
CDC model trained by myself 37.27, 28.04, 22.01, 12.12, 6.58 46.56

@sunnyxiaohu
Copy link

Hi, @HYPJUDY It's seem that it's easy to generate "nan" when extracting probability and training. Have you also meet this problem? If yes, How did you solve it ?

@HYPJUDY
Copy link
Author

HYPJUDY commented Nov 8, 2017

Hi @sunnyxiaohu , that's true. I met "nan" when training with action window (at least one frame belonging to actions) and too much ambiguous label (21). I don't know why (maybe zero division not handled in the code?) but testing at iteration before "nan" can get results (of course lower mAP).

@qioooo
Copy link

qioooo commented Dec 14, 2017

Hi,@HYPJUDY i am confused about how to make the training set, refer to the ‘gen_test_bin_and_list.py’,i kept the bin file that at least conclude 1 action frame,then i got 11906 bins. the test result i get
Per-frame labeling mAP: 0.3738
0.3: 0.355168
0.4: 0.260670
0.5: 0.199775
0.6: 0.106456
0.7: 0.067963

i found the frame number of each action is imbalance, can you share your code about how to make trainning set?
Thank you!

@HYPJUDY
Copy link
Author

HYPJUDY commented Dec 15, 2017

Hi @qioooo , you can try to generate bins without the limitation of "at least 1 action frame". I got 32746 bins for THUMOS14 validation set. If I only kept the action windows (at least 1 action frame), I got 11894 bins and the performance is worse. The code is similar to the code I commented on Sep 14 in this issue. I didn't count the frame number of each action.
Thanks.

@minhtriet
Copy link

Hi, are there any backup for the claim that the author use one-hot encoding for training data?

@XiongChengxin
Copy link

Hi,@HYPJUDY ,I try to reproduce thumos14 results, but when I run the step1_gen_test_metadata.m,I found the name of my training video is invalid,like v_ApplyEyeMakeup_g01_c01,and 'c01' can't be converted to num, I realized maybe I download the wrong dataset, could you tell me where did you download the dataset? Thanks in advance!

@HYPJUDY
Copy link
Author

HYPJUDY commented Mar 23, 2018

Hi @XiongChengxin , I just downloaded the dataset from the official website: http://crcv.ucf.edu/THUMOS14/download.html

@XiongChengxin
Copy link

@HYPJUDY you mean Test Data (1574 untrimmed videos) -- ground truth made available after the competition?

@HYPJUDY
Copy link
Author

HYPJUDY commented Mar 23, 2018

@XiongChengxin I downloaded all data about THUMOS14 from this website. I think the video 'v_ApplyEyeMakeup_g01_c01' is from THUMOS14 training dataset (UCF101). You can find it from here.

@XiongChengxin
Copy link

@HYPJUDY Okay,Thank you very much!!

@XiongChengxin
Copy link

@HYPJUDY ,I am sorry to bother you again……I found unzipping the test data needs password……Do you know the password?

@HYPJUDY
Copy link
Author

HYPJUDY commented Mar 27, 2018

Please follow the instruction in the website:

Password: Please email us your affiliation information in order to receive the password required for unzipping some of the shared data.

@XiongChengxin
Copy link

@HYPJUDY Thanks!

@XiongChengxin
Copy link

Hi,@HYPJUDY , when I tried to train my own models, I found that I need to provide 2 files for train.prototxt, one is a mean file (I use the one that the author provided), the other is a list file. So I generated a list file use gen_test_bin_and_list.py, and the content is

/home/xiongcx/C/dataset/video_bin/window/background/video_background_0000001/000001.bin
/home/xiongcx/C/dataset/video_bin/window/background/video_background_0000001/000033.bin
/home/xiongcx/C/dataset/video_bin/window/background/video_background_0000001/000065.bin
/home/xiongcx/C/dataset/video_bin/window/background/video_background_0000001/000129.bin
/home/xiongcx/C/dataset/video_bin/window/background/video_background_0000001/000161.bin
……

You may found that I didn't provide labels, but the strange thing is the model still can compute the loss…… The content of log.train-val are as follows:

I0405 11:39:05.419764 26132 solver.cpp:110] Iteration 0, Testing net
I0405 11:41:31.203992 26132 solver.cpp:143] Test loss: 3.77485
I0405 11:41:31.204056 26132 solver.cpp:150] Test score average: 0.0105368
I0405 11:41:41.623085 26132 solver.cpp:244] Iteration 20, lr = 1e-05
I0405 11:41:41.623474 26132 solver.cpp:91] Iteration 20, loss = 0.463634
I0405 11:41:52.365087 26132 solver.cpp:244] Iteration 40, lr = 1e-05
I0405 11:41:52.365442 26132 solver.cpp:91] Iteration 40, loss = 0.0745746
I0405 11:42:03.073410 26132 solver.cpp:244] Iteration 60, lr = 1e-05
I0405 11:42:03.073766 26132 solver.cpp:91] Iteration 60, loss = 0.113348
I0405 11:42:13.781235 26132 solver.cpp:244] Iteration 80, lr = 1e-05
I0405 11:42:13.781592 26132 solver.cpp:91] Iteration 80, loss = 0.0797165
I0405 11:42:24.492625 26132 solver.cpp:244] Iteration 100, lr = 1e-05
I0405 11:42:24.492977 26132 solver.cpp:91] Iteration 100, loss = 0.0806774
……

I guess the label may have default value. I wonder what is the right format of the lst file, in other words, should I use the oneshot_label_test_action_video.lst generated by the code you provided to replace mine? If so, how could the model read the videos without the bin list file?

@XiongChengxin
Copy link

@HYPJUDY , I have a new found…… The labels are stored in the bin file, so the oneshot label generated by the code you provided should be used to replace the 'vlabel' in gen_test_bin_and_list.py, I am going to try this!

@shanshuo
Copy link

@XiongChengxin Hi, I want to know have you trained the CDC model on THUMOS14? I tried to retrain the model, but the training loss curve seemed wired. And the test results of my trained model are not right.
train_loss
I set batch size from 8 to 4 due to the limitation of memory and double the iteration size. The training log file and scripts, used to generate labels and bin files, are attached. I really hope to know where the bug is? I would appreciate it if you could help me.
THUMOS14_Shuo.tar.gz

@fanw52
Copy link

fanw52 commented Apr 25, 2018

hello ,
I have generated the bin files ,and then I predict the result by convdeconv-TH14_iter_24390 caffe model ,and next I use *.m files to eval the result ,but what confuse me is that the AP value is very low ,even equals to zero. can you give me some suggestions?? @gss-ucas @HYPJUDY

@shanshuo
Copy link

shanshuo commented Apr 25, 2018

Hi @kstys , maybe you can check gen_test_bin_and_list.py. The function os.listdir in Python returns the list containing the directories in arbitrary order. This will break the sequence of frames during test and get wrong ground truth during evaluation. Use sorted(os.listdir()) instead. This works for me. I don't know if this is a bug. Cause it depends on the system. But it would be nice if the author can update the code to solve it.@zhengshou

@zhengshou
Copy link
Collaborator

Thank you @shanshuo for pointing this out. I did not encounter such issue of arbitrary order before. But it is definitely nice to make the code suitable for different systems. Will update the code according to your suggestion. Thanks.

@fanw52
Copy link

fanw52 commented Apr 26, 2018

@shanshuo Thanks a lot. I have changed os.listdir to sorted(os.listdir),but the result is the same as the before.Any else suggestion? could you send me a link that contain the code you run sucessively.

@fanw52
Copy link

fanw52 commented Apr 26, 2018

I have solve the problem, and I found the prefix.lst file has 31682 items,it's not matching to the datasets that I have generated .thanks a lot. @shanshuo

@uxtl
Copy link

uxtl commented Apr 27, 2018

@shanshuo Hi, I wonder if you have solved the training problem. I haven't train the model in THUMOS14, but when I train it in my dataset, the loss seems not converge too. Is the batch_size matters? Now I am trying some other lr and see if it works. Can you give me some suggestions? Thank you!

@shanshuo
Copy link

@uxtl No, the problem still exists. I also used the provided model to finetune my dataset and the loss didn't converge. I don't know if it is because the code or the parameters or the algorithm isn't suitable. That's why I want to train CDC model first. If getting the correct trained model, you can exempt the code issue. Then maybe you can change parameters. I suggest you to train CDC model first. It's worth to do. It's also a good idea to compare your dataset's loss curve with THUMOS14's. We can discuss it further through email. s.chen3@uva.nl

@XiongChengxin
Copy link

@shanshuo I am sorry for the late reply…… I haven't train my own model, I just finetune the provided model on my dataset, and use the model to do predictions.

@Stingsl
Copy link

Stingsl commented May 29, 2018

@shanshuo I have the same problem as you. I also set batch size from 8 to 4 due to the limitation of memory.Do you solve the problem now?

@shanshuo
Copy link

@Stingsl No, I haven't.

@Stingsl
Copy link

Stingsl commented May 30, 2018

@shanshuo By the way, Do you change the label into one-hot format? I think it might be some experiment details that i did not notice.

@Anymake
Copy link

Anymake commented Aug 6, 2018

@HYPJUDY hi, I found my extracted frames always less than author's, and the total num of test frame is not equal to the num of the author, so I can't run matlab compute_framelevel_mAP.m to get the per-frame labeling mAP as the dim of multi-label-test.mat is not same to mine, how did you make the right multi-label-test.mat file? by the way, my mAP@0.5 of Temporal localization is only 0.168890 ,thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests