support data_parallel training and ucf101 dataset #4819

chajchaj · 2020-08-27T06:22:16Z

support data_parallel training and ucf101 dataset

huangjun12 · 2020-08-28T10:38:23Z

dygraph/tsm/train.py

+        '--model_path_pre',
+        type=str,
+        default='tsm',
+        help='default model path pre is tsm.')


what the meaning of model path pre?

huangjun12 · 2020-08-28T10:40:35Z

dygraph/tsm/train.py

+    #load resnet50 pretrain
+    pre_state_dict = fluid.load_program_state(args.resnet50_dir)
+    for key in pre_state_dict.keys():
+        print('pre_state_dict.key: {}'.format(key))


print是调试代码？所有参数名打印出来太长了建议注释或删除

huangjun12 · 2020-08-28T10:53:47Z

dygraph/tsm/train.py

+                       current_step_lr))
+
+            # 6.2 save checkpoint 
+            save_parameters = (not use_data_parallel) or (


不用走or逻辑？单卡时也可以local_rank==0保存，参考
https://github.com/huangjun12/models/blob/9e2809a85c64115df92564d31055066300661141/dygraph/slowfast/train.py#L441

huangjun12 · 2020-08-28T10:54:42Z

dygraph/tsm/train.py

            for batch_id, data in enumerate(train_reader()):
+                t1 = time.time()


t1-t5重新命名一下？比如batch_start_time

huangjun12 · 2020-08-28T10:56:39Z

dygraph/tsm/train.py

            video_model = fluid.dygraph.parallel.DataParallel(video_model,
                                                              strategy)

+        # 4. load checkpoint
+        if args.checkpoint:


resume阶段， epoch计数是否对应调整下？

shippingwang · 2020-08-31T03:18:49Z

dygraph/tsm/train.py

                outputs = video_model(imgs)
+                t3 = time.time()
+
                loss = fluid.layers.cross_entropy(
                    input=outputs, label=labels, ignore_index=-1)
                avg_loss = fluid.layers.mean(loss)


copy avg_loss to a new variable , and output(print) it instead of avg_loss, in avoid to print avg_loss after scale_loss function, which is already divided by the number of cards

shippingwang · 2020-08-31T03:20:29Z

Add a result of multi-cards training?

chajchaj added 3 commits August 27, 2020 03:42

support data_parallel training and ucf101 dataset

0939bf8

delete commented code in dygraph/tsm/model.py

2518c13

set ucf101 root path config in dygraph/tsm/ucf101_reader.py

f6e637d

shippingwang self-requested a review August 28, 2020 07:33

huangjun12 reviewed Aug 28, 2020

View reviewed changes

shippingwang requested changes Aug 31, 2020

View reviewed changes

chajchaj added 3 commits September 1, 2020 04:11

add k400 pretrain and add useage in readme.

bf18cb7

add train from scratch in readme

0740cda

add reader_utils.py and add train from scratch in readme

bdd14b6

chajchaj requested a review from shippingwang September 1, 2020 07:39

chajchaj added 3 commits September 1, 2020 08:15

add script and yaml for benchmark

3434f64

fix format in readme

2982f07

update contents in readme

3d335cc

shippingwang approved these changes Sep 1, 2020

View reviewed changes

shippingwang merged commit e320130 into PaddlePaddle:develop Sep 1, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support data_parallel training and ucf101 dataset #4819

support data_parallel training and ucf101 dataset #4819

chajchaj commented Aug 27, 2020

huangjun12 Aug 28, 2020

huangjun12 Aug 28, 2020

huangjun12 Aug 28, 2020

huangjun12 Aug 28, 2020

huangjun12 Aug 28, 2020

shippingwang Aug 31, 2020

shippingwang commented Aug 31, 2020

		for batch_id, data in enumerate(train_reader()):
		t1 = time.time()

support data_parallel training and ucf101 dataset #4819

support data_parallel training and ucf101 dataset #4819

Conversation

chajchaj commented Aug 27, 2020

huangjun12 Aug 28, 2020

Choose a reason for hiding this comment

huangjun12 Aug 28, 2020

Choose a reason for hiding this comment

huangjun12 Aug 28, 2020

Choose a reason for hiding this comment

huangjun12 Aug 28, 2020

Choose a reason for hiding this comment

huangjun12 Aug 28, 2020

Choose a reason for hiding this comment

shippingwang Aug 31, 2020

Choose a reason for hiding this comment

shippingwang commented Aug 31, 2020