-
-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Description
Is there an existing issue for this?
- I have searched the existing issues
Bug description
My training dataset consists of 338 images with 1-10 mice, 6 bodyparts were annotated.
(nose, left_ear, right_ear, mid_back, mouse_center and tail1, just the name I set, they are not truly the case as in superanimal model according to the result showed in confusion_matrix.png)
And I wanted to use a superanimal pretrained model (sa-tvm, 27 kpts) to fine-tune on my data(6 kpts) and I got error as follows:
Traceback (most recent call last): File "/ssd01/user_acc_data/oppa/deeplabcut/code/satvm_training.py", line 29, in <module> main() File "/ssd01/user_acc_data/oppa/deeplabcut/code/satvm_training.py", line 26, in main train_network(config_path, shuffle, device, net_type) File "/ssd01/user_acc_data/oppa/deeplabcut/code/satvm_training.py", line 6, in train_network deeplabcut.train_network( File "/share/user_data/oppa/.conda/envs/dlc-oppa/lib/python3.10/site-packages/deeplabcut/compat.py", line 245, in train_network return train_network( File "/share/user_data/oppa/.conda/envs/dlc-oppa/lib/python3.10/site-packages/deeplabcut/pose_estimation_pytorch/apis/train.py", line 326, in train_network train( File "/share/user_data/oppa/.conda/envs/dlc-oppa/lib/python3.10/site-packages/deeplabcut/pose_estimation_pytorch/apis/train.py", line 189, in train runner.fit( File "/share/user_data/oppa/.conda/envs/dlc-oppa/lib/python3.10/site-packages/deeplabcut/pose_estimation_pytorch/runners/train.py", line 170, in fit train_loss = self._epoch( File "/share/user_data/oppa/.conda/envs/dlc-oppa/lib/python3.10/site-packages/deeplabcut/pose_estimation_pytorch/runners/train.py", line 220, in _epoch for i, batch in enumerate(loader): File "/share/user_data/oppa/.conda/envs/dlc-oppa/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 631, in __next__ data = self._next_data() File "/share/user_data/oppa/.conda/envs/dlc-oppa/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1346, in _next_data return self._process_data(data) File "/share/user_data/oppa/.conda/envs/dlc-oppa/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1372, in _process_data data.reraise() File "/share/user_data/oppa/.conda/envs/dlc-oppa/lib/python3.10/site-packages/torch/_utils.py", line 705, in reraise raise exception RuntimeError: Caught RuntimeError in DataLoader worker process 0. Original Traceback (most recent call last): File "/share/user_data/oppa/.conda/envs/dlc-oppa/lib/python3.10/site-packages/torch/utils/data/_utils/worker.py", line 308, in _worker_loop data = fetcher.fetch(index) # type: ignore[possibly-undefined] File "/share/user_data/oppa/.conda/envs/dlc-oppa/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 54, in fetch return self.collate_fn(data) File "/share/user_data/oppa/.conda/envs/dlc-oppa/lib/python3.10/site-packages/torch/utils/data/_utils/collate.py", line 316, in default_collate return collate(batch, collate_fn_map=default_collate_fn_map) File "/share/user_data/oppa/.conda/envs/dlc-oppa/lib/python3.10/site-packages/torch/utils/data/_utils/collate.py", line 154, in collate clone.update({key: collate([d[key] for d in batch], collate_fn_map=collate_fn_map) for key in elem}) File "/share/user_data/oppa/.conda/envs/dlc-oppa/lib/python3.10/site-packages/torch/utils/data/_utils/collate.py", line 154, in <dictcomp> clone.update({key: collate([d[key] for d in batch], collate_fn_map=collate_fn_map) for key in elem}) File "/share/user_data/oppa/.conda/envs/dlc-oppa/lib/python3.10/site-packages/torch/utils/data/_utils/collate.py", line 141, in collate return collate_fn_map[elem_type](batch, collate_fn_map=collate_fn_map) File "/share/user_data/oppa/.conda/envs/dlc-oppa/lib/python3.10/site-packages/torch/utils/data/_utils/collate.py", line 222, in collate_numpy_array_fn return collate([torch.as_tensor(b) for b in batch], collate_fn_map=collate_fn_map) File "/share/user_data/oppa/.conda/envs/dlc-oppa/lib/python3.10/site-packages/torch/utils/data/_utils/collate.py", line 141, in collate return collate_fn_map[elem_type](batch, collate_fn_map=collate_fn_map) File "/share/user_data/oppa/.conda/envs/dlc-oppa/lib/python3.10/site-packages/torch/utils/data/_utils/collate.py", line 213, in collate_tensor_fn return torch.stack(batch, 0, out=out) RuntimeError: stack expects each tensor to be equal size, but got [3, 1688, 1695] at entry 0 and [3, 1688, 1726] at entry 11
Operating System
operating system
Linux
DeepLabCut version
dlc version
3.0.0.rc2
DeepLabCut mode
multi animal
Device type
gpu
Steps To Reproduce
create shuffle step:
from pathlib import Path
import deeplabcut
from deeplabcut.core.engine import Engine
from deeplabcut.core.weight_init import WeightInitialization
from deeplabcut.modelzoo.utils import (
create_conversion_table,
read_conversion_table_from_csv,
)
from deeplabcut.utils.pseudo_label import keypoint_matching
def create_training_dataset(config_path, shuffle, super_animal_name, model_name, conversion_table_path, net_type):
# Step 1: Keypoint matching before creating the training dataset
keypoint_matching(config_path, super_animal_name, model_name)
# Step 2: Initialize weights for memory replay
table = create_conversion_table(
config=config_path,
super_animal=super_animal_name,
project_to_super_animal=read_conversion_table_from_csv(conversion_table_path),
)
weight_init = WeightInitialization(
dataset=super_animal_name,
conversion_array=table.to_array(),
with_decoder=True,
memory_replay=True,
)
# Step 3: Create training dataset
deeplabcut.create_training_dataset(
config_path,
Shuffles=[shuffle],
net_type=net_type,
weight_init=weight_init,
engine=Engine.PYTORCH,
userfeedback=False
)
def main():
dlc_proj_root = Path("/ssd01/user_acc_data/oppa/deeplabcut/projects/oppamousetracker-Oppa-2024-08-23")
super_animal_name = "superanimal_topviewmouse"
net_type = 'top_down_hrnet_w32'
model_name = 'hrnetw32'
shuffle = 1
config_path = str(dlc_proj_root / "config.yaml")
conversion_table_path = dlc_proj_root / "memory_replay" / "conversion_table.csv"
# Step 1: Create training dataset
create_training_dataset(config_path, shuffle, super_animal_name, model_name, conversion_table_path, net_type)
if __name__ == "__main__":
main()
training step:
`from pathlib import Path
import deeplabcut
def train_network(config_path, shuffle, device, net_type):
# Train the network with memory replay
deeplabcut.train_network(
config_path,
shuffle=shuffle,
device=device,
pose_threshold=0.1,
net_type=net_type,
detector_batch_size=32,
batch_size=64,
freeze_bn_stats=False
)
def main():
dlc_proj_root = Path("/ssd01/user_acc_data/oppa/deeplabcut/projects/oppamousetracker-Oppa-2024-08-23")
net_type = 'top_down_hrnet_w32'
device = "cuda"
shuffle = 1
config_path = str(dlc_proj_root / "config.yaml")
# Step 2: Train the network
train_network(config_path, shuffle, device, net_type)
if __name__ == "__main__":
main()
error report:
-------------------------------------------------- Traceback (most recent call last): File "/ssd01/user_acc_data/oppa/deeplabcut/code/satvm_training.py", line 29, in <module> main() File "/ssd01/user_acc_data/oppa/deeplabcut/code/satvm_training.py", line 26, in main train_network(config_path, shuffle, device, net_type) File "/ssd01/user_acc_data/oppa/deeplabcut/code/satvm_training.py", line 6, in train_network deeplabcut.train_network( File "/share/user_data/oppa/.conda/envs/dlc-oppa/lib/python3.10/site-packages/deeplabcut/compat.py", line 245, in train_network return train_network( File "/share/user_data/oppa/.conda/envs/dlc-oppa/lib/python3.10/site-packages/deeplabcut/pose_estimation_pytorch/apis/train.py", line 326, in train_network train( File "/share/user_data/oppa/.conda/envs/dlc-oppa/lib/python3.10/site-packages/deeplabcut/pose_estimation_pytorch/apis/train.py", line 189, in train runner.fit( File "/share/user_data/oppa/.conda/envs/dlc-oppa/lib/python3.10/site-packages/deeplabcut/pose_estimation_pytorch/runners/train.py", line 170, in fit train_loss = self._epoch( File "/share/user_data/oppa/.conda/envs/dlc-oppa/lib/python3.10/site-packages/deeplabcut/pose_estimation_pytorch/runners/train.py", line 220, in _epoch for i, batch in enumerate(loader): File "/share/user_data/oppa/.conda/envs/dlc-oppa/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 631, in __next__ data = self._next_data() File "/share/user_data/oppa/.conda/envs/dlc-oppa/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1346, in _next_data return self._process_data(data) File "/share/user_data/oppa/.conda/envs/dlc-oppa/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1372, in _process_data data.reraise() File "/share/user_data/oppa/.conda/envs/dlc-oppa/lib/python3.10/site-packages/torch/_utils.py", line 705, in reraise raise exception RuntimeError: Caught RuntimeError in DataLoader worker process 0. Original Traceback (most recent call last): File "/share/user_data/oppa/.conda/envs/dlc-oppa/lib/python3.10/site-packages/torch/utils/data/_utils/worker.py", line 308, in _worker_loop data = fetcher.fetch(index) # type: ignore[possibly-undefined] File "/share/user_data/oppa/.conda/envs/dlc-oppa/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 54, in fetch return self.collate_fn(data) File "/share/user_data/oppa/.conda/envs/dlc-oppa/lib/python3.10/site-packages/torch/utils/data/_utils/collate.py", line 316, in default_collate return collate(batch, collate_fn_map=default_collate_fn_map) File "/share/user_data/oppa/.conda/envs/dlc-oppa/lib/python3.10/site-packages/torch/utils/data/_utils/collate.py", line 154, in collate clone.update({key: collate([d[key] for d in batch], collate_fn_map=collate_fn_map) for key in elem}) File "/share/user_data/oppa/.conda/envs/dlc-oppa/lib/python3.10/site-packages/torch/utils/data/_utils/collate.py", line 154, in <dictcomp> clone.update({key: collate([d[key] for d in batch], collate_fn_map=collate_fn_map) for key in elem}) File "/share/user_data/oppa/.conda/envs/dlc-oppa/lib/python3.10/site-packages/torch/utils/data/_utils/collate.py", line 141, in collate return collate_fn_map[elem_type](batch, collate_fn_map=collate_fn_map) File "/share/user_data/oppa/.conda/envs/dlc-oppa/lib/python3.10/site-packages/torch/utils/data/_utils/collate.py", line 222, in collate_numpy_array_fn return collate([torch.as_tensor(b) for b in batch], collate_fn_map=collate_fn_map) File "/share/user_data/oppa/.conda/envs/dlc-oppa/lib/python3.10/site-packages/torch/utils/data/_utils/collate.py", line 141, in collate return collate_fn_map[elem_type](batch, collate_fn_map=collate_fn_map) File "/share/user_data/oppa/.conda/envs/dlc-oppa/lib/python3.10/site-packages/torch/utils/data/_utils/collate.py", line 213, in collate_tensor_fn return torch.stack(batch, 0, out=out) RuntimeError: stack expects each tensor to be equal size, but got [3, 1688, 1695] at entry 0 and [3, 1688, 1726] at entry 11
Relevant log output
pytorch_config:
Task:
scorer:
date:
multianimalproject:
identity:
project_path:
engine: tensorflow
video_sets:
bodyparts:
start:
stop:
numframes2pick:
skeleton: []
skeleton_color: black
pcutoff:
dotsize:
alphavalue:
colormap:
TrainingFraction:
iteration:
default_net_type:
default_augmenter:
snapshotindex:
detector_snapshotindex:
batch_size:
cropping:
x1:
x2:
y1:
y2:
corner2move2:
move2corner:
SuperAnimalConversionTables:
data:
colormode: RGB
inference:
auto_padding:
pad_width_divisor: 32
pad_height_divisor: 32
normalize_images: true
train:
affine:
p: 0.5
scaling:
- 1.0
- 1.0
rotation: 30
translation: 0
gaussian_noise: 12.75
normalize_images: true
auto_padding:
pad_width_divisor: 32
pad_height_divisor: 32
detector:
data:
colormode: RGB
inference:
normalize_images: true
train:
hflip: true
normalize_images: true
device: auto
model:
type: FasterRCNN
variant: fasterrcnn_resnet50_fpn_v2
box_score_thresh: 0.6
pretrained: false
runner:
type: DetectorTrainingRunner
eval_interval: 50
optimizer:
type: AdamW
params:
lr: 1e-05
scheduler:
type: LRListScheduler
params:
milestones:
- 90
lr_list:
- - 1e-06
snapshots:
max_snapshots: 5
save_epochs: 50
save_optimizer_state: false
train_settings:
batch_size: 32
dataloader_workers: 32
dataloader_pin_memory: true
display_iters: 500
epochs: 250
device: auto
metadata:
project_path: /ssd01/user_acc_data/oppa/deeplabcut/projects/oppamousetracker-Oppa-2024-08-23
pose_config_path:
/ssd01/user_acc_data/oppa/deeplabcut/projects/oppamousetracker-Oppa-2024-08-23/dlc-models-pytorch/iteration-0/oppamousetrackerAug23-trainset90shuffle1/train/pose_cfg.yaml
bodyparts:
- nose
- left_ear
- right_ear
- left_ear_tip
- right_ear_tip
- left_eye
- right_eye
- neck
- mid_back
- mouse_center
- mid_backend
- mid_backend2
- mid_backend3
- tail_base
- tail1
- tail2
- tail3
- tail4
- tail5
- left_shoulder
- left_midside
- left_hip
- right_shoulder
- right_midside
- right_hip
- tail_end
- head_midpoint
unique_bodyparts: []
individuals:
- individual1
- individual2
- individual3
- individual4
- individual5
- individual6
- individual7
- individual8
- individual9
- individual10
with_identity: false
method: td
model:
backbone:
type: HRNet
model_name: hrnet_w32
pretrained: false
freeze_bn_stats: false
freeze_bn_weights: false
interpolate_branches: false
increased_channel_count: false
backbone_output_channels: 32
heads:
bodypart:
type: HeatmapHead
weight_init: normal
predictor:
type: HeatmapPredictor
apply_sigmoid: false
clip_scores: true
location_refinement: false
locref_std: 7.2801
target_generator:
type: HeatmapGaussianGenerator
num_heatmaps: 27
pos_dist_thresh: 17
heatmap_mode: KEYPOINT
generate_locref: false
locref_std: 7.2801
criterion:
heatmap:
type: WeightedMSECriterion
weight: 1.0
heatmap_config:
channels:
- 32
- 27
kernel_size:
- 1
strides:
- 1
net_type: top_down_hrnet_w32
runner:
type: PoseTrainingRunner
key_metric: test.mAP
key_metric_asc: true
eval_interval: 10
optimizer:
type: AdamW
params:
lr: 1e-05
scheduler:
type: LRListScheduler
params:
lr_list:
- - 1e-06
- - 1e-07
milestones:
- 160
- 190
snapshots:
max_snapshots: 5
save_epochs: 25
save_optimizer_state: false
train_settings:
batch_size: 64
dataloader_workers: 64
dataloader_pin_memory: true
display_iters: 500
epochs: 200
pretrained_weights:
seed: 42
weight_init:
dataset: superanimal_topviewmouse
with_decoder: true
memory_replay: true
conversion_array:
- 0
- 1
- 2
- 7
- 9
- 13
freeze_bn_stats: falseAnything else?
Code of Conduct
- I agree to follow this project's Code of Conduct