Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to train Custom Dataset #220

Closed
L1aoXingyu opened this issue Aug 7, 2020 · 24 comments
Closed

How to train Custom Dataset #220

L1aoXingyu opened this issue Aug 7, 2020 · 24 comments

Comments

@L1aoXingyu
Copy link
Member

L1aoXingyu commented Aug 7, 2020

This guide explains how to train your own custom dataset with fastreid's data loaders.

Before You Start

Following Getting Started to setup the environment and install requirements.txt dependencies.

Train on Custom Dataset

  1. Register your dataset (i.e., tell fastreid how to obtain your dataset).

    To let fastreid know how to obtain a dataset named "my_dataset", users need to implement a Class that inherits fastreid.data.datasets.bases.ImageDataset:

    from fastreid.data.datasets import DATASET_REGISTRY
    from fastreid.data.datasets.bases import ImageDataset
    
    
    @DATASET_REGISTRY.register()
    class MyOwnDataset(ImageDataset):
    	def __init__(self, root='datasets', **kwargs):
    		...
    		super().__init__(train, query, gallery)		

    Here, the snippet associates a dataset named "MyOwnDataset" with a class that processes train set, query set and gallery set and then pass to the baseClass. Then add a decorator to this class for registration.

    The class can do arbitrary things and should generate train list: list(str, str, str), query list: list(str, int, int) and gallery list: list(str, int, int) as below.

    train_list = [
    (train_path1, pid1, camid1), (train_path2, pid2, camid2), ...]
    
    query_list = [
    (query_path1, pid1, camid1), (query_path2, pid2, camid2), ...]
    
    gallery_list = [
    (gallery_path1, pid1, camid1), (gallery_path2, pid2, camid2), ...]

    You can also pass an empty train_list to generate a "Testset" only with super().__init__([], query, gallery).

    Notice: query and gallery sets could have the same camera views, but for each individual query identity, his/her gallery samples from the same camera are excluded. So if your dataset has no camera annotations, you can set all query identities camera number to 0 and all gallery identities camera number to 1, then you can get the testing results.

  2. Import your dataset.

    Aftre registering your own dataset, you need to import it in train_net.py to make it effective.

    from dataset_file import MyOwnDataset
@AnhPC03
Copy link

AnhPC03 commented Sep 21, 2020

@L1aoXingyu I don't understand clearly your documentation for custom dataset. I've tried your way but got below error:

Traceback (most recent call last):
  File "tools/train_net.py", line 67, in <module>
    args=(args,),
  File "./fastreid/engine/launch.py", line 71, in launch
    main_func(*args)
  File "tools/train_net.py", line 53, in main
    trainer = Trainer(cfg)
  File "./fastreid/engine/defaults.py", line 204, in __init__
    data_loader = self.build_train_loader(cfg)
  File "./fastreid/engine/defaults.py", line 408, in build_train_loader
    return build_reid_train_loader(cfg)
  File "./fastreid/data/build.py", line 27, in build_reid_train_loader
    dataset = DATASET_REGISTRY.get(d)(root=_root, combineall=cfg.DATASETS.COMBINEALL)
  File "./fastreid/product_dataset.py", line 8, in __init__
    super().__init__(train, query, gallery)
NameError: name 'train' is not defined

And this is my product_dataset.py file in fastreid folder:

from fastreid.data.datasets import DATASET_REGISTRY
from fastreid.data.datasets.bases import ImageDataset

@DATASET_REGISTRY.register()
class ProductDataset(ImageDataset):
    def __init__(self, root='datasets', **kwargs):
        ...
        super().__init__(train, query, gallery)

Although I had removed ... in the above file but got the same error.
The ProductDataset folder is in the datasets folder and has structure as following:

.
├── gallery
│   ├── data_38
│   ├── data_43
│   ├── data_68
│   ├── data_gro
│   └── data_grocery
├── query
│   ├── data_38
│   ├── data_43
│   ├── data_68
│   ├── data_gro
│   └── data_grocery
└── train
    ├── data_38
    ├── data_43
    ├── data_68
    ├── data_gro
    └── data_grocery

And in the each child folder, has the structure as below (ex: train/data_38/):

.
├── 1
├── 10
├── 11
├── 12
├── 13
├── 14
├── 15
├── 16
├── 17
├── 18
├── 19
├── 2
├── 20
├── 21
├── 22
├── 23
├── 24
├── 25
├── 26
├── 27
├── 28
├── 29
├── 3
├── 30
├── 31
├── 32
├── 33
├── 34
├── 35
├── 36
├── 37
├── 38
├── 4
├── 5
├── 6
├── 7
├── 8
└── 9

In each above number folder has some images.

@AnhPC03
Copy link

AnhPC03 commented Sep 22, 2020

@L1aoXingyu I don't understand clearly your documentation for custom dataset. I've tried your way but got below error:

Traceback (most recent call last):
  File "tools/train_net.py", line 67, in <module>
    args=(args,),
  File "./fastreid/engine/launch.py", line 71, in launch
    main_func(*args)
  File "tools/train_net.py", line 53, in main
    trainer = Trainer(cfg)
  File "./fastreid/engine/defaults.py", line 204, in __init__
    data_loader = self.build_train_loader(cfg)
  File "./fastreid/engine/defaults.py", line 408, in build_train_loader
    return build_reid_train_loader(cfg)
  File "./fastreid/data/build.py", line 27, in build_reid_train_loader
    dataset = DATASET_REGISTRY.get(d)(root=_root, combineall=cfg.DATASETS.COMBINEALL)
  File "./fastreid/product_dataset.py", line 8, in __init__
    super().__init__(train, query, gallery)
NameError: name 'train' is not defined

And this is my product_dataset.py file in fastreid folder:

from fastreid.data.datasets import DATASET_REGISTRY
from fastreid.data.datasets.bases import ImageDataset

@DATASET_REGISTRY.register()
class ProductDataset(ImageDataset):
    def __init__(self, root='datasets', **kwargs):
        ...
        super().__init__(train, query, gallery)

Although I had removed ... in the above file but got the same error.
The ProductDataset folder is in the datasets folder and has structure as following:

.
├── gallery
│   ├── data_38
│   ├── data_43
│   ├── data_68
│   ├── data_gro
│   └── data_grocery
├── query
│   ├── data_38
│   ├── data_43
│   ├── data_68
│   ├── data_gro
│   └── data_grocery
└── train
    ├── data_38
    ├── data_43
    ├── data_68
    ├── data_gro
    └── data_grocery

And in the each child folder, has the structure as below (ex: train/data_38/):

.
├── 1
├── 10
├── 11
├── 12
├── 13
├── 14
├── 15
├── 16
├── 17
├── 18
├── 19
├── 2
├── 20
├── 21
├── 22
├── 23
├── 24
├── 25
├── 26
├── 27
├── 28
├── 29
├── 3
├── 30
├── 31
├── 32
├── 33
├── 34
├── 35
├── 36
├── 37
├── 38
├── 4
├── 5
├── 6
├── 7
├── 8
└── 9

In each above number folder has some images.

I've solve the problems with my datasets. And the key was that, train, query or gallery pass via super()__init__(train, query, gallery) must be list of tuples. Each tuple has structure as (path/to/image/, pid, camid)

@L1aoXingyu
Copy link
Member Author

@AnhPC03 Yes, you are right! It doesn't matter how your data structure is. The key idea is preparing the train, query and gallery as required and then pass via super().__init__(train, query, gallery).

@addisonklinke
Copy link

@L1aoXingyu What is the purpose of formatting the train pid and camid values as strings instead of integers? It seems like the later would make more sense to be consistent with the formatting of the query and gallery sets

@L1aoXingyu
Copy link
Member Author

@addisonklinke When combining two or more datasets to train, the integers will be confusing because 0 will be different ids in different datasets.

@addisonklinke
Copy link

@L1aoXingyu I see, that makes sense. Thank you for the clarification.

Another question I had is whether there are guidelines for splitting a dataset into train, query, and gallery subsets. Obviously, we want the identity IDs in train to be mutually exclusive with those in query and gallery in order to have an unbiased evaluation. However, when constructing query and gallery I am wondering...

  1. Is there a typical percent of the dataset used for these? i.e. 75% train / 25% query + gallery
  2. Do dataset creators usually require a minimum number of instances per identity in the query + gallery sets? Having too few instances seems like it could make the queries substantially harder
  3. Once you have a set of identity IDs that are mutually exclusive with train, is there a rule of thumb for determining how many should belong in the query vs. gallery?

@younghuvee
Copy link

Hello, I train the model on my own dataset, but I will be stuck in the process of data loading during training, that is, here, why is this?

    def _try_put_index(self):
        assert self._tasks_outstanding < 2 * self._num_workers
        try:
            index = self._next_index()
        except StopIteration:
            return
        for _ in range(self._num_workers):  # find the next active worker, if any
            worker_queue_idx = next(self._worker_queue_idx_cycle)
            if self._workers_status[worker_queue_idx]:
                break
        else:
            # not found (i.e., didn't break)
            return

@vicwer
Copy link

vicwer commented Oct 9, 2020

hi,您好,我是否可以把自己的数据集图片重命名成market1501的格式放进market数据集文件夹中,使用market的配置直接训练?

@L1aoXingyu
Copy link
Member Author

@vicwer 这种方式也可以,但是还是推荐使用上面写的自定义数据集配置。

@AnhPC03
Copy link

AnhPC03 commented Nov 2, 2020

@L1aoXingyu I want to train your fast-reid repo for classification. And my dataset has following structure:

├── train
│   ├── beverage_bottle
│   ├── box
│   ├── candy_bag
│   ├── candy_jar
│   ├── cylinder
│   ├── instant_food_cup
│   ├── juice_box
│   └── tiny_candy
└── val
    ├── beverage_bottle
    ├── box
    ├── candy_bag
    ├── candy_jar
    ├── cylinder
    ├── instant_food_cup
    ├── juice_box
    └── tiny_candy

In each child folder, are some images

I had written dataloader as below:

import os
from fastreid.data.datasets import DATASET_REGISTRY
from fastreid.data.datasets.bases import ImageDataset

@DATASET_REGISTRY.register()
class SuperClassDataset(ImageDataset):
    def __init__(self, root='datasets', **kwargs):
        train_path = root + '/super_class_dataset/train'
        val_path = root + '/super_class_dataset/val'
        gallery_path = root + '/super_class_dataset/train'

        self.convert_labels = {
            'beverage_bottle': 1,
            'box': 2,
            'candy_bag': 3,
            'candy_jar': 4,
            'cylinder': 5,
            'instant_food_cup': 6,
            'juice_box': 7,
            'tiny_candy': 8,
        }
        
        train_data = self.get_data(train_path, 1)
        val_data = self.get_data(val_path, 2)
        gallery_data = self.get_data(gallery_path, 3)

        super().__init__(train_data, val_data, gallery_data)
    def get_data(self, path, cam_id):
        data = []
        absolute_path = os.path.join(path)
        sub_1_dirs = os.listdir(absolute_path)
        for sub_1_dir in sub_1_dirs:
            sub_1_path = os.path.join(absolute_path, sub_1_dir)
            if sub_1_dir == '.DS_Store':
                continue
            filenames = os.listdir(sub_1_path)
            for filename in filenames:
                if filename == '.DS_Store':
                    continue
                filepath = os.path.join(sub_1_path, filename)
                data.append((filepath, self.convert_labels[sub_1_dir], cam_id))
        return data

I had use train dataset as role of query dataset, and val as role of test dataset. But when i was training, i got the error:

RuntimeError: The size of tensor a (3) must match the size of tensor b (4) at non-singleton dimension 0

But when i printed, tensor a equaled tensor b in shape everytime.
Could you give me suggestion in dataloader for classification?

@L1aoXingyu L1aoXingyu unpinned this issue Jan 26, 2021
@Mulbetty
Copy link

Mulbetty commented Mar 8, 2021

按照你上面流程里配置完成之后,config.yml文件中的name 和 继承的MyOwnDataset类中的参数 ”datasetname“ 是不是要统一?

@L1aoXingyu
Copy link
Member Author

按照你上面流程里配置完成之后,config.yml文件中的name 和 继承的MyOwnDataset类中的参数 ”datasetname“ 是不是要统一?

配置文件里面的数据集名字需要和自己定义的数据集名字匹配,比如上面的例子,在 config 里面需要写成 "SuperClassDataset"

@rafaelbate
Copy link

rafaelbate commented Jul 1, 2021

This guide explains how to train your own custom dataset with fastreid's data loaders.

Before You Start

Following Getting Started to setup the environment and install requirements.txt dependencies.

Train on Custom Dataset

  1. Register your dataset (i.e., tell fastreid how to obtain your dataset).
    To let fastreid know how to obtain a dataset named "my_dataset", users need to implement a Class that inherits fastreid.data.datasets.bases.ImageDataset:

    from fastreid.data.datasets import DATASET_REGISTRY
    from fastreid.data.datasets.bases import ImageDataset
    
    
    @DATASET_REGISTRY.register()
    class MyOwnDataset(ImageDataset):
    	def __init__(self, root='datasets', **kwargs):
    		...
    		super().__init__(train, query, gallery)		

    Here, the snippet associates a dataset named "MyOwnDataset" with a class that processes train set, query set and gallery set and then pass to the baseClass. Then add a decorator to this class for registration.
    The class can do arbitrary things and should generate train list: list(str, str, str), query list: list(str, int, int) and gallery list: list(str, int, int) as below.

    train_list = [
    (train_path1, pid1, camid1), (train_path2, pid2, camid2), ...]
    
    query_list = [
    (query_path1, pid1, camid1), (query_path2, pid2, camid2), ...]
    
    gallery_list = [
    (gallery_path1, pid1, camid1), (gallery_path2, pid2, camid2), ...]

    You can also pass an empty train_list to generate a "Testset" only with super().__init__([], query, gallery).
    Notice: query and gallery sets could have the same camera views, but for each individual query identity, his/her gallery samples from the same camera are excluded. So if your dataset has no camera annotations, you can set all query identities camera number to 0 and all gallery identities camera number to 1, then you can get the testing results.

  2. Import your dataset.
    Aftre registering your own dataset, you need to import it in train_net.py to make it effective.

    from dataset_file import MyOwnDataset

Hello @L1aoXingyu. First of all, thank you for you amazing work! If I understand correctly, I can train FastReID to re-identify any custom object I want right? In my case, I need to be able to re-identify a certain fruit. So I just need a dataset containing images of that fruit, right?

Thank you for your contribution!

@L1aoXingyu
Copy link
Member Author

This guide explains how to train your own custom dataset with fastreid's data loaders.

Before You Start

Following Getting Started to setup the environment and install requirements.txt dependencies.

Train on Custom Dataset

  1. Register your dataset (i.e., tell fastreid how to obtain your dataset).
    To let fastreid know how to obtain a dataset named "my_dataset", users need to implement a Class that inherits fastreid.data.datasets.bases.ImageDataset:

    from fastreid.data.datasets import DATASET_REGISTRY
    from fastreid.data.datasets.bases import ImageDataset
    
    
    @DATASET_REGISTRY.register()
    class MyOwnDataset(ImageDataset):
    	def __init__(self, root='datasets', **kwargs):
    		...
    		super().__init__(train, query, gallery)		

    Here, the snippet associates a dataset named "MyOwnDataset" with a class that processes train set, query set and gallery set and then pass to the baseClass. Then add a decorator to this class for registration.
    The class can do arbitrary things and should generate train list: list(str, str, str), query list: list(str, int, int) and gallery list: list(str, int, int) as below.

    train_list = [
    (train_path1, pid1, camid1), (train_path2, pid2, camid2), ...]
    
    query_list = [
    (query_path1, pid1, camid1), (query_path2, pid2, camid2), ...]
    
    gallery_list = [
    (gallery_path1, pid1, camid1), (gallery_path2, pid2, camid2), ...]

    You can also pass an empty train_list to generate a "Testset" only with super().__init__([], query, gallery).
    Notice: query and gallery sets could have the same camera views, but for each individual query identity, his/her gallery samples from the same camera are excluded. So if your dataset has no camera annotations, you can set all query identities camera number to 0 and all gallery identities camera number to 1, then you can get the testing results.

  2. Import your dataset.
    Aftre registering your own dataset, you need to import it in train_net.py to make it effective.

    from dataset_file import MyOwnDataset

Hello @L1aoXingyu. First of all, thank you for you amazing work! If I understand correctly, I can train FastReID to re-identify any custom object I want right? In my case, I need to be able to re-identify a certain fruit. So I just need a dataset containing images of that fruit, right?

Thank you for your contribution!

Yes, if you want to train a model for identifying different fruits, you can collect a dataset with different kinds of fruits and train on it.

@rrjia
Copy link

rrjia commented Sep 2, 2021

ted, tensor a equaled tensor b in shape everytime

同样遇到这个bug了

@shreejalt
Copy link

@L1aoXingyu I want to train your fast-reid repo for classification. And my dataset has following structure:

├── train
│   ├── beverage_bottle
│   ├── box
│   ├── candy_bag
│   ├── candy_jar
│   ├── cylinder
│   ├── instant_food_cup
│   ├── juice_box
│   └── tiny_candy
└── val
    ├── beverage_bottle
    ├── box
    ├── candy_bag
    ├── candy_jar
    ├── cylinder
    ├── instant_food_cup
    ├── juice_box
    └── tiny_candy

In each child folder, are some images

I had written dataloader as below:

import os
from fastreid.data.datasets import DATASET_REGISTRY
from fastreid.data.datasets.bases import ImageDataset

@DATASET_REGISTRY.register()
class SuperClassDataset(ImageDataset):
    def __init__(self, root='datasets', **kwargs):
        train_path = root + '/super_class_dataset/train'
        val_path = root + '/super_class_dataset/val'
        gallery_path = root + '/super_class_dataset/train'

        self.convert_labels = {
            'beverage_bottle': 1,
            'box': 2,
            'candy_bag': 3,
            'candy_jar': 4,
            'cylinder': 5,
            'instant_food_cup': 6,
            'juice_box': 7,
            'tiny_candy': 8,
        }
        
        train_data = self.get_data(train_path, 1)
        val_data = self.get_data(val_path, 2)
        gallery_data = self.get_data(gallery_path, 3)

        super().__init__(train_data, val_data, gallery_data)
    def get_data(self, path, cam_id):
        data = []
        absolute_path = os.path.join(path)
        sub_1_dirs = os.listdir(absolute_path)
        for sub_1_dir in sub_1_dirs:
            sub_1_path = os.path.join(absolute_path, sub_1_dir)
            if sub_1_dir == '.DS_Store':
                continue
            filenames = os.listdir(sub_1_path)
            for filename in filenames:
                if filename == '.DS_Store':
                    continue
                filepath = os.path.join(sub_1_path, filename)
                data.append((filepath, self.convert_labels[sub_1_dir], cam_id))
        return data

I had use train dataset as role of query dataset, and val as role of test dataset. But when i was training, i got the error:

RuntimeError: The size of tensor a (3) must match the size of tensor b (4) at non-singleton dimension 0

But when i printed, tensor a equaled tensor b in shape everytime.
Could you give me suggestion in dataloader for classification?

@AnhPC03 Did you solved this issue? It would be helpful for me if you can guide me through the error

@github-actions
Copy link

github-actions bot commented Oct 8, 2021

This issue is stale because it has been open for 30 days with no activity.

@github-actions github-actions bot added the stale label Oct 8, 2021
@akashAD98
Copy link

akashAD98 commented Oct 14, 2021

@AnhPC03 can you please tell us how did you solved this issue?|
Is there a typical percent of the dataset used for these? i.e. 75% train / 25% query + gallery ? should we need to add same images in train query gallary ?

@github-actions github-actions bot removed the stale label Oct 15, 2021
@AnhPC03
Copy link

AnhPC03 commented Oct 20, 2021

@shreejalt @akashAD98 Did you guys get this error RuntimeError: The size of tensor a (3) must match the size of tensor b (4) at non-singleton dimension 0, right?
If yes, please check that if images in your dataset had alpha channel? And then remove the alpha channel, only keep B,G,R channels.

@github-actions
Copy link

This issue is stale because it has been open for 30 days with no activity.

@github-actions github-actions bot added the stale label Nov 20, 2021
@github-actions
Copy link

github-actions bot commented Dec 4, 2021

This issue was closed because it has been inactive for 14 days since being marked as stale.

@github-actions github-actions bot closed this as completed Dec 4, 2021
@Cippppy
Copy link

Cippppy commented Jan 12, 2024

@L1aoXingyu I want to train your fast-reid repo for classification. And my dataset has following structure:

├── train
│   ├── beverage_bottle
│   ├── box
│   ├── candy_bag
│   ├── candy_jar
│   ├── cylinder
│   ├── instant_food_cup
│   ├── juice_box
│   └── tiny_candy
└── val
    ├── beverage_bottle
    ├── box
    ├── candy_bag
    ├── candy_jar
    ├── cylinder
    ├── instant_food_cup
    ├── juice_box
    └── tiny_candy

In each child folder, are some images

I had written dataloader as below:

import os
from fastreid.data.datasets import DATASET_REGISTRY
from fastreid.data.datasets.bases import ImageDataset

@DATASET_REGISTRY.register()
class SuperClassDataset(ImageDataset):
    def __init__(self, root='datasets', **kwargs):
        train_path = root + '/super_class_dataset/train'
        val_path = root + '/super_class_dataset/val'
        gallery_path = root + '/super_class_dataset/train'

        self.convert_labels = {
            'beverage_bottle': 1,
            'box': 2,
            'candy_bag': 3,
            'candy_jar': 4,
            'cylinder': 5,
            'instant_food_cup': 6,
            'juice_box': 7,
            'tiny_candy': 8,
        }
        
        train_data = self.get_data(train_path, 1)
        val_data = self.get_data(val_path, 2)
        gallery_data = self.get_data(gallery_path, 3)

        super().__init__(train_data, val_data, gallery_data)
    def get_data(self, path, cam_id):
        data = []
        absolute_path = os.path.join(path)
        sub_1_dirs = os.listdir(absolute_path)
        for sub_1_dir in sub_1_dirs:
            sub_1_path = os.path.join(absolute_path, sub_1_dir)
            if sub_1_dir == '.DS_Store':
                continue
            filenames = os.listdir(sub_1_path)
            for filename in filenames:
                if filename == '.DS_Store':
                    continue
                filepath = os.path.join(sub_1_path, filename)
                data.append((filepath, self.convert_labels[sub_1_dir], cam_id))
        return data

I had use train dataset as role of query dataset, and val as role of test dataset. But when i was training, i got the error:

RuntimeError: The size of tensor a (3) must match the size of tensor b (4) at non-singleton dimension 0

But when i printed, tensor a equaled tensor b in shape everytime. Could you give me suggestion in dataloader for classification?

Hey @AnhPC03, I know its been a few years but I am trying to build a classifer like you, but I can't get it working. Did you ever run into an issue where the trainer stalls forever without erroring out?

@bai-0829
Copy link

我想训练你的快速 reid 存储库进行分类。我的数据集具有以下结构:

├── train
│   ├── beverage_bottle
│   ├── box
│   ├── candy_bag
│   ├── candy_jar
│   ├── cylinder
│   ├── instant_food_cup
│   ├── juice_box
│   └── tiny_candy
└── val
    ├── beverage_bottle
    ├── box
    ├── candy_bag
    ├── candy_jar
    ├── cylinder
    ├── instant_food_cup
    ├── juice_box
    └── tiny_candy

In each child folder, are some images

我写了数据加载器如下:

import os
from fastreid.data.datasets import DATASET_REGISTRY
from fastreid.data.datasets.bases import ImageDataset

@DATASET_REGISTRY.register()
class SuperClassDataset(ImageDataset):
    def __init__(self, root='datasets', **kwargs):
        train_path = root + '/super_class_dataset/train'
        val_path = root + '/super_class_dataset/val'
        gallery_path = root + '/super_class_dataset/train'

        self.convert_labels = {
            'beverage_bottle': 1,
            'box': 2,
            'candy_bag': 3,
            'candy_jar': 4,
            'cylinder': 5,
            'instant_food_cup': 6,
            'juice_box': 7,
            'tiny_candy': 8,
        }
        
        train_data = self.get_data(train_path, 1)
        val_data = self.get_data(val_path, 2)
        gallery_data = self.get_data(gallery_path, 3)

        super().__init__(train_data, val_data, gallery_data)
    def get_data(self, path, cam_id):
        data = []
        absolute_path = os.path.join(path)
        sub_1_dirs = os.listdir(absolute_path)
        for sub_1_dir in sub_1_dirs:
            sub_1_path = os.path.join(absolute_path, sub_1_dir)
            if sub_1_dir == '.DS_Store':
                continue
            filenames = os.listdir(sub_1_path)
            for filename in filenames:
                if filename == '.DS_Store':
                    continue
                filepath = os.path.join(sub_1_path, filename)
                data.append((filepath, self.convert_labels[sub_1_dir], cam_id))
        return data

我使用训练数据集作为查询数据集的角色,并使用 val 作为测试数据集的角色。但是当我在训练时,我得到了错误:

RuntimeError: The size of tensor a (3) must match the size of tensor b (4) at non-singleton dimension 0

但是当我打印时,张量 a 每次都等于张量 b。你能给我在数据加载器中分类的建议吗?

嘿,我知道已经有几年了,但我正在尝试建立一个像你这样的分类器,但我无法让它工作。你有没有遇到过训练器永远失速而没有出错的问题?

你好,请问您解决了这个问题吗?,我现在也是训练不报错,但是卡死不运行状态

@Thomas-CHOCHOY
Copy link

@L1aoXingyu I don't understand clearly your documentation for custom dataset. I've tried your way but got below error:

Traceback (most recent call last):
  File "tools/train_net.py", line 67, in <module>
    args=(args,),
  File "./fastreid/engine/launch.py", line 71, in launch
    main_func(*args)
  File "tools/train_net.py", line 53, in main
    trainer = Trainer(cfg)
  File "./fastreid/engine/defaults.py", line 204, in __init__
    data_loader = self.build_train_loader(cfg)
  File "./fastreid/engine/defaults.py", line 408, in build_train_loader
    return build_reid_train_loader(cfg)
  File "./fastreid/data/build.py", line 27, in build_reid_train_loader
    dataset = DATASET_REGISTRY.get(d)(root=_root, combineall=cfg.DATASETS.COMBINEALL)
  File "./fastreid/product_dataset.py", line 8, in __init__
    super().__init__(train, query, gallery)
NameError: name 'train' is not defined

And this is my product_dataset.py file in fastreid folder:

from fastreid.data.datasets import DATASET_REGISTRY
from fastreid.data.datasets.bases import ImageDataset

@DATASET_REGISTRY.register()
class ProductDataset(ImageDataset):
    def __init__(self, root='datasets', **kwargs):
        ...
        super().__init__(train, query, gallery)

Although I had removed ... in the above file but got the same error. The ProductDataset folder is in the datasets folder and has structure as following:

.
├── gallery
│   ├── data_38
│   ├── data_43
│   ├── data_68
│   ├── data_gro
│   └── data_grocery
├── query
│   ├── data_38
│   ├── data_43
│   ├── data_68
│   ├── data_gro
│   └── data_grocery
└── train
    ├── data_38
    ├── data_43
    ├── data_68
    ├── data_gro
    └── data_grocery

And in the each child folder, has the structure as below (ex: train/data_38/):

.
├── 1
├── 10
├── 11
├── 12
├── 13
├── 14
├── 15
├── 16
├── 17
├── 18
├── 19
├── 2
├── 20
├── 21
├── 22
├── 23
├── 24
├── 25
├── 26
├── 27
├── 28
├── 29
├── 3
├── 30
├── 31
├── 32
├── 33
├── 34
├── 35
├── 36
├── 37
├── 38
├── 4
├── 5
├── 6
├── 7
├── 8
└── 9

In each above number folder has some images.

Hi, how did you name the images ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests