Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

can you give a train.py file to train the detection model on custom dataset its complicated without it #929

Closed
sachinbisht1 opened this issue May 5, 2023 · 11 comments

Comments

@sachinbisht1
Copy link

Is your feature request related to a problem? Please describe.

A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Describe the solution you'd like

A clear and concise description of what you want to happen.

Describe alternatives you've considered

A clear and concise description of any alternative solutions or features you've considered.

Additional context

Add any other context or screenshots about the feature request here.

@dagshub
Copy link

dagshub bot commented May 5, 2023

@NatanBagrov
Copy link
Contributor

Hello, I'm attaching here a general template. Fill-in your custom code:

setup_device(multi_gpu=MultiGPUMode.OFF, num_gpus=1)
trainer = Trainer(experiment_name="my_experiment")
net = models.get(Models.YoloNAS_S, pretrained_weights="coco")
train_dataloader, val_dataloader = ...  # YOUR DATALOADERS
training_hyperparams = ...  # YOUR HYP
trainer.train(net, train_dataloader, val_dataloader, training_hyperparams)

For the default training hyper-parameters of COCO, use:

training_hyperparams = coco2017_yolox_train_params()

@Louis-Dupont
Copy link
Contributor

@sachinbisht1 Ideally, if your data is stored in yolov5/8 or Coco Format, you can use actually work the code from the fine-tuning collab

If not, and you need to use you own dataset object, you can just do as @NatanBagrov explained, just make sure to have your targets in format LABEL_CXCYWH, and with tensors of shape [BS, C, H, W].
If you are not working with the same classes as Coco, you also need to set num_classes when loading the model: models.get(Models.YoloNAS_S, pretrained_weights="coco", num_classes=...)
Even in this case, if you missed the fine-tuning collab I think it's worth having a look :)

@sachinbisht1
Copy link
Author

Instead of cloning the repo I tried again without cloning the repo and just installing the the GS but it does not have the object_names py file
ModuleNotFoundError: No module named 'super_gradients.common.object_names'
from super_gradients.common.object_names import Models

@sachinbisht1
Copy link
Author

from super_gradients.training import Trainer
from super_gradients.training import dataloaders
from super_gradients.training.dataloaders.dataloaders import coco_detection_yolo_format_train, coco_detection_yolo_format_val
from super_gradients.training import models
from super_gradients.training.losses import PPYoloELoss
from super_gradients.training.metrics import DetectionMetrics_050
from super_gradients.training.models.detection_models.pp_yolo_e import PPYoloEPostPredictionCallback
from super_gradients.common.object_names import Models
dataset_params = {
'data_dir':'/home/administrator/cv/icdar23/dataset/COCO',
'train_images_dir':'/home/administrator/cv/icdar23/dataset/COCO/images/train2017',
'train_labels_dir':'/home/administrator/cv/icdar23/dataset/COCO/annotations/instances_train2017.json',
'val_images_dir':'/home/administrator/cv/icdar23/dataset/COCO/images/val2017',
'val_labels_dir':'/home/administrator/cv/icdar23/dataset/COCO/annotations/instances_val2017.json',
'classes': ['Caption', 'Title', 'List-item', 'Footnote',"Page-footer","Page-header","Picture","Table","Text"]
}
CHECKPOINT_DIR = '/home/administrator/cv/icdar23/checkpoints'
trainer = Trainer(experiment_name='Layout_MODEL', ckpt_root_dir=CHECKPOINT_DIR)
train_data = coco_detection_yolo_format_train(
dataset_params={
'data_dir': dataset_params['data_dir'],
'images_dir': dataset_params['train_images_dir'],
'labels_dir': dataset_params['train_labels_dir'],
'classes': dataset_params['classes']
},
dataloader_params={
'batch_size':16,
'num_workers':2
}
)
val_data = coco_detection_yolo_format_val(
dataset_params={
'data_dir': dataset_params['data_dir'],
'images_dir': dataset_params['val_images_dir'],
'labels_dir': dataset_params['val_labels_dir'],
'classes': dataset_params['classes']
},
dataloader_params={
'batch_size':16,
'num_workers':2
}
)
model = models.get(Models.YOLO_NAS_L,
num_classes=len(dataset_params['classes']),
pretrained_weights="coco"
)
train_params = {
# ENABLING SILENT MODE
'silent_mode': True,
"average_best_models":True,
"warmup_mode": "linear_epoch_step",
"warmup_initial_lr": 1e-6,
"lr_warmup_epochs": 3,
"initial_lr": 5e-4,
"lr_mode": "cosine",
"cosine_final_lr_ratio": 0.1,
"optimizer": "Adam",
"optimizer_params": {"weight_decay": 0.0001},
"zero_weight_decay_on_bias_and_bn": True,
"ema": True,
"ema_params": {"decay": 0.9, "decay_type": "threshold"},
# ONLY TRAINING FOR 10 EPOCHS FOR THIS EXAMPLE NOTEBOOK
"max_epochs": 10,
"mixed_precision": True,
"loss": PPYoloELoss(
use_static_assigner=False,
# NOTE: num_classes needs to be defined here
num_classes=len(dataset_params['classes']),
reg_max=16
),
"valid_metrics_list": [
DetectionMetrics_050(
score_thres=0.1,
top_k_predictions=300,
# NOTE: num_classes needs to be defined here
num_cls=len(dataset_params['classes']),
normalize_targets=True,
post_prediction_callback=PPYoloEPostPredictionCallback(
score_threshold=0.01,
nms_top_k=1000,
max_predictions=300,
nms_threshold=0.7
)
)
],
"metric_to_watch": 'mAP@0.50'
}
trainer.train(model=model,
training_params=train_params,
train_loader=train_data,
valid_loader=val_data)
this is the code i am trying to run the SG version is 3.1.0
it is throwing an error
Traceback (most recent call last):
File "/home/administrator/miniconda3/envs/super_grad/lib/python3.9/site-packages/torch/init.py", line 172, in _load_global_deps
ctypes.CDLL(lib_path, mode=ctypes.RTLD_GLOBAL)
File "/home/administrator/miniconda3/envs/super_grad/lib/python3.9/ctypes/init.py", line 382, in init
self._handle = _dlopen(self._name, mode)
OSError: /home/administrator/miniconda3/envs/super_grad/lib/python3.9/site-packages/torch/lib/../../nvidia/cublas/lib/libcublas.so.11: undefined symbol: cublasLtGetStatusString, version libcublasLt.so.11

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/administrator/cv/icdar23/train.py", line 1, in
from super_gradients.common.object_names import Models
File "/home/administrator/miniconda3/envs/super_grad/lib/python3.9/site-packages/super_gradients/init.py", line 1, in
from super_gradients.common import init_trainer, is_distributed, object_names
File "/home/administrator/miniconda3/envs/super_grad/lib/python3.9/site-packages/super_gradients/common/init.py", line 2, in
from super_gradients.common.crash_handler import setup_crash_handler
File "/home/administrator/miniconda3/envs/super_grad/lib/python3.9/site-packages/super_gradients/common/crash_handler/init.py", line 1, in
from super_gradients.common.crash_handler.crash_handler import setup_crash_handler
File "/home/administrator/miniconda3/envs/super_grad/lib/python3.9/site-packages/super_gradients/common/crash_handler/crash_handler.py", line 3, in
from super_gradients.common.crash_handler.crash_tips_setup import setup_crash_tips
File "/home/administrator/miniconda3/envs/super_grad/lib/python3.9/site-packages/super_gradients/common/crash_handler/crash_tips_setup.py", line 3, in
from super_gradients.common.environment.env_variables import env_variables
File "/home/administrator/miniconda3/envs/super_grad/lib/python3.9/site-packages/super_gradients/common/environment/init.py", line 4, in
from super_gradients.common.environment.ddp_utils import init_trainer, is_distributed
File "/home/administrator/miniconda3/envs/super_grad/lib/python3.9/site-packages/super_gradients/common/environment/ddp_utils.py", line 5, in
from super_gradients.common.environment.device_utils import device_config
File "/home/administrator/miniconda3/envs/super_grad/lib/python3.9/site-packages/super_gradients/common/environment/device_utils.py", line 3, in
import torch
File "/home/administrator/miniconda3/envs/super_grad/lib/python3.9/site-packages/torch/init.py", line 217, in
_load_global_deps()
File "/home/administrator/miniconda3/envs/super_grad/lib/python3.9/site-packages/torch/init.py", line 178, in _load_global_deps
_preload_cuda_deps()
File "/home/administrator/miniconda3/envs/super_grad/lib/python3.9/site-packages/torch/init.py", line 158, in _preload_cuda_deps
ctypes.CDLL(cublas_path)
File "/home/administrator/miniconda3/envs/super_grad/lib/python3.9/ctypes/init.py", line 382, in init
self._handle = _dlopen(self._name, mode)
OSError: /home/administrator/miniconda3/envs/super_grad/lib/python3.9/site-packages/nvidia/cublas/lib/libcublas.so.11: undefined symbol: cublasLtGetStatusString, version libcublasLt.so.11

@ajithkumarmcw
Copy link

@sachinbisht1 you can try this one for custom dataset training https://www.youtube.com/watch?v=pmHW15tELZk
the error which u r facing is because of torch incompactaiblity with cuda . this will solve your error hpcaitech/ColossalAI#2901 (comment)

@sachinbisht1
Copy link
Author

sachinbisht1 commented May 6, 2023

The console stream is logged into /home/administrator/sg_logs/console.log
[2023-05-06 06:37:27] INFO - crash_tips_setup.py - Crash tips is enabled. You can set your environment variable to CRASH_HANDLER=FALSE to disable it
Traceback (most recent call last):
File "/home/administrator/miniconda3/envs/super_grad/lib/python3.9/site-packages/requests/compat.py", line 11, in
import chardet
ModuleNotFoundError: No module named 'chardet'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/administrator/cv/icdar23/newtrain.py", line 1, in
from super_gradients.training import Trainer
File "/home/administrator/miniconda3/envs/super_grad/lib/python3.9/site-packages/super_gradients/init.py", line 2, in
from super_gradients.training import losses, utils, datasets_utils, DataAugmentation, Trainer, KDTrainer, QATTrainer
File "/home/administrator/miniconda3/envs/super_grad/lib/python3.9/site-packages/super_gradients/training/init.py", line 3, in
from super_gradients.training.datasets import datasets_utils, DataAugmentation
File "/home/administrator/miniconda3/envs/super_grad/lib/python3.9/site-packages/super_gradients/training/datasets/init.py", line 7, in
from super_gradients.training.datasets.detection_datasets import (
File "/home/administrator/miniconda3/envs/super_grad/lib/python3.9/site-packages/super_gradients/training/datasets/detection_datasets/init.py", line 5, in
from super_gradients.training.datasets.detection_datasets.yolo_format_detection import YoloDarknetFormatDetectionDataset
File "/home/administrator/miniconda3/envs/super_grad/lib/python3.9/site-packages/super_gradients/training/datasets/detection_datasets/yolo_format_detection.py", line 8, in
from super_gradients.training.utils.media.image import is_image
File "/home/administrator/miniconda3/envs/super_grad/lib/python3.9/site-packages/super_gradients/training/utils/media/image.py", line 11, in
import requests
File "/home/administrator/miniconda3/envs/super_grad/lib/python3.9/site-packages/requests/init.py", line 45, in
from .exceptions import RequestsDependencyWarning
File "/home/administrator/miniconda3/envs/super_grad/lib/python3.9/site-packages/requests/exceptions.py", line 9, in
from .compat import JSONDecodeError as CompatJSONDecodeError
File "/home/administrator/miniconda3/envs/super_grad/lib/python3.9/site-packages/requests/compat.py", line 13, in
import charset_normalizer as chardet
File "/home/administrator/miniconda3/envs/super_grad/lib/python3.9/site-packages/charset_normalizer/init.py", line 23, in
from charset_normalizer.api import from_fp, from_path, from_bytes, normalize
File "/home/administrator/miniconda3/envs/super_grad/lib/python3.9/site-packages/charset_normalizer/api.py", line 10, in
from charset_normalizer.md import mess_ratio
File "charset_normalizer/md.py", line 5, in
ImportError: cannot import name 'COMMON_SAFE_ASCII_CHARACTERS' from 'charset_normalizer.constant' (/home/administrator/miniconda3/envs/super_grad/lib/python3.9/site-packages/charset_normalizer/constant.py)

@ajithkumarmcw
Copy link

try installing chardet

@Louis-Dupont
Copy link
Contributor

@sachinbisht1 Is it solved ?
Also if when showing errors/code please embed it inside "```" ("add code") it helps with reading :)
image

@sachinbisht1
Copy link
Author

Hi, can you tell me about this assertion error
The console stream is logged into /home/administrator/sg_logs/console.log [2023-05-08 05:31:40] INFO - crash_tips_setup.py - Crash tips is enabled. You can set your environment variable to CRASH_HANDLER=FALSE to disable it /home/administrator/miniconda3/envs/super_grad/lib/python3.9/site-packages/_distutils_hack/__init__.py:18: UserWarning: Distutils was imported before Setuptools, but importing Setuptools also replaces the distutilsmodule insys.modules. This may lead to undesirable behaviors or errors. To avoid these issues, avoid using distutils directly, ensure that setuptools is installed in the traditional way (e.g. not an editable install), and/or make sure that setuptools is always imported before distutils. warnings.warn( /home/administrator/miniconda3/envs/super_grad/lib/python3.9/site-packages/_distutils_hack/__init__.py:33: UserWarning: Setuptools is replacing distutils. warnings.warn("Setuptools is replacing distutils.") Traceback (most recent call last): File "/home/administrator/cv/icdar23/newtrain.py", line 1, in <module> from super_gradients.training import Trainer File "/home/administrator/miniconda3/envs/super_grad/lib/python3.9/site-packages/super_gradients/__init__.py", line 2, in <module> from super_gradients.training import losses, utils, datasets_utils, DataAugmentation, Trainer, KDTrainer, QATTrainer File "/home/administrator/miniconda3/envs/super_grad/lib/python3.9/site-packages/super_gradients/training/__init__.py", line 4, in <module> from super_gradients.training.sg_trainer import Trainer File "/home/administrator/miniconda3/envs/super_grad/lib/python3.9/site-packages/super_gradients/training/sg_trainer/__init__.py", line 3, in <module> from super_gradients.training.sg_trainer.sg_trainer import Trainer, MultiGPUMode, StrictLoad File "/home/administrator/miniconda3/envs/super_grad/lib/python3.9/site-packages/super_gradients/training/sg_trainer/sg_trainer.py", line 23, in <module> from super_gradients.training.utils.sg_trainer_utils import get_callable_param_names File "/home/administrator/miniconda3/envs/super_grad/lib/python3.9/site-packages/super_gradients/training/utils/sg_trainer_utils.py", line 18, in <module> from torch.utils.tensorboard import SummaryWriter File "/home/administrator/miniconda3/envs/super_grad/lib/python3.9/site-packages/torch/utils/tensorboard/__init__.py", line 2, in <module> from setuptools import distutils File "/home/administrator/miniconda3/envs/super_grad/lib/python3.9/site-packages/setuptools/__init__.py", line 8, in <module> import _distutils_hack.override # noqa: F401 File "/home/administrator/miniconda3/envs/super_grad/lib/python3.9/site-packages/_distutils_hack/override.py", line 1, in <module> __import__('_distutils_hack').do_override() File "/home/administrator/miniconda3/envs/super_grad/lib/python3.9/site-packages/_distutils_hack/__init__.py", line 77, in do_override ensure_local_distutils() File "/home/administrator/miniconda3/envs/super_grad/lib/python3.9/site-packages/_distutils_hack/__init__.py", line 64, in ensure_local_distutils assert '_distutils' in core.__file__, core.__file__ AssertionError: /home/administrator/miniconda3/envs/super_grad/lib/python3.9/distutils/core.py

@Louis-Dupont
Copy link
Contributor

@sachinbisht1 looks like it's not directly related to SuperGradients.
This seems to be due to setuptools, you can try to run pip install setuptools==59.8.0 and try again after

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants