Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

training time does not reduces after increasing batch_size on 32 GB CPU instance #169

Open
suyogkute opened this issue Apr 11, 2023 · 1 comment

Comments

@suyogkute
Copy link

with batch_size = 1, ETA - 1 hour
with batch_size = 16 ETA - 10 hour

this behaviour is same on 32 GB RAM CPU intsnace vs local 16 GB CPU system.

@ithllc
Copy link

ithllc commented Dec 7, 2023

@elbruno @mmphego @fmorenovr @prateekralhan

Any answer on this from the team would be highly appreciated. I am currently training this on A100's in Google Colab and I second the matter of discussion. I have 12 classes, including the background for image classification. The screenshot is below:

image

output from the train_model method is:
Train 808 images
Validate 350 images
Applying augmentation on dataset
Checkpoint Path: /content/drive/MyDrive/computer_vision_model
Selecting layers to train

the code for training the model is just like what is discussed in the documentation, and I applied all the software downgrades, currently running it on tensorflow 2.8.0.

code below:
from google.colab import drive
drive.mount('/content/drive')

!pip3 uninstall tensorflow
!pip3 install tensorflow==2.8.0 # current version 2.14.0, reinstall it after this
!pip3 install tensorflow--gpu

!pip install requests numpy pillow scipy scikit-image==0.18.3 imgaug matplotlib labelme2coco==0.1.0 pixellib==0.5.2

import pixellib
from pixellib.custom_train import instance_custom_training

import json
import numpy as np
import pandas as pd
import os
import tensorflow as tf
print(tf.version)

train_maskrcnn = instance_custom_training()
train_maskrcnn.modelConfig(network_backbone = 'resnet101', num_classes= 12, batch_size = 4)
train_maskrcnn.load_pretrained_model(models_dir+'/mask_rcnn_coco.h5')
train_maskrcnn.load_dataset(exports_dir)
train_maskrcnn.train_model(num_epochs = 100, augmentation=True, path_trained_models = models_dir)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants