Train on Colab GPU #141

spectorp · 2020-06-08T15:53:52Z

Hello, I'm interested in training on a Google Colab GPU. Getting the code running on Colab is pretty straightforward, but it doesn't actually run on the GPU and is therefore quite slow. I'm not sure how to change this; could you point me in the right direction? Many thanks.

AntonMu · 2020-06-08T16:43:43Z

Hi @spectorp - in order to receive help from others, I recommend to complete the issue template. Thanks

spectorp · 2020-06-08T18:47:42Z

Hi @AntonMu , thanks for the quick reply and sorry for not following the issue template. Here's the issue:

Have you followed the instructions exactly (word by word)? Yes

Have you checked the troubleshooting section? Yes

System information

What is the top-level directory of the model you are using: 2_Training
Have I written custom code (as opposed to using a stock example script provided in the repo): No
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux Ubuntu 18.04.3
TensorFlow version (use command below): v2.2.0-0-g2b96f3662b 2.2.0
CUDA/cuDNN version: Cuda compilation tools, release 10.1, V10.1.243
GPU model and memory: The GPUs available in Colab often include Nvidia K80s, T4s, P4s and P100s. There is no way to choose what type of GPU you can connect to in Colab at any given time. I think memory ranges from 12 to 16 GB.
Exact command to reproduce: If you open a new Colab notebook, the following instructions and commands should reproduce the issue:

Set the runtime (Runtime > Change runtime type > select GPU)
!git clone https://github.com/AntonMu/TrainYourOwnYOLO
!pip install -r /content/TrainYourOwnYOLO/requirements.txt
Restart runtime (click restart runtime)
!python /content/TrainYourOwnYOLO/2_Training/Download_and_Convert_YOLO_weights.py
!python /content/TrainYourOwnYOLO/2_Training/Train_YOLO.py

Describe the problem

Despite setting Colab to use a GPU, the code runs on a CPU. Each epoch takes 2-3 minutes to run, and eventually Colab will issue a popup message at the bottom of the window saying: "You are connected to a GPU runtime, but not utilizing the GPU." Any help on this would be appreciated!

Source code / logs

Let me know if there would be any useful logs for me to provide.

AntonMu · 2020-06-09T01:12:49Z

Hi - it looks like your cuda version and tensorflow version are both wrong. Check this out for compatibility https://www.tensorflow.org/install/source

spectorp · 2020-06-09T05:51:15Z

Thanks @AntonMu . It works great!

AntonMu · 2020-06-09T06:00:03Z

@spectorp good to hear! If you don't mind, it would be great if you could share the code - either link it here or I can also add it somewhere in the Readme. I will credit you for your work.

Colab is a good option for many people that want to try out this repo.

spectorp · 2020-06-12T16:59:50Z

Hi @AntonMu , I'm new to Colab, and what I've learned is that it doesn't always reliably install the correct version of a package. For example, I recently needed an older version of numpy, and despite running !pip install numpy==1.17.4, every now and then the version would be 1.18.x. Restarting the runtime usually fixes the problem. So, for TrainYourOwnYOLO, the following code should work:

!git clone https://github.com/AntonMu/TrainYourOwnYOLO
!pip install -r /content/TrainYourOwnYOLO/requirements.txt
!python /content/TrainYourOwnYOLO/2_Training/Download_and_Convert_YOLO_weights.py
!python /content/TrainYourOwnYOLO/2_Training/Train_YOLO.py

However, if it doesn't, I would double check the package versions. For me, running !pip install tensorflow-gpu==1.15 fixed the problem.

Also, it's good to know that it's really easy to link a google drive account via:
from google.colab import drive
drive.mount('/content/drive')

AntonMu · 2020-06-22T17:53:19Z

https://colab.research.google.com/github/AntonMu/TrainYourOwnYOLO/blob/master/TrainYourOwnYOLO.ipynb

bushra-hafeez · 2020-08-21T02:47:26Z

Hi, @spectorp
I am training the model on my own dataset according to the instructions further described by AntonMu. But you haven't mentioned anything regarding the annotations part because we do that part on our local machine and not on colab. I would very much like to know how to associate the annotations with colab because even if i upload data_train and data_classes in their respective folders, I sure am going to get errors regarding directories of annotated images in data_train.txt on colab

spectorp · 2020-08-30T22:16:37Z

Hi @bushra-hafeez , sorry for the slow reply. You may have figured this out already, but I just linked my Google Drive and transferred the necessary files to Colab. Does that help?

AntonMu added the issue template not completed Issue template was not completed - missing one or more fields. label Jun 8, 2020

AntonMu removed the issue template not completed Issue template was not completed - missing one or more fields. label Jun 9, 2020

spectorp closed this as completed Jun 9, 2020

AntonMu reopened this Jun 9, 2020

AntonMu closed this as completed Jun 22, 2020

AntonMu added the enhancement New feature or request label Jun 22, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Train on Colab GPU #141

Train on Colab GPU #141

spectorp commented Jun 8, 2020 •

edited

AntonMu commented Jun 8, 2020

spectorp commented Jun 8, 2020

AntonMu commented Jun 9, 2020

spectorp commented Jun 9, 2020

AntonMu commented Jun 9, 2020

spectorp commented Jun 12, 2020

AntonMu commented Jun 22, 2020

bushra-hafeez commented Aug 21, 2020

spectorp commented Aug 30, 2020

Train on Colab GPU #141

Train on Colab GPU #141

Comments

spectorp commented Jun 8, 2020 • edited

AntonMu commented Jun 8, 2020

spectorp commented Jun 8, 2020

Have you followed the instructions exactly (word by word)? Yes

Have you checked the troubleshooting section? Yes

System information

Describe the problem

Source code / logs

AntonMu commented Jun 9, 2020

spectorp commented Jun 9, 2020

AntonMu commented Jun 9, 2020

spectorp commented Jun 12, 2020

AntonMu commented Jun 22, 2020

bushra-hafeez commented Aug 21, 2020

spectorp commented Aug 30, 2020

spectorp commented Jun 8, 2020 •

edited