Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Train on Colab GPU #141

Closed
spectorp opened this issue Jun 8, 2020 · 9 comments
Closed

Train on Colab GPU #141

spectorp opened this issue Jun 8, 2020 · 9 comments
Labels
enhancement New feature or request

Comments

@spectorp
Copy link

spectorp commented Jun 8, 2020

Hello, I'm interested in training on a Google Colab GPU. Getting the code running on Colab is pretty straightforward, but it doesn't actually run on the GPU and is therefore quite slow. I'm not sure how to change this; could you point me in the right direction? Many thanks.

@AntonMu AntonMu added the issue template not completed Issue template was not completed - missing one or more fields. label Jun 8, 2020
@AntonMu
Copy link
Owner

AntonMu commented Jun 8, 2020

Hi @spectorp - in order to receive help from others, I recommend to complete the issue template. Thanks

@spectorp
Copy link
Author

spectorp commented Jun 8, 2020

Hi @AntonMu , thanks for the quick reply and sorry for not following the issue template. Here's the issue:

Have you followed the instructions exactly (word by word)? Yes

Have you checked the troubleshooting section? Yes

System information

  • What is the top-level directory of the model you are using: 2_Training
  • Have I written custom code (as opposed to using a stock example script provided in the repo): No
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux Ubuntu 18.04.3
  • TensorFlow version (use command below): v2.2.0-0-g2b96f3662b 2.2.0
  • CUDA/cuDNN version: Cuda compilation tools, release 10.1, V10.1.243
  • GPU model and memory: The GPUs available in Colab often include Nvidia K80s, T4s, P4s and P100s. There is no way to choose what type of GPU you can connect to in Colab at any given time. I think memory ranges from 12 to 16 GB.
  • Exact command to reproduce: If you open a new Colab notebook, the following instructions and commands should reproduce the issue:

Set the runtime (Runtime > Change runtime type > select GPU)
!git clone https://github.com/AntonMu/TrainYourOwnYOLO
!pip install -r /content/TrainYourOwnYOLO/requirements.txt
Restart runtime (click restart runtime)
!python /content/TrainYourOwnYOLO/2_Training/Download_and_Convert_YOLO_weights.py
!python /content/TrainYourOwnYOLO/2_Training/Train_YOLO.py

Describe the problem

Despite setting Colab to use a GPU, the code runs on a CPU. Each epoch takes 2-3 minutes to run, and eventually Colab will issue a popup message at the bottom of the window saying: "You are connected to a GPU runtime, but not utilizing the GPU." Any help on this would be appreciated!

Source code / logs

Let me know if there would be any useful logs for me to provide.

@AntonMu
Copy link
Owner

AntonMu commented Jun 9, 2020

Hi - it looks like your cuda version and tensorflow version are both wrong. Check this out for compatibility https://www.tensorflow.org/install/source

@AntonMu AntonMu removed the issue template not completed Issue template was not completed - missing one or more fields. label Jun 9, 2020
@spectorp
Copy link
Author

spectorp commented Jun 9, 2020

Thanks @AntonMu . It works great!

@spectorp spectorp closed this as completed Jun 9, 2020
@AntonMu
Copy link
Owner

AntonMu commented Jun 9, 2020

@spectorp good to hear! If you don't mind, it would be great if you could share the code - either link it here or I can also add it somewhere in the Readme. I will credit you for your work.

Colab is a good option for many people that want to try out this repo.

@AntonMu AntonMu reopened this Jun 9, 2020
@spectorp
Copy link
Author

Hi @AntonMu , I'm new to Colab, and what I've learned is that it doesn't always reliably install the correct version of a package. For example, I recently needed an older version of numpy, and despite running !pip install numpy==1.17.4, every now and then the version would be 1.18.x. Restarting the runtime usually fixes the problem. So, for TrainYourOwnYOLO, the following code should work:

!git clone https://github.com/AntonMu/TrainYourOwnYOLO
!pip install -r /content/TrainYourOwnYOLO/requirements.txt
!python /content/TrainYourOwnYOLO/2_Training/Download_and_Convert_YOLO_weights.py
!python /content/TrainYourOwnYOLO/2_Training/Train_YOLO.py

However, if it doesn't, I would double check the package versions. For me, running !pip install tensorflow-gpu==1.15 fixed the problem.

Also, it's good to know that it's really easy to link a google drive account via:
from google.colab import drive
drive.mount('/content/drive')

@AntonMu AntonMu closed this as completed Jun 22, 2020
@AntonMu AntonMu added the enhancement New feature or request label Jun 22, 2020
@bushra-hafeez
Copy link

Hi, @spectorp
I am training the model on my own dataset according to the instructions further described by AntonMu. But you haven't mentioned anything regarding the annotations part because we do that part on our local machine and not on colab. I would very much like to know how to associate the annotations with colab because even if i upload data_train and data_classes in their respective folders, I sure am going to get errors regarding directories of annotated images in data_train.txt on colab

@spectorp
Copy link
Author

Hi @bushra-hafeez , sorry for the slow reply. You may have figured this out already, but I just linked my Google Drive and transferred the necessary files to Colab. Does that help?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants