Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No progress bar when training on Google Colab #1112

Closed
Vijayabhaskar96 opened this issue Mar 10, 2020 · 20 comments · Fixed by #1093
Closed

No progress bar when training on Google Colab #1112

Vijayabhaskar96 opened this issue Mar 10, 2020 · 20 comments · Fixed by #1093
Labels
bug Something isn't working help wanted Open to be worked on

Comments

@Vijayabhaskar96
Copy link

Vijayabhaskar96 commented Mar 10, 2020

🐛 Bug

To Reproduce

Steps to reproduce the behavior:

  1. Go to https://colab.research.google.com/drive/1W-_30tbOBMz_t0_yozzwJzlcu6m3xd8W
  2. Run the Trainer section of the MNIST
  3. It downloads the MNIST dataset and keeps spinning for a while and thats it, no progress bar or anything.

Environment

Google Colab, with current github version of pytorch-lightning installed.
PyTorch version: 1.4.0
Is debug build: No
CUDA used to build PyTorch: 10.1

OS: Ubuntu 18.04.3 LTS
GCC version: (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0
CMake version: version 3.12.0

Python version: 3.6
Is CUDA available: Yes
CUDA runtime version: 10.0.130
GPU models and configuration: GPU 0: Tesla K80
Nvidia driver version: 418.67
cuDNN version: /usr/lib/x86_64-linux-gnu/libcudnn.so.7.6.5

Versions of relevant libraries:
[pip3] numpy==1.17.5
[pip3] pytorch-lightning==0.7.1
[pip3] torch==1.4.0
[pip3] torchsummary==1.5.1
[pip3] torchtext==0.3.1
[pip3] torchvision==0.5.0
[conda] Could not collect

@Vijayabhaskar96 Vijayabhaskar96 added bug Something isn't working help wanted Open to be worked on labels Mar 10, 2020
@github-actions
Copy link
Contributor

Hi! thanks for your contribution!, great first issue!

@williamFalcon
Copy link
Contributor

we test this very rigorously... look at the docs and the MNIST example then check your code.

@jwallat
Copy link

jwallat commented Mar 10, 2020

I'm also having issues with the progress bar. Instead of a progress bar, I got

HBox(children=(FloatProgress(value=1.0, bar_style='info', layout=Layout(flex='2'), max=1.0), HTML(value='')), …

This happened to me only when using tpus and num_tpu_cores=8 (1 tpu core works just as expected). Interestingly, it is just the epoch progress bar, the validation progress bar shows as intended.
To reproduce:

  1. Open the mnist tpu notebook
  2. Factory reset to clear all saved states
  3. Run all

@Vijayabhaskar96
Copy link
Author

@williamFalcon I literally took the "MNIST on TPU" from the docs page and ran in on Google colab, and it showed no progress bar or anything.

@lezwon
Copy link
Contributor

lezwon commented Mar 11, 2020

I am having the same issue on Sagemaker too. No Progress bar, just the HBox(children=(FloatProgress(value=1.0, bar_style='info', layout=Layout(flex='2'), max=1.0), HTML(value='')) text.

@luiscape
Copy link
Contributor

@jwallat and @lezwon I'm suspicious the issue happens upstream with tqdm itself. The issue seems to be related with the cancelling (or failing) of a running operation. They suggest the following workaround:

tqdm/tqdm#548 (comment)

We could implement that into Lightning but other issue such as this one make me pessimistic about it working.

Any ideas?

@lezwon
Copy link
Contributor

lezwon commented Mar 12, 2020

@luiscape I have another notebook (without PL) wherein I use tqdm. The progress bar seems to be working fine there. Not sure why it isn't working with PL.
Used it in the following way:
for bi, data in tqdm(enumerate(data_loader), total=int(len(dataset) / data_loader.batch_size)):

@Borda
Copy link
Member

Borda commented Mar 14, 2020

check following fix, it should help #1093

@Borda Borda linked a pull request Mar 14, 2020 that will close this issue
5 tasks
@lezwon
Copy link
Contributor

lezwon commented Mar 16, 2020

@Borda I tried it on NextJournal, it still shows HBox(children=(FloatProgress(value=0.0, description='Validating', layout=Layout(flex='2'), max=157.0, style=Pr…

Screenshot 2020-03-17 at 2 12 35 AM

@williamFalcon
Copy link
Contributor

i don’t really know what this is either. this seems to be a colab thing.
just restart the environment

@williamFalcon
Copy link
Contributor

i see this issue when:

  1. i’m training
  2. colab times out or i stop execution or something
  3. then restart training.

fixes when i reset environment.

this is not a lightning issue though... might just be tqdm or colab

@lezwon
Copy link
Contributor

lezwon commented Mar 17, 2020

@williamFalcon I have tried this on Sagemaker and Nextjournal. tqdm works fine if I run it myself. When using Lightning though it shows the HBox text. I have tried restarting kernel, upgrading tqdm etc. does not seem to work.

@lezwon
Copy link
Contributor

lezwon commented Mar 17, 2020

This seems to be a tqdm issue.
If I do from tqdm import tqdm it seems to work fine. Lightning, however, imports tqdm via from tqdm.auto import tqdm (link) which in turn imports tqdm via notebook from .notebook import tqdm, trange (link). When I run tqdm via notebook import, I get the HBox text.

@hemanthyernagula
Copy link

Even though I'm using from tqdm import tqdm I'm facing same issue, is there any other suggestions from anyone?
Thanks in advance😊

@Borda
Copy link
Member

Borda commented Aug 25, 2020

mind upgrade on 0.9 and try again... :]

@czrcbl
Copy link

czrcbl commented Oct 16, 2020

Hi, I am facing the same problem, no progress bar on Jupyter Lab.

HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Validation sanity check', layout=Layout…

I am using pytorch_lightning 1.0.2;
Jupyter Lab 2.2.6;
Torch: 1.6.0;
Python: 3.6;

@lezwon
Copy link
Contributor

lezwon commented Oct 16, 2020

@czrcbl are you using sagemaker? or running jupyter lab locally?

@czrcbl
Copy link

czrcbl commented Oct 16, 2020

I am running the Jupyter Lab in an Amazon ECS instance and connecting to it through ssh port forwarding.

@lezwon
Copy link
Contributor

lezwon commented Oct 17, 2020

I think this is because the ipywidgets notebook extension has not been enabled. Can you execute the instructions mentioned here before installing lightning? Let us know if it works :)

@krishnakalyan3
Copy link
Contributor

krishnakalyan3 commented Nov 12, 2020

This worked for me
Thanks @lezwon

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Open to be worked on
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants