Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tasks fail when running an agent inside Notebook 2: Remote Agent #1204

Open
sobek1886 opened this issue Feb 14, 2024 · 4 comments
Open

Tasks fail when running an agent inside Notebook 2: Remote Agent #1204

sobek1886 opened this issue Feb 14, 2024 · 4 comments
Labels
bug Something isn't working

Comments

@sobek1886
Copy link

Describe the bug

A clear and concise description of what the bug is.

Hi. I'm going through the getting started Colab notebooks. I run into an issue when trying to execute tasks using an agent set-up through the 2nd tutorial notebook.

I start the agent using

!clearml-agent daemon --queue "default" --foreground

Then, every time when I enqueue a task, it fails. Example terminal output:

Executing task id [bfd4a833c8ba497794ecda6b4332b3aa]:
repository = 
branch = 
version_num = 
tag = 
docker_cmd = 
entry_point = colab_kernel_launcher.py
working_dir = .

::: Using Cached environment /root/.clearml/venvs-cache/6323bc2138a003ed039770fdc7da9483.b0b672ff6952198853ffbbe536f4e324 :::


Adding venv into cache: /root/.clearml/venvs-builds/3.10
Running task id [bfd4a833c8ba497794ecda6b4332b3aa]:
[.]$ /root/.clearml/venvs-builds/3.10/bin/python -u /root/.clearml/venvs-builds/3.10/code/colab_kernel_launcher.py
Summary - installed python packages:
pip:
- asttokens==2.4.1
- attrs==23.2.0
- cachetools==5.3.2
- certifi==2024.2.2
- charset-normalizer==3.3.2
- clearml==1.14.3
- Cython==3.0.8
- decorator==5.1.1
- exceptiongroup==1.2.0
- executing==2.0.1
- furl==2.1.3
- google-api-core==2.17.0
- google-auth==2.27.0
- google-cloud-core==2.4.1
- google-cloud-storage==2.8.0
- google-crc32c==1.5.0
- google-resumable-media==2.7.0
- googleapis-common-protos==1.62.0
- idna==3.6
- ipykernel==5.5.6
- ipython==8.21.0
- ipython-genutils==0.2.0
- jedi==0.19.1
- jsonschema==4.21.1
- jsonschema-specifications==2023.12.1
- jupyter_client==8.6.0
- jupyter_core==5.7.1
- matplotlib-inline==0.1.6
- numpy==1.26.4
- orderedmultidict==1.0.1
- parso==0.8.3
- pathlib2==2.3.7.post1
- pexpect==4.9.0
- pillow==10.2.0
- platformdirs==4.2.0
- prompt-toolkit==3.0.43
- protobuf==4.25.2
- psutil==5.9.8
- ptyprocess==0.7.0
- pure-eval==0.2.2
- pyasn1==0.5.1
- pyasn1-modules==0.3.0
- Pygments==2.17.2
- PyJWT==2.8.0
- pyparsing==3.1.1
- python-dateutil==2.8.2
- PyYAML==6.0.1
- pyzmq==23.2.1
- referencing==0.33.0
- requests==2.31.0
- rpds-py==0.18.0
- rsa==4.9
- six==1.16.0
- stack-data==0.6.3
- tornado==6.4
- traitlets==5.14.1
- urllib3==2.2.0
- wcwidth==0.2.13

Environment setup completed successfully

Starting Task Execution:

[ColabKernelApp] CRITICAL | Bad config encountered during initialization: The 'kernel_class' trait of <__main__.ColabKernelApp object at 0x7c5aa3978160> instance must be a type, but 'google.colab._kernel.Kernel' could not be imported

Leaving process id 3941
DONE: Running task 'bfd4a833c8ba497794ecda6b4332b3aa', exit status 1

Expected behaviour

What is the expected behaviour? What should've happened but didn't?
I expected the tasks to execute successfully.

Environment

  • Server type (app.clear.ml)
  • Colab
@sobek1886 sobek1886 added the bug Something isn't working label Feb 14, 2024
@Jonasmpi
Copy link

Having the same issue here following the tutorial for remote colab agents

@tkukurin
Copy link
Contributor

tkukurin commented Mar 3, 2024

The problem is the diff you probably have in your task, sent #1220 to hopefully fix.

This is the diff ("uncomitted changes") you'll probably see if you open the task in your ClearML project:

# Copyright 2023 Google Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Custom kernel launcher app to customize socket options."""

from ipykernel import kernelapp
import zmq


# We want to set the high water mark on *all* sockets to 0, as we don't want
# the backend dropping any messages. We want to set this before any calls to
# bind or connect.
#
# In principle we should override `init_sockets`, but it's hard to set options
# on the `zmq.Context` there without rewriting the entire method. Instead we
# settle for only setting this on `iopub`, as that's the most important for our
# use case.
class ColabKernelApp(kernelapp.IPKernelApp):

  def init_iopub(self, context):
    context.setsockopt(zmq.RCVHWM, 0)
    context.setsockopt(zmq.SNDHWM, 0)
    return super().init_iopub(context)


if __name__ == '__main__':
  ColabKernelApp.launch_instance()

@Jonasmpi
Copy link

Jonasmpi commented Mar 3, 2024

The problem is the diff you probably have in your task, sent #1220 to hopefully fix.

This is the diff ("uncomitted changes") you'll probably see if you open the task in your ClearML project:

# Copyright 2023 Google Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Custom kernel launcher app to customize socket options."""

from ipykernel import kernelapp
import zmq


# We want to set the high water mark on *all* sockets to 0, as we don't want
# the backend dropping any messages. We want to set this before any calls to
# bind or connect.
#
# In principle we should override `init_sockets`, but it's hard to set options
# on the `zmq.Context` there without rewriting the entire method. Instead we
# settle for only setting this on `iopub`, as that's the most important for our
# use case.
class ColabKernelApp(kernelapp.IPKernelApp):

  def init_iopub(self, context):
    context.setsockopt(zmq.RCVHWM, 0)
    context.setsockopt(zmq.SNDHWM, 0)
    return super().init_iopub(context)


if __name__ == '__main__':
  ColabKernelApp.launch_instance()

From my side the issue also occured with normal projects, it was just public replicable on the tutorial too.
Will try again tomorrow

@pollfly
Copy link
Contributor

pollfly commented May 9, 2024

Hey @sobek1886! Just letting you know that this issue has been resolved in v1.15.0. Let us know if there are any issues :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants