Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't change the number of vCPUs #22

Closed
AbubakrHassan opened this issue May 10, 2022 · 7 comments
Closed

Can't change the number of vCPUs #22

AbubakrHassan opened this issue May 10, 2022 · 7 comments

Comments

@AbubakrHassan
Copy link

I'm trying to launch a job that requires multiple CPU cores to run faster, for that I make the executor as follows

xm_local.Vertex(
                requirements=xm.JobRequirements(CPU=vcpu_count)
            )

setting vcpu_count to 1, 8, 32 and 64 doesn't change the actual number of vCPUs allocated for the task. I check the number of CPUs by running

import multiprocessing

multiprocessing.cpu_count()

and also running this in the debug terminal of the job cat /proc/cpuinfo | grep processor | wc -l.
In all cases these two commands return 4 regardless of the changing requirements.

Background:

  • The job launches and executes to completion. Although very slow.
  • During build (after the image is pushed to the container registry) I get this warning message
W0510 14:00:15.198342 140373868750400 http.py:139] Encountered 403 Forbidden with reason "PERMISSION_DENIED"

Followed immediately by

I0510 14:00:15.200866 140373858600512 base.py:80] Creating CustomJob
  • The launched jobs don't show up under the Training Pipelines tab but rather the Custom Jobs tab in Vertex AI -> Training
@andrewluchen
Copy link
Collaborator

From the web page, what is the value for the key Machine type (Worker pool 0)?

@AbubakrHassan
Copy link
Author

it says n1-standard-4

@andrewluchen
Copy link
Collaborator

It's always n1-standard-4 regardless of vcpu count?

@AbubakrHassan
Copy link
Author

AbubakrHassan commented May 11, 2022

Yes, in all the work units I've created with varying cpu values i get that device n1-standard-4

@andrewluchen
Copy link
Collaborator

It looks like XM will default to n1-standard-4 if either CPU or RAM is not set.

https://github.com/deepmind/xmanager/blob/main/xmanager/cloud/vertex.py#L344

workaround: set both instead of just one.

@AbubakrHassan
Copy link
Author

I see,
Thanks Andrew! setting both values did lead to a different machine to be allocated.

@andrewluchen
Copy link
Collaborator

andrewluchen commented May 31, 2022

Closing with commit

27979f2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants