Marllib seems never uses gpu devices #203

wmzfight · 2023-11-23T07:03:31Z

When I use marrlib to train a model on my custom env, it always shows "(0.0/1.0 accelerator_type:V100S)". I have checked all the possible configs to use GPU devices. Both pytorch and tensorflow can detect all the GPUs on my computer.

However, when I use the example provided by Marllib to train a model on mpe env, it provides the same info. So do you have any idea how to solve this problem?

fortyMiles · 2023-12-02T03:35:19Z

Hi, it's so glad to hear your reply.

Please check your configurations to make sure you have utilized the GPU.

libin-star · 2024-01-02T03:05:04Z

I have another one question, when I use 2gpu to train the MAPPO algorithm, but it uses the more training time than the one gpu. This is why? So could you have any solutions to solve this problem, thank you.

satpreetsingh · 2024-03-05T22:09:45Z

I have the same issue, I'm using the following code to kick off a job. No GPU utilization increase seen, and mostly only 1 CPU is used at 100%, others stay idle


from marllib import marl
env = marl.make_env(environment_name="mpe", map_name="simple_spread", force_coop=True)
mappo = marl.algos.mappo(hyperparam_source='mpe')
model = marl.build_model(env, mappo, {"core_arch": "gru", "encode_layer": "128-256"})

# start training
mappo.fit(env, model, 
  stop={'timesteps_total': 100000}, 
  checkpoint_freq=100,
  share_policy='group', 
  num_gpus=1,
  num_workers=32,
  )

E.g. top output

top - 17:09:04 up 6 days,  1:03,  5 users,  load average: 1.16, 1.05, 0.88
Tasks: 556 total,   2 running, 554 sleeping,   0 stopped,   0 zombie
%Cpu(s):  3.6 us,  0.5 sy,  0.0 ni, 95.9 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem : 257572.3 total, 249544.2 free,   3948.5 used,   4079.7 buff/cache
MiB Swap:   2048.0 total,   2048.0 free,      0.0 used. 251594.3 avail Mem 

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                       
 186567 satsingh  20   0   78.8g 889732 189528 R 100.7   0.3   1:46.04 python                        
 186582 satsingh  20   0  918108  27648  11264 S   1.0   0.0   0:00.74 gcs_server                    
 186594 satsingh  20   0   77.9g  73984  11776 S   1.0   0.0   0:01.41 raylet                        
 186685 satsingh  20   0   76.1g  76116  35840 S   1.0   0.0   0:00.92 ray::IDLE                     
 186692 satsingh  20   0   76.1g  76604  36096 S   1.0   0.0   0:00.94 ray::IDLE                     
 186695 satsingh  20   0   76.1g  76628  36352 S   1.0   0.0   0:00.91 ray::IDLE                     
 186717 satsingh  20   0   76.1g  75864  35840 S   1.0   0.0   0:00.97 ray::IDLE                     
 186726 satsingh  20   0   76.1g  76100  35840 S   1.0   0.0   0:00.92 ray::IDLE                     
 186727 satsingh  20   0   76.1g  76080  35840 S   1.0   0.0   0:00.95 ray::IDLE                     
 186730 satsingh  20   0   76.1g  76760  36352 S   1.0   0.0   0:00.92 ray::IDLE                     
 186731 satsingh  20   0   76.1g  76384  36096 S   1.0   0.0   0:00.93 ray::IDLE

Theohhhu assigned fortyMiles Dec 3, 2023

Theohhhu added the help wanted Extra attention is needed label Dec 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Marllib seems never uses gpu devices #203

Marllib seems never uses gpu devices #203

wmzfight commented Nov 23, 2023

fortyMiles commented Dec 2, 2023

libin-star commented Jan 2, 2024 •

edited

satpreetsingh commented Mar 5, 2024

Marllib seems never uses gpu devices #203

Marllib seems never uses gpu devices #203

Comments

wmzfight commented Nov 23, 2023

fortyMiles commented Dec 2, 2023

libin-star commented Jan 2, 2024 • edited

satpreetsingh commented Mar 5, 2024

libin-star commented Jan 2, 2024 •

edited