Synchronous mode - tick hangs #2809

Vaan5 · 2020-05-04T06:06:37Z

Intro

I am running a simple script in synchronous mode which does the following:

spawn an ego vehicle
in a loop do:
- set velocity
- initiate tick
- wait to receive data
- store it.

Here is the code (i've simplified the synchronous mode example):

import carla
import queue


snapshots = queue.Queue()


def push_snapshot(snapshot):
    snapshots.put(snapshot)


def get_position(frame_id, vehicle_id):
    snapshot = snapshots.get(timeout=2)
    assert(frame_id == snapshot.frame)
    transform = snapshot.find(vehicle_id).get_transform()
    return transform.location.x, transform.location.y, transform.location.z


def main():
    sim_time = 0.0
    sim_time_end = 20.0
    delta_time = 0.05
    results = []

    client = carla.Client("127.0.0.1", 2000)
    world = client.get_world()

    # set mode to synchronous
    frame_id = world.apply_settings(carla.WorldSettings(True, False, delta_time))
    print("Synchronous mode will be applied in frame: {}".format(frame_id))

    # spawn car
    spectator = world.get_spectator()
    blueprint_library = world.get_blueprint_library()
    map_ = world.get_map()
    vehicles = blueprint_library.filter("vehicle")
    bp = vehicles[0]
    vehicle = world.spawn_actor(bp, map_.get_spawn_points()[0])
    world.tick()    # just so that I see it in the simulator

    world.on_tick(push_snapshot)

    cmd = [carla.command.ApplyVelocity(vehicle.id, carla.Vector3D(y=2.0))]    # constant for the minimal example
    while sim_time <= sim_time_end:
        client.apply_batch_sync(cmd)
        print("before tick")
        frame_id = world.tick()   # on newer versions raises an Exception from which you can't rly recover
        print("after tick {}".format(frame_id))
        sim_time += 0.05
        results.append((sim_time, get_position(frame_id, vehicle.id)))

    world.apply_settings(carla.WorldSettings(False, False, 0))
    vehicle.destroy()


if __name__ == '__main__':
    main()

Problem

The problem happens when running this multiple times, sometimes the tick call will just hang. I originally encountered this with the C++ API (0.9.7). After that, I was able to reproduce it with the PythonAPI on 0.9.7, 0.9.8, 0.9.9 as well.

In case of 0.9.7, the script would just hang forever. With the newer versions, an exception would be thrown because of the recently added tick timeout (#2556). But in my opinion this just hides the root problem.

In my opinion this is really a big shortcoming of Carla and the synchronous mode, and it basically makes the synchronous mode not that usable. If you want to have reproducible results, it is not possible as the frame (corresponding to the "hanging" tick call) would be dropped. The exception being thrown in newer versions due to a timeout is also not useful as you can't really recover from it.

I've already found a few related issues:

Added timeout to World Tick #2556
Improve sync mode performance #2070 - sadly closed, but I can see that the author and @nsubiron also commented about the same problem. It seems like the issue was forgotten.
Fixes for synchronous mode #1803
Synchronous_mode gets stuck in RL training #2581
Synchronous_mode gets stuck in endless loop #2038

So here are my questions:

Do you agree with me that this is a problem (and can be reproduced)?
Is anything already planned regarding the fix (as I can see in the linked issue that @nsubiron also suggests getting rid of the cause of the problem instead of applying workarounds?

Additional info

IMPORTANT: The described problem doesn't occur all the time. You might have to run the script multiple times to see it. I typically run it like:

FOR /L %v IN (1,1,50) DO python tick_hangs.py

Environment

OS: Win 10
Python: 3.7
Carla versions:

0.9.7 (locally built from 1f7669e) - tried both the python and C++ API
0.9.8 (official package) - tried only the python API
0.9.9 (official package) - tried only the python API

The text was updated successfully, but these errors were encountered:

nsubiron · 2020-05-05T07:59:52Z

I totally agree, the synchronous mode should not drop images, that breaks significantly the functionality. I've been working on a fix cause this has been bothering me a lot lately, but so far I couldn't get anything that works well (doing it on my free time so couldn't put a lot of effort). The problem is that needs to work both for sync and async modes, and async mode needs to drop images, so it's not so easy to find a solution that works well for both.

hh0rva1h · 2020-06-17T19:18:07Z

@nsubiron Would be really great if you could escalate the priority of this issue among devs as this is a major problem for reinforcement learning research with Carla.

germanros1987 · 2020-06-25T01:20:03Z

@doterop This issue has high priority, could you and @bernatx take a look at this?

hh0rva1h · 2020-06-25T07:14:18Z

@germanros1987 Thanks for triaging this. We looked again into it last night and found a non documented way to handle it:
@doterop added a timeout for ticks in march: 5d238b2 so the following code works around the problem:

try:
    world.tick(0.1)
except:
    print("WARNING: tick not received")

It seems the tick is done in the simulator just that the client never gets the signal.

Maybe for the time being until this is fixed

In the API it should be documented that world.tick() takes a timeout argument
This issue / workaround should be documented in the sync mode documentation

Note that we also tested to recompile Carla with the workaround proposed in #2070 and it did not reliably workaround the issue.

nsubiron · 2020-06-25T15:24:35Z

I can comment on what I've found checking this not long ago.

The origin of the frame drops is here

carla/LibCarla/source/carla/streaming/detail/tcp/ServerSession.cpp

Lines 80 to 83 in baf43b0

    
           if (_is_writing) { 
        
             log_debug("session", _session_id, ": connection too slow: message discarded"); 
        
             return; 
        
           }

That boolean is set to false when the completion callback is called and the socket starts accepting messages again. In async mode this is fine cause we cannot keep an infinite queue anyway (though it would be nice if the number of drops could be reduced), but in sync mode this can cause a deadlock. So I tried adding a double-buffer for messages, basically a queue of 2 elements, that should have solved the issue in sync mode since the server would hang until the client received the message and you have a buffer of 1 frame that compensates (since render is async you could still overlap), but to my surprise the frame drops still happened. I think one possible cause is that the callback is not called immediately after sending the message and somehow can be postponed, this is handled by the ASIO context, so even though the client already detected and forwarded the message to the API the server session is still marked as "_is_writing" when reading the next message.

doterop · 2020-07-07T16:23:00Z

Hi @Vaan5,

I've been struggling the last few days trying to reproduce this error. Although all points out in the direction that @nsubiron mentioned before, it would very handy for me to have a scenario where I can reproduce the error and therefore create a proper solution for both sync and async mode.

If you don't have this case, no worries. :)

Thanks!

Vaan5 · 2020-07-09T08:24:41Z

Hi, sry for the late reply.

Sadly i don't have a different scenario. The script i wrote in the first post is the smallest I could think of.

I tried it again yesterday with:

Carla 0.9.9.4 (downloaded from the github link)
Python 3.7.7
Windows 10 (sry I don't have an ubuntu so I can't try it on one)

Here is exactly what i did:

Start Carla Server
Open console; add the carla egg to the PYTHONPATH variable
Run python tick_hangs.py // I named the script tick_hangs.py (i just copied the code from my first post into it - without any modifications) 100 times in a loop (FOR /L %v IN (1,1,100) DO python tick_hangs.py)

Out of the 100 runs, it happened once (0 times on the second try, 2 times on the third). So it still happens. Here is an output file of the 100 runs: out.txt (just search for RuntimeError or Error).

Edit: It also occurrs if you those setter-like functions are used instead of commands (like in #2812) - might come in handy for async mode checks.

hh0rva1h · 2020-07-09T20:30:10Z

@doterop Just run the following script:

import carla

# connect to Carla
client = carla.Client("localhost", 2000)
world = client.load_world("Town04")

# set fixed time step length
settings = world.get_settings()
settings.fixed_delta_seconds = 0.05
settings.synchronous_mode = True
settings.no_rendering_mode = True
world.apply_settings(settings)

# do ticks, will deadlock eventually (will timeout after 10 seconds)
while True:
    print("Issuing tick ...")
    world.tick()
    print("Tick successful")

For me it takes less than a minute to trigger the issue:

Issuing tick ...
Tick successful
Issuing tick ...
Tick successful
Issuing tick ...
Traceback (most recent call last):
  File "tick_hang.py", line 17, in <module>
    world.tick()
RuntimeError: time-out of 10000ms while waiting for the simulator, make sure the simulator is ready and connected to localhost:2000

How easy it is to trigger that issue might depend on the hardware: on my laptop with integrated graphics only this is very easy trigger, on very fast machines with dedicated gpu it can take long to trigger.

stale · 2020-09-11T02:38:00Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

hh0rva1h · 2020-09-12T20:47:57Z

@bernatx Where did this end up in? I failed to find the commit in master / dev, but the pull request ist closed?

hh0rva1h · 2020-10-02T07:20:03Z

@germanros1987 @bernatx I tested release 0.9.10 for the issue and it's still happening. Did the linked commit not make it into the release?

bernatx · 2020-10-02T12:56:01Z

@hh0rva1h I have been running the script several times, each time for 50 times running around 20 seconds, and never happened the problem. I have tried in Town03 and also in Town04.
I'm using version 0.9.10.1 where we add a fix for a camera freeze we detected, but I don't think it is related to this.
I will do more tests in Ubuntu also and let you know.
In the meantime, can you check also that 0.9.10.1 version?

hh0rva1h · 2020-10-02T13:35:13Z

@bernatx Thanks for letting me know, I fetched 0.9.10.1 now, but it's still happening to me:
https://youtu.be/X3WZ-tmltWo
I guess 20 seconds are not long enough, it can take time until the problem occurs, also keep in mind that the problem is harder to trigger on beefy machines. I am reproducing this on an XPS 13 9360 from 2016, a dual core i5 machine with no dedicated graphics.

bernatx · 2020-10-02T18:56:09Z

ok, I have been able to reproduce using a package a let the script a bit more of time to run. It happens at random, but now I can try to check further. I will tell you what I find, thank you.

bernatx · 2020-10-05T08:55:27Z

Hi, I have opened a PR fixing this problem (#3394).
It seems it was a race condition and a wrong management of a handler, that was queued instead of executed in real time.

hh0rva1h · 2020-10-11T15:23:10Z

@bernatx Tried the nightly build, looks good so far. Thanks very much!

nsubiron · 2020-10-19T16:41:57Z

Left some comments on #3394, it fixes this issue, but I think introduces other issues.

Vaan5 · 2020-12-01T12:16:17Z

Probably isn't of much help (as I'm quite late): tried it with 0.9.10 and I couldn't reproduce it anymore.

varunjammula · 2021-01-28T05:38:02Z

This behavior is not deterministic. The tick hang occurs most of the times but the intended behavior magically works only some times

stale · 2021-04-26T15:33:35Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

egeonat · 2021-05-07T11:38:26Z

I'm running CARLA version 0.9.11 on Ubuntu and I am still experiencing this issue in synchronous mode with 1/20 fixed_delta_seconds. Is there any progress? Were the issues specified in the comments by @nsubiron fixed?

varunjammula · 2021-05-07T12:12:22Z

@egeonat Did you try the synchronous_mode in the PythonAPI examples folder? They changed the way the synchronous mode works, I think. If you follow the example, it should work and this issue will not happen again. (https://github.com/carla-simulator/carla/blob/master/PythonAPI/examples/synchronous_mode.py)

egeonat · 2021-05-07T17:26:33Z

@varunjammula I am trying to use CARLA for RL training so my code has some differences, but I have mostly followed the synchronous_mode.py example. I believe my problem might be slightly different than the original poster, as when I call world.tick() the call doesn't timeout. It simply hangs forever. That is why I was wondering if perhaps new issues were introduced to the simulator with the PR, as suggested by nsublron.

stale · 2021-07-21T01:38:54Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

hh0rva1h · 2021-08-19T11:52:24Z

@egeonat You can specify a timeout for world.tick, e.g. world.tick(1), then it wont simpy hang forever. You can catch that with a try block.

stale · 2022-01-09T10:26:29Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Julbak · 2023-03-29T15:20:40Z

Jumped to the version 0.9.14 and managed to hang the server, task manager and closing the process works. Exactly the same code worked with 0.9.13. I'm using the quick start package installation as the other option was painful beyond anything.

stale · 2023-06-01T16:20:40Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

SExpert12 · 2023-12-31T04:50:09Z

Intro

I am running a simple script in synchronous mode which does the following:

spawn an ego vehicle

in a loop do:

set velocity

initiate tick

wait to receive data

store it.

Here is the code (i've simplified the synchronous mode example):
import carla
import queue


snapshots = queue.Queue()


def push_snapshot(snapshot):
    snapshots.put(snapshot)


def get_position(frame_id, vehicle_id):
    snapshot = snapshots.get(timeout=2)
    assert(frame_id == snapshot.frame)
    transform = snapshot.find(vehicle_id).get_transform()
    return transform.location.x, transform.location.y, transform.location.z


def main():
    sim_time = 0.0
    sim_time_end = 20.0
    delta_time = 0.05
    results = []

    client = carla.Client("127.0.0.1", 2000)
    world = client.get_world()

    # set mode to synchronous
    frame_id = world.apply_settings(carla.WorldSettings(True, False, delta_time))
    print("Synchronous mode will be applied in frame: {}".format(frame_id))

    # spawn car
    spectator = world.get_spectator()
    blueprint_library = world.get_blueprint_library()
    map_ = world.get_map()
    vehicles = blueprint_library.filter("vehicle")
    bp = vehicles[0]
    vehicle = world.spawn_actor(bp, map_.get_spawn_points()[0])
    world.tick()    # just so that I see it in the simulator

    world.on_tick(push_snapshot)

    cmd = [carla.command.ApplyVelocity(vehicle.id, carla.Vector3D(y=2.0))]    # constant for the minimal example
    while sim_time <= sim_time_end:
        client.apply_batch_sync(cmd)
        print("before tick")
        frame_id = world.tick()   # on newer versions raises an Exception from which you can't rly recover
        print("after tick {}".format(frame_id))
        sim_time += 0.05
        results.append((sim_time, get_position(frame_id, vehicle.id)))

    world.apply_settings(carla.WorldSettings(False, False, 0))
    vehicle.destroy()


if __name__ == '__main__':
    main()
Problem

The problem happens when running this multiple times, sometimes the tick call will just hang. I originally encountered this with the C++ API (0.9.7). After that, I was able to reproduce it with the PythonAPI on 0.9.7, 0.9.8, 0.9.9 as well.

In case of 0.9.7, the script would just hang forever. With the newer versions, an exception would be thrown because of the recently added tick timeout (#2556). But in my opinion this just hides the root problem.

In my opinion this is really a big shortcoming of Carla and the synchronous mode, and it basically makes the synchronous mode not that usable. If you want to have reproducible results, it is not possible as the frame (corresponding to the "hanging" tick call) would be dropped. The exception being thrown in newer versions due to a timeout is also not useful as you can't really recover from it.

I've already found a few related issues:

Added timeout to World Tick #2556

Improve sync mode performance #2070 - sadly closed, but I can see that the author and @nsubiron also commented about the same problem. It seems like the issue was forgotten.

Fixes for synchronous mode #1803

Synchronous_mode gets stuck in RL training #2581

Synchronous_mode gets stuck in endless loop #2038

So here are my questions:

Do you agree with me that this is a problem (and can be reproduced)?

Is anything already planned regarding the fix (as I can see in the linked issue that @nsubiron also suggests getting rid of the cause of the problem instead of applying workarounds?

Additional info

IMPORTANT: The described problem doesn't occur all the time. You might have to run the script multiple times to see it. I typically run it like:
FOR /L %v IN (1,1,50) DO python tick_hangs.py
Environment

OS: Win 10 Python: 3.7 Carla versions:

0.9.7 (locally built from 1f7669e) - tried both the python and C++ API

0.9.8 (official package) - tried only the python API

0.9.9 (official package) - tried only the python API

Hi,

I want to retrieve data after every 10ms in my code. In this code I am not able to understand that why there is world_tick() and world_on_tick() after the vehicle spawn statement is written?
Can any one help me to understand?

How to write code if I need to update vehicle location after every 10 ms, so I can change the car crossing order?

Thanks in advance

germanros1987 assigned doterop Jun 25, 2020

germanros1987 added critical issue with high priority bug labels Jun 25, 2020

germanros1987 added this to the 0.9.10 milestone Jun 25, 2020

bernatx mentioned this issue Jul 31, 2020

Bernat/fix sync deadlock #3144

Closed

stale bot added the stale Issue has not had recent activity label Sep 11, 2020

stale bot removed the stale Issue has not had recent activity label Sep 12, 2020

stale bot added the stale Issue has not had recent activity label Apr 26, 2021

stale bot removed the stale Issue has not had recent activity label May 7, 2021

stale bot added the stale Issue has not had recent activity label Jul 21, 2021

stale bot removed the stale Issue has not had recent activity label Aug 19, 2021

stale bot added the stale Issue has not had recent activity label Jan 9, 2022

shh1v mentioned this issue Jul 13, 2022

Vehicles on autopilot crash into walls and each other HARPLab/DReyeVR#41

Closed

stale bot removed the stale Issue has not had recent activity label Mar 29, 2023

stale bot added the stale Issue has not had recent activity label Jun 1, 2023

stale bot removed the stale Issue has not had recent activity label Dec 31, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Synchronous mode - tick hangs #2809

Synchronous mode - tick hangs #2809

Vaan5 commented May 4, 2020

nsubiron commented May 5, 2020

hh0rva1h commented Jun 17, 2020

germanros1987 commented Jun 25, 2020

hh0rva1h commented Jun 25, 2020 •

edited

Loading

nsubiron commented Jun 25, 2020

doterop commented Jul 7, 2020

Vaan5 commented Jul 9, 2020 •

edited

Loading

hh0rva1h commented Jul 9, 2020 •

edited

Loading

stale bot commented Sep 11, 2020

hh0rva1h commented Sep 12, 2020

hh0rva1h commented Oct 2, 2020

bernatx commented Oct 2, 2020

hh0rva1h commented Oct 2, 2020 •

edited

Loading

bernatx commented Oct 2, 2020

bernatx commented Oct 5, 2020

hh0rva1h commented Oct 11, 2020

nsubiron commented Oct 19, 2020

Vaan5 commented Dec 1, 2020

varunjammula commented Jan 28, 2021

stale bot commented Apr 26, 2021

egeonat commented May 7, 2021

varunjammula commented May 7, 2021 •

edited

Loading

egeonat commented May 7, 2021

stale bot commented Jul 21, 2021

hh0rva1h commented Aug 19, 2021

stale bot commented Jan 9, 2022

Julbak commented Mar 29, 2023

stale bot commented Jun 1, 2023

SExpert12 commented Dec 31, 2023

Intro

Problem

Additional info

Environment

Synchronous mode - tick hangs #2809

Synchronous mode - tick hangs #2809

Comments

Vaan5 commented May 4, 2020

Intro

Problem

Additional info

Environment

nsubiron commented May 5, 2020

hh0rva1h commented Jun 17, 2020

germanros1987 commented Jun 25, 2020

hh0rva1h commented Jun 25, 2020 • edited Loading

nsubiron commented Jun 25, 2020

doterop commented Jul 7, 2020

Vaan5 commented Jul 9, 2020 • edited Loading

hh0rva1h commented Jul 9, 2020 • edited Loading

stale bot commented Sep 11, 2020

hh0rva1h commented Sep 12, 2020

hh0rva1h commented Oct 2, 2020

bernatx commented Oct 2, 2020

hh0rva1h commented Oct 2, 2020 • edited Loading

bernatx commented Oct 2, 2020

bernatx commented Oct 5, 2020

hh0rva1h commented Oct 11, 2020

nsubiron commented Oct 19, 2020

Vaan5 commented Dec 1, 2020

varunjammula commented Jan 28, 2021

stale bot commented Apr 26, 2021

egeonat commented May 7, 2021

varunjammula commented May 7, 2021 • edited Loading

egeonat commented May 7, 2021

stale bot commented Jul 21, 2021

hh0rva1h commented Aug 19, 2021

stale bot commented Jan 9, 2022

Julbak commented Mar 29, 2023

stale bot commented Jun 1, 2023

SExpert12 commented Dec 31, 2023

Intro

Problem

Additional info

Environment

hh0rva1h commented Jun 25, 2020 •

edited

Loading

Vaan5 commented Jul 9, 2020 •

edited

Loading

hh0rva1h commented Jul 9, 2020 •

edited

Loading

hh0rva1h commented Oct 2, 2020 •

edited

Loading

varunjammula commented May 7, 2021 •

edited

Loading