Skip to content

Conversation

@UncleGrumpy
Copy link
Collaborator

This fixes the bug that causes the VM to crash when the GPIO port driver is stopped on the ESP32 platform (issue #816). Also adds missing NULL check when the GPIOData struct is malloc'd.

These changes are made under both the "Apache 2.0" and the "GNU Lesser General
Public License 2.1 or later" license terms (dual license).

SPDX-License-Identifier: Apache-2.0 OR LGPL-2.1-or-later


ctx->platform_data = NULL;
globalcontext_unregister_process(glb, gpio_atom_index);
free(gpio_data);
Copy link
Collaborator

@fadushin fadushin Feb 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is free-ing platform_data and nullifying the reference supposed to be done here or should it be deferred to context_destroy. Perhaps @pguyot should weigh in on best practices here. There is the reference in the context_destroy comments about interrupts possibly getting a handle on the context, which could be relevant here, since we have a queue attached to this process, but I haven't studied it in detail.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to hear what @pguyot has to say. This is also what is done in the I2C driver, nullifying the reference was very recently added, which pointed me in the direction of using this, and does fix the crash described in #816.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the absence of a response, I say we merge this in. The context_destroy function is defensive about whether to delete platform data if it is already null, so we don’t have to worry about double deletes.

IMHO this module needs to be re-written as a platform Nif so that it will automatically close on a GC after the last reference goes out of scope, but that is a different kettle.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this change is correct but unsufficient. If platform_data is freed (as it currently is in gpiodriver_close), it should be nullified as indeed context_destroy will attempt to free it again if it is not null, hence crash #816.

The reason the free was added in context_destroy is to avoid any memory leak. And free is called at the end of the destroy function, after the process is made unreachable (is removed from the process table), to cope with cases where the platform data is called from interruptions. This is typically the "ugly hack" mentioned in i2c driver.

It looks like there still is a problem with GPIO driver, though. gpiodriver_close loops over listeners like gpiodriver_remove_int, but doesn't do it as well. It does call gpio_isr_handler_remove, but it doesn't call sys_unregister_listener unlike gpiodriver_remove_int.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed. I was unsafely removing the listener.listeners_list_head manually (without the lock), rather than using the safe sys_unregister_listener

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fixed.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have 9 duplicate lines. Do you think we should factorize them?
277-286 and 457-466.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactoring may be a good idea, it could alleviate future discrepancies between the gpiodriver_close and gpiodriver_remove_int handling of removing the listener.

If I were to replace the duplicate code with a common remove_gpio_listener function would it make sense to make that an inline function in this case?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course the factorized code can be inlined.

Adds missing `NULL` check when `struct GPIOData *gpio_data` is malloc'd.

Signed-off-by: Winford <winford@object.stream>
Fixes a hard vm crash when either `gpio:stop/0` or `gpio:close/1` are used on
ESP32 platform, which emits the following error before dumping a stacktrace
(ESP-IDF native, not an erlang stacktrace) in `idf.py monitor`:

assert failed: tlsf_free tlsf.c:1119 (!block_is_free(block) && "block already marked as free")

closes atomvm#816

Signed-off-by: Winford <winford@object.stream>
@bettio bettio merged commit 8d9436a into atomvm:release-0.6 Mar 3, 2024
@UncleGrumpy UncleGrumpy deleted the esp32_gpio_stop branch March 11, 2024 03:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants