Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Anjay-esp32-client stop working after registration update failure #2

Closed
AxelLin opened this issue Jan 21, 2022 · 11 comments
Closed

Anjay-esp32-client stop working after registration update failure #2

AxelLin opened this issue Jan 21, 2022 · 11 comments

Comments

@AxelLin
Copy link
Contributor

AxelLin commented Jan 21, 2022

I hit registration update failure when there is a temporary network interface down, the Anjay-esp32-client stop working.

Below is the error mesages I observed:
E (105533) anjay: ERROR [avs_net] [/home/axel/Anjay-esp32-client/main/anjay/deps/avs_commons/src/net/compat/posix/avs_net_impl.c:1198]: send failed
W (105544) anjay: WARNING [anjay] [/home/axel/Anjay-esp32-client/main/anjay/src/core/servers/anjay_register.c:878]: failure while receiving Update response: No route to host
E (105562) anjay: ERROR [anjay] [/home/axel/Anjay-esp32-client/main/anjay/src/core/servers/anjay_register.c:844]: could not send registration update for SSID==1: 2

It then stop working after above messages.
The point is that it does not recover after the network interface is back.
It looks like anjay_event_loop_run does not consider this as fatal error
so the application does not know how to handle such error.

@Mierunski
Copy link
Member

Hi!

We have noticed this issue, currently we have a fix in our internal repository, it will be addressed in next release in the following days.

@Mierunski
Copy link
Member

Could you confirm if the issue is resolved?

@AxelLin
Copy link
Contributor Author

AxelLin commented Jan 27, 2022

Hi,

  1. Test with ethernet, my observation is that it enters offline but never exit offline.
    i.e. It's complete broken now with ethernet.

  2. Test with wifi. (Try disconnect from wifi for 10 seconds, then reconnect to wifi)
    My debug print shows it enters offline and then exits offline when wifi is backed.
    But it still does not work even after exit offline.
    Seems the UDP socket is deleted, "anjay/src/core/servers/anjay_reload.c:176]: servers reloaded" does not help.

BTW, I'd appreciate if you commit bug fix in a separate commit.

@AxelLin
Copy link
Contributor Author

AxelLin commented Feb 4, 2022

Just check the commit log and realize you guys commit a lot of changes in single commit.
That does not make sense and it is difficult to figure out what was changed/broken/fixed.
The tags are useless if you tag each commit.

@kFYatek
Copy link
Contributor

kFYatek commented Feb 4, 2022

Hi,

This is because we do most of the development on our internal repositories which include features that are only available commercially. The code is then post-processed for public open source releases. Unfortunately that makes it infeasible to publish the entire commit history, so the changes between releases are squashed into single commits for the open source repositories.

I'm sorry for this inconvenience.

@AxelLin
Copy link
Contributor Author

AxelLin commented Feb 12, 2022

Just curious if the "stop working after registration update failure" issue only happens on Anjay-esp32-client
or it also happens on Anjay-freertos-client and Anjay-mbedos-client?
I ask this because I don't find similar fixup in Anjay-freertos-client and Anjay-mbedos-client.

@AxelLin
Copy link
Contributor Author

AxelLin commented Mar 28, 2022

Hi,
This has been broken for 2 months, just wondering if this will be fixed soon.
Note, the 22.01.1 is worsen than 22.01 becasue it is complete broken if using ethernet (see #2 (comment))

@kFYatek
Copy link
Contributor

kFYatek commented Apr 6, 2022

Hi,

Sorry for not responding for so long.

We are currently not targeting the Ethernet interface - in fact we don't have any boards with a proper Ethernet port, so this will not work at the moment. Feel free to contribute support for it if you need it.

As far as the broken connection issue goes, my suspicion is that the library is flagging the connection as failed and expecting the client application to react. You can add handling of this case using e.g. code such as this:

diff --git a/main/main.c b/main/main.c
index 9468ae3..9e0f4e4 100644
--- a/main/main.c
+++ b/main/main.c
@@ -155,6 +155,10 @@ static void update_connection_status_job(avs_sched_t *sched,
         connected_prev = true;
     }
 
+    if (anjay_all_connections_failed(anjay)) {
+        anjay_transport_schedule_reconnect(anjay);
+    }
+
     AVS_SCHED_DELAYED(sched, NULL, avs_time_duration_from_scalar(1, AVS_TIME_S),
                       update_connection_status_job, &anjay, sizeof(anjay));
 }

I hope that this will work for you.

Please note that is project is intended more as an example and demonstration of library usage than as something ready for use in the field, so we do not consider this a bug. Different users may have different requirements when it comes to reconnecting after a hard failure like this (immediately vs. after a predefined time vs. exponential backoff etc.), that's why this is not done automatically by the library.

@AxelLin
Copy link
Contributor Author

AxelLin commented Apr 7, 2022

@kFYatek

  1. This indeed a bug with etherenet interface, especially you have a config option to use ethernet interface.
  2. The recent change of using esp_wifi_sta_get_ap_info() is wrong if you consider the etherent interface.
    Anyway, it does not work well even with wifi interface as I reported. I think that change needs to be reverted.
  3. Above mentioned changes to add anjay_transport_schedule_reconnect() works.

BTW, this project failed to compile with current esp-idf master tree now. Just FYI.
I notice the avs_commons still use MBEDTLS_PRIVATE which is likely to break in a future minor version of Mbed TLS.
Link: https://github.com/Mbed-TLS/mbedtls/blob/development/docs/3.0-migration-guide.md#most-structure-fields-are-now-private

@AxelLin AxelLin closed this as completed Apr 7, 2022
@kFYatek
Copy link
Contributor

kFYatek commented Apr 7, 2022

@AxelLin I'm glad that the change works for you.

This indeed a bug with etherenet interface, especially you have a config option to use ethernet interface.

The option is in the config UI because we used the example app framework in the initial version. This will be removed in the upcoming release, which will only support WiFi.

BTW, this project failed to compile with current esp-idf master tree now. Just FYI.

The current version is tested using ESP-IDF 4.3, and the upcoming release will be targeting ESP-IDF 4.4. Our goal is to support the latest stable release, not necessarily the latest master tree.

I notice the avs_commons still use MBEDTLS_PRIVATE which is likely to break in a future minor version of Mbed TLS.

Yes, we are aware that this is a hack. The current version indeed does not work with Mbed TLS 3.1, as it made some previously private fields public again - we have a fix for that in our internal branch, which will be released shortly. However, some functionality is still missing from the public API, so that's why we couldn't remove the usage of MBEDTLS_PRIVATE even for Mbed TLS 3.1. We plan on regularly updating avs_commons to support any upcoming Mbed TLS releases. However, according to our outlook, the adoption of Mbed TLS 3.x remains low for now, so we don't see it as an absolute priority.

In any case, thank you for all the comments and suggestions!

@AxelLin
Copy link
Contributor Author

AxelLin commented Apr 8, 2022

@AxelLin I'm glad that the change works for you.

This indeed a bug with etherenet interface, especially you have a config option to use ethernet interface.

The option is in the config UI because we used the example app framework in the initial version. This will be removed in the upcoming release, which will only support WiFi.

I do hope you keep the ethernet config option and it actually works.
(I don't see any good reason to remove it since lwm2m should work with either wifi or ethernet).

BTW, now I upgrade to use Anjay 2.15.0.
By using anjay_event_loop_run_with_error_handling() it looks fine with both wifi and etherent.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants