Bug: wasm log aws.appmesh.egress_http_stats: panicked at 'called `Option::unwrap()` on a `None` value', amzn-appmesh-aggregate-stats/src/http/egress.rs:172:9 #485

tnsardesai · 2024-01-16T17:11:37Z

Summary
What are you observing that doesn't seem right?
Seeing a panic during normal operations which is not expected! This panic leads to metrics getting dropped

[2024-01-11 22:24:28.980][91][critical][wasm] [source/extensions/common/wasm/context.cc:1157] wasm log aws.appmesh.egress_http_stats: panicked at 'called `Option::unwrap()` on a `None` value', amzn-appmesh-aggregate-stats/src/http/egress.rs:172:9
[2024-01-11 22:24:28.980][91][error][wasm] [source/extensions/common/wasm/wasm_vm.cc:38] Function: proxy_on_done failed: Uncaught RuntimeError: unreachable
Proxy-Wasm plugin in-VM backtrace:
0: 0x2db69 - rust_panic
1: 0x2c9a3 - std::panicking::rust_panic_with_hook::hbafe3e603d331223
2: 0x2e0f1 - std::panicking::begin_panic_handler::_$u7b$$u7b$closure$u7d$$u7d$::h8ab6ee68d5b4c391
3: 0x2e059 - std::sys_common::backtrace::__rust_end_short_backtrace::h008f69666d134159
4: 0x1f5cc - rust_begin_unwind
5: 0x40ed - core::panicking::panic_fmt::h1d17fc068f528130
6: 0xc34f - core::panicking::panic::h27b5eefa3e4ff738
7: 0x1802e - _$LT$aws_appmesh_aggregate_stats..http..egress..EgressHttpStatsFilter$u20$as$u20$amzn_appmesh_wasm..plugin..Filter$GT$::on_done::h90cbcd72c8e2cb58
8: 0xc3f3 - _$LT$amzn_appmesh_wasm..plugin..context..http..HttpFilterContext$u20$as$u20$proxy_wasm..traits..Context$GT$::on_done::h30819f4e590de618
9: 0x26138 - proxy_on_done

Steps to Reproduce
What are the steps you can take to reproduce this issue?
No idea. It seems to happen when a request to bad request is made by some bot or attacker.

Are you currently working around this issue?
How are you currently solving this problem?
We are not

Additional context
Anything else we should know?
I saw this log in both v1.25.1.0 and v1.27.2.0

Attachments
If you think you might have additional information that you'd like to include via an attachment, please do - we'll take a look. (Remember to remove any personally-identifiable information.)
I see requests like [2024-01-13T17:32:43.610Z] "GET /oauth/lookup-ip HTTP/1.1" 400 - 0 0 22 21 "1 ��%2527%2522, 188.241.177.125, 130.176.91.135,127.0.0.1" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36" "6c222c47-5c31-9b08-820c-edca79a3f90b" "production--oauth--833ed5a3.prod.mesh" "127.0.0.1:80" with bad X-Forwarded-For around the same time as this panic which might be relevant

The text was updated successfully, but these errors were encountered:

ytsssun · 2024-01-16T22:23:27Z

Hi @tnsardesai, I see the stack trace is showing different line of code for the failure.

amzn-appmesh-aggregate-stats/src/http/egress.rs:172:9

For this log, do you know which specific version of Envoy it is associated with? I agree this may be a separate issue that we need to address apart from the original issue #484.

Another question, are you seeing this error in older version of Envoy as well?

tnsardesai · 2024-01-16T23:17:55Z

Oh I didn't notice the difference in line number. Looking back at our logs

In v1.25.1.0 we see

[2024-01-13 17:31:01.255][82][critical][wasm] [source/extensions/common/wasm/context.cc:1157] wasm log aws.appmesh.egress_http_stats: panicked at 'called `Option::unwrap()` on a `None` value', amzn-appmesh-aggregate-stats/src/http/egress.rs:172:9
[2024-01-13 17:31:01.256][82][error][wasm] [source/extensions/common/wasm/wasm_vm.cc:38] Function: proxy_on_done failed: Uncaught RuntimeError: unreachable
Proxy-Wasm plugin in-VM backtrace:
0:  0x2db69 - rust_panic
1:  0x2c9a3 - std::panicking::rust_panic_with_hook::hbafe3e603d331223
2:  0x2e0f1 - std::panicking::begin_panic_handler::_$u7b$$u7b$closure$u7d$$u7d$::h8ab6ee68d5b4c391
3:  0x2e059 - std::sys_common::backtrace::__rust_end_short_backtrace::h008f69666d134159
4:  0x1f5cc - rust_begin_unwind
5:  0x40ed - core::panicking::panic_fmt::h1d17fc068f528130
6:  0xc34f - core::panicking::panic::h27b5eefa3e4ff738
7:  0x1802e - _$LT$aws_appmesh_aggregate_stats..http..egress..EgressHttpStatsFilter$u20$as$u20$amzn_appmesh_wasm..plugin..Filter$GT$::on_done::h90cbcd72c8e2cb58
8:  0xc3f3 - _$LT$amzn_appmesh_wasm..plugin..context..http..HttpFilterContext$u20$as$u20$proxy_wasm..traits..Context$GT$::on_done::h30819f4e590de618
9:  0x26138 - proxy_on_done

In v1.27.2.0 we see

[2024-01-10 11:39:18.189][87][critical][wasm] [source/extensions/common/wasm/context.cc:1157] wasm log aws.appmesh.egress_http_stats: panicked at amzn-appmesh-aggregate-stats/src/http/egress.rs:302:29:
called `Option::unwrap()` on a `None` value
[2024-01-10 11:39:18.189][87][error][wasm] [source/extensions/common/wasm/wasm_vm.cc:38] Function: proxy_on_done failed: Uncaught RuntimeError: unreachable
Proxy-Wasm plugin in-VM backtrace:
0:  0x385ad - rust_panic
1:  0x37842 - std::panicking::rust_panic_with_hook::h3aa054d35a0817d7
2:  0x38a24 - std::panicking::begin_panic_handler::_$u7b$$u7b$closure$u7d$$u7d$::h2f73e4cf6cd6319a
3:  0x3898f - std::sys_common::backtrace::__rust_end_short_backtrace::h98ac61a6abbff7e9
4:  0x24033 - rust_begin_unwind
5:  0x24d1 - core::panicking::panic_fmt::h3e1dd3d08288569e
6:  0xced1 - core::panicking::panic::h0f0c05b20da93dd7
7:  0x1179d - _$LT$aws_appmesh_aggregate_stats..http..egress..EgressHttpStatsFilter$u20$as$u20$amzn_appmesh_wasm..plugin..Filter$GT$::on_done::h8b7afc1a5cddb214
8:  0xcfa5 - _$LT$amzn_appmesh_wasm..plugin..context..http..HttpFilterContext$u20$as$u20$proxy_wasm..traits..Context$GT$::on_done::h61f448a25b9c1222
9:  0x2be71 - proxy_on_done

I looked back at logs from November (when our applications were all on v1.24.0.0) and I actually see a similar log from that version too

[2023-12-05 04:15:05.184][86][critical][wasm] [source/extensions/common/wasm/context.cc:1189] wasm log aws.appmesh.egress_http_stats: panicked at 'called `Option::unwrap()` on a `None` value', amzn-appmesh-aggregate-stats/src/http/egress.rs:172:29
[2023-12-05 04:15:05.185][86][error][wasm] [source/extensions/common/wasm/wasm_vm.cc:38] Function: proxy_on_done failed: Uncaught RuntimeError: unreachable
Proxy-Wasm plugin in-VM backtrace:
0:  0x2ed10 - rust_panic
1:  0x2d5d8 - std::panicking::rust_panic_with_hook::hb09154fa23e06c37
2:  0x2f1a6 - std::panicking::begin_panic_handler::_$u7b$$u7b$closure$u7d$$u7d$::h6091c197f0d08bf0
3:  0x2f11b - std::sys_common::backtrace::__rust_end_short_backtrace::h004afb3e6a867c40
4:  0x2029e - rust_begin_unwind
5:  0x24b5 - core::panicking::panic_fmt::h9e229748e3ae9f9d
6:  0x9db4 - core::panicking::panic::h6f5024a57ca8da86
7:  0x11d54 - _$LT$aws_appmesh_aggregate_stats..http..egress..EgressHttpStatsFilter$u20$as$u20$amzn_appmesh_wasm..plugin..Filter$GT$::on_done::hc616f22dcbbc1095
8:  0x9e58 - _$LT$amzn_appmesh_wasm..plugin..context..http..HttpFilterContext$u20$as$u20$proxy_wasm..traits..Context$GT$::on_done::h2e26b9d53a795a82
9:  0x26f7c - proxy_on_done

ytsssun · 2024-01-17T00:14:07Z

Thanks for getting back with the information so quickly. Yeah this looks like an issue that exists for a long time. We have found the corresponding code. We are working on a fix for this along with #484.

One question - was this issue affecting your fleet in anyway? Since it's been there for a while.

tnsardesai · 2024-01-17T01:44:34Z

Thanks for getting back with the information so quickly. Yeah this looks like an issue that exists for a long time. We have found the corresponding code. We are working on a fix for this along with #484.

One question - was this issue affecting your fleet in anyway? Since it's been there for a while.

The issue in #484 resulted in metrics getting dropped. Our setup is similar to https://github.com/aws/aws-app-mesh-examples/tree/main/walkthroughs/howto-metrics-extension-ecs#optional-filtering-metrics-with-the-aws-distro-for-opentelemetry. If I had to guess this issue also caused something similar but we never noticed it because the impact was very low. Looking back at our logs I see this panic log a maximum of 3 times for a single service in one day but between scaling events and deployments we never noticed any impact.

liubnu · 2024-02-28T15:38:00Z

This issue is fixed by Envoy release v1.27.3.0 #486

tnsardesai added the Bug Something isn't working label Jan 16, 2024

tnsardesai mentioned this issue Jan 16, 2024

Bug: wasm log aws.appmesh.ingress_http_stats and aws.appmesh.egress_http_stats panicked #484

Closed

liubnu mentioned this issue Feb 28, 2024

Bug: Metrics related to Outbound Traffic are not exported after enabling the appmesh metric extension re #488

Closed

liubnu closed this as completed Feb 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug: wasm log aws.appmesh.egress_http_stats: panicked at 'called `Option::unwrap()` on a `None` value', amzn-appmesh-aggregate-stats/src/http/egress.rs:172:9 #485

Bug: wasm log aws.appmesh.egress_http_stats: panicked at 'called `Option::unwrap()` on a `None` value', amzn-appmesh-aggregate-stats/src/http/egress.rs:172:9 #485

tnsardesai commented Jan 16, 2024

ytsssun commented Jan 16, 2024

tnsardesai commented Jan 16, 2024

ytsssun commented Jan 17, 2024

tnsardesai commented Jan 17, 2024

liubnu commented Feb 28, 2024

Bug: wasm log aws.appmesh.egress_http_stats: panicked at 'called Option::unwrap() on a None value', amzn-appmesh-aggregate-stats/src/http/egress.rs:172:9 #485

Bug: wasm log aws.appmesh.egress_http_stats: panicked at 'called Option::unwrap() on a None value', amzn-appmesh-aggregate-stats/src/http/egress.rs:172:9 #485

Comments

tnsardesai commented Jan 16, 2024

ytsssun commented Jan 16, 2024

tnsardesai commented Jan 16, 2024

ytsssun commented Jan 17, 2024

tnsardesai commented Jan 17, 2024

liubnu commented Feb 28, 2024

Bug: wasm log aws.appmesh.egress_http_stats: panicked at 'called `Option::unwrap()` on a `None` value', amzn-appmesh-aggregate-stats/src/http/egress.rs:172:9 #485

Bug: wasm log aws.appmesh.egress_http_stats: panicked at 'called `Option::unwrap()` on a `None` value', amzn-appmesh-aggregate-stats/src/http/egress.rs:172:9 #485