Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: wasm log aws.appmesh.egress_http_stats: panicked at 'called Option::unwrap() on a None value', amzn-appmesh-aggregate-stats/src/http/egress.rs:172:9 #485

Closed
tnsardesai opened this issue Jan 16, 2024 · 5 comments
Labels
Bug Something isn't working

Comments

@tnsardesai
Copy link

Summary
What are you observing that doesn't seem right?
Seeing a panic during normal operations which is not expected! This panic leads to metrics getting dropped

[2024-01-11 22:24:28.980][91][critical][wasm] [source/extensions/common/wasm/context.cc:1157] wasm log aws.appmesh.egress_http_stats: panicked at 'called `Option::unwrap()` on a `None` value', amzn-appmesh-aggregate-stats/src/http/egress.rs:172:9
[2024-01-11 22:24:28.980][91][error][wasm] [source/extensions/common/wasm/wasm_vm.cc:38] Function: proxy_on_done failed: Uncaught RuntimeError: unreachable
Proxy-Wasm plugin in-VM backtrace:
0: 0x2db69 - rust_panic
1: 0x2c9a3 - std::panicking::rust_panic_with_hook::hbafe3e603d331223
2: 0x2e0f1 - std::panicking::begin_panic_handler::_$u7b$$u7b$closure$u7d$$u7d$::h8ab6ee68d5b4c391
3: 0x2e059 - std::sys_common::backtrace::__rust_end_short_backtrace::h008f69666d134159
4: 0x1f5cc - rust_begin_unwind
5: 0x40ed - core::panicking::panic_fmt::h1d17fc068f528130
6: 0xc34f - core::panicking::panic::h27b5eefa3e4ff738
7: 0x1802e - _$LT$aws_appmesh_aggregate_stats..http..egress..EgressHttpStatsFilter$u20$as$u20$amzn_appmesh_wasm..plugin..Filter$GT$::on_done::h90cbcd72c8e2cb58
8: 0xc3f3 - _$LT$amzn_appmesh_wasm..plugin..context..http..HttpFilterContext$u20$as$u20$proxy_wasm..traits..Context$GT$::on_done::h30819f4e590de618
9: 0x26138 - proxy_on_done

Steps to Reproduce
What are the steps you can take to reproduce this issue?
No idea. It seems to happen when a request to bad request is made by some bot or attacker.

Are you currently working around this issue?
How are you currently solving this problem?
We are not

Additional context
Anything else we should know?
I saw this log in both v1.25.1.0 and v1.27.2.0

Attachments
If you think you might have additional information that you'd like to include via an attachment, please do - we'll take a look. (Remember to remove any personally-identifiable information.)
I see requests like [2024-01-13T17:32:43.610Z] "GET /oauth/lookup-ip HTTP/1.1" 400 - 0 0 22 21 "1 ����%2527%2522, 188.241.177.125, 130.176.91.135,127.0.0.1" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36" "6c222c47-5c31-9b08-820c-edca79a3f90b" "production--oauth--833ed5a3.prod.mesh" "127.0.0.1:80" with bad X-Forwarded-For around the same time as this panic which might be relevant

@ytsssun
Copy link

ytsssun commented Jan 16, 2024

Hi @tnsardesai, I see the stack trace is showing different line of code for the failure.

amzn-appmesh-aggregate-stats/src/http/egress.rs:172:9

For this log, do you know which specific version of Envoy it is associated with? I agree this may be a separate issue that we need to address apart from the original issue #484.

Another question, are you seeing this error in older version of Envoy as well?

@tnsardesai
Copy link
Author

Oh I didn't notice the difference in line number. Looking back at our logs

In v1.25.1.0 we see

[2024-01-13 17:31:01.255][82][critical][wasm] [source/extensions/common/wasm/context.cc:1157] wasm log aws.appmesh.egress_http_stats: panicked at 'called `Option::unwrap()` on a `None` value', amzn-appmesh-aggregate-stats/src/http/egress.rs:172:9
[2024-01-13 17:31:01.256][82][error][wasm] [source/extensions/common/wasm/wasm_vm.cc:38] Function: proxy_on_done failed: Uncaught RuntimeError: unreachable
Proxy-Wasm plugin in-VM backtrace:
0:  0x2db69 - rust_panic
1:  0x2c9a3 - std::panicking::rust_panic_with_hook::hbafe3e603d331223
2:  0x2e0f1 - std::panicking::begin_panic_handler::_$u7b$$u7b$closure$u7d$$u7d$::h8ab6ee68d5b4c391
3:  0x2e059 - std::sys_common::backtrace::__rust_end_short_backtrace::h008f69666d134159
4:  0x1f5cc - rust_begin_unwind
5:  0x40ed - core::panicking::panic_fmt::h1d17fc068f528130
6:  0xc34f - core::panicking::panic::h27b5eefa3e4ff738
7:  0x1802e - _$LT$aws_appmesh_aggregate_stats..http..egress..EgressHttpStatsFilter$u20$as$u20$amzn_appmesh_wasm..plugin..Filter$GT$::on_done::h90cbcd72c8e2cb58
8:  0xc3f3 - _$LT$amzn_appmesh_wasm..plugin..context..http..HttpFilterContext$u20$as$u20$proxy_wasm..traits..Context$GT$::on_done::h30819f4e590de618
9:  0x26138 - proxy_on_done

In v1.27.2.0 we see

[2024-01-10 11:39:18.189][87][critical][wasm] [source/extensions/common/wasm/context.cc:1157] wasm log aws.appmesh.egress_http_stats: panicked at amzn-appmesh-aggregate-stats/src/http/egress.rs:302:29:
called `Option::unwrap()` on a `None` value
[2024-01-10 11:39:18.189][87][error][wasm] [source/extensions/common/wasm/wasm_vm.cc:38] Function: proxy_on_done failed: Uncaught RuntimeError: unreachable
Proxy-Wasm plugin in-VM backtrace:
0:  0x385ad - rust_panic
1:  0x37842 - std::panicking::rust_panic_with_hook::h3aa054d35a0817d7
2:  0x38a24 - std::panicking::begin_panic_handler::_$u7b$$u7b$closure$u7d$$u7d$::h2f73e4cf6cd6319a
3:  0x3898f - std::sys_common::backtrace::__rust_end_short_backtrace::h98ac61a6abbff7e9
4:  0x24033 - rust_begin_unwind
5:  0x24d1 - core::panicking::panic_fmt::h3e1dd3d08288569e
6:  0xced1 - core::panicking::panic::h0f0c05b20da93dd7
7:  0x1179d - _$LT$aws_appmesh_aggregate_stats..http..egress..EgressHttpStatsFilter$u20$as$u20$amzn_appmesh_wasm..plugin..Filter$GT$::on_done::h8b7afc1a5cddb214
8:  0xcfa5 - _$LT$amzn_appmesh_wasm..plugin..context..http..HttpFilterContext$u20$as$u20$proxy_wasm..traits..Context$GT$::on_done::h61f448a25b9c1222
9:  0x2be71 - proxy_on_done

I looked back at logs from November (when our applications were all on v1.24.0.0) and I actually see a similar log from that version too

[2023-12-05 04:15:05.184][86][critical][wasm] [source/extensions/common/wasm/context.cc:1189] wasm log aws.appmesh.egress_http_stats: panicked at 'called `Option::unwrap()` on a `None` value', amzn-appmesh-aggregate-stats/src/http/egress.rs:172:29
[2023-12-05 04:15:05.185][86][error][wasm] [source/extensions/common/wasm/wasm_vm.cc:38] Function: proxy_on_done failed: Uncaught RuntimeError: unreachable
Proxy-Wasm plugin in-VM backtrace:
0:  0x2ed10 - rust_panic
1:  0x2d5d8 - std::panicking::rust_panic_with_hook::hb09154fa23e06c37
2:  0x2f1a6 - std::panicking::begin_panic_handler::_$u7b$$u7b$closure$u7d$$u7d$::h6091c197f0d08bf0
3:  0x2f11b - std::sys_common::backtrace::__rust_end_short_backtrace::h004afb3e6a867c40
4:  0x2029e - rust_begin_unwind
5:  0x24b5 - core::panicking::panic_fmt::h9e229748e3ae9f9d
6:  0x9db4 - core::panicking::panic::h6f5024a57ca8da86
7:  0x11d54 - _$LT$aws_appmesh_aggregate_stats..http..egress..EgressHttpStatsFilter$u20$as$u20$amzn_appmesh_wasm..plugin..Filter$GT$::on_done::hc616f22dcbbc1095
8:  0x9e58 - _$LT$amzn_appmesh_wasm..plugin..context..http..HttpFilterContext$u20$as$u20$proxy_wasm..traits..Context$GT$::on_done::h2e26b9d53a795a82
9:  0x26f7c - proxy_on_done

@ytsssun
Copy link

ytsssun commented Jan 17, 2024

Thanks for getting back with the information so quickly. Yeah this looks like an issue that exists for a long time. We have found the corresponding code. We are working on a fix for this along with #484.

One question - was this issue affecting your fleet in anyway? Since it's been there for a while.

@tnsardesai
Copy link
Author

Thanks for getting back with the information so quickly. Yeah this looks like an issue that exists for a long time. We have found the corresponding code. We are working on a fix for this along with #484.

One question - was this issue affecting your fleet in anyway? Since it's been there for a while.

The issue in #484 resulted in metrics getting dropped. Our setup is similar to https://github.com/aws/aws-app-mesh-examples/tree/main/walkthroughs/howto-metrics-extension-ecs#optional-filtering-metrics-with-the-aws-distro-for-opentelemetry. If I had to guess this issue also caused something similar but we never noticed it because the impact was very low. Looking back at our logs I see this panic log a maximum of 3 times for a single service in one day but between scaling events and deployments we never noticed any impact.

@liubnu
Copy link

liubnu commented Feb 28, 2024

This issue is fixed by Envoy release v1.27.3.0 #486

@liubnu liubnu closed this as completed Feb 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants