`non-exclusive-tests` failing on AWS due to exceeding user data limit #1030

jlebon · 2021-11-25T14:56:33Z

19:36:35  --- FAIL: non-exclusive-tests (1.34s)
19:36:35          harness.go:1222: Cluster failed starting machines: error running instances: InvalidParameterValue: User data is limited to 16384 bytes
19:36:35  	status code: 400, request id: 683a1a1a-baee-40ad-94b6-c1acefa99df5

We could try experimenting with inline compression, but that would only delay the issue. Maybe we can just upload the final config to S3 and provision with a pointer config?

The text was updated successfully, but these errors were encountered:

saqibali-2k · 2021-11-25T15:35:58Z

Uploading the final config to S3 seems to be the more robust option. Down the road user data can still grow to a point where even the compressed size is too large.

jlebon · 2021-11-25T19:28:34Z

Copying from IRC:

14:13:47 < jlebon> hmm, the problem with this though, is that it adds another parameter we'd need to pass when running `kola run -p aws`
14:14:28 < jlebon> which isn't the end of the world, though it does increase friction a bit
14:16:44 < saqali__> Could we default to a specific bucket and region every time to avoid passing in the extra parameter(s)?
14:18:32 < jlebon> seems weird to hardcode a FCOS community account bucket name in kola
14:20:04 < jlebon> let's brainstorm a bit more

Another approach I guess is to commit to coreos/coreos-assembler#2516 and also bucketize based on user data size? OTOH, I think on principle we should be able to handle this because in theory a single test on its own could have a really large user data. IOW, this isn't about non-exclusive-tests specifically, but could affect any kola test.

jlebon · 2021-11-25T19:54:42Z

The other thing with the S3 approach is that now the IAM account used to run kola tests needs both permissions to launch instances as well as S3 upload permissions.

Anyway, not sure at this point what's cleaner. Kinda leaning more towards building on top of coreos/coreos-assembler#2516 for now since we're only hitting it there. But interested in other opinions!

Re. inline compression, it could make sense to still do that even if we have a mechanism for handling overflow so that we don't engage that mechanism as often.

saqibali-2k · 2021-11-25T19:58:15Z

OTOH, I think on principle we should be able to handle this because in theory a single test on its own could have a really large user data. IOW, this isn't about non-exclusive-tests specifically, but could affect any kola test.

I agree, I think this is still worth taking a look at.

Re. inline compression, it could make sense to still do that even if we have a mechanism for handling overflow

I think maybe we should resort to inline compression right now to unblock non-exclusive tests, and then we can look towards handling overflow.

jlebon · 2021-11-25T20:02:15Z

Re. inline compression, it could make sense to still do that even if we have a mechanism for handling overflow

I think maybe we should resort to inline compression right now to unblock non-exclusive tests, and then we can look towards handling overflow.

Ack, SGTM!

jlebon · 2021-12-06T15:20:51Z

This looks fixed now! @saqibali-2k from your testing while hacking on coreos/coreos-assembler#2588, do you know how large the compressed version of the non-exclusive-tests userdata was? Just to know how far we've kicked this problem down the road.

saqibali-2k · 2021-12-08T16:59:24Z

@jlebon Sorry for the late reply!

The compressed size of the base64 string is 5,828 Bytes.

For the long term solution, I was thinking that we could just go with your earlier suggestion of adapting coreos/coreos-assembler#2516 to further constrain buckets depending on config size. For any test that is exclusive and has a config size that is too large, we could add a remote config the way it was done in this PR: coreos/fedora-coreos-config#1232.

dustymabe · 2021-12-08T21:29:33Z

Thanks for the info @saqibali-2k.

@jlebon @saqibali-2k should we close this?

saqibali-2k · 2021-12-09T04:10:14Z

Compressing the configs has only delayed the problem so I think that it should stay open until we land (or at least decide on) a permanent fix.

dustymabe · 2021-12-09T14:21:04Z

For the long term solution, I was thinking that we could just go with your earlier suggestion of adapting coreos/coreos-assembler#2516 to further constrain buckets depending on config size. For any test that is exclusive and has a config size that is too large, we could add a remote config the way it was done in this PR: coreos/fedora-coreos-config#1232.

I think that sounds like an excellent plan. Makes a ton of sense. I guess for coreos/coreos-assembler#2516 we'd need to be aware of the size restrictions on each platform (alternatively just limit to the lowest value on any platform) so we can make that decision. One good way to test this once we implement it would be to disable compression on AWS and see if it properly splits things into multiple buckets.

jlebon · 2021-12-09T18:00:59Z

@saqibali-2k WDYT about tracking the long-term fix in a separate issue and closing this one? It would make things easier to follow for others I think.

saqibali-2k · 2021-12-09T18:52:05Z

@saqibali-2k WDYT about tracking the long-term fix in a separate issue and closing this one? It would make things easier to follow for others I think.

Sounds like a good idea!

jlebon added the kind/bug label Nov 25, 2021

saqibali-2k mentioned this issue Nov 30, 2021

kola/platform: compress userdata before passing to aws coreos/coreos-assembler#2588

Merged

saqibali-2k closed this as completed Dec 9, 2021

saqibali-2k mentioned this issue Dec 9, 2021

kola: ignition config sizes for tests can become too large to pass to instances #1041

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`non-exclusive-tests` failing on AWS due to exceeding user data limit #1030

`non-exclusive-tests` failing on AWS due to exceeding user data limit #1030

jlebon commented Nov 25, 2021

saqibali-2k commented Nov 25, 2021

jlebon commented Nov 25, 2021

jlebon commented Nov 25, 2021

saqibali-2k commented Nov 25, 2021

jlebon commented Nov 25, 2021

jlebon commented Dec 6, 2021

saqibali-2k commented Dec 8, 2021 •

edited

dustymabe commented Dec 8, 2021

saqibali-2k commented Dec 9, 2021

dustymabe commented Dec 9, 2021

jlebon commented Dec 9, 2021

saqibali-2k commented Dec 9, 2021 •

edited

non-exclusive-tests failing on AWS due to exceeding user data limit #1030

non-exclusive-tests failing on AWS due to exceeding user data limit #1030

Comments

jlebon commented Nov 25, 2021

saqibali-2k commented Nov 25, 2021

jlebon commented Nov 25, 2021

jlebon commented Nov 25, 2021

saqibali-2k commented Nov 25, 2021

jlebon commented Nov 25, 2021

jlebon commented Dec 6, 2021

saqibali-2k commented Dec 8, 2021 • edited

dustymabe commented Dec 8, 2021

saqibali-2k commented Dec 9, 2021

dustymabe commented Dec 9, 2021

jlebon commented Dec 9, 2021

saqibali-2k commented Dec 9, 2021 • edited

`non-exclusive-tests` failing on AWS due to exceeding user data limit #1030

`non-exclusive-tests` failing on AWS due to exceeding user data limit #1030

saqibali-2k commented Dec 8, 2021 •

edited

saqibali-2k commented Dec 9, 2021 •

edited