Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

non-exclusive-tests failing on AWS due to exceeding user data limit #1030

Closed
jlebon opened this issue Nov 25, 2021 · 12 comments
Closed

non-exclusive-tests failing on AWS due to exceeding user data limit #1030

jlebon opened this issue Nov 25, 2021 · 12 comments
Labels

Comments

@jlebon
Copy link
Member

jlebon commented Nov 25, 2021

19:36:35  --- FAIL: non-exclusive-tests (1.34s)
19:36:35          harness.go:1222: Cluster failed starting machines: error running instances: InvalidParameterValue: User data is limited to 16384 bytes
19:36:35  	status code: 400, request id: 683a1a1a-baee-40ad-94b6-c1acefa99df5

We could try experimenting with inline compression, but that would only delay the issue. Maybe we can just upload the final config to S3 and provision with a pointer config?

@saqibali-2k
Copy link
Member

Uploading the final config to S3 seems to be the more robust option. Down the road user data can still grow to a point where even the compressed size is too large.

@jlebon
Copy link
Member Author

jlebon commented Nov 25, 2021

Copying from IRC:

14:13:47 < jlebon> hmm, the problem with this though, is that it adds another parameter we'd need to pass when running `kola run -p aws`
14:14:28 < jlebon> which isn't the end of the world, though it does increase friction a bit
14:16:44 < saqali__> Could we default to a specific bucket and region every time to avoid passing in the extra parameter(s)?
14:18:32 < jlebon> seems weird to hardcode a FCOS community account bucket name in kola
14:20:04 < jlebon> let's brainstorm a bit more

Another approach I guess is to commit to coreos/coreos-assembler#2516 and also bucketize based on user data size? OTOH, I think on principle we should be able to handle this because in theory a single test on its own could have a really large user data. IOW, this isn't about non-exclusive-tests specifically, but could affect any kola test.

@jlebon
Copy link
Member Author

jlebon commented Nov 25, 2021

The other thing with the S3 approach is that now the IAM account used to run kola tests needs both permissions to launch instances as well as S3 upload permissions.

Anyway, not sure at this point what's cleaner. Kinda leaning more towards building on top of coreos/coreos-assembler#2516 for now since we're only hitting it there. But interested in other opinions!

Re. inline compression, it could make sense to still do that even if we have a mechanism for handling overflow so that we don't engage that mechanism as often.

@saqibali-2k
Copy link
Member

OTOH, I think on principle we should be able to handle this because in theory a single test on its own could have a really large user data. IOW, this isn't about non-exclusive-tests specifically, but could affect any kola test.

I agree, I think this is still worth taking a look at.

Re. inline compression, it could make sense to still do that even if we have a mechanism for handling overflow

I think maybe we should resort to inline compression right now to unblock non-exclusive tests, and then we can look towards handling overflow.

@jlebon
Copy link
Member Author

jlebon commented Nov 25, 2021

Re. inline compression, it could make sense to still do that even if we have a mechanism for handling overflow

I think maybe we should resort to inline compression right now to unblock non-exclusive tests, and then we can look towards handling overflow.

Ack, SGTM!

@jlebon
Copy link
Member Author

jlebon commented Dec 6, 2021

This looks fixed now! @saqibali-2k from your testing while hacking on coreos/coreos-assembler#2588, do you know how large the compressed version of the non-exclusive-tests userdata was? Just to know how far we've kicked this problem down the road.

@saqibali-2k
Copy link
Member

saqibali-2k commented Dec 8, 2021

@jlebon Sorry for the late reply!

The compressed size of the base64 string is 5,828 Bytes.

For the long term solution, I was thinking that we could just go with your earlier suggestion of adapting coreos/coreos-assembler#2516 to further constrain buckets depending on config size. For any test that is exclusive and has a config size that is too large, we could add a remote config the way it was done in this PR: coreos/fedora-coreos-config#1232.

@dustymabe
Copy link
Member

Thanks for the info @saqibali-2k.

@jlebon @saqibali-2k should we close this?

@saqibali-2k
Copy link
Member

Compressing the configs has only delayed the problem so I think that it should stay open until we land (or at least decide on) a permanent fix.

@dustymabe
Copy link
Member

For the long term solution, I was thinking that we could just go with your earlier suggestion of adapting coreos/coreos-assembler#2516 to further constrain buckets depending on config size. For any test that is exclusive and has a config size that is too large, we could add a remote config the way it was done in this PR: coreos/fedora-coreos-config#1232.

I think that sounds like an excellent plan. Makes a ton of sense. I guess for coreos/coreos-assembler#2516 we'd need to be aware of the size restrictions on each platform (alternatively just limit to the lowest value on any platform) so we can make that decision. One good way to test this once we implement it would be to disable compression on AWS and see if it properly splits things into multiple buckets.

@jlebon
Copy link
Member Author

jlebon commented Dec 9, 2021

@saqibali-2k WDYT about tracking the long-term fix in a separate issue and closing this one? It would make things easier to follow for others I think.

@saqibali-2k
Copy link
Member

saqibali-2k commented Dec 9, 2021

@saqibali-2k WDYT about tracking the long-term fix in a separate issue and closing this one? It would make things easier to follow for others I think.

Sounds like a good idea!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants