Optionally seperate S3 object directory by metadata #28

isbee · 2021-10-27T06:27:17Z

As far as I know, dump file name is {CORE_UUID}-dump-{CORE_TIMESTAMP}-{CORE_HOSTNAME}-{CORE_EXE_NAME}-{CORE_PID}-{CORE_SIGNAL} and this is equal to S3 object name.

When someone uses a single S3 bucket, and If core-dump-handler is used on multi-cloud/multi-cluster/multi-region, it's hard to figure out where does a dump file come from. Since these metadatas are statically settled on deploying daemonset, It can be injected via environment variable and used for S3 upload directory seperation.

But maybe seperating S3 bucket is just enough, So it'll be great we can choose

Multiple S3 bucket + flat S3 directory(by vendor/cluster-name)
Single S3 bucket + nested S3 directory

Additionally, it would be nice to be able to distinguish the namespace of crashed containers. From core-dump-handler's point of view, crashed-container's namespace are runtime metadata so it'll be hard to figure out. I guess modifying core_pattern to |{HOST_LOCATION}/{CORE_DUMP_COMPOSER_NAME} -c=%c -e=%e -p=%p -s=%s -t=%t -d={HOST_DIR}/core -h=%h -E=%E -NS=$NAMESPACE might work since %h is crashed-container's hostname, not core-dump-handler agent's hostname.

The text was updated successfully, but these errors were encountered:

No9 · 2021-10-27T11:38:07Z

Hey @isbee

Option 1 of multiple S3 buckets per vendor per cluster is something that can be done today.
Just configure different S3 locations per install of CDH.
Option 2 of a single S3 bucket with nested directories seems like it could become fairly cumbersome as S3 is a key/value store so directories would really be a path convention but I guess it would depend on your use case.

Maybe making it possible to provide mechanism that allows you to define the S3 path would be enough to cover the option 2 requirement?

On the namespace- that is currently provided in the json metadata files.
ps, runtime and pod json files provide this information in some cases. You will have to test your provider.
Unfortunately there is no way for the pod to pass this information to the composer at dump time and the operating system has a very limited set of parameters it can capture.

I do like the idea of capturing the node hostname and will provide that as an addition property in the dump-info.json

As a general comment you may wish to consider a post processor that takes the items from the S3 and stores them in a more structured manner for your use case.
I think search, personal data scrubbing, alerting are obvious requirements that are probably beyond the scope of this project but if there is a "simple" solution to address these then I am willing to consider accepting PRs but it should be discussed prior to development.

Let me know your thoughts on the dynamic path feature and the post processor

No9 · 2021-11-08T21:23:06Z

As there are no further comments I am going to close this but feel free to reopen if you have some more detail on the paths requirements

isbee changed the title ~~Seperate S3 object directory by metadata~~ Optionally seperate S3 object directory by metadata Oct 27, 2021

No9 mentioned this issue Oct 27, 2021

Include Node HOSTNAME in dump-info.json #30

Closed

No9 closed this as completed Nov 8, 2021

No9 mentioned this issue Jan 13, 2022

Custom / Templated Filenames #47

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optionally seperate S3 object directory by metadata #28

Optionally seperate S3 object directory by metadata #28

isbee commented Oct 27, 2021

No9 commented Oct 27, 2021

No9 commented Nov 8, 2021

Optionally seperate S3 object directory by metadata #28

Optionally seperate S3 object directory by metadata #28

Comments

isbee commented Oct 27, 2021

No9 commented Oct 27, 2021

No9 commented Nov 8, 2021