Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optionally seperate S3 object directory by metadata #28

Closed
isbee opened this issue Oct 27, 2021 · 2 comments
Closed

Optionally seperate S3 object directory by metadata #28

isbee opened this issue Oct 27, 2021 · 2 comments

Comments

@isbee
Copy link

isbee commented Oct 27, 2021

As far as I know, dump file name is {CORE_UUID}-dump-{CORE_TIMESTAMP}-{CORE_HOSTNAME}-{CORE_EXE_NAME}-{CORE_PID}-{CORE_SIGNAL} and this is equal to S3 object name.

When someone uses a single S3 bucket, and If core-dump-handler is used on multi-cloud/multi-cluster/multi-region, it's hard to figure out where does a dump file come from. Since these metadatas are statically settled on deploying daemonset, It can be injected via environment variable and used for S3 upload directory seperation.

But maybe seperating S3 bucket is just enough, So it'll be great we can choose

  1. Multiple S3 bucket + flat S3 directory(by vendor/cluster-name)
  2. Single S3 bucket + nested S3 directory

Additionally, it would be nice to be able to distinguish the namespace of crashed containers. From core-dump-handler's point of view, crashed-container's namespace are runtime metadata so it'll be hard to figure out. I guess modifying core_pattern to |{HOST_LOCATION}/{CORE_DUMP_COMPOSER_NAME} -c=%c -e=%e -p=%p -s=%s -t=%t -d={HOST_DIR}/core -h=%h -E=%E -NS=$NAMESPACE might work since %h is crashed-container's hostname, not core-dump-handler agent's hostname.

@isbee isbee changed the title Seperate S3 object directory by metadata Optionally seperate S3 object directory by metadata Oct 27, 2021
@No9
Copy link
Collaborator

No9 commented Oct 27, 2021

Hey @isbee

Option 1 of multiple S3 buckets per vendor per cluster is something that can be done today.
Just configure different S3 locations per install of CDH.
Option 2 of a single S3 bucket with nested directories seems like it could become fairly cumbersome as S3 is a key/value store so directories would really be a path convention but I guess it would depend on your use case.

Maybe making it possible to provide mechanism that allows you to define the S3 path would be enough to cover the option 2 requirement?

On the namespace- that is currently provided in the json metadata files.
ps, runtime and pod json files provide this information in some cases. You will have to test your provider.
Unfortunately there is no way for the pod to pass this information to the composer at dump time and the operating system has a very limited set of parameters it can capture.

I do like the idea of capturing the node hostname and will provide that as an addition property in the dump-info.json

As a general comment you may wish to consider a post processor that takes the items from the S3 and stores them in a more structured manner for your use case.
I think search, personal data scrubbing, alerting are obvious requirements that are probably beyond the scope of this project but if there is a "simple" solution to address these then I am willing to consider accepting PRs but it should be discussed prior to development.

Let me know your thoughts on the dynamic path feature and the post processor

@No9
Copy link
Collaborator

No9 commented Nov 8, 2021

As there are no further comments I am going to close this but feel free to reopen if you have some more detail on the paths requirements

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants