-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OpenShift compatibility #44
Comments
Hey @tjungblu Some aspects I was going to investigate further: The core-dump-composer is copied to a shared host location during the deployment. Is this actually supported by CoreOS? a) If copying the binary to the host is supported does it require compatible builds or will RHEL ubi7/8 builds work ok? b) If copying the file to the host isn't supported the best practices for supplying host services needs to be researched. From initial reading it seems as though host services are provided as containers and these need to be defined at cluster creation time but I may have that totally wrong. I don't think the issues are ARO/ROSA specific and my next step was going to be to setup a local cluster and investigate further but I haven't had time to get around to it. I'm also conscious that the way this project works might impact how Red Hat provides general support for aborting processes so that's another area that will need clarification. [edit] Sorry to give you more questions than answers but that's the status of where I got too. Any help would be greatly appreciated. |
If I understand correctly, this program sets itself up to handle coredumps from the kernel, taking over systemd-coredump. As the binary is launched directly by the kernel, the easiest option is to place it directly on the host in RHCOS is based on RHEL so building binaries in UBI 7/8 will give compatible binaries for RHCOS. Another option is to have the agent directly talk to systemd-coredump via its socket ( |
Thanks for your help here @travier, much appreciated.
and the scc grant:
do you think we should add a switch to helm to make it work on open shift? I haven't tried the add the scc change into the chart yet, but I'm sure we can also fix this somehow. Otherwise we can just use a post-installation hook job that runs the OC command. |
@tjungblu can you confirm the contents of the zip - there should be 7 files in there. |
@No9 interesting, I got only 5:
The coredump looks fine though with objdump (as far as I can tell) I ran the segfaulter directly after installing the helm chart (no .env file setup):
|
OK the good news is it looks like [Edit ]@tjungblu To be clear the test includes the original zip file so it's just the image file that's missing. |
yep, that's now on the daemonset: COMP_CRIO_IMAGE_CMD = images and in the logs
I'm afraid the image file isn't there however:
|
OK in the pod can you run |
debug did the trick, here's the log output:
|
running the crictl command directly gives:
|
Hmm but it's reporting |
@tjungblu Building ... https://quay.io/repository/icdh/core-dump-handler/build/e99996b6-6385-4d72-8e7c-2aa180f6c326 |
sounds good, let me spin up another cluster for this |
I took the liberty to run it already, since it's Friday :) here's the output:
full file, copied from the pod: |
Love a Friday vibe :) I was hoping the As a fix I will also iterate over the I'll ping when a test image is ready |
ok @tjungblu that's building here Let me know if you want to add the scc fix to this and I will hold off merging this into main and bundle both aspects as a single release. |
@No9 awesome. I sadly couldn't figure out which container image tag was built (can't see the build logs), but I took the last tag
now we're at six files, you have mentioned earlier there should be seven. Anything crucial missing here?
I can send you a separate PR towards the end of the week once I got helm properly working. You're fine with adding a post-installation hook job for this? |
Hey @tjungblu How do you want to deal with the recommendation to use |
I think that's the least invasive, I reckon we put it behind an
yeah, I'd suggest we do it that way - it seems to work for now :) |
This is actually a little tricky as there is currently a
It might be better to have a flag like Or you can just add an My preference would be for the There are likely other options that merge these so feel free to suggest some ideas :) |
great point, after all your explanation it seems that
I'm wondering how many platforms we really need here, it seems that ROKS is different because it uses RHEL instead of RHCOS. I can see where you want to go with the different providers, especially in relation to the "provider-local" storage options. Another solution would be to have different values files for the respective environments you want to support (which is just a textual representation of your --set directives).
that certainly sounds like a better and more composable solution, I just have to figure out how to patch the scc from Helm :) Looking into the feature, I think we (OpenShift) should build an operator to wrap this to have proper support on OpenShift across all envs - that also solves the issue in (3) as we can easily detect the environment and operating systems. |
The other xKS services such as GCP/AWS seem to offer "own brand linux" by default and an Ubuntu option for their nodes so I think this will have wider utility
I really like the idea of a different values file! Lets go with that along with the Agree with the operator - There is a helm wrapper for this project here https://github.com/IBM/core-dump-operator but it's really just a stub at the moment. If OpenShift folks are going to pick up an operator it would be great to understand what the plan is so I can either shut that repo down or grant access - whatever makes sense. |
I couldn't get the job to work that would patch the existing SCC, which makes sense as this would be an easy privilege escalation path. I could make it work by creating a new SCC - so please have a look at #46 :)
awesome! then let's add a couple more, let me know what you think about the naming in the PR - I just bluntly named it openshift again.
Nice! The reason I came here is that we have a lighthouse customer that wants this functionality - I'm meeting them on Thursday and we'll decide on the operator aspects based on that. Generally we would work upstream in your operator project if it already exists, so we can discuss that when we get there. Thanks for your help so far, much appreciated. 🚀 |
ok - away from keyboard for the rest of the day but will look at the PR in the morning so you will have an update for Thursday. |
OK - I've merged the work associated with this issue and we have created separate issues for follow on work so I am closing this. |
Hey,
I'm currently looking into this project for OpenShift, you mention for ARO/ROSA:
what is missing to get this to work with RHCOS?
Cheers,
Thomas
The text was updated successfully, but these errors were encountered: