Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow restart of Solr Pod init_container cp-solr-xml chown #537

Closed
smoldenhauer-ish opened this issue Mar 31, 2023 · 7 comments · Fixed by #548
Closed

Slow restart of Solr Pod init_container cp-solr-xml chown #537

smoldenhauer-ish opened this issue Mar 31, 2023 · 7 comments · Fixed by #548
Labels
Milestone

Comments

@smoldenhauer-ish
Copy link
Contributor

Solr Operator 0.6.0
In our deployments a restart of Solr Pods will take minutes up to half an hour.
The init container cp-solr-xml with the command
cp /tmp/solr.xml /tmp-config/solr.xml && chown -R 8983:8983
/var/solr/data/backup-restore/local-collection-backups-1
esp. the recursive chown seems to be the cause. We mitigated it a bit by cleaning up the backup volume regularly. But there are deployments with a larger number of collections (~400) and then it is still slow.
...
The backup repository is a azurefile-csi storage.

mailing list discussion: https://lists.apache.org/thread/tlxg38v6y81dhykycrgqw77mm2xdoqbr

@smoldenhauer-ish
Copy link
Contributor Author

basic idea is to skip the chown if the backup directory is already writable for the solr user.

@HoustonPutman
Copy link
Contributor

Yeah, sounds like a good improvement! Looks like you are testing out some options, so make a PR when you are ready.

@HoustonPutman HoustonPutman added this to the main (v0.7.0) milestone Mar 31, 2023
smoldenhauer-ish added a commit to intershop/solr-operator that referenced this issue Apr 3, 2023
smoldenhauer-ish added a commit to intershop/solr-operator that referenced this issue Apr 4, 2023
@smoldenhauer-ish
Copy link
Contributor Author

The changed statement is working now. I'm currently waiting for a test with forked custom image (0.6.0 + fix) in production deployment.

@HoustonPutman
Copy link
Contributor

Awesome, make the PR when your tests are done! We will probably start the process for the v0.7.0 release late next week, and it would be great to have this included.

smoldenhauer-ish added a commit to intershop/solr-operator that referenced this issue Apr 12, 2023
@HoustonPutman HoustonPutman linked a pull request Apr 12, 2023 that will close this issue
smoldenhauer-ish added a commit to intershop/solr-operator that referenced this issue Apr 12, 2023
smoldenhauer-ish added a commit to intershop/solr-operator that referenced this issue Apr 12, 2023
smoldenhauer-ish added a commit to intershop/solr-operator that referenced this issue Apr 12, 2023
@HoustonPutman
Copy link
Contributor

TODO: We should think about just using the fsGroupChangePolicy available in Kubernetes v1.23

@smoldenhauer-ish
Copy link
Contributor Author

I saw that fsGroupChangePolicy = onRootMismatch , but wasn't sure how to apply that to the Solr backup PVC
It was kind of inspiration for the bugfix.
If that would work, I guess the cp-solr-xml will be back to only copy the solr.xml

@HoustonPutman
Copy link
Contributor

Yeah reading more through it, it seems like the default functionality is "Always", so its weird that it wouldn't be working in the first place. I guess we need to keep the logic in regardless.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants