Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend the retention policy of logs from the Rucio pods #495

Closed
jhonatanamado opened this issue May 1, 2023 · 5 comments · Fixed by dmwm/rucio-flux#263
Closed

Extend the retention policy of logs from the Rucio pods #495

jhonatanamado opened this issue May 1, 2023 · 5 comments · Fixed by dmwm/rucio-flux#263
Assignees
Milestone

Comments

@jhonatanamado
Copy link

Time to time, data management require to give information about particular dids (files, datasets or containers) to understand if those were properly injected in rucio, properly attached to the parent/child dids or if those dids were erased from Rucio.
Right now data management can only see this information from the pods and those are being restarted each 24h (or 48h).
In order to properly provide feedback to the different groups that data management gives support we require a better retention of those logs, 6 months should be enough.

@jhonatanamado
Copy link
Author

FYI @amanrique1 , @klannon @amaltaro

@amaltaro
Copy link

amaltaro commented May 1, 2023

Thank you for creating this ticket, Jhonatan.
I agree that having logs available for a longer period of time (many weeks / a few months) would be helpful to debug and correlate data among different systems. Of course, this assumes that Rucio server can record meaningful access log (action, actor, timestamp, etc).

@amaltaro
Copy link

amaltaro commented May 1, 2023

I forgot to mention that whoever works on this, I would suggest to communicate it with the CMSWEB team such that process/technology can be shared. They have already implemented it for all the CMSWEB services.

@ericvaandering
Copy link
Member

Some time ago we did have logs going to a special MONIT server, the name of which I forget. It was behind an SSO login.

It's possible that disappeared. However, the CERN Kubernetes setup has the ability to do this, I think, so it should just be a matter of turning it on and sending it to the right place. @arooshap or @muhammadimranfarooqi can you offer any insight?

@arooshap
Copy link
Member

arooshap commented May 3, 2023

@ericvaandering I remember that a while back, someone asked me for rucio logs, and I referred them to the following URL: https://monit-opensearch.cern.ch/dashboards/goto/05f82530b19ffdf87d770600b85c5042?security_tenant=global, but I am not sure if you are referring to this.

It's important to keep in mind that Kubernetes does not provide a built-in solution for long-term storage of logs. If you need to retain your logs for a longer period, you should consider using an external logging system that can store logs for an extended period of time. If you want to retain the stdout logs for a longer time, for example, 10 days, then you can add the following line to the API server configuration: --log-retention=10d. The default value is 5 days.

Logs are aggregated automatically at the cluster and can be pushed to the CERN IT logging infrastructure (timber) if desired by passing a logging_producer label. Refer to this linkfor more details.

Additionally, what you can also do is create a separate CephFS mount to store your logs. If you want to know about the details, you can access the documentation here: https://cms-http-group.docs.cern.ch/k8s_cluster/storage/#cephfs-shares, and if you have any questions, feel free to ask me.

Let me know how you would like to proceed forward, and if and where you are producing logs, because I am not aware of the architectural details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants