Extend the retention policy of logs from the Rucio pods #495

jhonatanamado · 2023-05-01T14:51:57Z

Time to time, data management require to give information about particular dids (files, datasets or containers) to understand if those were properly injected in rucio, properly attached to the parent/child dids or if those dids were erased from Rucio.
Right now data management can only see this information from the pods and those are being restarted each 24h (or 48h).
In order to properly provide feedback to the different groups that data management gives support we require a better retention of those logs, 6 months should be enough.

jhonatanamado · 2023-05-01T15:05:59Z

FYI @amanrique1 , @klannon @amaltaro

amaltaro · 2023-05-01T17:46:08Z

Thank you for creating this ticket, Jhonatan.
I agree that having logs available for a longer period of time (many weeks / a few months) would be helpful to debug and correlate data among different systems. Of course, this assumes that Rucio server can record meaningful access log (action, actor, timestamp, etc).

amaltaro · 2023-05-01T17:48:39Z

I forgot to mention that whoever works on this, I would suggest to communicate it with the CMSWEB team such that process/technology can be shared. They have already implemented it for all the CMSWEB services.

ericvaandering · 2023-05-02T21:22:27Z

Some time ago we did have logs going to a special MONIT server, the name of which I forget. It was behind an SSO login.

It's possible that disappeared. However, the CERN Kubernetes setup has the ability to do this, I think, so it should just be a matter of turning it on and sending it to the right place. @arooshap or @muhammadimranfarooqi can you offer any insight?

arooshap · 2023-05-03T10:48:08Z

@ericvaandering I remember that a while back, someone asked me for rucio logs, and I referred them to the following URL: https://monit-opensearch.cern.ch/dashboards/goto/05f82530b19ffdf87d770600b85c5042?security_tenant=global, but I am not sure if you are referring to this.

It's important to keep in mind that Kubernetes does not provide a built-in solution for long-term storage of logs. If you need to retain your logs for a longer period, you should consider using an external logging system that can store logs for an extended period of time. If you want to retain the stdout logs for a longer time, for example, 10 days, then you can add the following line to the API server configuration: --log-retention=10d. The default value is 5 days.

Logs are aggregated automatically at the cluster and can be pushed to the CERN IT logging infrastructure (timber) if desired by passing a logging_producer label. Refer to this linkfor more details.

Additionally, what you can also do is create a separate CephFS mount to store your logs. If you want to know about the details, you can access the documentation here: https://cms-http-group.docs.cern.ch/k8s_cluster/storage/#cephfs-shares, and if you have any questions, feel free to ask me.

Let me know how you would like to proceed forward, and if and where you are producing logs, because I am not aware of the architectural details.

amaltaro mentioned this issue May 2, 2023

gfal-copy failing but returning status code 0 - leaving a bunch of file mismatch in the system dmwm/WMCore#11556

Closed

dynamic-entropy self-assigned this Oct 10, 2023

ericvaandering added this to the Persist logs milestone Dec 5, 2023

This was referenced Feb 29, 2024

Meta: Persist rucio logs #729

Closed

Persist logs to cms opensearch dmwm/rucio-flux#263

Merged

ericvaandering closed this as completed in dmwm/rucio-flux#263 Mar 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extend the retention policy of logs from the Rucio pods #495

Extend the retention policy of logs from the Rucio pods #495

jhonatanamado commented May 1, 2023

jhonatanamado commented May 1, 2023

amaltaro commented May 1, 2023

amaltaro commented May 1, 2023

ericvaandering commented May 2, 2023

arooshap commented May 3, 2023

Extend the retention policy of logs from the Rucio pods #495

Extend the retention policy of logs from the Rucio pods #495

Comments

jhonatanamado commented May 1, 2023

jhonatanamado commented May 1, 2023

amaltaro commented May 1, 2023

amaltaro commented May 1, 2023

ericvaandering commented May 2, 2023

arooshap commented May 3, 2023