-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slurm collector: Correctly handle sending records that have already been sent to Auditor #681
Comments
We already handle this case AUDITOR/collectors/slurm/src/auditorsender.rs Lines 167 to 173 in 65cdcdd
So the question is why does the delete not work? |
The retries for the example record disappeared eventually:
Not sure if it's a coincidence that the job was stuck in the queue for almost exactly 2 hours. |
I may have found it. AUDITOR/collectors/slurm/src/sacctcaller.rs Lines 164 to 184 in 256395c
Your observation makes sense
At that time we had UTC+1. That one hour was not added but subtracted from UTC to build This is likely the same problem as #178. I suggest just doing: DateTime::<Local>::from_naive_utc_and_offset(
ts.naive_utc(),
Local::now().offset(),
), I think I will make a PR for this, #811 and #812 (since it's all the Slurm collector) later this week. |
Nice, thanks a lot for catching that! I'm no expert on timezone stuff in Rust, but if you think this should work please go ahead! |
For some reason, we see lines like this one in our logs:
As the error message suggests, the record already exists in the database:
This happens not only once, but for multiple records. Currently, we have
54
of such records in the sending queue.The issue here is, that these records are placed back into the sending queue: For the example record from above, a sending attempt has been made 12 times so far:
In these cases, where the record already exists in the database, we should either stop placing these records in the sending queue again and completely drop them, or place them into something similar to a dead letter queue so that they can be inspected manually.
We also should understand why this is happening in the first place.
Using Auditor and Slurm collector version v0.3.1
The text was updated successfully, but these errors were encountered: