-
Notifications
You must be signed in to change notification settings - Fork 796
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TS-3948 Lock g_records during in RecDumpRecords to avoid a race #304
Conversation
Under certain conditions, data passing to RecDumpEntryCb and data inside the callback can be different. The problem is that g_records is not locked.
How does this happen? There are lots of places that iterate over the records without holding the read lock. We should make sure that the lock is held correctly and consistently in all places. |
@jpeach I've added backtrace when we got a crash https://issues.apache.org/jira/browse/TS-3948 |
James, it does hold this lock in a lot of places. :) But yeah, maybe we need to file a subsequent Jira on other APIs? The crash seems to happens when something (e.g. propstats) modifies librecords while at the same time, stats_over_http iterates over the metrics. |
For example, In the stack trace, the relevant record is of type |
Wanted to update, our prod issues did not go away completely, but were greatly reduced. So I don't think this is the correct or complete fix, but it's related. |
This needs to be investigated more to completely solve this issue. |
@sekimura I think we should close this for now, and maybe you can produce a more complete fix? If you agree, please close this PR. |
Hence this is not a complete fix, I'll close this PR. |
Is there a new issue for this? I have some more information to share. |
Well, TS-3948 is still open, use that? |
YTSATS-935: TSStringPercentDecode null-termination
Under certain conditions, data passing to RecDumpEntryCb and data inside the callback can be different. The problem is that g_records is not locked.