TS-3948 Lock g_records during in RecDumpRecords to avoid a race #304

sekimura · 2015-10-09T04:06:59Z

Under certain conditions, data passing to RecDumpEntryCb and data inside the callback can be different. The problem is that g_records is not locked.

jpeach · 2015-10-09T04:12:31Z

How does this happen? There are lots of places that iterate over the records without holding the read lock. We should make sure that the lock is held correctly and consistently in all places.

sekimura · 2015-10-09T04:14:51Z

@jpeach I've added backtrace when we got a crash https://issues.apache.org/jira/browse/TS-3948

zwoop · 2015-10-09T14:13:13Z

James, it does hold this lock in a lot of places. :) But yeah, maybe we need to file a subsequent Jira on other APIs? The crash seems to happens when something (e.g. propstats) modifies librecords while at the same time, stats_over_http iterates over the metrics.

PSUdaemon · 2015-10-09T14:30:49Z

FWIW, I use this in prod and our corruption issues have gone away. We are not using hardening so I think we see silent corruption instead of crashes like @zwoop and @sekimura see.

jpeach · 2015-10-09T16:06:35Z

For example, RecLookupMatchingRecords doesn't lock around g_records, likewise g_records is not locked anywhere in P_RecCore.cc. The g_records_rwlock lock seems to be used when accessing the records store via the hash table, so it is not clear to me why you need to lock here.

In the stack trace, the relevant record is of type RECD_INT but by the time it gets to the plugin it is RECD_STRING. The proxy.process.ssl.origin_server_decryption_failed metric is a RECD_INT, so how did it get a partial update to a different type? Rather than adding extra locking, I really want an explanation of why the global lock should be held here.

PSUdaemon · 2016-02-23T17:27:43Z

Wanted to update, our prod issues did not go away completely, but were greatly reduced. So I don't think this is the correct or complete fix, but it's related.

bryancall · 2016-02-23T17:28:27Z

This needs to be investigated more to completely solve this issue.

zwoop · 2016-03-27T15:39:31Z

@sekimura I think we should close this for now, and maybe you can produce a more complete fix? If you agree, please close this PR.

sekimura · 2016-03-27T15:43:39Z

Hence this is not a complete fix, I'll close this PR.

PSUdaemon · 2016-05-02T03:00:54Z

Is there a new issue for this? I have some more information to share.

zwoop · 2016-05-05T14:49:46Z

Well, TS-3948 is still open, use that?

YTSATS-935: TSStringPercentDecode null-termination

TS-3948 Lock g_records during in RecDumpRecords to avoid a race

d297c7d

Under certain conditions, data passing to RecDumpEntryCb and data inside the callback can be different. The problem is that g_records is not locked.

sekimura closed this Mar 27, 2016

sekimura deleted the TS-3948 branch December 1, 2016 22:39

SolidWallOfCode pushed a commit to SolidWallOfCode/trafficserver that referenced this pull request Feb 1, 2017

Merge pull request apache#304 from solidwallofcode/yts-935-5.3.x

a8353cf

YTSATS-935: TSStringPercentDecode null-termination

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TS-3948 Lock g_records during in RecDumpRecords to avoid a race #304

TS-3948 Lock g_records during in RecDumpRecords to avoid a race #304

sekimura commented Oct 9, 2015

jpeach commented Oct 9, 2015

sekimura commented Oct 9, 2015

zwoop commented Oct 9, 2015

PSUdaemon commented Oct 9, 2015

jpeach commented Oct 9, 2015

PSUdaemon commented Feb 23, 2016

bryancall commented Feb 23, 2016

zwoop commented Mar 27, 2016

sekimura commented Mar 27, 2016

PSUdaemon commented May 2, 2016

zwoop commented May 5, 2016

TS-3948 Lock g_records during in RecDumpRecords to avoid a race #304

TS-3948 Lock g_records during in RecDumpRecords to avoid a race #304

Conversation

sekimura commented Oct 9, 2015

jpeach commented Oct 9, 2015

sekimura commented Oct 9, 2015

zwoop commented Oct 9, 2015

PSUdaemon commented Oct 9, 2015

jpeach commented Oct 9, 2015

PSUdaemon commented Feb 23, 2016

bryancall commented Feb 23, 2016

zwoop commented Mar 27, 2016

sekimura commented Mar 27, 2016

PSUdaemon commented May 2, 2016

zwoop commented May 5, 2016