inconsistent volume file size may lead to persistent write error #331

hxiaodon · 2016-06-27T12:58:51Z

Chris:
This is an edge case that my colleague and I reproduced at our beta environment

Reproduce steps:

turn on replication(at least 2 copy)
benchmark tool is launched, I write a simple one against filer service
stop one volume server during the stress testing(suppose the dat files include a file named 1.dat), then there will be 5-10 seconds for the master node assign a group of new volume files. The replicationWrite will lead to the 1.dat file in other volume servers still accepting the write requests during the heartbeat
start the volume server stopped at step 3, then the 1.dat file could accept write request again, but their file size's difference will not be erased
Finally, the 1.dat file will exceed volumeSizeLimitMB, and the 1.dat should be marked as readonly. But the problem happen, the 1.dat is readonly in one volume server while it is not readonly in another, which will make master server switch the readonly status and the replication write will never succeed as long as the master server assign a new fid at this volume when the topology status for this volume is writable( The stress testing could reproduce easily).

My local's screenshot

There are 2 ways to fix this issue:

Store write(store.go Write method) bypass the MaxPossibleVolumeSize checking, finally all volume files against same vid at different server will all exceed MaxPossibleVolumeSize
master's Heartbeat analyzing will cache each volume's max file size, if it exceed the MaxPossibleVolumeSize, directly mark this volume as readonly

What's your suggestion?

chrislusf · 2016-06-27T22:30:17Z

I used the second approach to remember any volume that has been oversized.

Actually this case should not happen often size the vacuuming usually will kick in to remove those extra files. So I guess your case only happen during your benchmarking.

chrislusf closed this as completed in b617b13 Jun 27, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

inconsistent volume file size may lead to persistent write error #331

inconsistent volume file size may lead to persistent write error #331

hxiaodon commented Jun 27, 2016 •

edited

Loading

chrislusf commented Jun 27, 2016

inconsistent volume file size may lead to persistent write error #331

inconsistent volume file size may lead to persistent write error #331

Comments

hxiaodon commented Jun 27, 2016 • edited Loading

chrislusf commented Jun 27, 2016

hxiaodon commented Jun 27, 2016 •

edited

Loading