Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid state in GC manifest seen #827

Open
engelsanchez opened this issue Mar 26, 2014 · 4 comments
Open

Invalid state in GC manifest seen #827

engelsanchez opened this issue Mar 26, 2014 · 4 comments
Milestone

Comments

@engelsanchez
Copy link

This was observed twice at customer sites. I'm not sure how it got there, but a manifest in the GC bucket did not have the state expected. The following patch was applied to avoid the daemon crashing so that GC could continue:

1c1c6f4

@reiddraper reiddraper added this to the 1.5.0 milestone Mar 28, 2014
@jonmeredith
Copy link

Customer ticket zd://7336

kuenishi added a commit that referenced this issue Apr 18, 2014
complementary to #827, this commit prevents Manifest's delete_block_remaining
being undefined, after init/1 returns. Before prepare/2 very rarely but
there could be a message that interrupts and moves to some invalid state.
This also reduces number of states about delete_block_remaining being
undefined or not.
kuenishi added a commit that referenced this issue Apr 18, 2014
complementary to #827, this commit prevents Manifest's delete_block_remaining
being undefined, after init/1 returns. Before prepare/2 very rarely but
there could be a message that interrupts and moves to some invalid state.
This also reduces number of states about delete_block_remaining being
undefined or not.

Conflicts:
	src/riak_cs_delete_fsm.erl
@reiddraper reiddraper modified the milestones: 1.5.1, 1.5.0 May 1, 2014
@kuenishi kuenishi modified the milestones: 1.5.1, 2.0.0 Sep 4, 2014
@kuenishi
Copy link
Contributor

kuenishi commented Sep 4, 2014

This also happened for 1.5.0, upgraded from 1.4. See zd://8612

@shino
Copy link
Contributor

shino commented Sep 5, 2014

memo:

{stop, normal} with state BlockCount=0 replies to a worker (or gc daemon in 1.4.x) with successful message. Then manifests in GC bucket are deleted (precisely speaking towp_set:delelete_element/2).
If the manifest with invalid state are actually the one which should be deleted, the blocks belonging to the manifest will become orphan.

shino added a commit that referenced this issue Sep 5, 2014
See #827

Logged information includes:

- Key in GC bucket
- UUID (in manifest)
- CS bucket and key (also in manifest)

It's noteworthy that if riak_cs_delete_fsm replies ok with the same
total blocks and deleted blocks to riak_cs_gc_worker, then the worker
attempt to delete manifest entry in twop_set.  Doing it before
deleting manfiests and blocks completely poses orphan
manifests/blocks.  In this PR, riak_cs_delete_fsm replies with total
blocks = 1 and deleted_blocks = 0.
shino added a commit that referenced this issue Sep 5, 2014
See #827

Logged information includes:

- Key in GC bucket
- UUID (in manifest)
- CS bucket and key (also in manifest)

It's noteworthy that if riak_cs_delete_fsm replies ok with the same
total blocks and deleted blocks to riak_cs_gc_worker, then the worker
attempt to delete manifest entry in twop_set.  Doing it before
deleting manfiests and blocks completely poses orphan
manifests/blocks.  In this PR, riak_cs_delete_fsm replies with total
blocks = 1 and deleted_blocks = 0.

Conflicts:
	src/riak_cs_delete_fsm.erl
Add GC riak_test for invalid state manifet in GC bucket
@shino
Copy link
Contributor

shino commented Sep 9, 2014

The code to skip invalid state manifests and output warning logs was merged at #964 .
The root cause is not yet identified.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants