Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDDS-3965. SCM failed to start up for duplicated pipeline detected. #1210

Merged
merged 3 commits into from Jul 17, 2020

Conversation

avijayanhwx
Copy link
Contributor

@avijayanhwx avijayanhwx commented Jul 16, 2020

What changes were proposed in this pull request?

After the patch from HDDS-3925, SCM failed to start up with an exception stating that a 'duplicated pipeline has been detected'.

RCA
On investigating this along with @nandakumar131, the issue was found to be in the implementation of the RocksDBIterator wrapper that we have over the native RocksIterator. While calling next(), the current value is returned, and the pointer is moved ahead. Hence, the removeFromDB method actually deletes the next entry from the DB every time! During the last removeFromDB call invoked after the last "next()" call, the current pointer is pointing to garbage. This means that when SCM starts up with this schema change, the removeFromDB actually deletes the next pipeline from the DB everytime, thereby leaving the "first" pipeline entry in old format undeleted. On restart again, this causes a duplicate pipeline exception. I will attach detailed logs from the problematic state and the fix.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-3965

How was this patch tested?

Manually tested.
Added unit tests.

Logs

Logs which show the issue.txt
Logs with fix.txt

@avijayanhwx avijayanhwx requested a review from arp7 July 17, 2020 04:06
@avijayanhwx
Copy link
Contributor Author

cc @fapifta

Copy link
Contributor

@arp7 arp7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, however I would be more comfortable if @nandakumar131 also reviews since I am not familiar with this code and might have missed some subtle point.

@nandakumar131 nandakumar131 merged commit ca4c5a1 into apache:master Jul 17, 2020
errose28 pushed a commit to errose28/ozone that referenced this pull request Jul 20, 2020
* master:
  HDDS-3984. Support filter and search the columns in recon UI (apache#1218)
  HDDS-3806. Support recognize aws v2 Authorization header. (apache#1098)
  HDDS-3955. Unable to list intermediate paths on keys created using S3G. (apache#1196)
  HDDS-3741. Reload old OM state if Install Snapshot from Leader fails (apache#1129)
  HDDS-3965. SCM failed to start up for duplicated pipeline detected. (apache#1210)
  HDDS-3855. Add upgrade smoketest (apache#1142)
  HDDS-3964. Ratis config key mismatch (apache#1204)
  HDDS-3612. Allow mounting bucket under other volume (apache#1104)
  HDDS-3926. OM Token Identifier table should use in-house serialization. (apache#1182)
  HDDS-3824: OM read requests should make SCM#refreshPipeline outside BUCKET_LOCK (apache#1164)
  HDDS-3966. Disable flaky TestOMRatisSnapshots
errose28 pushed a commit to errose28/ozone that referenced this pull request Jul 20, 2020
* HDDS-3869:
  HDDS-3984. Support filter and search the columns in recon UI (apache#1218)
  HDDS-3806. Support recognize aws v2 Authorization header. (apache#1098)
  HDDS-3955. Unable to list intermediate paths on keys created using S3G. (apache#1196)
  HDDS-3741. Reload old OM state if Install Snapshot from Leader fails (apache#1129)
  HDDS-3965. SCM failed to start up for duplicated pipeline detected. (apache#1210)
errose28 pushed a commit to errose28/ozone that referenced this pull request Jul 20, 2020
* master:
  HDDS-3984. Support filter and search the columns in recon UI (apache#1218)
  HDDS-3806. Support recognize aws v2 Authorization header. (apache#1098)
  HDDS-3955. Unable to list intermediate paths on keys created using S3G. (apache#1196)
  HDDS-3741. Reload old OM state if Install Snapshot from Leader fails (apache#1129)
  HDDS-3965. SCM failed to start up for duplicated pipeline detected. (apache#1210)
errose28 pushed a commit to errose28/ozone that referenced this pull request Jul 21, 2020
* add-deleted-block-table: (63 commits)
  Make block iterator tests use deleted blocks table, and remove the now unused #deleted#
  Replace uses of #deleted# key prefix with access to new deleted blocks table
  Add deleted blocks table to base level DB wrappers
  Have block deleting service test look for #deleted# keys in metadata table
  Move block delete to correct table and remove debugging print statement
  Import schema version when importing container data from export
  HDDS-3984. Support filter and search the columns in recon UI (apache#1218)
  HDDS-3806. Support recognize aws v2 Authorization header. (apache#1098)
  HDDS-3955. Unable to list intermediate paths on keys created using S3G. (apache#1196)
  HDDS-3741. Reload old OM state if Install Snapshot from Leader fails (apache#1129)
  Move new key value block iterator implementation and tests to new interface
  Fix checkstyle violations
  HDDS-3965. SCM failed to start up for duplicated pipeline detected. (apache#1210)
  Update comments
  Add comments on added helper method
  Remove seekToLast() from iterator interface, implementation, and tests
  Add more robust unit test with alternating key matches
  All unit tests pass after allowing keys with deleted and deleting prefixes to be made
  HDDS-3855. Add upgrade smoketest (apache#1142)
  HDDS-3964. Ratis config key mismatch (apache#1204)
  ...
ChenSammi pushed a commit that referenced this pull request Jul 22, 2020
rakeshadr pushed a commit to rakeshadr/hadoop-ozone that referenced this pull request Sep 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
3 participants