Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDDS-3814. Drop a column family through debug cli tool #1083

Merged
merged 5 commits into from
Oct 9, 2020

Conversation

maobaolong
Copy link
Member

@maobaolong maobaolong commented Jun 17, 2020

What changes were proposed in this pull request?

As it could involve an incompatible commit like HDDS-3614, we have to drop some table like S3Table to let OM can start successfully.

Now, we have to build and install a ldb of rocksdb, use that tools, we can drop the table S3Table, it is not an easy job to build and install ldb.

fortunately, we have debug tools which come in Ozone by HDDS-3622, so we can create a drop command to instead of ldb in rocksdb.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-3814

How was this patch tested?

An example as follow. We can verify the xxx table is dropped.

ozone debug ldb --db=/Users/mbl/ozone/om/metadata-dirs/om.db ls
ozone debug ldb --db=/Users/mbl/ozone/om/metadata-dirs/om.db drop --column_family xxx
ozone debug ldb --db=/Users/mbl/ozone/om/metadata-dirs/om.db ls

@maobaolong
Copy link
Member Author

@sadanand48 Please take a look at this PR. Thanks.

Copy link
Contributor

@sadanand48 sadanand48 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the patch @maobaolong. I have a question though: To let OM start successfully do we really need to drop the S3Table from the db?

List<byte[]> cfList = RocksDB.listColumnFamilies(new Options(),
parent.getDbPath());
byte[] nameByte = name.getBytes(StandardCharsets.UTF_8);
int deleteIndex = -1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the ArrayList.indexOf() method can be used here.
int deleteIndex = cfList.indexOf(nameByte)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@maobaolong
Copy link
Member Author

@sadanand48 Thanks for your review.

I have a question though: To let OM start successfully do we really need to drop the S3Table from the db?

HDDS-3614. Remove S3Table from OmMetadataManager. After this tickets merged, the S3Table has been removed from Ozone code, but it cannot startup successfully with the old om rocksdb, because the old om rocksdb contains S3Table, i just use the ldb of rocksdb to remove the S3Table from the om rocksdb, and start the OM successfully. That's doesn't matter, but the drop table cli command is useful i think.

elek
elek previously requested changes Jun 17, 2020
Copy link
Member

@elek elek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks to create @maobaolong. Overall it looks good to me. Two comments:

  1. Can we please use consistent naming for the column family parameters? In DBScanner, it's `--column_family. I would prefer to use the same naming for all the subcommands (if you need a short version, feel free to add it, but can we have the same long version here, too?)

  2. Other usability opinion: Until now we copied the interface of the rocksdb ldb command. Seems to be reasonable to do the same here and call the subcommand as drop_column_family to avoid some confusion (it drops the whole column family not just one key)

List<ColumnFamilyDescriptor> cfs = new ArrayList<>();
final List<ColumnFamilyHandle> columnFamilyHandleList =
new ArrayList<>();
List<byte[]> cfList = RocksDB.listColumnFamilies(new Options(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: Some part of this code is duplicated vs. DBScanner. I am not sure if it's better to move to a common method (as it's a real open not just a readOnlyOpen) but might be...

@avijayanhwx
Copy link
Contributor

avijayanhwx commented Jun 17, 2020

@maobaolong If OM cannot start successfully due to removal of a column family, then it is a bug and needs to be fixed. Does your cluster have the fix for HDDS-3668? IMO, adding capability to create/delete tables using a debug tool is risky.

@bharatviswa504
Copy link
Contributor

@maobaolong If OM cannot start successfully due to removal of a column family, then it is a bug and needs to be fixed. Does your cluster have the fix for HDDS-3668? IMO, adding capability to create/delete tables using a debug tool is risky.

Agree with @avijayanhwx this looks risky to delete column families from OM/SCM DB.

@maobaolong
Copy link
Member Author

@elek Thanks for your review and suggestion, i have changed the name of the argument to --column_family to keep consistent with DBScanner, and i create a common method to avoid code duplicated.

as it's a real open not just a readOnlyOpen) but might be...

I don't catch your point? If you mean to change the open to readOnlyOpen here, i have tested that it cannot work, because drop operation is a write operation for the rocksdb.

Please take another look, thanks.

@maobaolong
Copy link
Member Author

@avijayanhwx @bharatviswa504 Thank you for telling me HDDS-3668 can help me to resolve my problem.

  • Yeah, indeed, i believe HDDS-3668 is a better way, but when i met the problem caused by HDDS-3614, there aren't HDDS-3668 yet, so have to build and install a ldb of rocksdb, and drop the S3Table column family to resolve the problem, in fact, we haven't patch HDDS-3668 still now, and OM can successfully work by remove the useless table by the ldb of rocksdb.
  • HDDS-3668 is a way to avoid start failed with the unexpected column family, but we still want to remove the useless column family from rocksdb like the S3Table, when i think we don't need this table, i need a way to remove the column family.
  • Adding capability to create/delete tables using a debug tool is risky
    I agree, but we need it. In fact, without this PR, i can still create/delete the rocksdb by rocksdb/ldb, even i can remove the rocksdb persistent file on the disk, because i'm the administrator of this Ozone cluster, so i think we should prevent anybody/user to do the risky operation to Ozone cluster, but we should let someone with high permission of the cluster like me can do something they know they can do safety, and they can be responsible for the risk by their operation.

@bharatviswa504
Copy link
Contributor

bharatviswa504 commented Jun 18, 2020

@maobaolong
Thanks for the detailed explanation.
HDDS-3668 is a workaround until upgrades are properly handled. Once upgrades are handled, then it will be the responsibility of the feature developer to handle such scenarios. This was discussed in detail in the section "Extending Layout Features for OM"

Ya agree, for now, this has caused trouble for existing deployments.

even i can remove the rocksdb persistent file on the disk

I don't think if someone goes to the metadata folder and deletes it directly, I am not sure there is a way to safeguard such a situation, and user is risking his data on the cluster. Because now we are exposing such a CLI tool to end-user, this seems risky.

so i think we should prevent anybody/user to do the risky operation to Ozone cluster, but we should let someone with high permission of the cluster like me can do something they know they can do safety, and they can be responsible for the risk by their operation.

Do you think having an admin check here would safeguard any user from doing this?
And also printing out some WARN message to let the user remind about the operation they are trying to perform here.

@avijayanhwx
Copy link
Contributor

avijayanhwx commented Jun 18, 2020

Thank you for the explanation @maobaolong.

when i think we don't need this table, i need a way to remove the column family.

Removing useless column family is a schema change, and schema changes across versions are generally handled by Upgrade handlers in systems.

In fact, without this PR, i can still create/delete the rocksdb by rocksdb/ldb, even i can remove
the rocksdb persistent file on the disk, because i'm the administrator of this Ozone cluster

Yes, these types of destructive operations can be done possibly by naive users. But at least in that case, Ozone did not provide a direct way for the user to mess up its own DB.

As @bharatviswa504 said, we need to put appropriate safeguards beforee this operation. Is it possible to move this to an "ozone admin" command, with an explicit confirmation by printing a warning message (with the table name), and the user to enter something like "delete" or "yes" to confirm?

Also, in the case of OM HA, doesn't the user have to do it in every OM individually?

@maobaolong
Copy link
Member Author

@bharatviswa504 Thank you for discuss with me.

Do you think having an admin check here would safeguard any user from doing this?
I think it worth to do for many risky operation, but for this command, it cannot be executed remotely, it can be executed locally, so who can login the node of OM with the admin user that have prove that who have the admin permission. So, i mean, it does't need to do the safeguard check for this drop_column_family command.

even i can remove the rocksdb persistent file on the disk
Sorry, i just an inappropriate example, another appropriate examples like following, they are all risky and necessary.

  • Admin move the rocksdb from one dir into another,
  • Admin backup&recover the rocksdb

Another truth, without this PR, i can still create/delete the rocksdb by rocksdb/ldb, with this PR, i don't have to intall the rocksdb/ldb only.

@maobaolong
Copy link
Member Author

@avijayanhwx Thanks for your question, it let me think more.

Removing useless column family is a schema change, and schema changes across versions are generally handled by Upgrade handlers in systems.

In fact, the column family S3Table is a good example, the schema is removed in OM, but it still exist in the rocksdb, so it doesn't be removed by upgrade handlers, i have to remote it manually though rocksdb/ldb, i thinks this PR is nothing but only the substitute of rocksdb/ldb.

these types of destructive operations can be done possibly by naive users.

I just did the drop column family operation for S3Table last week on our Ozone cluster.

But at least in that case, Ozone did not provide a direct way for the user to mess up its own DB.

In our Ozone cluster, nobody can login with the admin user to the OM node, even someone can login with the developer user, they can only read the log and rocksdb file, no write permission at all, so they cannot do the drop table, create table, remove/move rocksdb. In this sense, i think it is safety.

Also, in the case of OM HA, doesn't the user have to do it in every OM individually?
Yeah, this PR is the only substitute of rocksdb/ldb, and it operate the rocksdb directly, so the operator need to execute this command individually.

@bharatviswa504
Copy link
Contributor

bharatviswa504 commented Jun 18, 2020

@maobaolong
The issue with S3 patch is, it has just used fixed volume "/s3v", previously this volume is determined from accesskey, and it is a breaking change when upgrading from the previous version to the latest master/to be released version. As once after upgrade, users will not be able to access old buckets that they have created using the S3G. To handle that HDDS-3563 has done some code changes to make this configurable. So, that old buckets can be still accessible.
But if the previous cluster has n number of access key's we need a solution like s3 bind mount.

So, just exposing such a tool, and dropping a table from DB and upgrade the OM, will not work.

As currently, Ozone does has not have complete support for upgrades, once upgrade work is done, if any of these kinds of changes are performed, it is the developer's responsibility to handle these kinds of breaking changes. As said before, @avijayanhwx has discussed this in the upgrade design document in the section " Extending Layout Features for OM" which needs to be taken care of by LayoutFeature upgrade handlers.

So, in my view exposing such a tool from Ozone is not a good idea.

Admin move the rocksdb from one dir into another,
Admin backup&recover the rocksdb

Another truth, without this PR, i can still create/delete the rocksdb by rocksdb/ldb, with this PR, >i don't have to intall the rocksdb/ldb only.

If an admin is doing these kinds of steps, it means he is risking data on the cluster.

Finally, I want to say, any kind of incompatible changes should be taken care of by upgrade handlers to handle properly, and it should not require an admin to perform these kinds of operations. If it requires, that means our upgrade path is not correct.

@maobaolong
Copy link
Member Author

@bharatviswa504 Ok, thank you for your clarification. Do you mind to move these tool into the admin subcommand?

@elek
Copy link
Member

elek commented Jun 22, 2020

I would like to add my own opinion about the size of the risks:

I wouldn't like to convince anybody, but couldn't see very high risk. Actually I think there is a higher risk to use a wrong rm -rf command than using ozone ldb which couldn't work without very long and specific parameters. Actually it's already longer than a usual --yes confirmation flag.

  1. Users shouldn't have permission to the rocksdb files. If they have it: doesn't matter if you check admin permission, they have all the power to do bad things. Actually even just having the read access to the rocksdb violates the security model.

  2. I would prefer to improve the usability of the admin tools even if the risk is slightly higher. Administrators already learned to handle risks and usually they use the power very carefully.

Finally, I want to say, any kind of incompatible changes should be taken care of by upgrade handlers to handle properly, and it should not require an admin to perform these kinds of operations. If it requires, that means our upgrade path is not correct.

This is what we have now. If this were the only concern, we can solve it by removing this ldb subcommand when upgrade is implemented. Today we have incompatibilities where I can accept that we need tool to help admins.

@bshashikant
Copy link
Contributor

I agree with @elek here.

@bharatviswa504
Copy link
Contributor

My point here is just deleting a column family and assuming everything works with any incompatible change is wrong.

This is what we have now. If this were the only concern, we can solve it by removing this ldb subcommand when upgrade >is implemented. Today we have incompatibilities where I can accept that we need tool to help admins.

Yes, for now with an incompatible change this tool may help. But using this tool to solve such issues will not solve the problem completely, let's take an example of S3 incompatible change, we need other changes like HDDS-3563. So, I am not sure just by exposing such tool, and when we see a column family is dropped and thinking that dropping a column family will resolve my upgrade issue. So, I am not sure just using this tool will help how much here.

I have explained the same in one of my above comment in detail regarding this issue.
#1083 (comment)

@elek
Copy link
Member

elek commented Jun 24, 2020

will not solve the problem completely, let's take an example of S3 incompatible change, we need other changes like HDDS-3563

It was not a goal to solve all the upgrade issues, IMHO. Just provide a tool which can help to solve one, current issues. Where this tool can help.

I agree, to solve all the issues we need the upgrade work but that's a bigger effort and independent of this patch.

@avijayanhwx
Copy link
Contributor

I am OK with this if it moves into admin command since it is a cluster "update" operation rather than a "debug" operation.

@bharatviswa504
Copy link
Contributor

HDDS-3668 has already solved this issue, but the only thing it has not taken care of is the deleting column family, but even after the upgrade it will open the column family which does not exist anymore in the newer version after the upgrade. Why can't we extend that to delete column families also, instead of providing this tool? (And we can remove this change, once after proper upgrade support is available). In this way, we don't need any admin intervention in deleting these column families during the upgrade.

@elek
Copy link
Member

elek commented Jun 25, 2020

Why can't we extend that to delete column families also, instead of providing this tool? (

We agreed that this is a dangerous tool. I think it's better to do it explicit rather than implicit in the background. But this is just my opinion.

@maobaolong
Copy link
Member Author

Some more arguments.

In fact, with this tool, in some situation, as an administrator of Ozone cluster, if i decided to do some operation with the rocksdb, i always have approach to modify the rocksdb, i can use the rocksdb/ldb to achieve my goal. What i want to say is this PR just give me convince, without this tool, cannot prevent me to do this risk things.

So, I think, what we should do is to verify the role of user, and doing operation under the permission of this role.

@bharatviswa504
Copy link
Contributor

bharatviswa504 commented Jun 25, 2020

We agreed that this is a dangerous tool. I think it's better to do it explicit rather than implicit in the background. But this is >just my opinion.

But if we do the backup of DB, and i don't see any issue. But if we want to be conservative or cautious it might be not to do this.

In fact, with this tool, in some situation, as an administrator of Ozone cluster, if i decided to do some operation with the >rocksdb, i always have approach to modify the rocksdb, i can use the rocksdb/ldb to achieve my goal. What i want to say is >this PR just give me convince, without this tool, cannot prevent me to do this risk things.

If admin, wants to do some modifications like this, and ozone is not providing such a tool, and admin still want to build ldb tool and then need to perform these operations which is a long process. So, in this case ozone is not providing such a tool, admin wants to still perform these, not sure in which cases we might need this when code can already handle these things internally.

But it looks like as everyone is more interested in getting this tool, even if code can handle by doing minor change, I am okay to move this tool under admin, currently for admin commands, we don't perform any checks is this operation by admin is this missing?

@xiaoyuyao
Copy link
Contributor

Reading through the comments and code, I'm OK with other readonly OM DB tools. However, I share the same concern with @bharatviswa504 and @avijayanhwx on exposing tool that can directly modify OM db.

If there is a bug in Ozone that needs drop tables, we would better fix it via official code fixes over asking admin do perform hack workaround as it usually will not be approved in most customer production cluster.

@elek
Copy link
Member

elek commented Jul 30, 2020

So what is the conclusion here.

Majority seems to have concerns about providing generic R/W RocksDB tools.

I created a related patch: #1276. It makes it possible to register sub-commands runtime: anybody (with special needs) can register sub-commands to existing CLI and provide additional functionality (it should be registered).

What do you suggest with this PR?

  1. Merge it without registering the sub-command (which can be done by creating files manually in the conf files)
  2. Cancel it and leave it to the users to use this command in a custom distribution.

Any opinion?

@elek
Copy link
Member

elek commented Aug 11, 2020

/pending Any opinion?

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Marking this issue as un-mergeable as requested.

Please use /ready comment when it's resolved.

Any opinion?

@maobaolong
Copy link
Member Author

@elek Thanks for your conclusion and suggested options, and thank @bharatviswa504 @sadanand48 @avijayanhwx @bshashikant @xiaoyuyao for the discussion, I will go to check #1276 and go back soon.

@maobaolong
Copy link
Member Author

@elek Thanks for the Extensible subcommands framework., it works as expected.
I did some tests as following.

➜  ozone-1.1.0-SNAPSHOT git:(HDDS-3814) ✗ bin/ozone debug ldb                                                                                                                            
Missing required option: '--db=<dbPath>'
Usage: ozone debug ldb --db=<dbPath> [COMMAND]
Parse rocksdb file content
      --db=<dbPath>   Database File Path
Commands:
  scan                      Parse specified metadataTable
  list_column_families, ls  list all column families in db.
➜  ozone-1.1.0-SNAPSHOT git:(HDDS-3814) ✗ cp -r /Users/mbl/projects/github/hadoop-ozone/hadoop-ozone/tools/target/classes/META-INF etc/hadoop/
➜  ozone-1.1.0-SNAPSHOT git:(HDDS-3814) ✗ bin/ozone debug ldb                                                                                 
Missing required option: '--db=<dbPath>'
Usage: ozone debug ldb --db=<dbPath> [COMMAND]
Parse rocksdb file content
      --db=<dbPath>   Database File Path
Commands:
  scan                      Parse specified metadataTable
  list_column_families, ls  list all column families in db.
➜  ozone-1.1.0-SNAPSHOT git:(HDDS-3814) ✗ echo "org.apache.hadoop.ozone.debug.DropTable" >>  etc/hadoop/META-INF/services/org.apache.hadoop.hdds.cli.SubcommandWithParent
➜  ozone-1.1.0-SNAPSHOT git:(HDDS-3814) ✗ bin/ozone debug ldb                                                             
Missing required option: '--db=<dbPath>'
Usage: ozone debug ldb --db=<dbPath> [COMMAND]
Parse rocksdb file content
      --db=<dbPath>   Database File Path
Commands:
  scan                      Parse specified metadataTable
  list_column_families, ls  list all column families in db.
  drop_column_family        drop column family in db.
➜  ozone-1.1.0-SNAPSHOT git:(HDDS-3814) ✗ bin/ozone debug ldb  --db=/tmp/metadata/scm.db/ ls                                                                         
default
validCerts
deletedBlocks
pipelines
revokedCerts
containers
➜  ozone-1.1.0-SNAPSHOT git:(HDDS-3814) ✗ bin/ozone debug ldb  --db=/tmp/metadata/scm.db/  drop_column_family --column_family=deletedBlocks 
➜  ozone-1.1.0-SNAPSHOT git:(HDDS-3814) ✗ bin/ozone debug ldb  --db=/tmp/metadata/scm.db/ ls                                               
default
validCerts
pipelines
revokedCerts
containers

It's awesome, for the default, we don't need to add the risk commands as subcommand, but someone who want to drop table can edit the etc/hadoop/META-INF/services/org.apache.hadoop.hdds.cli.SubcommandWithParent file and add whatever the subcommand you like.

@elek @bharatviswa504 @sadanand48 @avijayanhwx @bshashikant @xiaoyuyao thanks for your review and discussion, please take another look.

@maobaolong
Copy link
Member Author

/ready

@maobaolong maobaolong requested a review from elek August 31, 2020 02:35
@github-actions github-actions bot dismissed their stale review August 31, 2020 02:35

Blocking review request is removed.

@github-actions github-actions bot removed the pending label Aug 31, 2020
@elek
Copy link
Member

elek commented Sep 23, 2020

@bharatviswa504 @avijayanhwx Are you fine with this approach?

Command won't be visible unless you put a magic file to the right place (META-INF/services). It's a hidden helper class.

Would like to merge if you have no objectinos.

Copy link
Contributor

@avijayanhwx avijayanhwx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies for getting back to this late. I am happy with the guards around the delete operation. LGTM +1.

Copy link
Contributor

@bharatviswa504 bharatviswa504 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for getting back late.
I have one comment.

break;
}
}
rocksDB.dropColumnFamily(toBeDeletedCf);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a question:
If tablename is not a columnfamily in DB, we pass null for dropColumnFamily, do we need to print some error message if table is not in DB?

And also for the success scenario, do we want to add some message?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Thanks for the suggestion.

Copy link
Contributor

@bharatviswa504 bharatviswa504 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 LGTM.

@elek
Copy link
Member

elek commented Oct 9, 2020

Merging it now. Thanks the review @bharatviswa504 and @avijayanhwx and the patch @maobaolong

@elek elek merged commit 7704cb5 into apache:master Oct 9, 2020
errose28 added a commit to errose28/ozone that referenced this pull request Oct 14, 2020
* master: (23 commits)
  HDDS-4122. Implement OM Delete Expired Open Key Request and Response (apache#1435)
  HDDS-4336. ContainerInfo does not persist BCSID (sequenceId) leading to failed replica reports (apache#1488)
  Remove extra serialization from getBlockID (apache#1470)
  HDDS-4262. Use ClientID and CallID from Rpc Client to detect retry requests (apache#1436)
  HDDS-4285. Read is slow due to frequent calls to UGI.getCurrentUser() and getTokens() (apache#1454)
  HDDS-4312. findbugs check succeeds despite compile error (apache#1476)
  HDDS-4311. Type-safe config design doc points to OM HA (apache#1477)
  HDDS-3814. Drop a column family through debug cli tool (apache#1083)
  HDDS-3728. Bucket space: check quotaUsageInBytes when write key and allocate block. (apache#1458)
  HDDS-4316. Upgrade to angular 1.8.0 due to CVE-2020-7676 (apache#1481)
  HDDS-4325. Incompatible return codes from Ozone getconf -confKey (apache#1485). Contributed by Doroszlai, Attila.
  HDDS-4309. Fix inconsistency in recon config keys starting with recon and not ozone (apache#1478)
  HDDS-4310: Ozone getconf broke the compatibility (apache#1475)
  HDDS-4298. Use an interface in Ozone client instead of XceiverClientManager (apache#1460)
  HDDS-4280. Document notable configurations for Recon. (apache#1448)
  HDDS-4156. add hierarchical layout to Chinese doc (apache#1368)
  HDDS-4242. Copy PrefixInfo proto to new project hadoop-ozone/interface-storage (apache#1444)
  HDDS-4264. Uniform naming conventions of Ozone Shell Options. (apache#1447)
  HDDS-4271. Avoid logging chunk content in Ozone Insight (apache#1466)
  HDDS-4299. Display Ratis version with ozone version (apache#1464)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants