New FateOperations API #2780

milleruntime · 2022-06-15T17:44:27Z

Changes taken from #2215 with some improvements.

ctubbsii

Conceptually, I like the idea. However, I think the API could be streamlined.

ctubbsii · 2022-06-16T06:02:17Z

core/src/main/java/org/apache/accumulo/core/client/admin/FateOperations.java

+  List<FateTransactionStatus> fateStatus(Set<String> txids, List<String> tStatus)
+      throws AccumuloException;


If you had methods, fail() and delete() on FateTransactionStatus, this list (or set, really.. since the items should be unique) is all you'd really need, and you wouldn't need the separate FateOperations interface. It could also be a stream, so you can do things like:

client.instanceOperations().fateTransactions() .filter(tx -> tx.getCreatedTime().before(threeDaysAgo)) .peek(FateTransaction::fail) .filter(tx -> tx.getCreatedTime().before(fiveDaysAgo)) .forEach(FateTransaction::delete);

Returning it as a set and calling .stream() on the set would work too. The main point is that FateTransactionImpl should have a reference to the client, so you can act on it with fail() and delete() operations.

I think this will work. Here is a method in FateCommand that I got to compile:

public void failTx(Shell shellState, List txids) throws AccumuloException { shellState.getAccumuloClient().instanceOperations().fateTransactions().stream() .filter(fateTransaction -> txids.contains(fateTransaction.getId().canonical().toString())) .forEach(FateTransaction::fail); }

Some things to note if we go with the fail() method hanging off of FateTransactionImpl. You will only be able to make calls to fail a single TX at a time. And we won't be able to have any checked exceptions thrown from fail(). So the FateTransaction interface method will just be:

/** * Fails the fate transaction. * * @since 2.1.0 */ void fail();

I guess we could have another instance method that takes a Set<FateTransaction> if you want to fail a bunch at once.

Failing a bunch is not uncommon - I'm not sure if the shell allows multiples right now, or not, but one way to handle it was to create a command file passed to the shell that had the individual txids / commands:

fate fail txid1 fate fail txid2 ...

It would be friendly if this could be fate fail txid1 txid2... in the shell, and if the API supported it.

core/src/main/java/org/apache/accumulo/core/client/admin/FateTransactionStatus.java

ctubbsii · 2022-06-16T06:10:38Z

core/src/main/java/org/apache/accumulo/core/client/admin/FateTransactionStatus.java

+  String getTxid();
+
+  /**
+   * @return This fate operations transaction id, in its original long form.
+   */
+  long getTxidLong();


I think these might be a bit backwards. The transaction ID (what should be returned from a getter of the transaction ID) is a long. The String is merely an prettyPrinted encoding. The method that returns the transaction ID should just be called getTxid, and the method that returns the String should be the one that is called something else.

Alternatively, we don't provide any String version, and just add a javadoc that points to Long.toHexString (I think that's what we're using). Or, we could just stop encoding the long in the logs, and just use the number in decimal in the logs.

I agree but I started working on a FateTxId class, that extends AbstractId. Checkout my comment on #2495 (comment). I like the idea of using a long as the ID but I think we really need a class, like we did with TableId. We could do a cache, which won't be that useful yet since nothing is using it, but may be eventually. The only thing I wasn't sure about was extending AbstractId. If we store the internal long then we may not need to extend AbstractId. If we do, we have to convert it to a String every time its constructed.

Yeah, I'm not so sure about using AbstractId either. That class was mainly for things that had a canonical String representation, and very specifically originated as a concrete Table or Namespace ID type... it was never intended as something for all possible identifiers, certainly not to provide a canonical string representation of a numeric id.

What about an AbstractNumericId or AbstractLongId? I feel like there has to be others... Only one I could find was flushId:

// get the flush id before the new memmap is made available for write long flushId; try { flushId = getFlushID();

An abstract numeric ID would certainly be better than trying to reuse the string-based one... but I don't think the need for such a thing is particularly compelling at the moment.

core/src/main/java/org/apache/accumulo/core/client/admin/FateTransactionStatus.java

ctubbsii · 2022-06-16T06:13:44Z

core/src/main/java/org/apache/accumulo/core/client/admin/FateTransactionStatus.java

+  /**
+   * @return The debug info for the operation on the top of the stack for this Fate operation.
+   */
+  String getDebug();


This method name is confusing, because it's not clear what a "debug" is as a noun to "get". Expanding it could help: getDebugInfo or getDebuggingInformation

The way that this is used - it is the command name (set when the transaction is seeded in Fate.seedTransaction ) something like getCommand that clarified that it is the op that created the fate stack.

ctubbsii · 2022-06-16T06:14:05Z

core/src/main/java/org/apache/accumulo/core/client/admin/FateTransactionStatus.java

+  List<String> getHeldLocks();
+
+  /**
+   * @return list of namespace and table ids locked
+   */
+  List<String> getWaitingLocks();


Are these lists or sets?

For some reason the collections for the locks are List types. I am not sure why. The method to retrieve them in AdimUtil creates Maps with Lists:

Map<Long,List<String>> heldLocks, Map<Long,List<String>> waitingLocks

ctubbsii · 2022-06-16T06:15:59Z

core/src/main/java/org/apache/accumulo/core/client/admin/FateTransactionStatus.java

+  /**
+   * @return The operation on the top of the stack for this Fate operation.
+   */
+  String getTop();


This method could be collapsed with getStackInfo into a method that returns an actual stack type (a List or Queue might work in Java), whose top could easily be viewed.

ctubbsii · 2022-06-16T06:17:50Z

core/src/main/java/org/apache/accumulo/core/client/admin/FateTransactionStatus.java

+  /**
+   * @return The timestamp of when the operation was created in ISO format with UTC timezone.
+   */
+  String getTimeCreatedFormatted();
+
+  /**
+   * @return The unformatted form of the timestamp.
+   */
+  long getTimeCreated();


These should just return a Date object or similar. The consumer of this API can format it if they like, using the normal APIs for formatting dates. Only one method is needed to return the date type.

Manno15 · 2022-06-25T15:50:52Z

I am a little behind on following this PR. What else is remaining to get this ready for review?

milleruntime · 2022-06-27T14:37:35Z

I am a little behind on following this PR. What else is remaining to get this ready for review?

It is still a WIP. Working on the new API design.

milleruntime · 2022-06-27T18:57:25Z

@ctubbsii @EdColeman In my latest changes I added another method to instanceOps and a new enum type to assist with the multiple TXs use case:

void executeFateAction(FateAction action, List<FateTxId> transactions)
      throws AccumuloException, AccumuloSecurityException;

I started writing the javadoc for the API and realized we need to provide some background about how to use some of the methods, such as cancel and fail, that can only be used on FateTransactions with a certain status. I thought the name FateAction is more appropriate for the API, instead of "operation", which refers to the FATE REPO. Then changed the Thrift types to reflect this naming. I also created an enum in FateTransaction for Status that directly mirrors the internal enum TStatus in ReadOnlyTStore. Let me know what you think.

EdColeman · 2022-07-14T00:31:29Z

core/src/main/java/org/apache/accumulo/core/client/admin/FateTransaction.java

+   * @return The debug info for the operation on the top of the stack for this Fate operation.
+   */
+  String getDebug();
+


I hope we can come up with a better name / description. I think that this entry is used to (or should) describe the command (target op) that creates the stack. That enables determining the overall command that is executing rather than needing to walk the entire fate stack to find the lowest numbered repo (usually repo_000000) but not sure if that must always be the case. When used with getTop() the status of the current op and the target command can be determined.

milleruntime · 2022-08-30T12:09:34Z

There are a lot of conflicts in this PR with recent changes. I don't want to loose the design discussion here so I am going to close this in favor of a branch with all the changes merged together.

ctubbsii · 2022-08-30T18:32:56Z

There are a lot of conflicts in this PR with recent changes. I don't want to loose the design discussion here so I am going to close this in favor of a branch with all the changes merged together.

Can you link to the new PR here when it exists, so we can follow the conversation to that PR?

milleruntime · 2022-09-02T12:54:12Z

@ctubbsii what do you think of the special methods, fail() and delete() being called on a FateTransaction? These currently require the Manager to be down to free the lock, which I feel is a special circumstance. A user can easily get the Set<FateTransaction> while the Manager is running and call these methods. We could easily throw an error if the Manager is running but it seems like that could trip up users.

ctubbsii · 2022-09-02T16:01:47Z

The Manager is supposed to be off in order to prevent FaTE concurrency issues during a delicate manual failure-handling situation. If we're directly manipulating FaTE by executing these special methods, we're already in a "special circumstance". Manual manipulation is not a routine operation. I think it makes sense to continue to leave the requirement to have the Manager be down to prevent issues. That's the current requirement, and changing it could introduce a lot of unforeseen concurrency issues.

Overall, I'm inclined to scrap the idea of adding a FaTE RPC and API so we can access it from the shell. That's the whole basis for these changes... and it's become bloated with all sorts of issues. The FateAdmin command functionality never should have been rolled into the shell to begin with. Instead, I think it should be merged with the more general purpose Admin command, like some of the other recent utilities, and removed from the shell. That will satisfy the initial issue we were trying to fix, which was removing the shell's dependency on SiteConfiguration. And, it has the advantage of not needing any of these proposed RPC/API changes, or changing anything about the requirements for online/offline servers. But, it does make it more accessible, alongside the other admin utilities for server-side maintenance.

milleruntime · 2022-09-02T16:18:50Z

Overall, I'm inclined to scrap the idea of adding a FaTE RPC and API so we can access it from the shell. That's the whole basis for these changes... and it's become bloated with all sorts of issues.

I have been tinkering with new FaTE RPCs for an API and I am starting to feel the same way. It doesn't really make sense to have the tablet server execute RPCs when we are just messing with ZK. And we can't have RPCs run in the manager that require it to be down. Also, FaTE is internal to the Manager and was never really meant to be exposed at such a high level.

I think it should be merged with the more general purpose Admin command.

I like that idea. It looks like FateAdmin has been deprecated for a few versions. Do you think we could drop it for 2.1?

ctubbsii · 2022-09-02T16:25:42Z

I like that idea. It looks like FateAdmin has been deprecated for a few versions. Do you think we could drop it for 2.1?

No. We can drop it in 3, along with the fate stuff in the shell. But we should deprecate the fate stuff in the shell.

milleruntime · 2022-09-02T16:41:11Z

But we should deprecate the fate stuff in the shell.

What about the new cancel operation?

milleruntime · 2022-09-02T18:27:50Z

I started work to move the new FateCommand options (cancel and summary) added in 2.1 to the Admin command in #2914

ctubbsii · 2022-09-02T22:19:54Z

But we should deprecate the fate stuff in the shell.

What about the new cancel operation?

I seem to remember making the point when that was added, that FaTE wasn't really user-facing. It probably makes more sense to expose that as "cancel scheduled compaction" or cancelling some other scheduled operation. Perhaps a cancel flag can be added to the relevant tasks? Alternatively, just put the new cancel option in the Admin command with fail and delete, and roll back the changes made in the shell.

EdColeman · 2022-09-07T11:20:37Z

Operators are familiar with the current shell fate operations and may be using them more that expected. I'm not opposed to moving it to the admin command, but the same functionality should be preserved. It would also be nice if the shell could print a message that points to the moved command and at least a hint on how to execute it.

milleruntime mentioned this pull request Jun 15, 2022

Explore consistently formatting all ids in Accumulo logs. #2495

Open

ctubbsii reviewed Jun 16, 2022

View reviewed changes

milleruntime force-pushed the fate-command-siteConfig branch from 29adc49 to e1b2c4a Compare June 21, 2022 14:39

This was referenced Jun 21, 2022

Fix Fate print command and improve ShellServerIT test #2755

Merged

Remove the use of SiteConfiguration from FateCommand #2215

Closed

milleruntime force-pushed the fate-command-siteConfig branch from 5e08a64 to d743449 Compare June 27, 2022 18:43

milleruntime added 3 commits June 29, 2022 10:27

New FateOperations API

4533c24

Thrift stuff

731325d

Fixes

45842aa

milleruntime force-pushed the fate-command-siteConfig branch from d743449 to 45842aa Compare June 29, 2022 14:30

EdColeman reviewed Jul 14, 2022

View reviewed changes

milleruntime closed this Aug 30, 2022

milleruntime mentioned this pull request Sep 2, 2022

Deprecate FateCommand #2914

Merged

milleruntime deleted the fate-command-siteConfig branch September 21, 2022 12:55

		List<FateTransactionStatus> fateStatus(Set<String> txids, List<String> tStatus)
		throws AccumuloException;

New FateOperations API #2780

New FateOperations API #2780

Uh oh!

Conversation

milleruntime commented Jun 15, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ctubbsii left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

EdColeman Jun 22, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

EdColeman Jul 25, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Manno15 commented Jun 25, 2022

Uh oh!

milleruntime commented Jun 27, 2022

Uh oh!

milleruntime commented Jun 27, 2022

Uh oh!

EdColeman Jul 14, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

milleruntime commented Aug 30, 2022

Uh oh!

ctubbsii commented Aug 30, 2022

Uh oh!

milleruntime commented Sep 2, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ctubbsii commented Sep 2, 2022

Uh oh!

milleruntime commented Sep 2, 2022

Uh oh!

ctubbsii commented Sep 2, 2022

Uh oh!

milleruntime commented Sep 2, 2022

Uh oh!

milleruntime commented Sep 2, 2022

Uh oh!

ctubbsii commented Sep 2, 2022

Uh oh!

EdColeman commented Sep 7, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

milleruntime commented Jun 15, 2022 •

edited

Loading

EdColeman Jun 22, 2022 •

edited

Loading

EdColeman Jul 25, 2022 •

edited

Loading

EdColeman Jul 14, 2022 •

edited

Loading

milleruntime commented Sep 2, 2022 •

edited

Loading