Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PHOENIX-6627 remove tephra references - WIP #1455

Closed
wants to merge 1 commit into from

Conversation

zsgyulavari
Copy link

No description provided.

@zsgyulavari zsgyulavari changed the title WIP: PHOENIX-6627 remove tephra references PHOENIX-6627 remove tephra references - WIP Jun 20, 2022
@zsgyulavari zsgyulavari marked this pull request as draft June 20, 2022 09:32
@gjacoby126
Copy link
Contributor

Thanks for taking this up, @zsgyulavari ! Please let me know if there's anything I can do to help -- it would be great if we could get this into 5.2 which is coming soon.

On the first pass it looks like there are 151 test failures, but most of them seem to be from the same problem in TransactionFactory$Provider's fromCode method. On line 69 it's making an assumption that the length of the Provider enum is equal to the largest Provider enum value (that is, there are 2 providers, and the highest provider is enum value 2.) That's not true anymore now that the Tephra provider is gone, so the logic has to change.

@gjacoby126
Copy link
Contributor

Also, what will happen to user tables who upgrade from Phoenix 5.1 -> 5.2 and have the Provider set to Tephra? Would that cause the region to crash from trying to load an enum value that doesn't exist? If so, what sort of upgrade logic should we include? I can think of three options:

  1. Switch tables to non-transactional
  2. Switch tables to Omid
  3. Keep the Tephra enum and just WARN or ERROR on region load that it doesn't do anything

1 and 2 are both tricky to coordinate on a rolling restart, because the normal upgrade logic runs when SYSTEM.CATALOG is upgraded, but the crashing would probably come before that, when a region hosting a TEPHRA-enabled table has its region server upgraded. But I worry that some operators would miss the warning in their logs.

@stoty @jpisaac @kadirozde @virajjasani , curious on your opinions.

@stoty
Copy link
Contributor

stoty commented Jun 21, 2022

We could just fail all operations on the affected tables.
While drastic, this would make sure that the tables fail early, and there's no data corruption.

Has anyone ever seen a live Tephra user in production ?

@gjacoby126
Copy link
Contributor

@stoty - so long as the region comes online, I like the idea of the coproc just sending a clear, non-retriable exception if tephra is enabled on a region. (If the region crashes, then there's a risk of it taking down the entire cluster as the HMaster keeps trying to bring it up on other RSs, killing each one.)

And no, I've never seen a live Tephra use case in production.

@@ -987,10 +987,8 @@ public void testDivergedViewsStayDiverged() throws Exception {
}

@Test
@Ignore("tephra dependent test case")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of @Ignore-ing this test, we could just change it to test for OmidTransactionaProcessor instead.

@@ -137,9 +137,7 @@ public static synchronized Collection<Object[]> data() {
Arrays.asList(new Object[][] {
{ false, false, null, false }, { false, false, null, true },
{ false, true, "OMID", false },
{ false, true, "TEPHRA", false }, { false, true, "TEPHRA", true },
{ true, false, null, false }, { true, false, null, true },
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we have a single combination that uses OMID, and four that uses Tephra (probably to save on execution time).
We should still have remove the Omid one, and change the Tephra ones to Omid instead (i.e., instead of five transactional combinations, we should have four)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correction:
Those seem to be unsupported operations for Omid, so this is fine.

@@ -305,9 +304,9 @@ public void testNonTxToTxTable() throws Exception {
}

htable = conn.unwrap(PhoenixConnection.class).getQueryServices().getTable(Bytes.toBytes( nonTxTableName));
assertTrue(htable.getDescriptor().getCoprocessors().contains(TephraTransactionalProcessor.class.getName()));
//assertTrue(htable.getDescriptor().getCoprocessors().contains(TephraTransactionalProcessor.class.getName()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again, check for the Omid coprocessor instead.

@@ -1255,7 +1255,7 @@ private PTable getTableFromCells(List<Cell> tableCellList, List<List<Cell>> allC
transactionalKv.getValueOffset(),
transactionalKv.getValueLength()))) {
// For backward compat, prior to client setting TRANSACTION_PROVIDER
transactionProvider = TransactionFactory.Provider.TEPHRA;
transactionProvider = TransactionFactory.Provider.OMID;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a behaviour change for old clients that only expect Tephra to exist.
Maybe we should just error out in this case, as the expected functionality has been removed.

I am assuming from the comment the the Omid aware clients always set the Provider, but haven't checked it.

@@ -1867,7 +1867,7 @@ public static PTable createFromProto(PTableProtos.PTable table) {
transactionProvider = TransactionFactory.Provider.fromCode(table.getTransactionProvider());
} else if (table.hasTransactional()) {
// For backward compatibility prior to transactionProvider field
transactionProvider = TransactionFactory.Provider.TEPHRA;
transactionProvider = TransactionFactory.Provider.OMID;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should consider erroring out in this case as well.

TEPHRA((byte)1, tephraTransactionProvider,
tephraTransactionProvider instanceof TephraTransactionProvider),
OMID((byte)2, OmidTransactionProvider.getInstance(), true);
OMID((byte)2, OmidTransactionProvider.getInstance());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This changes the ordinal, which SHOULD not be a real problem as we don't use that for serialization, but keeping TEPHRA, and just renaming it to something like TEPHRA_REMOVED would possibly also make it easier to handle the hairy cases where tephra defined for old tables.

@@ -94,7 +66,7 @@ public static PhoenixTransactionProvider getTransactionProvider(byte[] txState,
return null;
}
Provider provider = (clientVersion < MetaDataProtocol.MIN_SYSTEM_TABLE_TIMESTAMP_4_14_0)
? Provider.TEPHRA
? Provider.OMID
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, should we just error out ?

}
return filteredData;
return data;
//TODO: remove this method altogether
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it shouldn't be too difficult.

@@ -60,7 +60,7 @@ public PhoenixTransactionClient getTransactionClient(Configuration config, Conne

@Override
public Provider getProvider() {
return TransactionFactory.Provider.TEPHRA;
return TransactionFactory.Provider.OMID;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not right.
We should either return null, or the UnavailableProvider's type, if we keep that,

@apurtell
Copy link
Contributor

I'm going to close this PR but we can reopen it if the alternative PR I am about to open is not acceptable for some reason.

@apurtell apurtell closed this Aug 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants