Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PHOENIX-3534 Support multi region SYSTEM.CATALOG table #303

Closed
wants to merge 81 commits into from

Conversation

twdsilva
Copy link
Contributor

This patch adds two new LinkTypes EXCLUDED_COLUMNS (used to represent a column that has been dropped) and VIEW_INDEX_PARENT_TABLE (used to link an index on a view to its parent). Views and view indexes no longer store columns derived from their ancestors in their metadata. When they are resolved the ancestors are looked up and added to the PTable that is returned to the client (see combineColumns in MetadataEndpointImpl). The PTable in the server side metadata cache only stores the columns created in the view/view index and not derived columns.
We do not propagate metadata changes made to a parent to all its children. While adding columns to a base table, we no longer lock all the children in the view hierarchy, we only validate that the columns being added does not conflict with an existing base table column. We also don't lock children while dropping a parent table column. When dropping a parent column we also drop any view indexes that need the column. This patch does not handle the case when there are concurrent changes (eg. adding a conflicting column or creating a new view index that requires a parent column that is being dropped). That will be handled in a follow-up patch.
While dropping a parent table, we don't drop all the child views metadata. This metadata needs to be cleaned-up (maybe at compaction time?) which will be handled in a follow-up patch.

There are a few test failures I am working through, which I will fix soon and update the PR.
@JamesRTaylor can you please review?

FYI @karanmehta93 @ChinmaySKulkarni

rgidwani and others added 30 commits March 17, 2017 16:14
… additive case where we take lower timestamp
Conflicts:
	phoenix-core/src/main/java/org/apache/phoenix/coprocessor/MetaDataEndpointImpl.java
	phoenix-core/src/main/java/org/apache/phoenix/coprocessor/WhereConstantParser.java
	phoenix-core/src/test/java/org/apache/phoenix/coprocessor/MetaDataEndpointImplTest.java
//TODO: when Phoenix drops support for pre-1.3 versions of HBase, redo as a WALCellFilter
if (!SchemaUtil.isMetaTable(entry.getKey().getTablename().getName())){
byte[] tableName = entry.getKey().getTablename().getName();
if (!SchemaUtil.isMetaTable(tableName) && !SchemaUtil.isChildLinkTable(tableName)){
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be safe to turn on normal HBase replication on the new System.CHILD_LINK? (That is, is there any unwanted data in System.CHILD_LINK that this WALFilter wouldn't copy that normal HBase replication would?)

If normal HBase replication works for System.CHILD_LINK, and all view data left in System.Catalog starts with tenant_id, then the logic here can be greatly simplified, similar to how it was before PHOENIX-4229

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SYSTEM.CHILD_LINK contains the parent->child linking rows and cells we use to detect race conditions (eg a column of conflicting type being added at the same time to a parent and child).
The latter cells are written with a short TTL.
I think we can use HBase replication for SYSTEM.CHILD_LINK. All the tenant specific view metadata rows in SYSTEM.CATALOG start with tenant id.
I will modify this filter to how it was before PHOENIX-4229.
@gjacoby126 Thanks for the suggestion.

@JamesRTaylor
Copy link
Contributor

+1 to the patch. Great work @twdsilva and @Churrodog! I made some minor comments for some potential follow up work and had a few questions, but let's get this committed first. I'd recommend the following priority for the next JIRA as:

  1. Move views to their own table
  2. Get rid of client side code that is sending the base columns
  3. Fix corner case/race condition issues
  4. Add code that doesn't write orphaned metadata on major compaction

PColumn pColumn = myColumns.get(i);
if (pColumn.isExcluded()) {
excludedColumns.add(pColumn);
} else if (!pColumn.equals(SaltingUtil.SALTING_COLUMN)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of matching on SALTING_COLUMN, we should stop the loop at 1 above if table.getSaltBuckets() != null. The code never filters the salt column based on it's name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will change this.

// add all pk columns of parent tables to indexes
for (PColumn column : pTable.getPKColumns()) {
// don't add the salt column of ancestor tables for view indexes
if (column.equals(SaltingUtil.SALTING_COLUMN) || column.isExcluded()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as before - we should be able to match based on the column being the first column. Note that the salt column would only be in the base physical table.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will change this.

try {
int clientVersion = request.getClientVersion();
List<Mutation> tableMetadata = ProtobufUtil.getMutations(request);
MetaDataUtil.getTenantIdAndSchemaAndTableName(tableMetadata, rowKeyMetaData);
byte[] tenantIdBytes = rowKeyMetaData[PhoenixDatabaseMetaData.TENANT_ID_INDEX];
schemaName = rowKeyMetaData[PhoenixDatabaseMetaData.SCHEMA_NAME_INDEX];
tableName = rowKeyMetaData[PhoenixDatabaseMetaData.TABLE_NAME_INDEX];
fullTableName = SchemaUtil.getTableName(schemaName, tableName);
// TODO before creating a table we need to see if the table was previously created and then dropped
// and clean up any parent->child links or child views
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove TODO as isn't this done now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

// been cleaned up by compaction
if (!Bytes.toString(schemaName).equals(QueryConstants.SYSTEM_SCHEMA_NAME)) {
dropChildMetadata(schemaName, tableName, tenantIdBytes);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor - indentation issue here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

/*
* For a mapped view, there is no link present to the physical table. So the
* viewPhysicalTableRow is null in that case.
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix indentation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@@ -447,7 +447,7 @@
static {
Collections.sort(FUNCTION_KV_COLUMNS, KeyValue.COMPARATOR);
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be good to include a class level comment that explains the overall approach at a high level.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I modified the class level comment.

this(propertyName, colFamSpecifiedException, isMutable, mutatingException, isValidOnView, isMutableOnView, true);
}

private TableProperty(String propertyName, SQLExceptionCode colFamSpecifiedException, boolean isMutable, SQLExceptionCode mutatingException, boolean isValidOnView, boolean isMutableOnView, boolean propagateToViews) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How did you end up dealing with table property conflicts between parent and children? Is there follow up work required? Can we use the timestamp of the Cell storing the property to differentiate similar to the logic for columns? It's fine to do this work in a follow up JIRA.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I filed PHOENIX-4763 to fix this. We should be able to use the cell timestamp to differentiate, still need to figure out how to expose this since its the properties in PTable don't currently expose the timestamp.

// 3. Finally write the mutations to create the table

// From 4.15 the parent->child links are stored in a separate table SYSTEM.CHILD_LINK
List<Mutation> childLinkMutations = MetaDataUtil.removeChildLinks(tableMetadata);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO to remove this code in 4.16.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I filed PHOENIX-4810 and added a comment to reference this jira.



// if the view is a first level child, then we need to create the PARENT_TABLE link
// that was overwritten by the PHYSICAL_TABLE link
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, good. So we'll be consistent with the parent link now, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this will make it so that the parent link row will be created correctly when upgrading tables to be namespace mapped.

}

@Test
public void testRecreateDroppedTableWithChildViews() throws Exception {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These new tests are good. These are testing that the left over metadata doesn't impact the re-creation of a table since we don't make the RPC to delete views when a base table is dropped, right? Do you think there'd be any issues if part of the rows for a view were there (i.e. say that the create view failed, but some of the rows were written)? Might be good to have a test like this - you could set it up by using HBase APIs to manually delete some rows of a view.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We write the parent->child link first, then if the table uses column encoding we update the encoded column qualifiers on the parent table, and finally use mutateRowsWithLocks to write the view metadata atomically.
We ignore views that can't be found (in case writing the child view metadata fails).
If the metadata write fails and the table uses column encoding then we will lose a few column qualifiers.
I'll add a test for this.

@twdsilva
Copy link
Contributor Author

@JamesRTaylor Thanks for the feedback, I have updated the PR. I will get this committed shortly.

@twdsilva twdsilva closed this Jul 19, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants