Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PHOENIX-6720 "create table" can't recreate column encoded tables that had columns dropped #1553

Closed
wants to merge 14 commits into from

Conversation

Aarchy
Copy link
Contributor

@Aarchy Aarchy commented Jan 23, 2023

No description provided.

phoenix-core/src/main/antlr3/PhoenixSQL.g Outdated Show resolved Hide resolved
@@ -704,19 +707,31 @@ column_defs returns [List<ColumnDef> ret]
: v = column_def {$ret.add(v);} (COMMA v = column_def {$ret.add(v);} )*
;

column_qualifier_counters returns [Map<String, Integer> ret]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:
Call we call this something like "initializiation_list" instead ?
We may want to use this for other cases later.

indexes returns [List<NamedNode> ret]
@init{ret = new ArrayList<NamedNode>(); }
: v = index_name {$ret.add(v);} (COMMA v = index_name {$ret.add(v);} )*
;

column_def returns [ColumnDef ret]
: c=column_name dt=identifier (LPAREN l=NUMBER (COMMA s=NUMBER)? RPAREN)? ar=ARRAY? (lsq=LSQUARE (a=NUMBER)? RSQUARE)? (nn=NOT? n=NULL)? (DEFAULT df=expression)? (pk=PRIMARY KEY (order=ASC|order=DESC)? rr=ROW_TIMESTAMP?)?
{ $ret = factory.columnDef(c, dt, ar != null || lsq != null, a == null ? null : Integer.parseInt( a.getText() ), nn!=null ? Boolean.FALSE : n!=null ? Boolean.TRUE : null,
: c=column_name dt=identifier (LPAREN l=NUMBER (COMMA s=NUMBER)? RPAREN)? ar=ARRAY? (lsq=LSQUARE (a=NUMBER)? RSQUARE)? (nn=NOT? n=NULL)? (DEFAULT df=expression)? ((pk=PRIMARY KEY (order=ASC|order=DESC)? rr=ROW_TIMESTAMP?)|(COLUMN_QUALIFIER_ID cq=NUMBER))?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My original intent was to use the encoded binary literal for the CQ here.
Did you have a specific reason to go with the unencoded integer here ?

nit: I'd prefer POSINT instead of number, even if they are aliases, as it is more expressive.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What re the advantages of using encoded binary literals?
I believe these checks are more readable this way:
cq.getValue() < QueryConstants.ENCODED_CQ_COUNTER_INITIAL_VALUE

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use that internal representation independent of the user-facing format.

I've just verified, that the "COLUMN_QUALIFIER" syscat column always stores the qualifier, whether we use encoded qualifiers, or unencoded ones.

On the other hand, there is no way to specify the unencoded COLUMN_QUALIFER directly, so that does not rule out using the integer values here.

We should make sure to use a name that makes it clear this is only for encoded qualifiers, like
ENCODED_QUALIFER.

TableName baseTableName, ParseNode whereClause, int bindCount, Boolean immutableRows) {
List<ParseNode> splitNodes, PTableType tableType, boolean ifNotExists,
TableName baseTableName, ParseNode whereClause, int bindCount, Boolean immutableRows,
Map<String, Integer> familyCounters) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure String is the good choice here.

The hbase API uses byte[] for families.
It is possible that we already limit the usable families to valid Strings in Phoenix, but we should double check this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ColumnDef class stores family names as NamedNode, which is a String and a boolean flag for case sensitivity. So using String here was a problem, but not because hbase, rather because the case sensitive cases.

Family names for the counters are now stored as NamedNode, also few new tests have been added.

Thank you for pointing this out.

if (colDef.getColumnQualifier() != null && encodingScheme != NON_ENCODED_QUALIFIERS) {
if (cqCounter.getNextQualifier(cqCounterFamily) > ENCODED_CQ_COUNTER_INITIAL_VALUE &&
!inputCqCounters.containsKey(cqCounterFamily)) {
throw new SQLExceptionInfo.Builder(SQLExceptionCode.MISSING_CQ)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we require a CQ counter if we specify the CQ for column.

Is this really necessary ?
Can't we calculate the new counter value in this case ?

Copy link
Contributor Author

@Aarchy Aarchy Jan 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not quite the case here. This checks validates if every column has its qualifier. eg.
COL1 INTEGER, COL2 INTEGER COLUMN_QUALIFIER 13
In this case I don't think we can really figure out what the qualifier of COL1 should be. (it could be 11, 12)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK


if (statement.getFamilyCQCounters() == null ||
statement.getFamilyCQCounters().get(cqCounterFamily) == null) {
cqCounter.setValue(cqCounterFamily, colDef.getColumnQualifier());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we already check that the CQs are increasing ?
Looks like this would fail if that's not true.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, thank you.

@@ -2614,7 +2628,47 @@ else if (!SchemaUtil.isSystemTable(Bytes.toBytes(SchemaUtil.getTableName(schemaN
}
// Use position as column qualifier if APPEND_ONLY_SCHEMA to prevent gaps in
// the column encoding (PHOENIX-4737).
Integer encodedCQ = isPkColumn ? null : isAppendOnlySchema ? Integer.valueOf(ENCODED_CQ_COUNTER_INITIAL_VALUE + position) : cqCounter.getNextQualifier(cqCounterFamily);
Integer encodedCQ = null;
if (!isPkColumn) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use curly braces everywhere, it's more readable, and the validators alse require it.

@@ -396,6 +400,35 @@ private TableDescriptor getTableDescriptor(ConnectionQueryServices cqsi, PTable
}
}

private String convertColumnQualifiersToString(PTable table) {
StringBuilder cqBuilder = new StringBuilder();
if (shouldGenerateWithDefaults)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I won't add more comments, but make sure you don't use ifs without braces anywhere.

@@ -396,6 +400,35 @@ private TableDescriptor getTableDescriptor(ConnectionQueryServices cqsi, PTable
}
}

private String convertColumnQualifiersToString(PTable table) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be convertColumnQualifierCountersToString

@stoty
Copy link
Contributor

stoty commented Jan 25, 2023

We already have the DEFAULT_COLUMN_FAMILY property, where we specify column families in a table property.

According to the grammar page:

DEFAULT_COLUMN_FAMILY string option determines the column family used used when none is specified. The value is case sensitive. If this option is not present, a column family name of '0' is used.

So we should stick to the same semantics, and not do the "normalize to uppercase when not double quoted" stuff here.

We could argue that your current solution is more correct, but we should have a single way of handling families (at least in table properties)

@stoty
Copy link
Contributor

stoty commented Jan 25, 2023

i.e.
COLUMN_QUALIFIER_COUNTER ('A'=14, 'B'=13, 'a'=5)

Of course it's also worth checking the the docs I quoted above is correct.

Copy link
Contributor

@stoty stoty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, I have one nit with a parameter/variable naming.

public ColumnDef columnDef(ColumnName columnDefName, String sqlTypeName,
boolean isArray, Integer arrSize, Boolean isNull,
Integer maxLength, Integer scale, boolean isPK,
SortOrder sortOrder, String expressionStr, Integer cq, boolean isRowTimestamp) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer 'encodedQualifier' or instead of 'cq' as a the variable name.

Copy link
Contributor

@stoty stoty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 LGTM

@stoty stoty closed this Jan 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants