-
Notifications
You must be signed in to change notification settings - Fork 994
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PHOENIX-374: Enable access to dynamic columns in * or cf.* selection #428
Conversation
b6c2a53
to
a9951fa
Compare
Please review @twdsilva, @gjacoby126, @vincentpoon @kadirozde if you have some time. |
a9951fa
to
bbb31be
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ChinmaySKulkarni Looks very good so far, I added some comments.
* turned on | ||
*/ | ||
@RunWith(Parameterized.class) | ||
public class DynamicColumnWildcardIT extends BaseTest { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice test coverage!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
return new RowProjector(projectedColumns, Math.max(estimatedKeySize, estimatedByteSize), isProjectEmptyKeyValue, resolver.hasUDFs(), isWildcard); | ||
return new RowProjector(projectedColumns, Math.max(estimatedKeySize, estimatedByteSize), | ||
isProjectEmptyKeyValue, resolver.hasUDFs(), isWildcard, | ||
isWildcard || !projectedFamilies.isEmpty()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you need to include the isWildcard boolean check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should actually be the wildcardIncludesDynamicCols
check. Will change it.
* QueryServices.WILDCARD_QUERY_DYNAMIC_COLS_ATTRIB | ||
* See PHOENIX-374 | ||
*/ | ||
public class DynamicColumnWildcard implements RegionObserver, RegionCoprocessor { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you just use ScanRegionObserver instead of creating a new coprocessor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@twdsilva One problem with this approach is that now the preBatchMutate
will be called in every case, irrespective of the client-side config to enable dynamic column data in wildcard queries. With the current implementation, even if this config is off (and thus there are no attributes set on any of the mutations), we will still have to:
- In the worst case (all Puts), iterate over (Number of mutations in
miniBatchOp
) * (Number of column families in each mutation), though ultimately it will be a no-op. - In the best case (all Deletes), iterate over (Number of mutations in
miniBatchOp
), though ultimately it will be a no-op.
At best, I can set an additional attribute on the Put if we don't want to store any dynamic column metadata and make sure that if the config is off, we will always only iterate over (Number of mutations in miniBatchOp
) and do no-op. There is still the additional iterations over all the mutations though. Is there a way to pass client-side configs to the server-side via the miniBatchOp
? If so, I can completely short-circuit this logic in case the config is off.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Went ahead with with the additional attribute to avoid unnecessary processing in case the config is off or there are no dynamic columns. Please take a look now @twdsilva
// Scan attribute that is set in case we want to project dynamic columns | ||
public static final String WILDCARD_SCAN_INCLUDES_DYNAMIC_COLUMNS = | ||
"_WildcardScanIncludesDynCols"; | ||
public static final byte[] DYN_COLS_METADATA_CELL_QUALIFIER = Bytes.toBytes("#Dyn_"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you change this to something shorter to save space maybe "D#" ?
|
||
boolean wildcardIncludesDynamicCols = props.getBoolean( | ||
WILDCARD_QUERY_DYNAMIC_COLS_ATTRIB, DEFAULT_WILDCARD_QUERY_DYNAMIC_COLS_ATTRIB); | ||
// We register the coprocessor for dynamic columns to be exposed when issuing wildcard |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need need to always load this coprocessor, since you could have a mix of clients that have this config turned on/off.
.getDynamicColumnsList(); | ||
for (PTableProtos.PColumn dynColProto : dynColsInThisFam) { | ||
// Add a column for this dynamic column to the metadata Put operation | ||
((Put)putDynColMetaData.get(0)).addColumn(fam, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can you create a local variable for the put and then call miniBatchOp.addOperationsFromCP
if its not null
putDynColMetaData.add(new Put(m.getRow())); | ||
NavigableMap<byte[], List<Cell>> famCellMap = m.getFamilyCellMap(); | ||
for (byte[] fam : famCellMap.keySet()) { | ||
byte[] serializedDynColsList = m.getAttribute(Bytes.toString(fam)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the attribute that store the serialized dynamic pcolumns is named the same as the family name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. The Put's attribute key is the column family name and the value is the serialized list of dynamic columns in that column family
@twdsilva Addressed review comments. Please review. Once it looks good, I'll squash and rebase with master. Thanks! |
@ChinmaySKulkarni +1, I just had some minor feedback that you can fix before committing. |
|
||
public PRowImpl(KeyValueBuilder kvBuilder, ImmutableBytesWritable key, long ts, Integer bucketNum, boolean hasOnDupKey) { | ||
PRowImpl(KeyValueBuilder kvBuilder, ImmutableBytesWritable key, long ts, Integer bucketNum, boolean hasOnDupKey) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: Why the change from public to protector?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gokceni This method does not require public scope hence I changed it to package-private
Mutation m = miniBatchOp.getOperation(i); | ||
// There is at max 1 extra Put (for dynamic column shadow cells) per original Put | ||
Put dynColShadowCellsPut = null; | ||
if (m instanceof Put && !Bytes.equals(m.getAttribute( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Older clients won't have this attribute set on the mutation and it will enter this loop, IMO its better to invert this flag.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch!
ByteArrayOutputStream qual = new ByteArrayOutputStream(); | ||
qual.write(DYN_COLS_METADATA_CELL_QUALIFIER); | ||
qual.write(dynCol.getColumnQualifierBytes()); | ||
if (LOG.isDebugEnabled()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might end up being a lot of logs, change it to the TRACE level.
Put dynColShadowCellsPut = null; | ||
if (m instanceof Put && !Bytes.equals(m.getAttribute( | ||
NO_DYNAMIC_COLUMN_METADATA_STORED_FOR_MUTATION), TRUE_BYTES)) { | ||
NavigableMap<byte[], List<Cell>> famCellMap = m.getFamilyCellMap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a debug log line here?
27a5f0d
to
176b059
Compare
176b059
to
376775e
Compare
Using 1 cell per dyn col for metadata storage. Upsert path done. Select path done. Serializing
dynamic column PColumn list and sending back in result Cell list done. Combined with
PhoenixResultSet rowprojector so that dynamic columns can directly be
used with the help of PhoenixResultSetMetaData. Supporting column family wildcards as well. Added tests.
Design Doc: PHOENIX-374 Design Doc