Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PHOENIX-374: Enable access to dynamic columns in * or cf.* selection #428

Closed
wants to merge 1 commit into from

Conversation

ChinmaySKulkarni
Copy link
Contributor

@ChinmaySKulkarni ChinmaySKulkarni commented Jan 15, 2019

  • Using 1 cell per dyn col for metadata storage. Upsert path done. Select path done. Serializing
    dynamic column PColumn list and sending back in result Cell list done. Combined with
    PhoenixResultSet rowprojector so that dynamic columns can directly be
    used with the help of PhoenixResultSetMetaData. Supporting column family wildcards as well. Added tests.

  • Design Doc: PHOENIX-374 Design Doc

@ChinmaySKulkarni ChinmaySKulkarni force-pushed the PHOENIX-374 branch 11 times, most recently from b6c2a53 to a9951fa Compare January 22, 2019 06:01
@ChinmaySKulkarni
Copy link
Contributor Author

Please review @twdsilva, @gjacoby126, @vincentpoon @kadirozde if you have some time.

Copy link
Contributor

@twdsilva twdsilva left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ChinmaySKulkarni Looks very good so far, I added some comments.

* turned on
*/
@RunWith(Parameterized.class)
public class DynamicColumnWildcardIT extends BaseTest {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice test coverage!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

return new RowProjector(projectedColumns, Math.max(estimatedKeySize, estimatedByteSize), isProjectEmptyKeyValue, resolver.hasUDFs(), isWildcard);
return new RowProjector(projectedColumns, Math.max(estimatedKeySize, estimatedByteSize),
isProjectEmptyKeyValue, resolver.hasUDFs(), isWildcard,
isWildcard || !projectedFamilies.isEmpty());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you need to include the isWildcard boolean check?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should actually be the wildcardIncludesDynamicCols check. Will change it.

* QueryServices.WILDCARD_QUERY_DYNAMIC_COLS_ATTRIB
* See PHOENIX-374
*/
public class DynamicColumnWildcard implements RegionObserver, RegionCoprocessor {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you just use ScanRegionObserver instead of creating a new coprocessor?

Copy link
Contributor Author

@ChinmaySKulkarni ChinmaySKulkarni Jan 29, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@twdsilva One problem with this approach is that now the preBatchMutate will be called in every case, irrespective of the client-side config to enable dynamic column data in wildcard queries. With the current implementation, even if this config is off (and thus there are no attributes set on any of the mutations), we will still have to:

  • In the worst case (all Puts), iterate over (Number of mutations in miniBatchOp) * (Number of column families in each mutation), though ultimately it will be a no-op.
  • In the best case (all Deletes), iterate over (Number of mutations in miniBatchOp), though ultimately it will be a no-op.

At best, I can set an additional attribute on the Put if we don't want to store any dynamic column metadata and make sure that if the config is off, we will always only iterate over (Number of mutations in miniBatchOp) and do no-op. There is still the additional iterations over all the mutations though. Is there a way to pass client-side configs to the server-side via the miniBatchOp? If so, I can completely short-circuit this logic in case the config is off.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Went ahead with with the additional attribute to avoid unnecessary processing in case the config is off or there are no dynamic columns. Please take a look now @twdsilva

// Scan attribute that is set in case we want to project dynamic columns
public static final String WILDCARD_SCAN_INCLUDES_DYNAMIC_COLUMNS =
"_WildcardScanIncludesDynCols";
public static final byte[] DYN_COLS_METADATA_CELL_QUALIFIER = Bytes.toBytes("#Dyn_");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you change this to something shorter to save space maybe "D#" ?


boolean wildcardIncludesDynamicCols = props.getBoolean(
WILDCARD_QUERY_DYNAMIC_COLS_ATTRIB, DEFAULT_WILDCARD_QUERY_DYNAMIC_COLS_ATTRIB);
// We register the coprocessor for dynamic columns to be exposed when issuing wildcard
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need need to always load this coprocessor, since you could have a mix of clients that have this config turned on/off.

.getDynamicColumnsList();
for (PTableProtos.PColumn dynColProto : dynColsInThisFam) {
// Add a column for this dynamic column to the metadata Put operation
((Put)putDynColMetaData.get(0)).addColumn(fam,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can you create a local variable for the put and then call miniBatchOp.addOperationsFromCP if its not null

putDynColMetaData.add(new Put(m.getRow()));
NavigableMap<byte[], List<Cell>> famCellMap = m.getFamilyCellMap();
for (byte[] fam : famCellMap.keySet()) {
byte[] serializedDynColsList = m.getAttribute(Bytes.toString(fam));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the attribute that store the serialized dynamic pcolumns is named the same as the family name?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. The Put's attribute key is the column family name and the value is the serialized list of dynamic columns in that column family

@ChinmaySKulkarni
Copy link
Contributor Author

ChinmaySKulkarni commented Feb 4, 2019

@twdsilva Addressed review comments. Please review. Once it looks good, I'll squash and rebase with master. Thanks!

@twdsilva
Copy link
Contributor

twdsilva commented Feb 4, 2019

@ChinmaySKulkarni +1, I just had some minor feedback that you can fix before committing.


public PRowImpl(KeyValueBuilder kvBuilder, ImmutableBytesWritable key, long ts, Integer bucketNum, boolean hasOnDupKey) {
PRowImpl(KeyValueBuilder kvBuilder, ImmutableBytesWritable key, long ts, Integer bucketNum, boolean hasOnDupKey) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Why the change from public to protector?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gokceni This method does not require public scope hence I changed it to package-private

Mutation m = miniBatchOp.getOperation(i);
// There is at max 1 extra Put (for dynamic column shadow cells) per original Put
Put dynColShadowCellsPut = null;
if (m instanceof Put && !Bytes.equals(m.getAttribute(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Older clients won't have this attribute set on the mutation and it will enter this loop, IMO its better to invert this flag.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!

ByteArrayOutputStream qual = new ByteArrayOutputStream();
qual.write(DYN_COLS_METADATA_CELL_QUALIFIER);
qual.write(dynCol.getColumnQualifierBytes());
if (LOG.isDebugEnabled()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might end up being a lot of logs, change it to the TRACE level.

Put dynColShadowCellsPut = null;
if (m instanceof Put && !Bytes.equals(m.getAttribute(
NO_DYNAMIC_COLUMN_METADATA_STORED_FOR_MUTATION), TRUE_BYTES)) {
NavigableMap<byte[], List<Cell>> famCellMap = m.getFamilyCellMap();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a debug log line here?

@ChinmaySKulkarni ChinmaySKulkarni force-pushed the PHOENIX-374 branch 2 times, most recently from 27a5f0d to 176b059 Compare February 5, 2019 02:26
@ChinmaySKulkarni
Copy link
Contributor Author

Committed to master, 4.x-HBase-1.3 and 4.x-HBase-1.4. Thanks for the review @twdsilva and @gokceni !

@ChinmaySKulkarni ChinmaySKulkarni deleted the PHOENIX-374 branch February 7, 2019 23:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants