APEXMALHAR-2515 Operator maturity - HBase output operator Multi Table feature. #638

prasannapramod · 2017-06-29T20:25:49Z

Implemented HBase output operator multi-table insertion feature.

@venkateshkottapalli @tushargosavi @sanjaypujare @PramodSSImmaneni please see.

sanjaypujare · 2017-06-30T03:40:14Z

contrib/src/main/java/com/datatorrent/contrib/hbase/AbstractHBaseAppendOutputOperator.java

 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;

+import org.apache.hadoop.hbase.client.Append;


Why did the import order change? Is the new order correct as per the import-order guideline or was the old one better?

In apex checkstyle, org is listed before org.apache. IDE ordered it accordingly. Maybe the older imports were before the checkstyle configuration was created.

Hmmm, does it mean both the old and new orders are checkstyle compliant?

sanjaypujare · 2017-06-30T17:25:13Z

contrib/src/main/java/com/datatorrent/contrib/hbase/OutputAdapter.java

+  private static final Logger logger = LoggerFactory.getLogger(OutputAdapter.class);
+
+  HBaseStore store;
+  OutputAdapter.Driver driver;


You don't need the qualifier OutputDriver. since Driver is local to this class.

Also can we get rid of warnings like the below:

OutputAdapter.Driver is a raw type. References to generic type OutputAdapter.Driver should be parameterized

sanjaypujare · 2017-06-30T22:32:49Z

contrib/src/main/java/com/datatorrent/contrib/hbase/OutputAdapter.java

+    }
+  }
+
+  interface Driver<T>


The Driver always contains the store since Driver is implemented by most HBase operators you have modified. Why can't we make this more modular/OO by encapsulating the store also in the Driver? In this case Driver becomes an abstract class as follows:

abstract class Driver<T> { HBaseStore store; Driver(HBaseStore store) { this.store = store; } void processTuple(T tuple) { String tableName = getTableName(tuple); HTable table = store.getTable(tableName); if (table == null) { logger.debug("No table found for tuple {}", tuple); errorTuple(tuple); return; } processTuple(tuple, table); } void processTuple(T tuple, HTable table); String getTableName(T tuple); void errorTuple(T tuple); }

Then OutputAdapter.processTuple(T tuple) simply calls driver.processTuple(T tuple) as the logic has moved into the latter (and belongs there)? This also makes it easy to implement the batch/window modes that currently have been removed from those classes.

The problem is AbstractHBaseOutputOperator series and AbstractHBaseWindowOutputOperator series operators belong to two different hierarchies that originate from generic store operators which are a common framework across different connectors such as jdbc, couch, memcache etc. The encapsulation was a way to reuse commonly needed functionality without sacrificing the hierarchy or backward compatibility.

sanjaypujare · 2017-06-30T22:33:48Z

contrib/src/main/java/com/datatorrent/contrib/hbase/OutputAdapter.java

+    this.driver = driver;
+  }
+
+  public void processTuple(T tuple)


See comment below for Driver<T>

sanjaypujare · 2017-06-30T22:53:45Z

contrib/src/main/java/com/datatorrent/contrib/hbase/AbstractHBaseOutputOperator.java

+import com.datatorrent.lib.db.AbstractStoreOutputOperator;
+
+/**
+ * Created by lakshmi on 6/27/17.


More Javadoc to describe this class?

I don't see any Javadoc for this class yet

sanjaypujare · 2017-06-30T23:05:13Z