kafka-connect-hdfs upon thrift server,instead of hive metastore #116

lakeofsand · 2016-09-14T03:39:55Z

In some spark cluster,there will no hive metastore deployed, but only a thrift server upon spark engine.
We should consider to support kafka-connect-hdfs in this scenario.

I try to modify locally,with not so much change,it works well.
(but so far,schema change is a litter difficult.)

cotedm · 2017-01-09T15:03:33Z

@lakeofsand I believe this enhancement proposal is now obsolete given that we have the JDBC Sink Connector that can do this directly. Feel free to reopen if you are talking about something other than the thrift server for Spark

lakeofsand · 2017-01-10T01:50:08Z

It is not exactly same with "JDBC Sink connector".
In "HDFS sink connector", we also need a hive metastore service for sync-with-hive when new partition's data come in.

It need support to sync-with-hive with spark thrift server,not hive metastore service.

cotedm · 2017-01-10T15:16:13Z

@lakeofsand the spark thrift server is akin to the hiveserver2 implementation and as such has no state to sync http://spark.apache.org/docs/latest/sql-programming-guide.html#running-the-thrift-jdbcodbc-server

I'm not sure what the current implementation is lacking, but if you can lay out an example then that would be helpful.

lakeofsand · 2017-01-11T02:25:13Z

Sorry for my poor explanation...

Let say in this way:

Now 'kafka-connect-hdfs' use class 'HiveMetastore' to do hive actions, for example add partitions when new data come in. It relys on 'org.apache.hadoop.hive.metastore.*',and need a hive metastore service in the cluster.

In our spark 1.6 cluster, there is no hive metastore service. We need deploy a new one just for 'kafka-connect-hdfs'. That's unworthy and heavily.

So we add a thin implement 'Hive2Thrift' just upon "java.sql.", it can do same thing,but only need include standard 'java.sql.', and a spark thrift server.

I am not a expect,but in our spark cluster,really unworthy to deploy a heavily hivemestore service.

cotedm · 2017-01-11T19:36:12Z

@lakeofsand so are you suggesting an architectural change here to remove the HiveMetastore dependency of the connector for those HDFS instances that have no Hive service associated with them? I'll reopen this but I think we need more details here because that's a pretty non-trivial change.

lakeofsand · 2017-01-12T02:11:10Z

Maybe no need an 'architectural change'.
In our local implement ,we just extend a class named 'ThriftUtil' from HiveUtil(io.confluent.connect.hdfs.hive),like:
public class ThriftUtil extends HiveUtil {
...
@OverRide
public void createTable(String database, String tableName, Schema schema, Partitioner partitioner) throws Hive2ThriftException {
StringBuilder createDBDDL = new StringBuilder();
String createTableDDL;

    createDBDDL.append("CREATE DATABASE IF NOT EXISTS ").append(database);
    hive2Thrift.excute(createDBDDL.toString());

    createTableDDL = getCreateTableDDL(database,tableName, schema, partitioner,this.lifeCycle);
    log.debug("create table ddl {}",createTableDDL);
    hive2Thrift.excute(createTableDDL);
}
...

}

lakeofsand · 2017-01-12T02:13:32Z

But i can't find a appropriate way to override 'alterSchema()'

cotedm added the enhancement label Jan 5, 2017

cotedm closed this as completed Jan 9, 2017

cotedm reopened this Jan 11, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kafka-connect-hdfs upon thrift server,instead of hive metastore #116

kafka-connect-hdfs upon thrift server,instead of hive metastore #116

lakeofsand commented Sep 14, 2016

cotedm commented Jan 9, 2017

lakeofsand commented Jan 10, 2017

cotedm commented Jan 10, 2017

lakeofsand commented Jan 11, 2017

cotedm commented Jan 11, 2017

lakeofsand commented Jan 12, 2017

lakeofsand commented Jan 12, 2017

kafka-connect-hdfs upon thrift server,instead of hive metastore #116

kafka-connect-hdfs upon thrift server,instead of hive metastore #116

Comments

lakeofsand commented Sep 14, 2016

cotedm commented Jan 9, 2017

lakeofsand commented Jan 10, 2017

cotedm commented Jan 10, 2017

lakeofsand commented Jan 11, 2017

cotedm commented Jan 11, 2017

lakeofsand commented Jan 12, 2017

lakeofsand commented Jan 12, 2017