Skip to content

Conversation

@OopsOutOfMemory
Copy link
Contributor

issue is here:
https://issues.apache.org/jira/browse/SPARK-5011

Currently, since I find a bug which block this PR:
issues: https://issues.apache.org/jira/browse/SPARK-5009 ,

Temporarily, I replace SERDEPROPERTIES with SERDEPROP, replace TBLPROPERTIES with TBLPROP.

After fix that bug above, I will rename them back.

And the final version will be like this, see below:

val hbaseDDL = s"""
      |CREATE TEMPORARY TABLE hbase_people(row_key string, name string, age int, job string)
      |USING com.shengli.spark.hbase
      |OPTIONS (
      |  someOptions 'abcdefg'
      |) 
      |WITH SERDEPROPERTIES (
      |  'hbase.columns.mapping'=':key, profile:name, profile:age, career:job'
      |)
      |TBLPROPERTIES (
      | 'hbase.table.name' = 'people'
      |)
""".stripMargin```

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@marmbrus
Copy link
Contributor

What is the rational behind this change? You already have options for passing key/value pairs to the library. Also, there is nothing called a SerDe in the external datasources API. Why not just pass them all as options.

@OopsOutOfMemory
Copy link
Contributor Author

Hi, @marmbrus
Why I want to make a change for this is original caused by the OPTIONS keyword.
Currently the OPTIONS is an ident but not a stringLit. And I don't want to break the design here, you can also passing path parameter to avro source. so I choose to add another one to make the property more flexible.

For external datasource, there are many parameters, they all follow a kind of naming format.
I refereed Hive HBase:

CREATE TABLE hbase_table_1(key int, value string) 
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:val")
TBLPROPERTIES ("hbase.table.name" = "xyz");

If I use OPTIONS to pass k/v pairs. It looks like hbase_table_name or you can define other name, but you can not define hbase.table.name which sepreated by . Like:

  val hbaseDDL = s"""
        |CREATE TEMPORARY TABLE hbase_people
        |USING com.shengli.spark.hbase
        |OPTIONS (
        |  sparksql_table_schema   '(row_key string, name string, age int, job string)',
        |  hbase_table_name    'people',
        |  hbase_table_schema '(:key , profile:name , profile:age , career:job )'
        |)""".stripMargin

The format like sparksql_table_schema which sepreated by _ is a little ugly to me.
Recommend Format
I prefer the options format like this:

   stringLit ~ "=" ~stringLit

Also,I think Users are get used to use these properties which sepreated by . , so stringLit is more make sence than indent.

hbase.zookeeper.sission.timeout
hbase.zookeeper.property.clientPort
hbase.table.name
hbase.master

There I lists some common parameter format:

Hive Cassandra:

hive> CREATE EXTERNAL TABLE MyHiveTable 
        ( key int,  data string ) 
        STORED BY 'org.apache.hadoop.hive.cassandra.cql3.CqlStorageHandler' 
        TBLPROPERTIES ( "cassandra.ks.name" = "cassandra_keyspace" , 
          "cassandra.cf.name" = "exampletable" );

Hive Elasticsearch:

CREATE EXTERNAL TABLE artists (...)
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES('es.resource' = 'radio/artists',
              'es.index.auto.create' = 'false') ;
  1. yes, SerDe is not in API, we can do it later if there is a need. but we can put parameters in to TBLPROPERTIES first like Cassandra.

Any suggestions : )

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exceeds limitation of line length

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and need a space after //

@yhuai
Copy link
Contributor

yhuai commented Jan 9, 2015

I think that putting all options in OPTIONS is better. Will backticks help you (like hbase.table.name)?

@OopsOutOfMemory
Copy link
Contributor Author

@yhuai
It's ok to use backticks, currently options key/value pair defined as
ident ~ stringLit whic is not support backticks.

I suppose to use stringLit ~ stringLit
or
(indent | repsep(ident, ".")) ~ stringLit

Which is better?

@marmbrus
Copy link
Contributor

Are you sure? I believe that ident does support backticks.

@OopsOutOfMemory
Copy link
Contributor Author

Sorry, I did not know it before.
Thanks to point it out. I will close this. : )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants