Resolved: TableCatelog doesn't supports multiple columns from multiple Columnfamilies #45

chetkhatri · 2017-01-25T16:33:36Z

If your catalog having the format where you have multiple columns from single / multiple column family, at that point it throws an exception, for example.

def empcatalog = s"""{
|"table":{"namespace":"empschema", "name":"emp"},
|"rowkey":"key",
|"columns":{
|"empNumber":{"cf":"rowkey", "col":"key", "type":"string"},
|"city":{"cf":"pdata", "col":"city", "type":"string"},
|"empName":{"cf":"pdata", "col":"name", "type":"string"},
|"jobDesignation":{"cf":"pdata", "col":"designation", "type":"string"},
|"salary":{"cf":"pdata", "col":"salary", "type":"string"}
|}
|}""".stripMargin

Here, we have city, name, designation, salary from pdata column family.

Exception while saving Dataframe at HBase.

java.lang.IllegalArgumentException: Family 'pdata' already exists so cannot be added
at org.apache.hadoop.hbase.HTableDescriptor.addFamily(HTableDescriptor.java:827)
at org.apache.spark.sql.execution.datasources.hbase.HBaseRelation$$anonfun$createTable$1.apply(HBaseRelation.scala:98)
at org.apache.spark.sql.execution.datasources.hbase.HBaseRelation$$anonfun$createTable$1.apply(HBaseRelation.scala:95)
at scala.collection.immutable.List.foreach(List.scala:381)
at org.apache.spark.sql.execution.datasources.hbase.HBaseRelation.createTable(HBaseRelation.scala:95)
at org.apache.spark.sql.execution.datasources.hbase.DefaultSource.createRelation(HBaseRelation.scala:58)
at org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:457)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:211)

HBaseTableCatalog.scala class has getColumnFamilies method which returns duplicates, which should not return.

Unit test has been written for the same at HBaseTableCatelog.scala, writeCatalog definition.

…nd rephrased

…re used, Integer instead of Int

…e column families

chetkhatri added 4 commits January 25, 2017 04:30

Corrected - Wrong classnames at println and unused comments removed a…

afe2b25

…nd rephrased

Improved: Show() Spark Action instead of show. Deprecated datatype we…

601f815

…re used, Integer instead of Int

Corrected - Incorrect Syntax at HBase-Spark Module Example

2e4b511

Resolved: TableCatelog doesn't supports multiple columns from multipl…

5629fa8

…e column families

chetkhatri closed this Jan 26, 2017

This was referenced Jul 3, 2022

HBASE-27175 - Failure to cleanup WAL split dir log should be at INFO level #4593

Merged

HBASE-26827 RegionServer JVM crash when compact mob table #4206

Open

HBASE-27183 Support regionserver to connect to HMaster proxy port #4606

Closed

Apache-HBase mentioned this pull request Jul 29, 2022

HBASE-27251 Rolling back from 2.5.0-SNAPSHOT to 2.4.13 fails due to '… #4665

Merged

ndimiduk pushed a commit to ndimiduk/hbase that referenced this pull request Feb 29, 2024

add a config for using the legacy methods (apache#45)

cd1c883

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Resolved: TableCatelog doesn't supports multiple columns from multiple Columnfamilies #45

Resolved: TableCatelog doesn't supports multiple columns from multiple Columnfamilies #45

chetkhatri commented Jan 25, 2017

Resolved: TableCatelog doesn't supports multiple columns from multiple Columnfamilies #45

Resolved: TableCatelog doesn't supports multiple columns from multiple Columnfamilies #45

Conversation

chetkhatri commented Jan 25, 2017