Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resolved: TableCatelog doesn't supports multiple columns from multiple Columnfamilies #45

Closed
wants to merge 4 commits into from
Closed

Resolved: TableCatelog doesn't supports multiple columns from multiple Columnfamilies #45

wants to merge 4 commits into from

Conversation

chetkhatri
Copy link
Contributor

If your catalog having the format where you have multiple columns from single / multiple column family, at that point it throws an exception, for example.

def empcatalog = s"""{
|"table":{"namespace":"empschema", "name":"emp"},
|"rowkey":"key",
|"columns":{
|"empNumber":{"cf":"rowkey", "col":"key", "type":"string"},
|"city":{"cf":"pdata", "col":"city", "type":"string"},
|"empName":{"cf":"pdata", "col":"name", "type":"string"},
|"jobDesignation":{"cf":"pdata", "col":"designation", "type":"string"},
|"salary":{"cf":"pdata", "col":"salary", "type":"string"}
|}
|}""".stripMargin

Here, we have city, name, designation, salary from pdata column family.

Exception while saving Dataframe at HBase.

java.lang.IllegalArgumentException: Family 'pdata' already exists so cannot be added
at org.apache.hadoop.hbase.HTableDescriptor.addFamily(HTableDescriptor.java:827)
at org.apache.spark.sql.execution.datasources.hbase.HBaseRelation$$anonfun$createTable$1.apply(HBaseRelation.scala:98)
at org.apache.spark.sql.execution.datasources.hbase.HBaseRelation$$anonfun$createTable$1.apply(HBaseRelation.scala:95)
at scala.collection.immutable.List.foreach(List.scala:381)
at org.apache.spark.sql.execution.datasources.hbase.HBaseRelation.createTable(HBaseRelation.scala:95)
at org.apache.spark.sql.execution.datasources.hbase.DefaultSource.createRelation(HBaseRelation.scala:58)
at org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:457)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:211)

HBaseTableCatalog.scala class has getColumnFamilies method which returns duplicates, which should not return.

Unit test has been written for the same at HBaseTableCatelog.scala, writeCatalog definition.

@chetkhatri chetkhatri closed this Jan 26, 2017
ndimiduk pushed a commit to ndimiduk/hbase that referenced this pull request Feb 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant