Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues with Hive querying on MOR tables with no partitions #2801

Closed
aditiwari01 opened this issue Apr 10, 2021 · 3 comments
Closed

Issues with Hive querying on MOR tables with no partitions #2801

aditiwari01 opened this issue Apr 10, 2021 · 3 comments

Comments

@aditiwari01
Copy link
Contributor

Unable to read data via Hive from both _ro & _rt tables if my data is not partitioned.
Reading from spark api works fine.

Related Write Confs used:

PARTITIONPATH_FIELD_OPT_KEY->"",
"hoodie.datasource.hive_sync.enable" -> "true"
"hoodie.datasource.hive_sync.partition_fields"->"",
HIVE_PARTITION_EXTRACTOR_CLASS_OPT_KEY->classOf[NonPartitionedExtractor].getCanonicalName

Issue faced: NullPointerException in getTableMetaClientForBasePath of class HoodieInputFormatUtils.

My take:

  public static HoodieTableMetaClient getTableMetaClientForBasePath(FileSystem fs, Path dataPath) throws IOException {
    LOG.info("Getting Table Meta Client from path: " + dataPath.toString());
    int levels = HoodieHiveUtils.DEFAULT_LEVELS_TO_BASEPATH;
    if (HoodiePartitionMetadata.hasPartitionMetadata(fs, dataPath)) {
      HoodiePartitionMetadata metadata = new HoodiePartitionMetadata(fs, dataPath);
      metadata.readFromFS();
      levels = metadata.getPartitionDepth();
    }
    Path baseDir = HoodieHiveUtils.getNthParent(dataPath, levels);
    LOG.info("Reading hoodie metadata from path " + baseDir.toString());
    return HoodieTableMetaClient.builder().setConf(fs.getConf()).setBasePath(baseDir.toString()).build();
  }

Herein if partition meta is not available (as in case of no partition), we set levels to default of 3, in which case the base path fetched is wrong.

@aditiwari01
Copy link
Contributor Author

Update:

After setting default to 0 in case of no partition, I am able to avoid above error and able to get Table meta correctly. However, now select * query on hive return empty dataset.
From logs I can see that .HoodieInputFormatUtils: Total paths to process after hoodie filter 0.

@aditiwari01
Copy link
Contributor Author

Please ignore the above comment. The empty result from hive was due to me missinng one of the configs.

Everything is working as expected after changing the default value of HoodieHiveUtils.DEFAULT_LEVELS_TO_BASEPATH to 0 (currennt value 3), as explained in first comment.

I think we should permanently change this to 0. 3 looks like a random number. Also I have confirmed that this default is not used anywhere other than the code snippet I've mentioned.

@aditiwari01
Copy link
Contributor Author

Issue resolved after using correct key generator class. Closing the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants