Skip to content

Conversation

@alanfgates
Copy link
Contributor


import java.util.List;

public class SerDeStorageSchemaReader implements StorageSchemaReader {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where is this used ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HiveMetaStore.get_fields_with_environment_context(). Previously this method used the SerDe to read the fields from the SerDe parameters rather than the storage descriptor fields. However, standalone-metastore doesn't have access to the serdes. We haven't yet settled on a solution for this. There's a debate raging on HIVE-17714 on what the right way forward is. For now I've created the StorageSchemaReader interface to pull the serde dependency out. There is a default implementation that can be used by the metastore in standalone mode that just fails if its called. SerDeStorageSchemaReader is the implementation for use with Hive that works as before.

@@ -1,4 +1,4 @@
/**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should remain part of hive. repl dump command is implemented in hive and the cleanup of that also should be in a thread whose impl is in hive.
I guess this falls under same bucket as compactor thread. What is the approach being followed for that ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now the compactor is in Hive because it's tightly entwined in ql, though I plan to move it someday (though Eugene doesn't agree with me on that yet :) ). The compactor really needs to move because without it you cannot operate on ACID files.

For this, I'm fine with leaving it in Hive. I've wondered in general if the repl stuff should come along, though parts already have because of its tight integration into HiveMetaStore.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replication can be considered to have 2 parts - metastore event logging (DBNotificationListner) and the repl/dump load.
In my opinion, event logging belongs to metastore. The repl load/dump commands (they are sql commands, that generate query plans) should be in Hive. So associated code, such as cleanup of the repl dump dir etc should also be in Hive.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. I can move this class back into metastore.

@asfgit asfgit closed this in 8fcc7f3 Nov 21, 2017
b-slim pushed a commit to b-slim/hive that referenced this pull request Feb 11, 2018
…es, reviewed by Thejas Nair).

(cherry picked from commit 8fcc7f3)

Change-Id: I79bd26840eafe792595ceaaa35abe82b15178dab
SirOibaf pushed a commit to SirOibaf/hive that referenced this pull request Apr 10, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants