Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HIVE-22488 Break up DDLSemanticAnalyzer - extract Table creation analyzers #843

Closed
wants to merge 1 commit into from

Conversation

miklosgergely
Copy link
Contributor

DDLSemanticAnalyzer is a huge class, more than 4000 lines long. The goal is to refactor it in order to have everything cut into more handleable classes under the package org.apache.hadoop.hive.ql.exec.ddl:

have a separate class for each analyzers
have a package for each operation, containing an analyzer, a description, and an operation, so the amount of classes under a package is more manageable
Step #9: extract the table creationanalyzers from DDLSemanticAnalyzer, and move them under the new package.

@@ -16,7 +16,7 @@
* limitations under the License.
*/

package org.apache.hadoop.hive.ql.ddl.table.creation;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of org.apache.hadoop.hive.ql.ddl.table.creation.create or org.apache.hadoop.hive.ql.ddl.table.creation.drop, isn't it more natural to have org.apache.hadoop.hive.ql.ddl.table.create or org.apache.hadoop.hive.ql.ddl.table.drop?

Copy link
Contributor Author

@miklosgergely miklosgergely Nov 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I choose this package structure to have a balanced structure for the table related operations. There are 46 of them, and having a package for each of them would mean an endless list of packages, this is why they were put into packages by categories.

Yet I understand your point, if someone is looking at an import it seems odd. One way to solve is that you suggested, or we could replace the "creation" with something else, like "basic", or "existence", or if you have a better suggestion :) This way we'll have a more balanced structure, where there is the main category (table), the sub category (some label), and then the actual command. But if you believe that having the undoubtedly most straightforward name that you've suggested is more important than the balanced package structure, I'm totally fine with that too, I think the most important thing is to break up the DDLSemanticAnalyzer here, and this question is secondary.

Please let me know what you think.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I believe having create or drop packages within table package, even if we have them next to column, constraint, etc., is more intuitive. It seems the additional subpackage does not provide clarity, e.g., creation.drop or creation.create is confusing. Also the creation package seem small enough already, hence another option is that these classes go directly to the table package. I believe there are many options, right now it is a matter of taste... I would just avoid redundant packages.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I've updated the package structure, now the 4 operations are at

org.apache.hadoop.hive.ql.ddl.table.create
org.apache.hadoop.hive.ql.ddl.table.create.like
org.apache.hadoop.hive.ql.ddl.table.create.show
org.apache.hadoop.hive.ql.ddl.table.drop

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants