-
Notifications
You must be signed in to change notification settings - Fork 4.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HIVE-22488 Break up DDLSemanticAnalyzer - extract Table creation analyzers #843
Conversation
080c1a7
to
0c9e578
Compare
@@ -16,7 +16,7 @@ | |||
* limitations under the License. | |||
*/ | |||
|
|||
package org.apache.hadoop.hive.ql.ddl.table.creation; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of org.apache.hadoop.hive.ql.ddl.table.creation.create
or org.apache.hadoop.hive.ql.ddl.table.creation.drop
, isn't it more natural to have org.apache.hadoop.hive.ql.ddl.table.create
or org.apache.hadoop.hive.ql.ddl.table.drop
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I choose this package structure to have a balanced structure for the table related operations. There are 46 of them, and having a package for each of them would mean an endless list of packages, this is why they were put into packages by categories.
Yet I understand your point, if someone is looking at an import it seems odd. One way to solve is that you suggested, or we could replace the "creation" with something else, like "basic", or "existence", or if you have a better suggestion :) This way we'll have a more balanced structure, where there is the main category (table), the sub category (some label), and then the actual command. But if you believe that having the undoubtedly most straightforward name that you've suggested is more important than the balanced package structure, I'm totally fine with that too, I think the most important thing is to break up the DDLSemanticAnalyzer here, and this question is secondary.
Please let me know what you think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I believe having create
or drop
packages within table
package, even if we have them next to column
, constraint
, etc., is more intuitive. It seems the additional subpackage does not provide clarity, e.g., creation.drop
or creation.create
is confusing. Also the creation
package seem small enough already, hence another option is that these classes go directly to the table
package. I believe there are many options, right now it is a matter of taste... I would just avoid redundant packages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I've updated the package structure, now the 4 operations are at
org.apache.hadoop.hive.ql.ddl.table.create
org.apache.hadoop.hive.ql.ddl.table.create.like
org.apache.hadoop.hive.ql.ddl.table.create.show
org.apache.hadoop.hive.ql.ddl.table.drop
0c9e578
to
22bcefd
Compare
22bcefd
to
42b133a
Compare
DDLSemanticAnalyzer is a huge class, more than 4000 lines long. The goal is to refactor it in order to have everything cut into more handleable classes under the package org.apache.hadoop.hive.ql.exec.ddl:
have a separate class for each analyzers
have a package for each operation, containing an analyzer, a description, and an operation, so the amount of classes under a package is more manageable
Step #9: extract the table creationanalyzers from DDLSemanticAnalyzer, and move them under the new package.