Please sign in to comment.
First commit to run the random query generator on Hive.
With this change, random query generator can run continuously on Hive and approximately half of its generated queries are able to run. 1. Connect timeout from Impyla to HS2 was too small, increasing it to match Impala's. 2. Query timeout to wait for Hive queries was too short, making it configurable so we can play with different values. 3. Hive does not support 'with' clause in subquery, but interestingly supports it at the top-level. Added a profile flag "use_nested_with" to disable nested with's. 4. Hive does not support 'having' without 'group by'. Added a profile flag "use_having_without_groupby" to always generate a group by with having. 5. Hive does not support "interval" keyword for timestamp. Added a profile 'restrict' list to restrict certain functions, and added 'dateAdd' to this list for Hive. 6. Hive 'greatest' and 'least' UDF's do not do implicit type casting like other databases. Modified the query-generator to only choose args of the same type for these, and for HiveSqlWriter to add a cast as there were still some lingering issues like udf's on int returning bigint. 7. Hive always orders the Nulls first in ORDER BY ASC, opposite to other databases, and does not have any 'NULLS FIRST' or 'NULLS LAST' option. Thus the only workaround is to add a "nulls_order_asc" flag to the profile, and pass it in to the ref database's SqlWriter to generate the 'NULLS FIRST' or 'NULLS LAST' statement on that end. 8. Hive strangely does not support multiple sort keys in a window without frame specification. The workaround is for HiveSqlWriter to add 'rows unbounded preceding' to specify the default frame if there are no existing frames. Change-Id: I2a5b07e37378f695de1b50af49845283468b4f0f Reviewed-on: http://gerrit.cloudera.org:8080/619 Reviewed-by: Casey Ching <firstname.lastname@example.org> Tested-by: Internal Jenkins
- Loading branch information...
Showing with 186 additions and 17 deletions.