New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[LIVY-502] Remove dependency on hive-exec #117
Conversation
Codecov Report
@@ Coverage Diff @@
## master #117 +/- ##
=============================================
+ Coverage 27.02% 68.38% +41.36%
- Complexity 330 883 +553
=============================================
Files 100 100
Lines 5518 5583 +65
Branches 832 832
=============================================
+ Hits 1491 3818 +2327
+ Misses 3706 1227 -2479
- Partials 321 538 +217
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a few comments, but most are things that are better looked at later. I think as a first step getting the Hive code in, with the minimal amount of changes necessary, is better.
I need a second look, but overall take another look at whether the things you're forking really need to be forked. Having a more verbose PR description explaining what's being forked would also be good.
if (opState.isTerminal) { | ||
// Cancel should be a no-op | ||
// Cancel should be a no-op in either cases |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
either case
schema.addToFieldSchemas(fieldSchema) | ||
schema | ||
} | ||
val LOG_SCHEMA: Array[DataType] = Array(BasicDataType("string")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you're touching this, indentation looks wrong.
} | ||
|
||
def isAllowedToUse(user: String, session: InteractiveSession): Boolean = { | ||
session.owner == user || accessManager.checkModifyPermissions(user) | ||
} | ||
|
||
override def stop(): Unit = { | ||
info("Shutting down HiveServer2") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HiveServer2?
* This class is derived from Hive's one. | ||
*/ | ||
private[auth] class TUGIAssumingTransportFactory( | ||
val wrapped: TTransportFactory, val ugi: UserGroupInformation) extends TTransportFactory { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One arg per line.
/** | ||
* CallbackHandler for SASL DIGEST-MD5 mechanism. | ||
*/ | ||
// This code is pretty much completely based on Hadoop's SaslRpcServer.SaslDigestCallbackHandler - |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Put this inside the javadoc comment?
} | ||
} catch { | ||
case eL: IllegalStateException => | ||
throw new SaslException ("Invalid message format", eL) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: no space before (
(also in a few other places here an in other classes)
* If the proxy user name is provided then check privileges to substitute the user. | ||
*/ | ||
@throws[HiveSQLException] | ||
private def getProxyUser( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is another area where I think things should eventually be merge with the main server code. e.g. checking AccessManager.checkImpersonation
instead of the custom configs + verifyProxyAccess
being used here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. As previously, I'd do this in a separate PR in order to focus carefully on how to change these parts since they are sensitive.
* This class is ported from Hive. We cannot reuse Hive's one because we need to use the | ||
* `LivyCLIService`, `LivyConf` and `AuthFacotry` instead of Hive's one. | ||
*/ | ||
class ThriftHttpCLIService( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing that might be nice is to just mount this under a namespace in the main Livy HTTP server. But not sure whether Hive clients support that (or require the server to be on "/").
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure about this. Here we have a dedicated thread-pool associated with this which is separate from Livy's HTTP server. I am not sure it is feasible. We'd need to try.
* serviceUGI, which GSS-API will extract information from. | ||
* In case of a SPNego request we use the httpUGI, for the authenticating service tickets. | ||
*/ | ||
private def doKerberosAuth(request: HttpServletRequest): String = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice at some point to check whether the hadoop-auth
module (which provides a server filter, and is used by the main server) can replace this code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mmh...IIRC there was no such filter which was able to do exactly the same thing which is done here. We might have to adapt/extend one. But if we cannot just reuse it I am not sure it is worth switching to them.
* GetTypeInfoOperation. | ||
* | ||
*/ | ||
class GetTypeInfoOperation(sessionHandle: SessionHandle) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any specific reason why you have to fork this class, and others in this package?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure about your question:
- If the question is: what do we need them for? They are used when answering to metadata calls by the JDBC driver;
- If the question is: can't we reuse Hive's one? No, because Hive's classes use
HiveSession
andHiveConf
and we need to get rid of them.
Let me know if this doesn't answer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I missed the HiveConf
dependency when looking at the Hive code. Well, sigh.
thanks for the review @vanzin. I have updated the PR description. Let me know if it still needs to be improved.
I am not sure, as the only difference from that was moving from Java to Scala, which saves many conversion back and forth of collections and hence adds unneeded code only to handle this.
I am pretty sure that nothing which is there is unneeded. |
@jerryshao @vanzin anymore comments about this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is mostly ok. I still think there's stuff to clean up here, but that can be done later. Main thing is getting rid of hive-exec and other sources of conflicts...
import org.apache.hive.service.cli.operation.Operation | ||
import org.apache.hadoop.security.{SecurityUtil, UserGroupInformation} | ||
import org.apache.hive.service.ServiceException | ||
import org.apache.hive.service.cli.{FetchOrientation, FetchType, GetInfoType, GetInfoValue, HiveSQLException, OperationHandle, SessionHandle} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just use a wildcard for long import lists...
// The following configs are the same present in Hive | ||
val THRIFT_RESULTSET_DEFAULT_FETCH_SIZE = | ||
Entry("livy.server.thrift.resultset.default.fetch.size", 1000) | ||
val THRIFT_SPNEGO_PRINCIPAL = Entry("livy.server.thrift.authentication.spnego.principal", "") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Won't this be exactly the same as AUTH_KERBEROS_PRINCIPAL
?
The spnego principal must be HTTP
, so it's not like they can be different.
Which also means the keytab config will have to match, too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, I thought about that too but I was not sure, so I kept this. Let me remove it and replace it with AUTH_KERBEROS_PRINCIPAL
then. Thanks.
} | ||
addService(thriftCLIService) | ||
super.init(livyConf) | ||
Runtime.getRuntime.addShutdownHook(new Thread("LivyThriftServer Shutdown") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this necessary? Seems like if needed, it's something that the main server class should handle (and then shut down all loaded plugins).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, I moved it in the main server class, thanks.
* The class is taken from Hive's `HadoopThriftAuthBridge.Server`. It bridges Thrift's SASL | ||
* transports to Hadoop's SASL callback handlers and authentication classes. | ||
* | ||
* This class is based on Hive's one. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Redundant comment.
// TODO: add LDAP and PAM when supported | ||
val isAuthOther = authTypeStr.equalsIgnoreCase(AuthTypes.NONE.getAuthName) || | ||
authTypeStr.equalsIgnoreCase(AuthTypes.CUSTOM.getAuthName) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
too many empty lines
import org.apache.livy.thriftserver.SessionInfo | ||
import org.apache.livy.thriftserver.auth.{AuthenticationProvider, AuthFactory} | ||
|
||
class ThriftHttpServlet( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment about where this comes from? (Even if it's similar to the other classes above.)
import org.apache.livy.thriftserver.types.{BasicDataType, Field, Schema} | ||
|
||
/** | ||
* GetCatalogsOperation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a super useful doc.
import org.apache.livy.thriftserver.types.{BasicDataType, Field, Schema} | ||
|
||
/** | ||
* GetTableTypesOperation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same.
caseSensitive: Boolean, searchable: Short, unsignedAttribute: Boolean, numPrecRadix: Option[Int]) | ||
|
||
/** | ||
* GetTypeInfoOperation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same.
* GetTypeInfoOperation. | ||
* | ||
*/ | ||
class GetTypeInfoOperation(sessionHandle: SessionHandle) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I missed the HiveConf
dependency when looking at the Hive code. Well, sigh.
Thanks for the review @vanzin! I addressed your comments.
Not sure what you mean here exactly, but I think that after this patch we may start working with the Hive community in order to make some things easier to be re-used. Eg. we cannot reuse all the auth classes because they depend on the |
Merging to master. |
What changes were proposed in this pull request?
This PR removes the dependency on
hive-exec
. Only modules of Hive which are used after this PR arehive-service-rpc
andhive-service
. This drastically reduces the amount of JARs needed by the thriftserver module.The PR takes the Hive classes which we were using and adapts them when necessary (or simpify when we don't need something) in order to work in the Livy thriftserver.
Most of the classes are just migrated replacing all occurrences of
HiveConf
,HiveSession
and other Hive specific classes. Only one class has a quite different logic than Hive's and it isAuthFactory
, as we are using a different UGI handling from Hive (as we are not running the TS in a standalone JVM).The functionalities we are taking from Hive are:
cli
package), ie. the classes handling lower level details about the communication with the client;auth
package, which are used in the ones incli
), despite this PR leaves LDAP and PAM as out of scope as they are not trivial to be ported and not needed for a working solution. We can add them later;operation
package which are used to answer to metadata queries by the JDBC driver.How was this patch tested?
existing UTs + manual tests