Replies: 4 comments 4 replies
-
https://cwiki.apache.org/confluence/display/INCUBATOR/New+Podling+Proposal, this doc helps a lot when drafting the proposal |
Beta Was this translation helpful? Give feedback.
0 replies
-
https://uspto.report/TM/90007867/ Hmm... I am sure if this is a risk to keep the project name here, cc @jiwq |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Greetings!
We are proposing to enter Kyuubi, an open-source distributed multi-tenant Thrift JDBC/ODBC server powered by Apache Spark, into incubation. Please see the proposal below.
Abstract
Kyuubi is a distributed multi-tenant Thrift JDBC/ODBC server for large-scale data management, processing, and analytics, built on top of Apache Spark and designed to support more engines (i.e., Flink). We are aiming to make Kyuubi an "out-of-the-box" tool for data warehouses and data lakes.
Proposal
Kyuubi provides a pure SQL gateway through Thrift JDBC/ODBC interface for end-users to manipulate large-scale data with pre-programmed and extendible Spark SQL engines. This "out-of-the-box" model minimizes the barriers and costs for end-users to use Spark at the client side. At the server-side, Kyuubi server and engines' multi-tenant architecture provides the administrators a way to achieve computing resource isolation, data security, high availability, high client concurrency, etc.
Background
In typical big data production environments, especially secured ones, all bundled services manage access control lists to restricting access to authorized users. For example, Hadoop Yarn divides compute resources into Queues. With Queue ACLs, it can identify and control which users/groups can take actions on particular queues. Similarly, HDFS ACLs control access of HDFS files by providing a way to set different permissions for specific users/groups.
Apache Spark is a unified analytics engine for large-scale data processing. It provides a Distributed SQL Engine, a.k.a, the Spark Thrift Server(STS), designed to be seamlessly compatible with HiveServer2 and get even better performance.
HiveServer2 can identify and authenticate a caller, and then if the caller also has permissions for the YARN queue and HDFS files, it succeeds. Otherwise, it fails. However, on the one hand, STS is a single Spark application, and the user and the queue to which STS belongs are uniquely determined at startup. Consequently, STS cannot leverage cluster managers such as YARN and Kubernetes for resource isolation and sharing or control the access for callers by the single user inside the whole system. On the other hand, the Thrift Server is coupled in the Spark driver's JVM process. This coupled architect puts a high risk on server stability and makes it unable to handle high client concurrency or apply high availability such as load balancing as it is stateful.
Kyuubi extends the use of STS in a multi-tenant model based on a unified interface and relies on the concept of multi-tenancy to interact with cluster managers to finally gain the ability of resources sharing/isolation and data security. The loosely coupled architecture of the Kyuubi server and engine dramatically improves the client concurrency and service stability of the service itself.
You can find more information on Kyuubi at the existing open-source website: https://kyuubi.readthedocs.io/.
Rationale
For pure SQL users migrating from HiveServer2 to Spark SQL for better performance, there is a strong need for multi-tenancy support to realize the purpose of resource isolation and data security and the client concurrency, service stability, and high availability as well.
To achieve these goals, Kyuubi introduces the following three foremost aspects:
Kyuubi, STS, and HiveServer2 are identical in terms of interfaces and protocols. Therefore, from the user's point of view, the way of use is unchanged.
Kyuubi applies the multi-tenant feature based on the concept of Kyuubi engines, which are pre-programmed and extendible Spark applications.
Kyuubi isolates engines according to the tenants in the whole system. The tenant, a.k.a. user, is unified and end-to-end unique through a JDBC connection. The Kyuubi server will identify and authenticate the user and then retrieve from the
enginespace
or create an engine belonging to this user. This user will also be used as the submitter for the engine, and it must have authority to use the resources from YARN, Kubernetes, or just Local machine, e.t.c. Inside an engine, the engine's user, a.k.a. Spark User, will also be the same. When an engine runs queries received from the JDBC connection, the engine's user must also have rights to access the metadata and data.Initial Goals
Current Status
The Kyuubi project began at NetEase, maintained as a sub-module of our in-house Spark project(Since 2.0.0). At that point, we needed a Spark SQL service to migrate thousands of Hive QL jobs from HiveServer2 for better performance. Meanwhile, we didn't want to lose features like fine-grained permission control, queue resource isolation, high availability, etc. We separated it into an independent project and open-sourced it under Apache License 2.0 in Dec 2017.
Meritocracy:
This proposal intends to start building a diverse developer and user community around Kyuubi following the ASF meritocracy model. Since Kyuubi was open-sourced, many enterprises have adopted Kyuubi to build up their multi-tenant Spark SQL services to replace HiveServer2. In return, we have received many issue reports or enhancements from them simultaneously. Because Kyuubi is maintained under a personal account on Github, that makes it a bit untrustworthy as a fundamental platform. We've been asked many times by our users if the ASF could incubate it. The codebase is now mainly managed by a group of developers from NetEase, wenjuan.com, and eBay. We will also try our best to encourage an environment that supports a meritocracy.
Community:
Kyuubi has been building a community around predecessors to this framework for the last four years. And we believe that we can get a lot of help from the Apache Spark community too.
Core Developers:
Alignment:
Kyuubi is built upon Apache Spark and many other Apache projects such as Apache Hive, Zookeeper, Hadoop, YARN, etc. The codebase of Kyuubi is already under Apache License Version 2.0. Meanwhile, our current core developers all have the experience of contributing to various Apache projects. These community connections help us focus on development practices that emphasize community engagement to align us with the ASF path to meritocratic recognition naturally.
Known Risks
Project Name
The project took its name, Kyuubi, from a character of a popular Japanese manga - Naruto. It is a nine-tailed fox spirit in Chinese, Japanese mythology. Kyuubi spreads the power and energy of fire, used here to symbolize the powerful project. Its nine tails stand for end-to-end multi-tenancy support of this project vividly. Based on our search results, the term Kyuubi is used as a trademark only under the Class 17, so it is perfectly legal to use it as our project name.
Orphaned products
The risk of the Kyuubi project being abandoned is minimal. Many organizations are using Kyuubi to build critical big data pipelines and willing to help develop Kyuubi's community if it becomes an ASF project.
Inexperience with Open Source:
Many of the Kyuubi committers have experience working on open source projects. They are also active committers and contributors to other Apache projects.
Homogenous Developers:
The current contributors work across various organizations, including NetEase, eBay, Wenjuan.com, Huawei, HIVE-BOX, JD.com Inc, etc. We are committed to recruiting additional committers based on their contributions to the project.
Reliance on Salaried Developers:
Salaried engineers have made contributions to the Kyuubi project to date from NetEase, eBay, Wenjuan.com, Huawei, HIVE-BOX, etc., both on their salaried time and on volunteer time. They are all passionate about the project, and we are confident that the project will continue even if no salaried developers contribute to the project. We are committed to recruiting additional committers, including non-salaried developers, and aim to diversify the Kyuubi user and contributor base further.
Relationships with Other Apache Products:
Kyuubi is closely integrated with the Apache Spark, Zookeeper, Curator, Hive, Thrift, and commons currently in numerous ways.
Kyuubi inherits Hive's Hive Service RPC module to reuse the Thrift API to build the RPC environments between clients and Kyuubi servers, and between Kyuubi servers and engines internally. Clients can use the existing Hive JDBC/Beeline to talk to Kyuubi in the same way as Spark ThriftServer and HiveServer2. Kyuubi uses Zookeeper and Curator to build a service registration discovery mechanism for internal and external components. Kyuubi engines are pre-programmed Spark SQL applications that can fully support the pure SQL usages in Spark. They can run any cluster managers like Kubernetes, Hadoop YARN, Spark Standalone, etc. In the future, engines' pluggable design can support more Apache projects, such as Apache Flink.
An Excessive Fascination with the Apache Brand
The primary motivation for submitting Kyuubi to the ASF is to build a diverse and strong community and to gain stability for long-term development. We also wish to encourage diverse organizations to adopt Kyuubi and contribute to Kyuubi without any concerns about ownership or licensing.
Documentation
Since Kyuubi 1.0.0, the Kyuubi online documentation is hosted by https://readthedocs.org/.
You can find the specific version of Kyuubi documentation listed below.
For 0.8 and earlier versions, on Github Pages.
Initial Source
The initial source code for Kyuubi is hosted at https://github.com/yaooqinn/kyuubi
Initial Source and Intellectual Property Submission Plan
As soon as Kyuubi is approved to join Apache Incubator, our initial committers will submit ICLA(s) and CCLA(s). The codebase is already licensed under the Apache License 2.0.
External Dependencies
Apache Licence 2.0
commons-codec:commons-codec
org.apache.commons:commons-lang3
org.apache.curator:curator-client
org.apache.curator:curator-framework
org.apache.curator:curator-recipes
org.apache.curator:curator-test
com.google.guava:failureaccess
com.google.guava:guava
org.apache.hadoop:hadoop-client-api
org.apache.hadoop:hadoop-client-runtime
org.apache.hive:hive-service-rpc
org.apache.htrace:htrace-core4
com.fasterxml.jackson.core:jackson-annotations
com.fasterxml.jackson.core:jackson-core
com.fasterxml.jackson.core:jackson-databind
org.javassist:javassist
org.apache.thrift:libfb303
org.apache.thrift:libthrift
log4j:log4j
io.dropwizard.metrics:metrics-core
io.dropwizard.metrics:metrics-jmx
io.dropwizard.metrics:metrics-json
io.dropwizard.metrics:metrics-jvm
org.apache.zookeeper:zookeeper
spark-*-bin-*.tgz
BSD 3-Clause
org.scala-lang:scala-library
MIT License
org.slf4j:slf4j-api
org.slf4j:slf4j-log4j12
org.slf4j:jcl-over-slf4j
Required Resources
Mailing lists
Git Repositories:
github.com/apache/incubator-kyuubi
github.com/apache/incubator-kyuubi-shaded
github.com/apache/incubator-kyuubi-site
Issue Tracking
We request the creation of an Apache-hosted JIRA.
Jira ID: KYUUBI
Initial Committers
Sponsors
Champion
Nominated Mentors
Sponsoring Entity
We are expecting the Apache Incubator could sponsor this project.
Beta Was this translation helpful? Give feedback.
All reactions