New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add Apache Flink connector to Hue #1010
Comments
+1 Probably 2 ways:
|
@romainr thanks for the quick response! I think option 2 is doable now. For option 1, some questions since I'm not familiar with SqlAlchemy - does it requires Flink to have a jdbc server (Flink doesn't have one yet), or do we just need to add SqlAlchemy dialect to Flink? Flink supports its calcite dialect and Hive dialect, can the Hive dialect part work now? |
Option 1: it requires a way to communicate via Python, be it with REST, Thrift... Today, if we want to send a query via a Python program, how would we do? (JDBC is for Java, not Python). Some examples:
I am not 100% sure but maybe Option 2 https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/table/common.html#create-a-tableenvironment could be used for doing the SqlAlchemy connector until there is a proper REST/Thrift server for Flink SQL. In any case, Option 2 would be easy to try (I can add the skeleton if you want?). When we have a connector (aka "a way to send queries"), we can help provide the Flink specific syntax highlighting and autocomplete: https://docs.gethue.com/developer/parsers/ On the bonus side too, Hue is going to get proper support for continuously running queries. |
Got you. Option 1 probably isn't feasible at this moment since Flink doesn't have such a thrift/rest server yet, though we are looking into it. I agree option 2 is the way to go before Flink thrift server is available. Would be great if you can add the skeleton and I can help review and test! |
Skeleton for Option 2 is in review, as well as the SQL Autocomplete + Highlighter skeletons. e.g. for the connector: https://github.com/cloudera/hue/blob/testing-romain/desktop/libs/notebook/src/notebook/connectors/flink.py#L120 Note: maybe Option 2 can be ported to SqlAlchemy at some point too, what matters is to have a fair RPC via Python to Flink. At least Option 1 is easy to experiment with. Code should be in master tomorrow, and I can send a snippet of Hue config on how to activate the connector. |
Hey @bowenli86, was there any progress on the connector? Any update when Flink SQL will have a REST API? |
Hi @romainr, yes, we are building Flink gateway and it should be available in a month or so https://github.com/ververica/flink-sql-gateway @KurtYoung @godfreyhe can you follow up with Romain on this? I think it's valuable to 1) test completeness of sql gateway with such a good use case 2) integrate more deeply with Cloudera stack, especially since Cloudera has launched Flink service, something we can even bring in Gyula or Marton as well |
Interesting! And would you recommend a certain Docker with a ready to launch Flink with some tables? (e.g. https://github.com/ververica/sql-training... that way it would help do a quick scoping on the Hue side) Then I could just point the gateway to it. Also not sure for FLINK_HOME if using Flink in a docker. |
yes, I think https://github.com/ververica/sql-training is a good source, there's also official images from docker hub https://hub.docker.com/_/flink |
@romainr Could you describe your use case a little bit, like will you run both batch & streaming queries or just one of them, what kind of cluster type will you play with, standalone/yarn session/yarn per-job? Will you perform DDL operations via sql gateway or just DQL & DML? |
The idea would be to test the DESCRIBE and simple SELECT via the gateway. Then test more types of statements. I don't have a preference for the type of cluster. The idea is to play around to get a scoping of the gateway API / SQL autocomplete work. For example, I just did a POC for Phoenix https://github.com/cloudera/hue/blob/master/docs/designs/apache_phoenix.md, also did some tests with ksqlDB regular or stream queries. It was easy to poke around with a real system that comes out of the box via a container. |
What should be FLINK_HOME for the sql-gateway when using the SQL training docker compose? |
@romainr Currently, there is no Flink SQL Gateway docker image. We need to deploy sql gateway independently in a gateway machine (or local just for testing). And you should also download flink binary package to the same machine. |
Sorry again if this is a basic question, but I could still not point the SQL gateway to the SQL training properly:
Everything on the same machine flink-1.10.0: SQL gateway: SQL training is working and I can connect via the SQL client
Do we also need to backport the configs of https://github.com/ververica/sql-training/blob/master/build-image/conf/sql-client-conf.yaml into the SQL Gateway (but that seems counter intuitive to have it in the REST API) |
Hi @romainr, I think you should backport the |
Thanks! I am now running the gateway within the sql-training docker client container (and updated the bind addresses of sql-gateway-defaults.yaml to container IP) and added tables and functions and got past the 'Could not create execution context' errors by adding the jars to the gateway: ./sql-gateway.sh -j /opt/flink/lib/flink-table_2.11-1.10.0.jar -j /opt/flink/lib/flink-table-blink_2.11-1.10.0.jar /opt/flink/lib/flink-dist_2.11-1.10.0.jar Maybe I am not opening up the session properly or pointing to another context somehow but still not seeing any of the pre-defined tables.
At least a basic 'SELECT 1, 2, 3' works, so after getting above working it should be demo ready! |
hi @romainr if you defined FLINK_HOME, |
Thanks! I put a write-up and the file here: https://github.com/romainr/flink-sql-gateway/tree/master/docs/demo It is done in the sql training docker so that it is easy to repro. Basically I soon as I try to add the demo tables/UDF the curl to create a session fails. Let me know if it is not clear enough. |
After above and testing on a live Flink, the initial connector will come in https://issues.cloudera.org/browse/HUE-9280 |
Nevermind, I forgot the last -j, so the jars don't fix the issue:
Maybe it is a 2.11 compatibility issue? Would you have a build of https://github.com/ververica/flink-sql-gateway/tree/flink_1.11_SNAPSHOT ? |
The correct jars to add are actually from /opt/sql-client/lib |
First version of the connector: https://gethue.com/blog/sql-editor-for-apache-flink-sql/ Still more to do but before production ready but this is a good start for demos/POCs. Next follow-up tasks to do in their own jira. |
I’m trying to use Hue Editor with Flink SQL installed on Kubernetes. I have my Flink running on POD1 and HUE running on POD2 in the same K8 cluster. What should be my settings in - sql-gateway-defaults.yaml
hue-conf
Is the <what_ip_address_should_this_be> is a cluster IP for Job Manager? How do I use Hue on Kubernetes with flink? |
You could boot into the container https://github.com/romainr/query-demo/tree/master/stream-sql-demo to check but those should be all the same: the hostname of the Sql Gateway API |
@romainr Not sure how to go about that. How do I do that in Kubernetes? This is the error I see when connecting to defaults sql-gateway settings with Flink- |
Is the issue already present in https://github.com/cloudera/hue/issues or discussed in the forum https://discourse.gethue.com?
no
What is the Hue version or source? (e.g. open source 4.5, CDH 5.16, CDP 1.0...)
n/a
Is there a way to help reproduce it?
n/a
The text was updated successfully, but these errors were encountered: