add Apache Flink connector to Hue #1010

bowenli86 · 2019-12-06T21:08:56Z

Is the issue already present in https://github.com/cloudera/hue/issues or discussed in the forum https://discourse.gethue.com?

no

What is the Hue version or source? (e.g. open source 4.5, CDH 5.16, CDP 1.0...)

n/a

Is there a way to help reproduce it?

n/a

romainr · 2019-12-06T21:18:37Z

+1

Probably 2 ways:

Flink adds a dialect for SqlAlchemy and it should work in Hue without major changes
We add/use the Flink Python SQL API in Hue
https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/table/common.html#query-a-table
Similarly to KSQL https://github.com/cloudera/hue/blob/master/desktop/libs/notebook/src/notebook/connectors/ksql.py (more work but easy/basic implementation should be quick)

bowenli86 · 2019-12-06T22:11:14Z

@romainr thanks for the quick response!

I think option 2 is doable now.

For option 1, some questions since I'm not familiar with SqlAlchemy - does it requires Flink to have a jdbc server (Flink doesn't have one yet), or do we just need to add SqlAlchemy dialect to Flink? Flink supports its calcite dialect and Hive dialect, can the Hive dialect part work now?

romainr · 2019-12-07T11:30:53Z

Option 1: it requires a way to communicate via Python, be it with REST, Thrift... Today, if we want to send a query via a Python program, how would we do? (JDBC is for Java, not Python).

Some examples:

PyHive is using Hive Thrift: https://github.com/dropbox/PyHive/blob/master/pyhive/hive.py
PyDruid is using REST: https://github.com/druid-io/pydruid/tree/master/pydruid/db, Presto also REST...
Phoenix has a Python binding for Calcite Avatica: https://calcite.apache.org/avatica/docs/

I am not 100% sure but maybe Option 2 https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/table/common.html#create-a-tableenvironment could be used for doing the SqlAlchemy connector until there is a proper REST/Thrift server for Flink SQL. In any case, Option 2 would be easy to try (I can add the skeleton if you want?).

When we have a connector (aka "a way to send queries"), we can help provide the Flink specific syntax highlighting and autocomplete: https://docs.gethue.com/developer/parsers/

On the bonus side too, Hue is going to get proper support for continuously running queries.

bowenli86 · 2019-12-08T18:52:12Z

Got you. Option 1 probably isn't feasible at this moment since Flink doesn't have such a thrift/rest server yet, though we are looking into it.

I agree option 2 is the way to go before Flink thrift server is available. Would be great if you can add the skeleton and I can help review and test!

romainr · 2019-12-09T22:17:40Z

Skeleton for Option 2 is in review, as well as the SQL Autocomplete + Highlighter skeletons.

e.g. for the connector: https://github.com/cloudera/hue/blob/testing-romain/desktop/libs/notebook/src/notebook/connectors/flink.py#L120

Note: maybe Option 2 can be ported to SqlAlchemy at some point too, what matters is to have a fair RPC via Python to Flink. At least Option 1 is easy to experiment with.

Code should be in master tomorrow, and I can send a snippet of Hue config on how to activate the connector.

romainr · 2020-04-06T17:23:48Z

Hey @bowenli86, was there any progress on the connector? Any update when Flink SQL will have a REST API?

bowenli86 · 2020-04-06T18:53:27Z

Hi @romainr, yes, we are building Flink gateway and it should be available in a month or so https://github.com/ververica/flink-sql-gateway

@KurtYoung @godfreyhe can you follow up with Romain on this? I think it's valuable to 1) test completeness of sql gateway with such a good use case 2) integrate more deeply with Cloudera stack, especially since Cloudera has launched Flink service, something we can even bring in Gyula or Marton as well

romainr · 2020-04-06T23:06:07Z

Interesting! And would you recommend a certain Docker with a ready to launch Flink with some tables? (e.g. https://github.com/ververica/sql-training... that way it would help do a quick scoping on the Hue side) Then I could just point the gateway to it. Also not sure for FLINK_HOME if using Flink in a docker.

bowenli86 · 2020-04-07T00:48:11Z

yes, I think https://github.com/ververica/sql-training is a good source, there's also official images from docker hub https://hub.docker.com/_/flink

KurtYoung · 2020-04-07T02:34:40Z

@romainr Could you describe your use case a little bit, like will you run both batch & streaming queries or just one of them, what kind of cluster type will you play with, standalone/yarn session/yarn per-job? Will you perform DDL operations via sql gateway or just DQL & DML?

romainr · 2020-04-07T05:19:24Z

The idea would be to test the DESCRIBE and simple SELECT via the gateway. Then test more types of statements. I don't have a preference for the type of cluster. The idea is to play around to get a scoping of the gateway API / SQL autocomplete work.

For example, I just did a POC for Phoenix https://github.com/cloudera/hue/blob/master/docs/designs/apache_phoenix.md, also did some tests with ksqlDB regular or stream queries. It was easy to poke around with a real system that comes out of the box via a container.

romainr · 2020-04-10T05:02:15Z

What should be FLINK_HOME for the sql-gateway when using the SQL training docker compose?

godfreyhe · 2020-04-11T12:05:39Z

@romainr Currently, there is no Flink SQL Gateway docker image. We need to deploy sql gateway independently in a gateway machine (or local just for testing). And you should also download flink binary package to the same machine. FLINK_HOME is the path of the flink binary location. You should set gateway-address as local ip address in <sql-gateway-dir>/conf/sql-gateway-defaults.yaml. Because the taskmanager (in docker) will send result data to sql gateway (in physical machine) through socket for select query. If you run all components in local (or a single machine), that is the only thing you need to care about, I had tested successfully in my local. If you deploy sql gateway and flink docker cluster in different machines, you need also set jobmanager.rpc.address as real jobmanger address (default is localhost) in <your-flink-binary-dir>/conf/flink-conf.yaml.

romainr · 2020-04-11T18:49:17Z

I see, the gateway just reuses the libs in FLINK_HOME. Thanks!

I think I got the basis working:

romainr · 2020-04-25T18:49:11Z

Sorry again if this is a basic question, but I could still not point the SQL gateway to the SQL training properly:

POST None http://127.0.0.1:8083/v1/sessions/d61a353d8cfa7e7bb5039744bc68511e/statements {"statement": "SHOW tables", "execution_timeout": ""} returned in 5ms 200 163 {"results":[{"columns":[{"name":"tables","type":"VARCHAR(1) NOT NULL"},{"name":"type","type":"VARCHAR(5) NOT NULL"}],"data":[]}],"statement_types":["SHOW_TABLES"]}
SQL Gateway is running, just I don't see any tables, and it behaves the same even if the Flink SQL training docker is not running (I can SHOW DATABASES/FUNCTIONS etc but there is no tables).

Everything on the same machine

flink-1.10.0:
I put the IP address of the Docker job manager for jobmanager.rpc.address in conf/flink-conf.yaml

SQL gateway:
conf/sql-gateway-defaults.yaml has gateway-address:127.0.0.1
./sql-gateway.sh Reading configuration from: file:/home/romain/projects/flink-sql-gateway-0.1-SNAPSHOT/conf/sql-gateway-defaults.yaml Rest endpoint started.

SQL training is working and I can connect via the SQL client
https://github.com/ververica/sql-training/wiki/Setting-up-the-Training-Environment
https://github.com/ververica/sql-training/blob/master/build-image/conf/sql-client-conf.yaml

sql-training$ docker ps
CONTAINER ID        IMAGE                                                COMMAND                  CREATED             STATUS              PORTS                                                NAMES
7777ea00b5d4        fhueske/flink-sql-training:1-FLINK-1.10-scala_2.11   "/docker-entrypoint.…"   5 seconds ago       Up 3 seconds        6123/tcp, 8081/tcp                                   sql-training_sql-client_1
e7d664901220        flink:1.10.0-scala_2.11                              "/docker-entrypoint.…"   6 seconds ago       Up 4 seconds        6121-6123/tcp, 8081/tcp                              sql-training_taskmanager_1
e5d36587a4c8        wurstmeister/kafka:2.12-2.2.1                        "start-kafka.sh"         8 seconds ago       Up 5 seconds        0.0.0.0:9092->9092/tcp                               sql-training_kafka_1
ee4c680f36e5        wurstmeister/zookeeper:3.4.6                         "/bin/sh -c '/usr/sb…"   10 seconds ago      Up 8 seconds        22/tcp, 2888/tcp, 3888/tcp, 0.0.0.0:2181->2181/tcp   sql-training_zookeeper_1
ed4e76577a51        flink:1.10.0-scala_2.11                              "/docker-entrypoint.…"   10 seconds ago      Up 6 seconds        6123/tcp, 0.0.0.0:8081->8081/tcp                     sql-training_jobmanager_1
e996f679fcba        mysql:8.0.19                                         "docker-entrypoint.s…"   10 seconds ago      Up 9 seconds        3306/tcp, 33060/tcp                                  sql-training_mysql_1

Do we also need to backport the configs of https://github.com/ververica/sql-training/blob/master/build-image/conf/sql-client-conf.yaml into the SQL Gateway (but that seems counter intuitive to have it in the REST API)

godfreyhe · 2020-04-26T03:45:47Z

Hi @romainr, I think you should backport the tables section and functions section to sql-gateway-defaults.yaml. because Sql gateway will load them when startup.

romainr · 2020-04-26T15:15:42Z

Thanks!

I am now running the gateway within the sql-training docker client container (and updated the bind addresses of sql-gateway-defaults.yaml to container IP) and added tables and functions and got past the 'Could not create execution context' errors by adding the jars to the gateway:

./sql-gateway.sh -j /opt/flink/lib/flink-table_2.11-1.10.0.jar -j /opt/flink/lib/flink-table-blink_2.11-1.10.0.jar /opt/flink/lib/flink-dist_2.11-1.10.0.jar

Maybe I am not opening up the session properly or pointing to another context somehow but still not seeing any of the pre-defined tables.

http://172.20.0.7:8083/v1/sessions {"session_name": "test", "planner": "blink", "execution_type": "streaming", "properties": {}} returned in 20ms 200 49 {"session_id":"42cb096147dc270f342afc9536641897"}

At least a basic 'SELECT 1, 2, 3' works, so after getting above working it should be demo ready!

godfreyhe · 2020-04-27T04:07:35Z

hi @romainr if you defined FLINK_HOME, -j /opt/flink/lib/flink-table_2.11-1.10.0.jar -j /opt/flink/lib/flink-table-blink_2.11-1.10.0.jar /opt/flink/lib/flink-dist_2.11-1.10.0.jar is no needed when executing ./sql-gateway.sh. could you provide sql-gateway-defaults.yaml file, I can try it in my local.

romainr · 2020-04-27T15:02:00Z

Thanks!

I put a write-up and the file here: https://github.com/romainr/flink-sql-gateway/tree/master/docs/demo

It is done in the sql training docker so that it is easy to repro. Basically I soon as I try to add the demo tables/UDF the curl to create a session fails.

Let me know if it is not clear enough.

romainr · 2020-04-28T13:28:02Z

After above and testing on a live Flink, the initial connector will come in https://issues.cloudera.org/browse/HUE-9280

romainr · 2020-05-02T04:56:44Z

j /opt/flink/lib/flink-table_2.11-1.10.0.jar -j /opt/flink/lib/flink-table-blink_2.11-1.10.0.jar /opt/flink/lib/flink-dist_2.11-1.10.0.jar

Nevermind, I forgot the last -j, so the jars don't fix the issue:

Could not find a suitable table factory for 'org.apache.flink.table.factories.TableSourceFactory' in\nthe classpath.\n\nReason: Required context properties mismatch.\n\nThe matching candidates:\norg.apache.flink.table.sources.CsvAppendTableSourceFactory\nMismatched properties:\n'connector.type' expects 'filesystem', but is 'kafka'\n'format.type' expects 'csv', but is 'json'

Maybe it is a 2.11 compatibility issue? Would you have a build of https://github.com/ververica/flink-sql-gateway/tree/flink_1.11_SNAPSHOT ?
(for some reason mvn package fails for me on this branch)

romainr · 2020-05-02T15:28:58Z

The correct jars to add are actually from /opt/sql-client/lib

romainr · 2020-05-07T14:13:22Z

First version of the connector: https://gethue.com/blog/sql-editor-for-apache-flink-sql/

Still more to do but before production ready but this is a good start for demos/POCs. Next follow-up tasks to do in their own jira.

amitshahiplooto · 2021-01-08T17:35:55Z

I’m trying to use Hue Editor with Flink SQL installed on Kubernetes. I have my Flink running on POD1 and HUE running on POD2 in the same K8 cluster.

What should be my settings in -

sql-gateway-defaults.yaml

server:
# The address that the gateway binds itself.
bind-address: <what_ip_address_should_this_be>
# The address that should be used by clients to connect to the gateway.
address: <what_ip_address_should_this_be>

hue-conf

[[[flink]]]
name=Flink
interface=flink
options=’{“url”: “http://<what_ip_address_should_this_be>:8083”}’

Is the <what_ip_address_should_this_be> is a cluster IP for Job Manager?

How do I use Hue on Kubernetes with flink?

romainr · 2021-01-08T18:12:04Z

You could boot into the container https://github.com/romainr/query-demo/tree/master/stream-sql-demo to check but those should be all the same: the hostname of the Sql Gateway API

amitshahiplooto · 2021-01-08T18:16:36Z

@romainr Not sure how to go about that. How do I do that in Kubernetes? This is the error I see when connecting to defaults sql-gateway settings with Flink-

romainr self-assigned this Apr 25, 2020

romainr closed this as completed May 7, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add Apache Flink connector to Hue #1010

add Apache Flink connector to Hue #1010

bowenli86 commented Dec 6, 2019

romainr commented Dec 6, 2019

bowenli86 commented Dec 6, 2019 •

edited

romainr commented Dec 7, 2019

bowenli86 commented Dec 8, 2019

romainr commented Dec 9, 2019 •

edited

romainr commented Apr 6, 2020

bowenli86 commented Apr 6, 2020 •

edited

romainr commented Apr 6, 2020

bowenli86 commented Apr 7, 2020

KurtYoung commented Apr 7, 2020

romainr commented Apr 7, 2020

romainr commented Apr 10, 2020

godfreyhe commented Apr 11, 2020

romainr commented Apr 11, 2020 •

edited

romainr commented Apr 25, 2020

godfreyhe commented Apr 26, 2020

romainr commented Apr 26, 2020 •

edited

godfreyhe commented Apr 27, 2020

romainr commented Apr 27, 2020

romainr commented Apr 28, 2020

romainr commented May 2, 2020

romainr commented May 2, 2020

romainr commented May 7, 2020

amitshahiplooto commented Jan 8, 2021 •

edited

romainr commented Jan 8, 2021

amitshahiplooto commented Jan 8, 2021 •

edited

add Apache Flink connector to Hue #1010

add Apache Flink connector to Hue #1010

Comments

bowenli86 commented Dec 6, 2019

romainr commented Dec 6, 2019

bowenli86 commented Dec 6, 2019 • edited

romainr commented Dec 7, 2019

bowenli86 commented Dec 8, 2019

romainr commented Dec 9, 2019 • edited

romainr commented Apr 6, 2020

bowenli86 commented Apr 6, 2020 • edited

romainr commented Apr 6, 2020

bowenli86 commented Apr 7, 2020

KurtYoung commented Apr 7, 2020

romainr commented Apr 7, 2020

romainr commented Apr 10, 2020

godfreyhe commented Apr 11, 2020

romainr commented Apr 11, 2020 • edited

romainr commented Apr 25, 2020

godfreyhe commented Apr 26, 2020

romainr commented Apr 26, 2020 • edited

godfreyhe commented Apr 27, 2020

romainr commented Apr 27, 2020

romainr commented Apr 28, 2020

romainr commented May 2, 2020

romainr commented May 2, 2020

romainr commented May 7, 2020

amitshahiplooto commented Jan 8, 2021 • edited

romainr commented Jan 8, 2021

amitshahiplooto commented Jan 8, 2021 • edited

bowenli86 commented Dec 6, 2019 •

edited

romainr commented Dec 9, 2019 •

edited

bowenli86 commented Apr 6, 2020 •

edited

romainr commented Apr 11, 2020 •

edited

romainr commented Apr 26, 2020 •

edited

amitshahiplooto commented Jan 8, 2021 •

edited

amitshahiplooto commented Jan 8, 2021 •

edited