Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add Apache Flink connector to Hue #1010

Closed
bowenli86 opened this issue Dec 6, 2019 · 26 comments
Closed

add Apache Flink connector to Hue #1010

bowenli86 opened this issue Dec 6, 2019 · 26 comments
Assignees

Comments

@bowenli86
Copy link

Is the issue already present in https://github.com/cloudera/hue/issues or discussed in the forum https://discourse.gethue.com?

no

What is the Hue version or source? (e.g. open source 4.5, CDH 5.16, CDP 1.0...)

n/a

Is there a way to help reproduce it?

n/a

@romainr
Copy link
Contributor

romainr commented Dec 6, 2019

+1

Probably 2 ways:

  1. Flink adds a dialect for SqlAlchemy and it should work in Hue without major changes
  2. We add/use the Flink Python SQL API in Hue
    https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/table/common.html#query-a-table
    Similarly to KSQL https://github.com/cloudera/hue/blob/master/desktop/libs/notebook/src/notebook/connectors/ksql.py (more work but easy/basic implementation should be quick)

@bowenli86
Copy link
Author

bowenli86 commented Dec 6, 2019

@romainr thanks for the quick response!

I think option 2 is doable now.

For option 1, some questions since I'm not familiar with SqlAlchemy - does it requires Flink to have a jdbc server (Flink doesn't have one yet), or do we just need to add SqlAlchemy dialect to Flink? Flink supports its calcite dialect and Hive dialect, can the Hive dialect part work now?

@romainr
Copy link
Contributor

romainr commented Dec 7, 2019

Option 1: it requires a way to communicate via Python, be it with REST, Thrift... Today, if we want to send a query via a Python program, how would we do? (JDBC is for Java, not Python).

Some examples:

I am not 100% sure but maybe Option 2 https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/table/common.html#create-a-tableenvironment could be used for doing the SqlAlchemy connector until there is a proper REST/Thrift server for Flink SQL. In any case, Option 2 would be easy to try (I can add the skeleton if you want?).

When we have a connector (aka "a way to send queries"), we can help provide the Flink specific syntax highlighting and autocomplete: https://docs.gethue.com/developer/parsers/

On the bonus side too, Hue is going to get proper support for continuously running queries.

@bowenli86
Copy link
Author

Got you. Option 1 probably isn't feasible at this moment since Flink doesn't have such a thrift/rest server yet, though we are looking into it.

I agree option 2 is the way to go before Flink thrift server is available. Would be great if you can add the skeleton and I can help review and test!

@romainr
Copy link
Contributor

romainr commented Dec 9, 2019

Skeleton for Option 2 is in review, as well as the SQL Autocomplete + Highlighter skeletons.

e.g. for the connector: https://github.com/cloudera/hue/blob/testing-romain/desktop/libs/notebook/src/notebook/connectors/flink.py#L120

Note: maybe Option 2 can be ported to SqlAlchemy at some point too, what matters is to have a fair RPC via Python to Flink. At least Option 1 is easy to experiment with.

Code should be in master tomorrow, and I can send a snippet of Hue config on how to activate the connector.

@romainr
Copy link
Contributor

romainr commented Apr 6, 2020

Hey @bowenli86, was there any progress on the connector? Any update when Flink SQL will have a REST API?

@bowenli86
Copy link
Author

bowenli86 commented Apr 6, 2020

Hi @romainr, yes, we are building Flink gateway and it should be available in a month or so https://github.com/ververica/flink-sql-gateway

@KurtYoung @godfreyhe can you follow up with Romain on this? I think it's valuable to 1) test completeness of sql gateway with such a good use case 2) integrate more deeply with Cloudera stack, especially since Cloudera has launched Flink service, something we can even bring in Gyula or Marton as well

@romainr
Copy link
Contributor

romainr commented Apr 6, 2020

Interesting! And would you recommend a certain Docker with a ready to launch Flink with some tables? (e.g. https://github.com/ververica/sql-training... that way it would help do a quick scoping on the Hue side) Then I could just point the gateway to it. Also not sure for FLINK_HOME if using Flink in a docker.

@bowenli86
Copy link
Author

yes, I think https://github.com/ververica/sql-training is a good source, there's also official images from docker hub https://hub.docker.com/_/flink

@KurtYoung
Copy link

@romainr Could you describe your use case a little bit, like will you run both batch & streaming queries or just one of them, what kind of cluster type will you play with, standalone/yarn session/yarn per-job? Will you perform DDL operations via sql gateway or just DQL & DML?

@romainr
Copy link
Contributor

romainr commented Apr 7, 2020

The idea would be to test the DESCRIBE and simple SELECT via the gateway. Then test more types of statements. I don't have a preference for the type of cluster. The idea is to play around to get a scoping of the gateway API / SQL autocomplete work.

For example, I just did a POC for Phoenix https://github.com/cloudera/hue/blob/master/docs/designs/apache_phoenix.md, also did some tests with ksqlDB regular or stream queries. It was easy to poke around with a real system that comes out of the box via a container.

@romainr
Copy link
Contributor

romainr commented Apr 10, 2020

What should be FLINK_HOME for the sql-gateway when using the SQL training docker compose?

@godfreyhe
Copy link

@romainr Currently, there is no Flink SQL Gateway docker image. We need to deploy sql gateway independently in a gateway machine (or local just for testing). And you should also download flink binary package to the same machine. FLINK_HOME is the path of the flink binary location. You should set gateway-address as local ip address in <sql-gateway-dir>/conf/sql-gateway-defaults.yaml. Because the taskmanager (in docker) will send result data to sql gateway (in physical machine) through socket for select query. If you run all components in local (or a single machine), that is the only thing you need to care about, I had tested successfully in my local. If you deploy sql gateway and flink docker cluster in different machines, you need also set jobmanager.rpc.address as real jobmanger address (default is localhost) in <your-flink-binary-dir>/conf/flink-conf.yaml.

@romainr
Copy link
Contributor

romainr commented Apr 11, 2020

I see, the gateway just reuses the libs in FLINK_HOME. Thanks!

I think I got the basis working:

image

@romainr romainr self-assigned this Apr 25, 2020
@romainr
Copy link
Contributor

romainr commented Apr 25, 2020

Sorry again if this is a basic question, but I could still not point the SQL gateway to the SQL training properly:

POST None http://127.0.0.1:8083/v1/sessions/d61a353d8cfa7e7bb5039744bc68511e/statements {"statement": "SHOW tables", "execution_timeout": ""} returned in 5ms 200 163 {"results":[{"columns":[{"name":"tables","type":"VARCHAR(1) NOT NULL"},{"name":"type","type":"VARCHAR(5) NOT NULL"}],"data":[]}],"statement_types":["SHOW_TABLES"]}
SQL Gateway is running, just I don't see any tables, and it behaves the same even if the Flink SQL training docker is not running (I can SHOW DATABASES/FUNCTIONS etc but there is no tables).

Everything on the same machine

flink-1.10.0:
I put the IP address of the Docker job manager for jobmanager.rpc.address in conf/flink-conf.yaml

SQL gateway:
conf/sql-gateway-defaults.yaml has gateway-address:127.0.0.1
./sql-gateway.sh Reading configuration from: file:/home/romain/projects/flink-sql-gateway-0.1-SNAPSHOT/conf/sql-gateway-defaults.yaml Rest endpoint started.

SQL training is working and I can connect via the SQL client
https://github.com/ververica/sql-training/wiki/Setting-up-the-Training-Environment
https://github.com/ververica/sql-training/blob/master/build-image/conf/sql-client-conf.yaml

sql-training$ docker ps
CONTAINER ID        IMAGE                                                COMMAND                  CREATED             STATUS              PORTS                                                NAMES
7777ea00b5d4        fhueske/flink-sql-training:1-FLINK-1.10-scala_2.11   "/docker-entrypoint.…"   5 seconds ago       Up 3 seconds        6123/tcp, 8081/tcp                                   sql-training_sql-client_1
e7d664901220        flink:1.10.0-scala_2.11                              "/docker-entrypoint.…"   6 seconds ago       Up 4 seconds        6121-6123/tcp, 8081/tcp                              sql-training_taskmanager_1
e5d36587a4c8        wurstmeister/kafka:2.12-2.2.1                        "start-kafka.sh"         8 seconds ago       Up 5 seconds        0.0.0.0:9092->9092/tcp                               sql-training_kafka_1
ee4c680f36e5        wurstmeister/zookeeper:3.4.6                         "/bin/sh -c '/usr/sb…"   10 seconds ago      Up 8 seconds        22/tcp, 2888/tcp, 3888/tcp, 0.0.0.0:2181->2181/tcp   sql-training_zookeeper_1
ed4e76577a51        flink:1.10.0-scala_2.11                              "/docker-entrypoint.…"   10 seconds ago      Up 6 seconds        6123/tcp, 0.0.0.0:8081->8081/tcp                     sql-training_jobmanager_1
e996f679fcba        mysql:8.0.19                                         "docker-entrypoint.s…"   10 seconds ago      Up 9 seconds        3306/tcp, 33060/tcp                                  sql-training_mysql_1

Do we also need to backport the configs of https://github.com/ververica/sql-training/blob/master/build-image/conf/sql-client-conf.yaml into the SQL Gateway (but that seems counter intuitive to have it in the REST API)

@godfreyhe
Copy link

Hi @romainr, I think you should backport the tables section and functions section to sql-gateway-defaults.yaml. because Sql gateway will load them when startup.

@romainr
Copy link
Contributor

romainr commented Apr 26, 2020

Thanks!

I am now running the gateway within the sql-training docker client container (and updated the bind addresses of sql-gateway-defaults.yaml to container IP) and added tables and functions and got past the 'Could not create execution context' errors by adding the jars to the gateway:

./sql-gateway.sh -j /opt/flink/lib/flink-table_2.11-1.10.0.jar -j /opt/flink/lib/flink-table-blink_2.11-1.10.0.jar /opt/flink/lib/flink-dist_2.11-1.10.0.jar

Maybe I am not opening up the session properly or pointing to another context somehow but still not seeing any of the pre-defined tables.

http://172.20.0.7:8083/v1/sessions {"session_name": "test", "planner": "blink", "execution_type": "streaming", "properties": {}} returned in 20ms 200 49 {"session_id":"42cb096147dc270f342afc9536641897"}

At least a basic 'SELECT 1, 2, 3' works, so after getting above working it should be demo ready!

image

@godfreyhe
Copy link

hi @romainr if you defined FLINK_HOME, -j /opt/flink/lib/flink-table_2.11-1.10.0.jar -j /opt/flink/lib/flink-table-blink_2.11-1.10.0.jar /opt/flink/lib/flink-dist_2.11-1.10.0.jar is no needed when executing ./sql-gateway.sh. could you provide sql-gateway-defaults.yaml file, I can try it in my local.

@romainr
Copy link
Contributor

romainr commented Apr 27, 2020

Thanks!

I put a write-up and the file here: https://github.com/romainr/flink-sql-gateway/tree/master/docs/demo

It is done in the sql training docker so that it is easy to repro. Basically I soon as I try to add the demo tables/UDF the curl to create a session fails.

Let me know if it is not clear enough.

@romainr
Copy link
Contributor

romainr commented Apr 28, 2020

After above and testing on a live Flink, the initial connector will come in https://issues.cloudera.org/browse/HUE-9280

@romainr
Copy link
Contributor

romainr commented May 2, 2020

j /opt/flink/lib/flink-table_2.11-1.10.0.jar -j /opt/flink/lib/flink-table-blink_2.11-1.10.0.jar /opt/flink/lib/flink-dist_2.11-1.10.0.jar

Nevermind, I forgot the last -j, so the jars don't fix the issue:

Could not find a suitable table factory for 'org.apache.flink.table.factories.TableSourceFactory' in\nthe classpath.\n\nReason: Required context properties mismatch.\n\nThe matching candidates:\norg.apache.flink.table.sources.CsvAppendTableSourceFactory\nMismatched properties:\n'connector.type' expects 'filesystem', but is 'kafka'\n'format.type' expects 'csv', but is 'json'

Maybe it is a 2.11 compatibility issue? Would you have a build of https://github.com/ververica/flink-sql-gateway/tree/flink_1.11_SNAPSHOT ?
(for some reason mvn package fails for me on this branch)

@romainr
Copy link
Contributor

romainr commented May 2, 2020

The correct jars to add are actually from /opt/sql-client/lib

@romainr
Copy link
Contributor

romainr commented May 7, 2020

First version of the connector: https://gethue.com/blog/sql-editor-for-apache-flink-sql/

flink_editor_v1

Still more to do but before production ready but this is a good start for demos/POCs. Next follow-up tasks to do in their own jira.

@romainr romainr closed this as completed May 7, 2020
@amitshahiplooto
Copy link

amitshahiplooto commented Jan 8, 2021

I’m trying to use Hue Editor with Flink SQL installed on Kubernetes. I have my Flink running on POD1 and HUE running on POD2 in the same K8 cluster.

What should be my settings in -

sql-gateway-defaults.yaml

server:
# The address that the gateway binds itself.
bind-address: <what_ip_address_should_this_be>
# The address that should be used by clients to connect to the gateway.
address: <what_ip_address_should_this_be>

hue-conf

[[[flink]]]
name=Flink
interface=flink
options=’{“url”: “http://<what_ip_address_should_this_be>:8083”}’

Is the <what_ip_address_should_this_be> is a cluster IP for Job Manager?

How do I use Hue on Kubernetes with flink?

@romainr
Copy link
Contributor

romainr commented Jan 8, 2021

You could boot into the container https://github.com/romainr/query-demo/tree/master/stream-sql-demo to check but those should be all the same: the hostname of the Sql Gateway API

@amitshahiplooto
Copy link

amitshahiplooto commented Jan 8, 2021

@romainr Not sure how to go about that. How do I do that in Kubernetes? This is the error I see when connecting to defaults sql-gateway settings with Flink-

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants