-
Notifications
You must be signed in to change notification settings - Fork 6.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question about distributed queries #39827
Comments
https://clickhouse.com/docs/en/sql-reference/table-functions/hdfs/ |
Thanks; you might have misunderstood my question. Presto allows several file formats in its hive connector (ORC, Parquet, Avro, ...). Suppose someone has already created, say for concreteness a Parquet table on Presto. And that customer has also created some ClickHouse tables and populated them with data. How does he write a query that combines a ClickHouse table 'ch' with a Presto table 'pr'? Thanks, |
No, I have not. Read https://clickhouse.com/docs/en/engines/table-engines/integrations/hdfs |
Let me explain my scenario. I have a Presto table whose underlying storage is OBS (Huawei's object store). I access OBS using hive connector. So although hive connector is involved, HDFS is not involved. Recall that my original question did not mention HDFS. It said:
I tried the method that you gave a pointer to (thanks!) but it insists on hdfs:// in the URL. Maybe I am missing something? While writing distributed queries, storage issues usually do not (or ideally, should not) pop up. A multi-part naming should capture the essence of a remote table. For example, "server.database.schema.table" or because people using inconsistently named concepts, "server.catalog.schema.table", etc. I believe this is not possible in ClickHouse. I think I can still use the S3 table engine though. |
I have a couple of questions but because they are different from each other, I will put them in
separate issues to enable ease-of-tracking for future.
The first question is about writing distributed queries. In ClickHouse, can I write a distributed
query in which a subset of table(s) reside on an external dbms, say Presto or Trino?
Checking the documentation, I see the distributed query engine:
https://clickhouse.com/docs/en/engines/table-engines/special/distributed
However, this does not seem to fit the bill, unless someone can provide concrete examples.
Can anybody suggest a way to go about doing this? What I have mind is something like this (sketch):
SELECT ....
FROM clickhouse_table1, presto_table2, presto_table3
WHERE ...
Thanks,
Arun
The text was updated successfully, but these errors were encountered: