Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No data about SELECT queries in Superset in ClickHouse table system.processes #28007

Open
3 tasks done
berlicon opened this issue Apr 12, 2024 · 6 comments
Open
3 tasks done
Labels
data:connect:clickhouse Related to Clickhouse

Comments

@berlicon
Copy link

berlicon commented Apr 12, 2024

Bug description

In ClickHouse there is a table system.processes which contains info about all queries running right now. And we can kill any too long query.
See:
https://clickhouse.com/docs/en/operations/system-tables/processes
https://clickhouse.com/docs/en/sql-reference/statements/kill#kill-query
https://clickhouse.com/docs/en/operations/system-tables/query_log

This table works great. I can run any long (>5sec) query any type (select, insert, update, delete, alter) then I can see that query in system.processes and kill that query by query_id. I checked this functionality in DBeaver. Also I tested that in Superset / SQLLab. All queries running in SQLLab I can see in system.processes except of SELECT queries. I want to see all queries from Superset, not only ins/upd/del but also SELECT queries.

How to reproduce the bug

  1. Connect ClickHouse DB to Superset.
  2. Open DBeaver and on current ClickHouse DB create table with a lot of data. Execute SQL code below:
    create table eso.t2(v String) ENGINE = MergeTree() order by v
    insert into eso.t2(v) values(generateUUIDv4())
    insert into eso.t2(v) SELECT v from eso.t2 -- repeat that row 20+ times to have 1m+ rows in table eso.t2
  3. Open Superset/SQLLab, select UI LIMIT: 1m+ and run long query:
    SELECT v, generateUUIDv4() as uid from eso.t2 limit 500000
  4. In DBeaver on current ClickHouse DB run query:
    SELECT query, * FROM system.processes where query not like '%processes%'
  5. There are no rows returned by query. This is the BUG. Exprected result: returned one row from system.processes with query from point 3.

Success case: I run in Superset query: "insert into eso.t2(v) SELECT v from eso.t2" and can see that query in system.processes.

Screenshots/recordings

No response

Superset version

master / latest-dev

Python version

3.9

Node version

16

Browser

Chrome

Additional context

No response

Checklist

  • I have searched Superset docs and Slack and didn't find a solution to my problem.
  • I have searched the GitHub issue tracker and didn't find a similar bug report.
  • I have checked Superset's logs for errors and if I found a relevant Python stacktrace, I included it here as text in the "additional context" section.
@rusackas rusackas added the data:connect:clickhouse Related to Clickhouse label Apr 12, 2024
@berlicon
Copy link
Author

I received workaround for this issue from @den-crane - ClickHouse/ClickHouse#62604 (comment)

TLDR: add sleep(1) to SELECT query. So, instead this "SELECT * FROM t" I should write this: "SELECT sleep(1),* FROM t"

@betodealmeida
Copy link
Member

I don't think this is something we can fix in Superset, it's not doing anything special with Select queries.

@berlicon
Copy link
Author

OK, I fixed it with workaround:

  1. in superset/sql_lab.py add this:
    import re -- row №25

sql = add_sleep_to_selects_in_sql(sql) -- row №264 in execute_sql_statement before 'try: query.executed_sql = sql'

def add_sleep_to_selects_in_sql(
    sql: str
) -> str:
    if (    (re.search(r"\bINSERT\b", sql, flags=re.IGNORECASE) == None) 
        and (re.search(r"\bUPDATE\b", sql, flags=re.IGNORECASE) == None) 
        and (re.search(r"\bDELETE\b", sql, flags=re.IGNORECASE) == None)):
        return re.sub(r"\bSELECT\b", "SELECT sleep(1), ", sql, flags=re.IGNORECASE)

    return sql
-- row №324
  1. in superset/result_set.py add this:
            if "sleep(1)" in column_names:
                column_names.remove("sleep(1)")
                for i, s in enumerate(data):
                    data[i] = s[1:]

-- row №123 in class SupersetResultSet before comment '# fix cursor descriptor with the deduped names'

@rusackas
Copy link
Member

If we have a workaround and can't fix this in Superset, we should probably close it. Before I/we do, would anyone want to open a Documentation PR (or at least suggestion) on this page to help others?

@berlicon
Copy link
Author

berlicon commented Apr 18, 2024

@rusackas - I think ticket could be closed. The problem will not be fixed on Superset (#28007 (comment)) neither on ClickHouse (ClickHouse/ClickHouse#62604 (comment)) side. Fortunately we have a workaround.

About Documentation PR... Is this suggestion to me? I think this info is useless on this page. If anyone will have this problem he/she will googling and find this post.

@rusackas
Copy link
Member

Right @berlicon - I think you (or anyone who understands this issue) would be helping others by making a change on that page, if it's "useless". The more we document rather than have people go fishing through issues, the better.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data:connect:clickhouse Related to Clickhouse
Projects
None yet
Development

No branches or pull requests

3 participants