Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(sql lab): display presto and trino tracking url #20799

Merged
merged 3 commits into from
Jul 27, 2022

Conversation

ktmud
Copy link
Member

@ktmud ktmud commented Jul 21, 2022

SUMMARY

Enable tracking URL for Presto and Trino (previously it is only available for Hive queries) and also change TRACKING_URL_TRANSFORMER to run at runtime (as opposed to when queries are fetched) so that we can display different tracking URL based on query age (most tracking URLs will expire after a certain amount of time---in Trino/Preso, this is configured by query.min-expire-age.

Had to update query execution logics to add end time for failed executions, too.

While debugging, I also realized the TIMED_OUT status isn't actually in use---and now the frontend also cannot handle that status. We should probably add it back, but let's save it for another PR.

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

When queries are still running:

image

We will now display the tracking url even for failed jobs, this is useful for users to debug their failed queries:

Xnip2022-07-20_14-46-55

TESTING INSTRUCTIONS

  1. Connect to a Presto or Trino dataset and run some long-running or erroneous queries
  2. Optionally, update Superset config to test customize TRACKING_URL_TRANSFORMER. For example, following setting will only display tracking urls for queries finished within the last 5 minutes:
     def TRACKING_URL_TRANSFORMER(url: str, query: "Query") -> Optional[str]:
         if not query.end_time or (
             now_as_float() - float(query.end_time)
             < timedelta(minutes=5).total_seconds() * 1000
         ):
             return re.sub(
                 r"^https?://([^/]+)/", "https://trino.datalake.example.org/", url
             )
         return None

ADDITIONAL INFORMATION

  • Has associated issue:
  • Required feature flags:
  • Changes UI
  • Includes DB Migration (follow approval process in SIP-59)
    • Migration is atomic, supports rollback & is backwards-compatible
    • Confirm DB migration upgrade and downgrade tested
    • Runtime estimates and downtime expectations provided
  • Introduces new feature or API
  • Removes existing feature or API

@codecov
Copy link

codecov bot commented Jul 21, 2022

Codecov Report

Merging #20799 (ee37fbc) into master (1b577d1) will increase coverage by 0.01%.
The diff coverage is 55.88%.

@@            Coverage Diff             @@
##           master   #20799      +/-   ##
==========================================
+ Coverage   66.23%   66.24%   +0.01%     
==========================================
  Files        1757     1757              
  Lines       66978    67031      +53     
  Branches     7117     7118       +1     
==========================================
+ Hits        44360    44408      +48     
- Misses      20801    20806       +5     
  Partials     1817     1817              
Flag Coverage Δ
hive 53.22% <40.62%> (-0.03%) ⬇️
javascript 51.91% <0.00%> (-0.01%) ⬇️
mysql 81.00% <39.06%> (-0.08%) ⬇️
postgres 81.07% <39.06%> (-0.08%) ⬇️
presto 53.12% <57.81%> (+<0.01%) ⬆️
python 81.55% <59.37%> (+0.01%) ⬆️
sqlite 79.68% <37.50%> (?)
unit 50.23% <32.81%> (-0.04%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...frontend/src/SqlLab/components/ResultSet/index.tsx 51.38% <0.00%> (-0.36%) ⬇️
superset-frontend/src/components/Button/index.tsx 100.00% <ø> (ø)
superset/sqllab/exceptions.py 67.39% <0.00%> (ø)
superset/utils/dates.py 100.00% <ø> (ø)
superset/views/core.py 75.58% <0.00%> (+0.54%) ⬆️
superset/db_engine_specs/trino.py 73.40% <15.00%> (-15.93%) ⬇️
superset/db_engine_specs/presto.py 87.97% <58.82%> (-1.06%) ⬇️
superset/config.py 91.47% <100.00%> (ø)
superset/db_engine_specs/hive.py 86.87% <100.00%> (+0.99%) ⬆️
superset/models/sql_lab.py 78.90% <100.00%> (+1.52%) ⬆️
... and 8 more

Help us with your feedback. Take ten seconds to tell us how you rate us.

@ktmud ktmud force-pushed the presto-tracking-url branch 2 times, most recently from 6068fee to 6f74d4a Compare July 21, 2022 01:15
UPDATING.md Outdated Show resolved Hide resolved
superset/config.py Show resolved Hide resolved
superset/db_engine_specs/presto.py Outdated Show resolved Hide resolved
superset/db_engine_specs/presto.py Outdated Show resolved Hide resolved
superset/db_engine_specs/presto.py Outdated Show resolved Hide resolved
superset/db_engine_specs/presto.py Show resolved Hide resolved
superset/db_engine_specs/trino.py Outdated Show resolved Hide resolved
superset/db_engine_specs/trino.py Outdated Show resolved Hide resolved
@@ -279,6 +284,27 @@ def default_endpoint(self) -> str:
def get_extra_cache_keys(query_obj: Dict[str, Any]) -> List[str]:
return []

@property
def tracking_url(self) -> Optional[str]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this just be a method called get_tracking_url and then you wouldn't need to rename the column virtually or have a getter/setter.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want the FAB-based API to always return the transformed url but I couldn't find a clean way to do it without adding an (unnecessary) new field.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dpgaspar @betodealmeida do you have any recommendation on what's the best practice for such use case?

superset/sql_lab.py Show resolved Hide resolved
def handle_cursor(cls, cursor: Any, query: Query, session: Session) -> None:
def get_tracking_url(cls, cursor: "Cursor") -> Optional[str]:
try:
return cursor.info_uri
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

info_uri is only available in the latest version of the Trino client, which we haven't upgrade to yet.

superset/db_engine_specs/presto.py Outdated Show resolved Hide resolved
@@ -279,6 +284,27 @@ def default_endpoint(self) -> str:
def get_extra_cache_keys(query_obj: Dict[str, Any]) -> List[str]:
return []

@property
def tracking_url(self) -> Optional[str]:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want the FAB-based API to always return the transformed url but I couldn't find a clean way to do it without adding an (unnecessary) new field.

superset/sql_lab.py Show resolved Hide resolved
@ktmud ktmud force-pushed the presto-tracking-url branch 3 times, most recently from afe9f3e to 07f3687 Compare July 22, 2022 20:38
@@ -25,7 +25,7 @@
MSG_FORMAT = "Failed to execute {}"

if TYPE_CHECKING:
from superset.utils.sqllab_execution_context import SqlJsonExecutionContext
from superset.sqllab.sqllab_execution_context import SqlJsonExecutionContext
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bycatch: wrong import

@ktmud
Copy link
Member Author

ktmud commented Jul 22, 2022

@john-bodley I addressed your comments and added some integration tests. Can you take another look?

# pylint: disable=protected-access, line-too-long
return f"{cursor._protocol}://{cursor._host}:{cursor._port}/ui/query.html?{cursor.last_query_id}"
except AttributeError:
pass
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pass
return None

Copy link
Member Author

@ktmud ktmud Jul 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Final return is expected to be explicit if you have a return statement earlier. I think there is a pylint or mypy warning for this.

return f"{cursor._protocol}://{cursor._host}:{cursor._port}/ui/query.html?{cursor.last_query_id}"
except AttributeError:
pass
return None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return None

# pylint: disable=protected-access, line-too-long
return f"{conn.http_scheme}://{conn.host}:{conn.port}/ui/query.html?{cursor._query.query_id}"
except AttributeError:
pass
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See previous comment.

if tracking_url:
job_id = tracking_url.split("/")[-2]
logger.info(
"Query %s: Found the tracking url: %s",
str(query_id),
tracking_url,
)
transformer = current_app.config["TRACKING_URL_TRANSFORMER"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ktmud per you PR description this logic has changed. Can we confirmed that the previous behavior is preserved? I just want to confirm this isn't a breaking change.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ktmud ktmud requested a review from john-bodley July 26, 2022 17:43
).start_time
# Test search queries on time filter
from_time = "from={}".format(int(first_query_time))
to_time = "to={}".format(int(second_query_time))
Copy link
Member Author

@ktmud ktmud Jul 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bycatch: this test is flaky. [QUERY_1, QUERY_2, QUERY_3] are executed consequently, with supposedly increasing start_time.

Since the original code int(...) basically rounded down the float epoch, the actual query results returned [QUERY_1, QUERY_2]---even though we filtered second_query_time by QUERY_3. But this still sometimes fail because the rounded down time can be anywhere between QUERY_1 start time and QUERY_3 start time, so if QUERY_1 and QUERY_2 are executed within the same second, we wouldn't have QUERY_2 in the filtered results. E.g. when the start time for the queries are 1.1, 1.5, 2---we'd be filtered to between [1, 1].

resp = self.get_resp("/superset/search_queries?" + "&".join(params))
data = json.loads(resp)
self.assertEqual(2, len(data))
# pylint: disable=line-too-long
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add this at the end of line #420, otherwise I think it disables the error for the entire block.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reformatted it so pylint disable isn't necessary.

@ktmud ktmud merged commit 77db065 into apache:master Jul 27, 2022
@john-bodley john-bodley deleted the presto-tracking-url branch February 17, 2023 22:16
@mistercrunch mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 2.1.0 labels Mar 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels size/L 🚢 2.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants