Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request tag(per-transaction vs. per-query?) #132

Open
apstndb opened this issue Jun 3, 2022 · 25 comments · May be fixed by #142
Open

Request tag(per-transaction vs. per-query?) #132

apstndb opened this issue Jun 3, 2022 · 25 comments · May be fixed by #142

Comments

@apstndb
Copy link
Collaborator

apstndb commented Jun 3, 2022

Currently, spanner-cli supports per-transaction tag and it is used as the request tag of all queries in the transaction.
#128

Users can be confused by the behavior that multiple queries with the same request tag string are grouped into a single row.

https://cloud.google.com/spanner/docs/introspection/troubleshooting-with-tags?hl=en#things_to_note

When you assign a REQUEST_TAG, statistics for multiple queries that have the same tag string are grouped in a single row in query statistics table. Only the text of one of those queries is shown in the TEXT field.

Possible actions:

a. Add a note to multiple queries with the same tag in the "Transaction Tags and Request Tags" section of the README
b. Add per-request request tag syntax with/without the explicit transaction

Related discussion: #129 (comment) (We are also not familiar with the behavior)

@takabow
Copy link
Contributor

takabow commented Jun 7, 2022

I confirmed that using the same request tag for different SQLs in a single transaction results in all the different SQLs being grouped together on the query stats and causing only a single SQL to be listed.

Let me confirm internally whether this behavior is the expected behavior.

@apstndb
Copy link
Collaborator Author

apstndb commented Jun 7, 2022

I think this behavior is usable when users want to group some logically-equivalent dynamic queries but it surprises users who want to group different queries by the tag loosely.

@takabow
Copy link
Contributor

takabow commented Nov 8, 2022

I apologize for not replying for so long. I have confirmed with the development team that grouping by a request tag is the intended behavior. It was also clarified in the documentation of query statistics.

https://cloud.google.com/spanner/docs/introspection/query-statistics#table_schema

Statistics for multiple queries that have the same tag string are grouped in a single row with the REQUEST_TAG matching that tag string.

The hash of the REQUEST_TAG value if present; Otherwise, the hash of the TEXT value.

@takabow
Copy link
Contributor

takabow commented Nov 8, 2022

I can't come up with a way to handle request tags in spanner-cli, but I have two simple ideas right now.

  1. stop supporting request tags in spanner-cli
  2. request tag = transaction tag + sequential number

Any ideas?

@yfuruyama
Copy link
Collaborator

Considering that users can't put a request tag on individual statement with spanner-cli, I think 2. request tag = transaction tag + sequential number might be a reasonable choice for now.

@apstndb
Copy link
Collaborator Author

apstndb commented Nov 9, 2022

I think 1. stop supporting request tags in spanner-cli is not bad because it won't surprise users
but if users want to filter spanner-cli-originated queries, spanner-cli should set request tags.

IMO, 2. request tag = transaction tag + sequential number is non-obvious behavior so it will need some documentation.

@takabow
Copy link
Contributor

takabow commented Nov 9, 2022

I just came up with an idea, how about request_tag = transaction_tag + SQL ?
The following image is a result of query stats from Query Insights.
In this example, I used tag1 as transaction_tag, and the tag added to request_tag as prefix.
image

if s.InReadWriteTransaction() {
    opts.RequestTag = s.tc.tag + ": " + stmt.SQL
    iter := s.tc.rwTxn.QueryWithOptions(ctx, stmt, opts)

@yfuruyama
Copy link
Collaborator

Request tag has maximum limit of the character length, that is 50 for now, so it's likely that moderate number of SQL queries will be truncated. That could confuse users if they're not aware of that limitation.

@yfuruyama
Copy link
Collaborator

IMO, 2. request tag = transaction tag + sequential number is non-obvious behavior so it will need some documentation.

Yeah this is not obvious, but my opinion is it's better than providing nothing. If we stop populating request tags, users can't correlate the transaction and the queries inside the transaction.

@takabow
Copy link
Contributor

takabow commented Nov 9, 2022

Thanks for pointing this out! As you mentioned, tag with a long sql text were truncated.
image

I am using spanner-cli to examine transactions and would like to correlate each query listed in the query stats with which transaction it was issued from. Unfortunately, the current query insights only shows the SQL text or the request tag. So, it would be useful to embed the transaction tag and each SQL text to request tags, even if only partially.

@yfuruyama
Copy link
Collaborator

Hmm that's a tough situation... but I thought that it might be useful to have a partial SQL text in a request tag rather than putting an unclear suffix like tag-1 or tag-2.

+1 for adding SQL text.

@takabow
Copy link
Contributor

takabow commented Nov 9, 2022

request tag = transaction tag + SQL text (partial)

@apstndb Would you give me your thoughts on this?

@apstndb
Copy link
Collaborator Author

apstndb commented Nov 9, 2022

Unfortunately, the current query insights only shows the SQL text or the request tag.

Because of this, I have always thought that Query Insights is not a good tool for examining the details of a query, and I was more concerned with how it is represented in Query Stats. I am hopeful that Query Insights will be fixed.

My personal experience is that the important part of the query is often not in the first 50 characters, but in the WHERE clause, GROUP BY clause, etc.
Therefore, if the second half of the query is truncated, there is a concern that completely different queries will be grouped together.

@apstndb
Copy link
Collaborator Author

apstndb commented Nov 9, 2022

It is possible to implement an option to switch tag naming.

@takabow
Copy link
Contributor

takabow commented Nov 9, 2022

Therefore, if the second half of the query is truncated, there is a concern that completely different queries will be grouped together.

Right, this is an important point of view.

It is possible to implement an option to switch tag naming.

It's a good idea.

For example, by default, auto tagging of request tags is disabled, and if the flag is enabled,
then reqeust_tag = transaction_tag + sequence + SQL_text

@takabow
Copy link
Contributor

takabow commented Nov 9, 2022

I'm implementing reqeust_tag = transaction_tag + sequence + SQL_text as follows.
image

Is there a delimiter/format you would recommend? tx3-1:, tx3/1:, tx3,1:, tx3-0001, [tx3,1], etc...

@takabow
Copy link
Contributor

takabow commented Nov 9, 2022

It is possible to implement an option to switch tag naming.

Currently, spanner-cli uses BEGIN RW/RO TAG <tag_name> for both transaction tags and request tags.

It is natural to use the short keyword TAG for simplicity. And since Read Only Transactions doesn't support transaction tags, BEGIN RO TAG <tag_name> is used only for request tags.

For Read Write Transactions, it may be natural to use BEGIN RW TAG <tag_name> for both transaction tags and request tags.

So, how about using the keywords TTAG or RTAG as an alternative mode to BEGIN RW/RO TAG <tag_name> to enable only one of the transaction tags or the request tags?

For example, BEGIN RW TTAG <tag_name> only enables transaction tags.

@apstndb
Copy link
Collaborator Author

apstndb commented Nov 9, 2022

Is there a delimiter/format you would recommend? tx3-1:, tx3/1:, tx3,1:, tx3-0001, [tx3,1], etc...

I am thinking about OpenTelemetery receiver for Cloud Spanner.
It collects and converts query stats into other monitoring systems and request_tag is represented as a label value.

  • Cloud Monitoring and Prometheus can use any unicode characters as label values.
  • Datadog only permit [a-zA-Z0-9_:./-] in tag values and other special characters are converted to underscores.
    • For example, tx3,1: and [tx3,1] will be converted to tx3_1_, _tx3_1_.
  • Other monitoring system may have different restriction.
  • Metrics query languages may need some efforts to use special characters.

I don't know limitations of other systems but I think simple format like tx3_1_ is better if we don't implement more flexible feature like customizable tag templates.

@apstndb
Copy link
Collaborator Author

apstndb commented Nov 9, 2022

In Datadog, we will see tag entries like request_tag:tx_3_1_select_1_from_table.

@takabow
Copy link
Contributor

takabow commented Nov 9, 2022

@apstndb Thank you for your suggestions!
I will add mode switching later, but for now, I would like to implement a change in the naming rule for request tags.

@yfuruyama
Copy link
Collaborator

Sorry for the late reply. I realized that request_tag is allowed to have only ASCII characters.

  // A per-request tag which can be applied to queries or reads, used for
  // statistics collection.
  // Both request_tag and transaction_tag can be specified for a read or query
  // that belongs to a transaction.
  // This field is ignored for requests where it's not applicable (e.g.
  // CommitRequest).
  // Legal characters for `request_tag` values are all printable characters
  // (ASCII 32 - 126) and the length of a request_tag is limited to 50
  // characters. Values that exceed this limit are truncated.
  // Any leading underscore (_) characters will be removed from the string.
  string request_tag = 2;

https://github.com/googleapis/googleapis/blob/150b4079b6441ea8b3d7f9a71d0be7bbacbb4e3a/google/spanner/v1/spanner.proto#L466-L476

SQL text could have non-ASCII characters like SELECT * FROM Users WHERE name = "テスト", so maybe it's not appropriate to have SQL text in the request_tag...?

Strange thing is that currently it looks like we can put multi-byte characters in request_tag.

For example, when I run the following statements with #142, it doesn't throw any error.

spanner> BEGIN TAG my-tag;
Query OK, 0 rows affected (0.16 sec)

spanner(rw txn)> SELECT "テスト";
+--------+
|        |
+--------+
| テスト |
+--------+
1 rows in set (1.01 msecs)

spanner(rw txn)> commit;
Query OK, 0 rows affected (0.20 sec)

spanner> select REQUEST_TAG, TEXT from SPANNER_SYS.QUERY_STATS_TOP_MINUTE where REQUEST_TAG LIKE 'my-tag%';
+--------------------------+-----------------+
| REQUEST_TAG              | TEXT            |
+--------------------------+-----------------+
| my-tag_1_SELECT "テスト" | SELECT "テスト" |
+--------------------------+-----------------+
1 rows in set (11.44 msecs)

@takabow
Copy link
Contributor

takabow commented Nov 10, 2022

Yes, I've also found the behavior. Currently, it seems that there are no problems but I'm not sure.

@takabow
Copy link
Contributor

takabow commented Nov 10, 2022

Another idea would be to add a statement type instead of SQL text?
For example, my-tag_1_SELECT, my-tag_2_INSERT, my-tag_3_DELETE, etc

@takabow
Copy link
Contributor

takabow commented Feb 27, 2023

@apstndb @yfuruyama
I'd like to move this discussion forward.

Currently, the same request tags aggregate several queries and DMLs into one record in query stats. This is an intended behavior in Cloud Spanner, but it may not be helpful for spanner-cli users. We talked about various ideas above, how about simply number them sequentially such as my-tag_1, my-tag_2, my-tag_3. If there are no objections, I would like to file a PR with the fix.

@yfuruyama
Copy link
Collaborator

Thanks! I don't have any objections, but I just want to know how it will be changed.

So if user puts BEGIN RW TAG my-tag;, then

  • Transaction tag: my-tag.
  • Request tag: my-tag_1, my-tag_2, ... etc.

And if user puts BEGIN RO TAG my-tag, then

  • Transaction tag: N/A (Read-only transaction doesn't support transaction tags)
  • Request tag: my-tag_1, my-tag_2, ... etc.

Is my understanding correct?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants