Skip to content

make normalized non-utf-8 labels unique#24

Closed
tivvit wants to merge 1 commit into
aerospike:masterfrom
tivvit:unique-non-utf-8-labels
Closed

make normalized non-utf-8 labels unique#24
tivvit wants to merge 1 commit into
aerospike:masterfrom
tivvit:unique-non-utf-8-labels

Conversation

@tivvit

@tivvit tivvit commented Jun 12, 2020

Copy link
Copy Markdown
Contributor

Normalized label names may not be unique which causes problems. I have added original string hex to make them unique and consistent. Connected to alicebob/asprom#44

@tivvit

tivvit commented Jun 18, 2020

Copy link
Copy Markdown
Contributor Author

Hi @khaf what do you think about this change?

@khaf

khaf commented Jun 18, 2020

Copy link
Copy Markdown
Contributor

I'm not sure about this. Fixing an invalid UTF8 name is one thing, changing it via appending another string at the end is another. The original fix was already for a very niche situation, what does this fix address? Do you have examples of the problem to demonstrate the issue?

@tivvit

tivvit commented Jun 18, 2020

Copy link
Copy Markdown
Contributor Author

I agree that it is a very strong edge-case. Sadly we hit that.

Yes, I do have an example. The problem we encountered is that after originally proposed utf-8 normalization is that multiple set labels were normalized to the set name. This fix overcomes this problem.

Example log from the exporter:

err: error gathering metrics: 4 error(s) occurred:
* collected metric "aerospike_set_memory_data_bytes" { label:<name:"namespace" value:"nc" > label:<name:"set" value:"\357\277\275 65533655336553365533\177" > gauge:<value:0 > } was collected before with the same name and label values

@khaf

khaf commented Jun 18, 2020

Copy link
Copy Markdown
Contributor

Could you kindly share those set names? How did invalid utf8 strings end up as your set names? What part of Aerospike toolchain was responsible for allowing it?

@tivvit

tivvit commented Jun 18, 2020

Copy link
Copy Markdown
Contributor Author

It was possible to create sets with binary data via C client.

@tivvit

tivvit commented Jun 18, 2020

Copy link
Copy Markdown
Contributor Author

It is possible for me to get the set names but it is probably simpler to imagine the situation

  • \xC0
  • \xC1
    both set names are "valid" for AS but they are invalid utf-8, therefore they are both translated to � which causes the previously mentioned error

@khaf

khaf commented Jun 18, 2020

Copy link
Copy Markdown
Contributor

Can't we do this differently? As in show the invalid characters inline and escaped? Something Like:

[]byte{set\xc0} -> "set\xc0"

@tivvit

tivvit commented Jun 18, 2020

Copy link
Copy Markdown
Contributor Author

I display the escaped value and hex of the original data (both only if it is invalid utf-8).

It may also be only the hex - I think it is sufficient

Your proposal is also nice - how would you encode it to make sure that it won't be interpreted by the prometheus?

@tivvit

tivvit commented Jul 15, 2020

Copy link
Copy Markdown
Contributor Author

@khaf what do you think about this. I know it is not a big issue. I just think it is a better solution than the first one.

@khaf

khaf commented Jul 16, 2020

Copy link
Copy Markdown
Contributor

Prometheus is written in Go, so I expect it to be fully utf8 compatible.
I'm still not clear as to why you need to add the hex string to the end. Escaping the UTF8 code point should be enough, otherwise it would be treated as 2 different sets on the server.
In this case, I don't think transforming the original value by adding a suffix is a good idea. Historically this problem has been taken care of by escaping. If that is not sufficient, I need to understand why.
We treat all issues as important, and I will dedicate bandwidth to this issue as any other, but I need to ensure the problem is well understood and the solution sufficiently engineered.

@arrowplum

arrowplum commented Jul 17, 2020

Copy link
Copy Markdown
Contributor

Regardless of any of this set names have restrictions (some implicit unfortunately). The next version of the server will have a much more strict set of rules for naming and will enforce them. It will be something you can turn off if you want to live dangerous but by default only namespace names, set names, and bin names that conform to a simplified naming scheme will be supported.

@spkesan

spkesan commented Jul 23, 2020

Copy link
Copy Markdown
Contributor

Hi @tivvit

Having a set name with binary data is not something that we recommend (although you were able to create them). Aerospike tools and monitoring could break with such invalid set names.
As @arrowplum mentioned, the future aerospike server versions will have strict rules for naming namespaces, sets, bins etc.

Also, I think this change wouldn't actually fix your problem metric ... was collected before with the same name and label values as the sanitizeUTF8() is applied over the complete info response and not per individual metric or label in the info response string. So appending hex encoded version of the input will not help in this case. Did you also test this change?

The current UTF-8 normalization would still be useful to protect against accidental non-UTF-8 values appearing in the info response (which is again very less probable).

Let me know what you think. I would suggest we can close this PR.

@tivvit

tivvit commented Jul 24, 2020

Copy link
Copy Markdown
Contributor Author

@spkesan I am actually using alicebob/asprom#44. I just wanted to handle this very special case in this "official exporter" too.

The code is tested and it works as expected for me.

I am ok with closing this PR. If the rules for sets will be more strict this won't be an issue anymore. I wanted to inform the community about this possible problem and I think that handling this directly in the Aerospike server is the best solution.

@spkesan

spkesan commented Jul 26, 2020

Copy link
Copy Markdown
Contributor

Thanks @tivvit
Closing this PR..

@spkesan spkesan closed this Jul 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants