Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make normalized non-utf-8 labels unique #24

Closed
wants to merge 1 commit into from

Conversation

tivvit
Copy link
Contributor

@tivvit tivvit commented Jun 12, 2020

Normalized label names may not be unique which causes problems. I have added original string hex to make them unique and consistent. Connected to alicebob/asprom#44

@tivvit
Copy link
Contributor Author

tivvit commented Jun 18, 2020

Hi @khaf what do you think about this change?

@khaf
Copy link
Contributor

khaf commented Jun 18, 2020

I'm not sure about this. Fixing an invalid UTF8 name is one thing, changing it via appending another string at the end is another. The original fix was already for a very niche situation, what does this fix address? Do you have examples of the problem to demonstrate the issue?

@tivvit
Copy link
Contributor Author

tivvit commented Jun 18, 2020

I agree that it is a very strong edge-case. Sadly we hit that.

Yes, I do have an example. The problem we encountered is that after originally proposed utf-8 normalization is that multiple set labels were normalized to the set name. This fix overcomes this problem.

Example log from the exporter:

err: error gathering metrics: 4 error(s) occurred:
* collected metric "aerospike_set_memory_data_bytes" { label:<name:"namespace" value:"nc" > label:<name:"set" value:"\357\277\275 65533655336553365533\177" > gauge:<value:0 > } was collected before with the same name and label values

@khaf
Copy link
Contributor

khaf commented Jun 18, 2020

Could you kindly share those set names? How did invalid utf8 strings end up as your set names? What part of Aerospike toolchain was responsible for allowing it?

@tivvit
Copy link
Contributor Author

tivvit commented Jun 18, 2020

It was possible to create sets with binary data via C client.

@tivvit
Copy link
Contributor Author

tivvit commented Jun 18, 2020

It is possible for me to get the set names but it is probably simpler to imagine the situation

  • \xC0
  • \xC1
    both set names are "valid" for AS but they are invalid utf-8, therefore they are both translated to � which causes the previously mentioned error

@khaf
Copy link
Contributor

khaf commented Jun 18, 2020

Can't we do this differently? As in show the invalid characters inline and escaped? Something Like:

[]byte{set\xc0} -> "set\xc0"

@tivvit
Copy link
Contributor Author

tivvit commented Jun 18, 2020

I display the escaped value and hex of the original data (both only if it is invalid utf-8).

It may also be only the hex - I think it is sufficient

Your proposal is also nice - how would you encode it to make sure that it won't be interpreted by the prometheus?

@tivvit
Copy link
Contributor Author

tivvit commented Jul 15, 2020

@khaf what do you think about this. I know it is not a big issue. I just think it is a better solution than the first one.

@khaf
Copy link
Contributor

khaf commented Jul 16, 2020

Prometheus is written in Go, so I expect it to be fully utf8 compatible.
I'm still not clear as to why you need to add the hex string to the end. Escaping the UTF8 code point should be enough, otherwise it would be treated as 2 different sets on the server.
In this case, I don't think transforming the original value by adding a suffix is a good idea. Historically this problem has been taken care of by escaping. If that is not sufficient, I need to understand why.
We treat all issues as important, and I will dedicate bandwidth to this issue as any other, but I need to ensure the problem is well understood and the solution sufficiently engineered.

@arrowplum
Copy link
Contributor

arrowplum commented Jul 17, 2020

Regardless of any of this set names have restrictions (some implicit unfortunately). The next version of the server will have a much more strict set of rules for naming and will enforce them. It will be something you can turn off if you want to live dangerous but by default only namespace names, set names, and bin names that conform to a simplified naming scheme will be supported.

@spkesan
Copy link
Contributor

spkesan commented Jul 23, 2020

Hi @tivvit

Having a set name with binary data is not something that we recommend (although you were able to create them). Aerospike tools and monitoring could break with such invalid set names.
As @arrowplum mentioned, the future aerospike server versions will have strict rules for naming namespaces, sets, bins etc.

Also, I think this change wouldn't actually fix your problem metric ... was collected before with the same name and label values as the sanitizeUTF8() is applied over the complete info response and not per individual metric or label in the info response string. So appending hex encoded version of the input will not help in this case. Did you also test this change?

The current UTF-8 normalization would still be useful to protect against accidental non-UTF-8 values appearing in the info response (which is again very less probable).

Let me know what you think. I would suggest we can close this PR.

@tivvit
Copy link
Contributor Author

tivvit commented Jul 24, 2020

@spkesan I am actually using alicebob/asprom#44. I just wanted to handle this very special case in this "official exporter" too.

The code is tested and it works as expected for me.

I am ok with closing this PR. If the rules for sets will be more strict this won't be an issue anymore. I wanted to inform the community about this possible problem and I think that handling this directly in the Aerospike server is the best solution.

@spkesan
Copy link
Contributor

spkesan commented Jul 26, 2020

Thanks @tivvit
Closing this PR..

@spkesan spkesan closed this Jul 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants