Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use uuid_in tracking ID. #25

Closed
hakuturu583 opened this issue Aug 5, 2019 · 41 comments
Closed

use uuid_in tracking ID. #25

hakuturu583 opened this issue Aug 5, 2019 · 41 comments

Comments

@hakuturu583
Copy link
Contributor

In order to discuss more in open Isses.
I made this Issue.

I add tracking field in Detection2D.
I understand the human readable, but I strongly disagree with use string in tracking ID field.

I have oppositions about these issues.

@hakuturu583
Copy link
Contributor Author

  1. compare between String and UUID.
String:

Human readable
Also used in movit_msgs (e.g. CollisionObject) and all messages in of object_recognition_msgs
Can stand for itself without a database, e.g. just "apple"
Can put UUIDs, URLs (linked data, to wikipedia, gazebo), or Wikidata IDs (Q812880 for cube) into it
larger than Int (but the images are much larger, so it is negligible)
slower to (de)serialize, because of length field (but object detection is much slower)
no dependency on message of UUID
meaning not described by type. Could be a URL, a string of a UUID etc. (but the is can also be a benefit. Applications have to decide what it means).
UUID, Int:

not human readable
needs database lookup to get meaning
smaller (but see above)
faster to deserialize (but see above)
Type UUID exactly describes the meaning.

It is unnecessary to read UUID by human.
It is solved by visualization.
I am developing visualizer for this message.
https://github.com/OUXT-Polaris/vision_msgs_visualization.

@hakuturu583
Copy link
Contributor Author

  1. I disagree with flexible description.

@Kukanani said that string is flexible.
I agree with you, but flexible description causes problem in Open-Source.
If you use string in trackin_id field, users and developers can use these fields free.
It problem when integrate these modules using ROS.

@hakuturu583
Copy link
Contributor Author

If you use UUID in tracking_id, vision_msgs users process data from multiple trackers very very easily.
But, if you use just string, we have to do it manually.

@hakuturu583
Copy link
Contributor Author

So, I think we should use UUID in tracking field.

@mistermult
Copy link
Contributor

We should not use UUID. A string has the benefits already described in my last comparison. The most compelling reason for UUID is that it uses less space in the msg and it is faster to deserialise because it is a number. Another benefit is, that UUID are almost guaranteed to be different when generated on different hosts without coordination.
On the other hand, if the node wants to use UUID it can just put the UUID as a string in it. Other nodes that do not want to do it, can use other strings. A node that uses stringified UUIDs is then (almost) guaranteed to use unique IDs, too. Moreover, generating UUIDs is not a lightweight operation, because it generates a random number (very slow), asks the system for device properties like MAC adress (very slow system call). This slows down nodes that publish at high frequency and should not be enforced. See other benefits in my comparison cited in #25 (comment).

@hakuturu583
Copy link
Contributor Author

hakuturu583 commented Aug 5, 2019

@mistermult I think the benefit of the vision_msgs is it enables to use all nodes using vision_msgs each other.
In terms of this view, I think string is "Too much flexible".
If we allow someone use UUID in tracking_id field and other use UUID, it cause a lot of problem.
String harms this very very big benefit.

@hakuturu583
Copy link
Contributor Author

If we use vision_msgs in multiple sensor systems, we use multiple tracking nodes.
If we use string in trackng ID field, we have to sync tracking nodes by manual.
However, if we use UUID in tracking ID, we are not necessary to sync them.

@mistermult
Copy link
Contributor

mistermult commented Aug 5, 2019

Sure, tracking_id of type string does not guarantee that nodes make unique IDs. However, some nodes might want to guarantee uniqueness in other (faster) ways, e.g. by using an internal counter. Other nodes, might want to reference a global database, which is not possible with UUIDs.
If one wants to make sure that the tracking_id is unique, then UUID is a good idea. But making a string out of a UUID, solves the same problem. So we might add the comment, that stringified UUIDs are a good choice for tracking_id.

@hakuturu583 Please explain: Why is it not possible to make strings out of UUIDs in your application? What are the drawbacks of this solution.

@hakuturu583
Copy link
Contributor Author

@mistermult If we use string in tracking_id field, someone use this field as UUID, other people use this field as other format.
It causes problem when we use integrate them.
It harms usability of vision_msgs.

@hakuturu583
Copy link
Contributor Author

I am now making a benchmark code of UUID generation.
I will post this result soon.

@gavanderhoorn
Copy link

I'm just an observer here, but if I understand @mistermult correctly, he suggests that even though ID is a string, it could still perform the same role as a UUID field: users that would like to set that field to a UUID can do that (ie: UUID -> string). Users that would like to use something else (for whatever reason) would use the string representation of that.

A field typed to be a uuid cannot contain arbitrary strings.

re: uniqueness: it is true that UUID guarantee uniqueness, which is something that cannot be guaranteed necessarily with arbitrary strings. Using the node name as a prefix/ns for each ID coming out of a specific node could already help though (but just making this up right now).

In the end all that is needed is some way of being able to compare ID fields, which should be possible with both UUIDs and strings.

@hakuturu583
Copy link
Contributor Author

I understand string can perform UUID.
But, string representation allows users to use this field in their own way.
I really afraid of it.

@hakuturu583
Copy link
Contributor Author

prefix and namespace is good idea, but it cannot prevent this risk completely.

@hakuturu583
Copy link
Contributor Author

I write test code like below.

#include <uuid/uuid.h>
#include <iostream>
#include <vector>
#include <array>
#include<chrono>

#define COUNT 10000

int main()
{
    std::cout << "start calculate UUID" << std::endl;
    std::array<double,COUNT> ticks;
    std::array<std::string,COUNT> values;
    for(int i=0; i<COUNT; i++)
    {
        std::chrono::system_clock::time_point start,end;
        start = std::chrono::system_clock::now();
        uuid_t value;
        uuid_generate(value);
        end = std::chrono::system_clock::now();
        double elapsed = std::chrono::duration_cast< std::chrono::microseconds>(end - start).count();
        ticks[i] = elapsed;
        uuid_string_t str;
        uuid_unparse_upper(value, str);
        values[i] = str;
    }
    double total_time = 0;
    for(int i=0; i<COUNT; i++)
    {
        std::cout << "UUID : " << values[i] << std::endl;
        //std::cout << "time to generation : " << ticks[i] << std::endl;
        total_time = total_time + ticks[i];
    }
    std::cout << "Total Time : " << total_time << " (micro seconds)" << std::endl;
    std::cout << "Average Time : " << total_time/COUNT << " (micro seconds)" << std::endl;
    return 0;
}

@gavanderhoorn
Copy link

But, string representation allows users to use this field in their own way.

can you clarify which "own way" would be a problem in your opinion?

Any string could be used as an ID.

What we lose with string instead of UUID is the typ-safety.

@hakuturu583
Copy link
Contributor Author

hakuturu583 commented Aug 5, 2019

In my mac book air, I can generate single UUID in 0.47 micro seconds.
This is much faster than sensor rate such as camera, lidars, tof cameras etc..
I think it is no problem to use UUID in this field in technical field of view.

@hakuturu583
Copy link
Contributor Author

can you clarify which "own way" would be a problem in your opinion?
Any string could be used as an ID.
What we lose with string instead of UUID is the typ-safety.

If we allow to use String in tracking ID field, each tracking algorithum developer use this field freely.
For example,

Jhon "I set UUID to the trackin_id field"
Tom "I set integer value which start 0"
Emily "I set integer value which start 1"
Bob "I set random number to the trackin_id field"
Eric "I set object name to the tracking_id field"

It causes problem when we want to integrate them and build complex sensor fusion system.

@hakuturu583
Copy link
Contributor Author

If we use uuid_msg in tracking_id field, developer can know the field should be represented by UUID clearly.
So, everyone follow this method and we can integrate all of them and build complex sensor fusion pipeline easily.

@mistermult
Copy link
Contributor

@mistermult If we use string in tracking_id field, someone use this field as UUID, other people use this field as other format.
It causes problem when we use integrate them.
It harms usability of vision_msgs.

There are two requirements on tracking_id's:

  • There is a operation to compare it for equality isEqual (compare if two IDs are the same): This operations is supported by UUIDs or strings.
  • The tracking_id has to be unique for each 'track'. That means that different nodes should use different IDs if they mean different things.
    That means, the string "1212-1212566-121212" (converted UUID) is a valid as "RedBox98203".

It is true that setting the type to UUID makes it likely that each developer generates a fresh UUID, so no collisions occur. If we allow strings, a programmer that converts UUIDs to strings is still sure that he uses fresh unique IDs because other nodes that do not UUIDs are very unlikely to use the exact format of UUIds (format something like: ####-######-#####).
However, there are also other possibilities, e.g. linking to a external database (Wikidata), using a counter (possible synchronized to other nodes), counter + node_id (thanks @gavanderhoorn).
Using a string is more flexible (as described above) and allows to use human-readable strings (RedBox12, BlueBox34, ...).

So it's a matter of taste: If we want to encode everything in the type system, we should use UUIDs. If we want more flexibility, use strings. I prefer the latter for the aforementioned reasons. I have some modules in python without any types and my system does not crash. Even in C++ not all constraints are modeled in the type system.

@mintar
Copy link
Contributor

mintar commented Aug 5, 2019

I understand @hakuturu583 's point: If you have multiple distributed trackers, and you want to aggregate all the different tracks from different topics, it is nice to have UUIDs that are guaranteed to be unique across all topics.

However, I still prefer strings for all the reasons discussed before (here and in #18). I believe an aggregating node should expect IDs to be unique per topic; if it aggregates multiple topics, it should prefix the IDs with the topic name (or something along those lines).

@gavanderhoorn
Copy link

@hakuturu583 wrote:

If we allow to use String in tracking ID field, each tracking algorithum developer use this field freely.

I understand what you're saying (and I expected this rationale), but if the ID field's task is to provide a unique identifier, it wouldn't seem to matter what it is set to exactly, as long as it is consistent and unique. Any string could be made unique.

Consumers of msgs could treat the ID field as an opaque blob/cookie/sequence-of-bytes: they could still be unique, be used as an identifier but would no longer need to be a UUID necessarily.

Would it matter whether id==7c798d42-67.. or id=="some_string_i_just_made_up"?

Would there be problems if I keep using "some_string_i_just_made_up" as the ID of the same object? It would seem to be the same as a UUID, no?

@hakuturu583 wrote:

It causes problem when we want to integrate them and build complex sensor fusion system.

Do you expect developers/msg consumers to introspect the ID field, instead of just using it as an opaque identifier?


Note: I'm not a stakeholder here, just an observer.

I typically prefer type-safety over convenience (so UUIDs over plain strings), but in this case I believe the loss of a UUID typed field does not lead to the problems described (if we consider an ID field as an opaque cookie/blob/sequence-of-bytes).

If we would use UUIDs to encode class then I would see a need for a strongly-typed field, but unless I'm misunderstanding, that is not the case here, correct?

@mintar
Copy link
Contributor

mintar commented Aug 5, 2019

@gavanderhoorn : You still didn't 100% understand the point raised by @hakuturu583. Maybe an example helps to clarify it:

If you have two different publishers, both of them could set the id of an object to some_string_i_just_made_up (or more probably there will be implementations that will set the ids to 0, 1, ...), even though they mean different objects.

However, if we were to use the UUID messages, it is practically guaranteed that there will not be two different tracking_ids set to 44091dba-81c3-42f0-8c3a-5bd756645476 by two different publishers.

In short, without UUIDs, there can be name clashes, but only between different nodes. This is why I think this is an edge case that should be handled differently, and we should keep using strings.

@gavanderhoorn
Copy link

I believe I did understand, but didn't articulate my comment sufficiently. I realise that UUIDs guarantee uniquess, even across different nodes, machines, etc and arbitrary strings do not.

The situation would seem somewhat similar to TF, where frame_ids also should be unique, but cannot be guaranteed to be. Clashes there are resolved with either a tf_prefix (but deprecated) or even remapping.

@Kukanani
Copy link
Collaborator

Kukanani commented Aug 5, 2019

I see where you are coming from @hakuturu583, and I also understand @mintar's clarification, but I maintain that a flexible string representation is better in terms of usability, and also doesn't add extra dependencies for all package users. The usability gain of using universally-understood strings is greater than the usability gain of standardizing to UUID, which some developers may not have used before. UUID is not entirely standardized anyway, since there are 5 different versions of UUID and the message does not enforce a version.

If you would like to maintain a unique internal representation, UUID v3 or v5 can be used to convert + (or similar) into a UUID.

I agree that UUIDs are quick to generate but speed was never a primary concern here.

@hakuturu583
Copy link
Contributor Author

@Kukanani How do you treat batting of the tracking_id from multiple trackers which made by multiple developers.
If you use string, we always exposed to this problem.
We can set anything in String field, so we cannot run from this problem if we use String in the field.

@hakuturu583
Copy link
Contributor Author

I think the compatibility is the most important thing in ROS package.
String format conflicts to this.

@hakuturu583
Copy link
Contributor Author

If you would like to maintain a unique internal representation, UUID v3 or v5 can be used to convert + (or similar) into a UUID.

I agree that UUIDs are quick to generate but speed was never a primary concern here.

There is uuid generator in uuid_msgs.(https://github.com/ros-geographic-info/unique_identifier/blob/master/unique_id/include/unique_id/unique_id.h)

Using this generator and if we write which generater should be used in message comment, we can avoid conflict of the tracking_id.

@hakuturu583
Copy link
Contributor Author

I really afraid that if we use string in the tracking ID format, people use this filed as they like.

@hakuturu583
Copy link
Contributor Author

hakuturu583 commented Aug 5, 2019

@mintar

However, if we were to use the UUID messages, it is practically guaranteed that there will not be two different tracking_ids set to 44091dba-81c3-42f0-8c3a-5bd756645476 by two different publishers.

In short, without UUIDs, there can be name clashes, but only between different nodes. This is why I think this is an edge case that should be handled differently, and we should keep using strings.

You say only two, but if we want to build complex sensor fusion system such as 3 cameras and 2 lidars like autonomous driving vehicle, the batting problem become much more complex.
So, I think we should restrict how to use this field.
If all developers of tracking nodes use UUID in tracking_id field, we are not necessary to think about this problem.

@mintar
Copy link
Contributor

mintar commented Aug 5, 2019

I believe I did understand, but didn't articulate my comment sufficiently.

OK, sorry for assuming you didn't. :)

@hakuturu583
Copy link
Contributor Author

hakuturu583 commented Aug 5, 2019

@mistermult I think adding namespace and restriction to the tracking ID is good idea.(#25 (comment))
However, I think it will be implemented as a comment.
I think it is too weak, this is because comment is a just "Gentleman's Agreement".

@mintar
Copy link
Contributor

mintar commented Aug 5, 2019

@hakuturu583 wrote :

You say only two, but if we want to build complex sensor fusion system such as 3 cameras and 2 lidars like autonomous driving vehicle, the batting problem become much more complex.

By "batting" you mean conflicting / colliding?

So, I think we should restrict how to use this field.
If all developers of tracking nodes use UUID in tracking_id field, we are not necessary to think about this problem.

Using UUIDs makes this specific problem easier to handle, at the cost of making other problems more difficult. For example, with strings, you can reuse IDs that your tracker or other software (like MoveIt) is already using. This seems to me a more common use case than aggregating multiple trackers by simply throwing all their messages together. Therefore, I prefer string.

There are ways for you to handle your problem even if we keep strings:

  1. You will probably want to remember anyway which tracker produced an ID. In this case, you can use the tracker / topic name as a namespace.
  2. You could write an adapter node that simply replaces all Ids with UUIDs and republishes the message.

@hakuturu583
Copy link
Contributor Author

hakuturu583 commented Aug 5, 2019

@mintar

You say only two, but if we want to build complex sensor fusion system such as 3 cameras and 2 lidars like autonomous driving vehicle, the batting problem become much more complex.

By "batting" you mean conflicting / colliding?

right, after that I call conflicting.

@hakuturu583
Copy link
Contributor Author

@mintar

Using UUIDs makes this specific problem easier to handle, at the cost of making other problems more difficult. For example, with strings, you can reuse IDs that your tracker or other software (like MoveIt) is already using. This seems to me a more common use case than aggregating multiple trackers by simply throwing all their messages together. Therefore, I prefer string.

We can convert UUID to String very easily, so I think you should convert them into String after you subscribe UUID in other nodes.

@hakuturu583
Copy link
Contributor Author

hakuturu583 commented Aug 5, 2019

@mintar

You will probably want to remember anyway which tracker produced an ID. In this case, you can use the tracker / topic name as a namespace.

I think this is good, but I recommend to add tracking_namespace field if you want to use String.
But, If we use UUID, trackig_namespace field is unnecessary.
Also, we can restrict developers to use UUID in this field and it makes us to build complex pipeline much more easily.

You could write an adapter node that simply replaces all Ids with UUIDs and republishes the message.

Republishing data is just a useless process and it makes the system complex more than necessary.

@mistermult
Copy link
Contributor

I'm voting for string.

@Kukanani
Copy link
Collaborator

Kukanani commented Aug 6, 2019

@hakuturu583 I'd appreciate it if you'd respond using one message rather than a chain of messages, it breaks up the dialogue and takes up visual space, making it hard to follow the flow of conversation.

Taking all this into account, I'm going to keep the tracking_id as a string. Although please note that I will soon be moving the recently merged PRs off of kinetic-devel and to a master branch to avoid breaking compatibility within the Kinetic distro. Sorry, still learning the ins and outs of maintaining a public ROS package.

As for the recent comments:

We can convert UUID to String very easily, so I think you should convert them into String after you subscribe UUID in other nodes.

Counterpoint: We can convert String to UUID very easily, so I think you should convert them into UUID after you receive them. If UUID benefits your use case there are multiple methods of creating a unique UUID based on side information, some of which have been listed in this thread.

By enforcing UUID, we are adding dependencies and also essentially requiring that everyone use a set of boilerplate code to use the tracking_id field. This is unacceptable to me. Personally I would not want to have to go to the trouble while writing my own node. I already find it annoying enough to have to remember to set w = 1 in every quaternion.

I think this is good, but I recommend to add tracking_namespace field if you want to use String.
But, If we use UUID, trackig_namespace field is unnecessary.

It is still unnecessary. You can use the fully-qualified topic name as the namespace, for example.

Also, we can restrict developers to use UUID in this field and it makes us to build complex pipeline much more easily.

The goal is not to build complex pipelines more easily, the goal is to standardize usage and make a message structure that works for multiple use cases. Many vision pipelines are simple.

@hakuturu583
Copy link
Contributor Author

hakuturu583 commented Aug 6, 2019

@Kukanani

Counterpoint: We can convert String to UUID very easily, so I think you should convert them into UUID after you receive them. If UUID benefits your use case there are multiple methods of creating a unique UUID based on side information, some of which have been listed in this thread.
By enforcing UUID, we are adding dependencies and also essentially requiring that everyone use a set of boilerplate code to use the tracking_id field. This is unacceptable to me. Personally I would not want to have to go to the trouble while writing my own node. I already find it annoying enough to have to remember to set w = 1 in every quaternion.

I agree with sometimes boilerplate code annoying me.
In this case, I think adding boilerplate code is very very useful for all users.
We can use all of tracker nodes which made by people all over the world very easily.
I think it is the most important hing in ROS development.

It is still unnecessary. You can use the fully-qualified topic name as the namespace, for example.
I think we should not split namespace by using specific characters such as using prefix.

Namespace should be splited by ROS message field.

The goal is not to build complex pipelines more easily, the goal is to standardize usage and make a message structure that works for multiple use cases. Many vision pipelines are simple.

Why do you think we are not necessary to standardize the usage of tracking_id node.
String can be used freely.

@Kukanani
Copy link
Collaborator

Kukanani commented Aug 7, 2019

If we all agree to use a string and the field is encoded as a string, then it is standardized.

At this point we're just rehashing the same arguments over and over. If any new information comes to light then I'll consider it when the time comes. Thanks for the discussion!

@hakuturu583
Copy link
Contributor Author

hakuturu583 commented Aug 8, 2019

I still strong disagree with using string.
This harms the biggest advantage of using ROS package.
ROS helps us using building complex robotics system, this is because the developer of ROS package consider compatibility.
String format harms compatibility.

@hakuturu583
Copy link
Contributor Author

hakuturu583 commented Aug 8, 2019

If we all agree to use a string and the field is encoded as a string, then it is standardized.

String is not "standardized".
We can set any value in this field.
In order to "standardize", we have to decide "what type of the value shoud be set" and "how the value was generated."
UUID shows this very very cleary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants