New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
No support for time with micro-second precision #353
Comments
Hi @yjiangnan
It'll be helpful to know what the schema you have in mind. Do you need global order across all users? Or do you need order in the context of each user's action. For example, is 'userid', the partition portion of your compound key, and you have requests that needed to be ordered within the context of each user (so you need a order_id which is guaranteed increasing, and is the clustering portion of your compound key)? Note that TIMEUUID is a unique time-based UUID which has a time component and additional components to make it unique. Additionally, within the context of a single partition (like a userid), it also has the property that it is in strictly increasing order. Will that work for your needs? |
Hi @kmuthukk, Thanks for the reply. I need the order for both, but focus on global order across all users, and I need it in a way that is easily verifiable by an user even in case of network unreliability. For example, in case of an online interactive gaming, a user establish a REST or Websocket connection to server, and he wants to know exactly what his status is and what others status is in the current moment. Let's also assume he is running an AI to learn from other users' visible action histories (some actions maybe invisible to others) so that the order of data and data completeness are very important. In case of network instability, he wants constantly to know whether he is still connected and has retrieved all the historic data after possible reconnection. Of course, this can be done by retrieving all the historic data by REST API since last time, but this obviously adds a lot of network overhead. It would be much easier to see the proof in the data (most recently retrieved by Websocket) that his has retrieved all the data. One way I have in mind is to put all the historic data in a chain so that each row of new data has a field linked to the previous one. E.g.,
However, I am not sure if there is a very efficient way to find out what the last insertion is for concurrent insertions. Additionally, if the "data" field is the hash of the content of the previous insertion (i.e., in a blockchain fashion), is it efficiently doable? |
hi @yjiangnan a) A global, monotonically increasing counter in a sharded/scale-out database will need some kind of a centralized service/coordinator (e.g., you can use a special single row table to keep track of the counter and keep incrementing it in a race condition safe Read-Modify-Write operation to assign out a monotonically increasing value to requesting clients). However, there's no guarantee that two concurrent write operations that use these counters commit it the correct order that is compatible with the counter values. b) The approach of chaining the actions in a linked list like scheme would work. Question: Currently, in YCQL (our Cassandra based data model) we require the primary key (even when it is a compound key) to have at least 1 leading partition key. This means there isn't an easy way to do global order queries efficiently, because there's no efficient way to get the MAX(id) in the table without doing a scan. But we do plan to relax this restriction of requiring a partition key (and allow for the primary key to be based only on CLUSTER/sorted columns). When we do that (i.e. implement range based primary key) finding the max id issued so far in log(N) time, and doing a conditional INSERT (with the IF NOT EXISTS clause) that protects against race conditions due to concurrent operations should be straightforward. For example, in YCQL pseudo-code:
If the INSERT fails due to concurrent operations, then the SELECT and INSERT have to be retried. Currently, because YCQL requires a hash-partition in the primary key, finding the max(id) will be O(N). If that's not desirable an alternative would be keep the current max in another single row table (say
|
Hi @kmuthukk, Thank you very much for the reply. I thought about using a row to track the max id, too, but placing it in the same table with id 0. Would that still work and be better if I have multiple tables to track? I am not sure on what scope the ACID transaction locks. Does it force all the operations on the cluster to be executed sequentially, or just do it per keyspace, or per table, or just the primary keys that would be affected by the transaction? I think this would have huge implications for the scalability of the cluster if, say, each user could get involved in multiple games and he has some credit that can be used to purchase equipments in each game so that the credit would be shared and modified from multiple games. If the ACID transaction locks operations globally, then all the games have to executed sequentially, making it very slow. But if only the affected primary keys are affected, then all the games could be executed in parallel. Additionally, for the last example, is it better to place By the way, what is the most efficient hash function for the serialized string of the previous insertion that can be called inside the transaction? Thanks again. |
Hi @yjiangnan,
Having two tables or a single table with a fixed id (0 in this case) work about the same in terms of semantics and performance.
ACID transactions only lock the rows (in fact, it is just the actual columns) they are modifying. So other, non-overlapping operations would not be affected. You would need to declare which tables are transactional using a property So in your case, the games would be executed in parallel since only modified columns are optimistically locked.
Today, we only support
Sorry I didnt understand... could you please explain? |
Hi @yjiangnan
Yes, keeping a special row (with id 0 for example) within the same table will work too.
YugaByte takes finer grained locks. We do not take DB/keyspace/table levels locks, as those will severely limit the concurrency the system can support (as you pointed out). Updates to different rows of a table can all proceed concurrently. However, for the particular use case you mention, if you maintain a special row to track the max id issued so far - whether it is stored in the same table or a different table - that row will become a hotspot. So all updates related to that table (maybe a game in your case) will be serialized because every transaction on that table will also try to update that special row. But updates to different games can proceed concurrently. regards, |
Hi @rkarthik007 In Cassandra one can define and call that function to calculate the hash of By using a hash, the user will also be able verify the data integrity so that no body can change the historical data without changing all data in the row and being detected. |
Hi @yjiangnan, Thanks for that explanation, we do not support user defined functions in YugaByte yet. Would the default hash function applied on the serialized string not work for you? cc @robertpang @m-iancu |
Hi @rkarthik007 What is the default hash function? Is it dependent on the content of the row so that changing anything in the row would necessarily change the hash? And is the default hash function implemented in common programming languages so that users can easily verify the integrity of the data themselves? |
@robertpang or @m-iancu - could one of you please answer? @yjiangnan - this is similar to the hash function that Cassandra uses by default. We have implemented it in C++ and Java, and should be able to port to other languages. |
@yjiangnan Here is the Java implementation of our default hash function - https://github.com/YugaByte/cassandra-java-driver/blob/3.2.0-yb-x/driver-core/src/main/java/com/yugabyte/driver/core/utils/Jenkins.java. It is based on Bob Jenkin's hash function (http://burtleburtle.net/bob/hash/evahash.html). It is good for sharding / partitioning but not for ensuring data integrity. For data integrity, I will suggest you to look into other hash functions like SHA-2 (https://en.wikipedia.org/wiki/SHA-2). |
@rkarthik007 @robertpang I think almost any hash function would work in my cause because the data to hash does not contain a nonce and there is not so many ways you can change the data to cause a collision. However, the problem is where I can call it. If it can only be called outside of the database (e.g., by a client driver), then I think I can implement it by any means. However, This way may cause some delay and frequent conflict in case of concurrency during the process of query last record --> hash --> insert as it requires two rounds of communication, although I do not have an estimate of the actual delay after cluster deployment. |
@yjiangnan if a collision change of approx
Additionally, if needed, we may be able to add a generic I'm not sure if this example matches your exact question, perhaps you can clarify what your table schema and main query patterns are. Then we can provide a more concrete (and accurate) response. Here is a simple example:
You can see that the
Would something like this help with your "no body can change the historical data without changing all data in the row and being detected" question above? |
Hi @m-iancu Thanks for the example. It is interesting that an almost random int64 integer can have a collision rate as large as 1 in 65536. xxHash for example offers a much lower collision rate: http://fastcompression.blogspot.com/2014/07/xxhash-wider-64-bits.html Then implementing xxHash or something in the SHA family would be a nice choice. Additionally, my purpose of keeping a hash to chain the data together is for users to easily verify the data. I am not sure if users can easily call a function to get the same result as the |
@yjiangnan We actually use But, as mentioned above, if the general idea of the example matches your use-case we can use it as a starting point and provide another general builtin function (e.g. |
Great! Then I will wait for the new version of YugaByte and hope that it will come with one or more builtin hash functions since the basic |
@yjiangnan - do you mind opening a github issue for a build in hash function? Just to make sure we dont lose it in this issue. Thanks! |
Hi @m-iancu How can I locate the server of a specific tablet? It seems that the tablet information is saved in table
It seems that generating the key for the partition key and comparing to the start_key and end_key is the way to achieve this. However, I have difficulty in generating the key:
The token is a negative number. In Python, I can see that it looks like the thing I want:
However, it has a negative sign and I am not sure how I can compare it to the start_key and end_key. I eventually want to do this in golang. |
@yjiangnan Great question, this is again because token uses Note that we do plan to add an alternative function (e.g. |
Hi @m-iancu Thanks for the info. An even simply approach would be adding a function to return the I implemented this in golang but still got a negative number:
I guess this is because |
@yjiangnan here is a working example in Go (only thing I changed is the type from
This should return:
Regarding your second question, the ranges are start-inclusive and end-exclusive, so writing max value ( But doing one query to look this up each time is probably inefficient anyway. We recommend caching the values from the
For example, in our Java driver fork we have a TableSplitMetadata instance for each table which, in turn, containing one PartitionMetadata for each row. We use this data in various applications, including for query locality in our Spark connector fork (which uses our Java driver fork internally): see this code |
Closing based on the discussion and workaround above. No plans to support this feature at this point. |
Example:
As one can see, even though there are 6 significant digits after second, there are only 3 non-zero digits. Therefore, the minimal time precision is only millisecond. Is it possible to get it to the level of micro-second?
I find this useful in my application. Suppose I have a high rate of requests from many users from multiple servers so that each millisecond sometimes could easily contain multiple requests. Then how do I now the order of the requests? Does TIMEUUID guarantee the correct order? Additionally, if I want the users to know the order of the requests of all users (say I run a blockchain and want the users to be able to verify the results), how could that be easily done without adding much overhead (they do not have YugaByte installed). Is it possible to add a field of id corresponding to the order of insertion with ACID guarantee? Thanks!
The text was updated successfully, but these errors were encountered: