Since the launch of Testnet v3, we have continued to make improvements to the codebase. We have since launched a new version of the Testnet v3.1.0 and released a testnet status page in our forum (https://forum.zilliqa.com/t/maoshanwang-testnet-status/151).
Here are some of the notable improvements we have made in the past few weeks.
Generalized DataSender Class
We observed from test runs that our assignment algorithm for forwarding messages between the DS committee, shards, and Lookup nodes was rather rudimentary. Additionally, the assignment code was spread out across several functions among the different classes. We have improved on this aspect by creating the DataSender class, which acts as a wrapper for the message forwarding. This new implementation selects nodes to act as message forwarders based on the co-signature of the most recently completed consensus. The fact that these nodes participated in the consensus serves as an indication of their continued liveness and increases the likelihood that the message will be forwarded successfully.
Changes to Transaction Processing, Verification, and Transmission
After observing our testnet perform under stress conditions, we concluded that our practice of buffering transactions with unexpected (i.e., larger) nonce values unnecessarily increased the complexity of handling transactions in a consistent manner across the nodes. This was especially the case when different recovery mechanisms such as view change would be triggered, or when transaction packets from Lookups would be received in an untimely manner. So, instead of maintaining the buffer, we chose to re-populate the current pool of transactions with the ones that would have been buffered previously, thus simplifying the overall workflow of transaction processing.
The verification algorithm was also modified to allow for an adjustable tolerance with respect to the ordering of the transactions proposed by the shard leader. Furthermore, to reduce the likelihood of the leader proposing a block with unexpected transaction ordering, we have modified the transaction packet transmission in such a way that the Lookup nodes send the transactions to the shard leader first. Should the shard backup nodes be missing some of these transactions, we already have code in place whereby the backups can fetch these missing transactions from the leader.
More Restrictive Timestamp Check
Another improvement we have recently made addresses potential security vulnerabilities in the way we verify timestamps across blocks. Previously, any new block proposed by the leader during consensus would be accepted if the timestamp was larger than the previous one in the chain. This time, we have changed the acceptance criterion to be based on the difference between the backup node’s local time and the received timestamp from the leader.
Bounded Message Queues
Our use of boost::queue::push to insert messages to our incoming and outgoing queues allowed the addition of entries beyond the defined size of the queue. This meant resource over-utilization was possible, including the uncontrolled spawning of hundreds of threads. Now we have moved to use boost::queue:bounded_push, which prevents adding new messages beyond the size limit. We have also removed the code that retries message insertion when the limit is reached. In effect, new messages are dropped when the node reaches capacity. We will be testing out this new behavior in the coming weeks, particularly to analyze the resource consumption of nodes in our testnet.
Network recovery mechanism
To prepare for unforeseeable events such as network failure, we have implemented a recovery mechanism for us to re-bootstrap and restore the whole network. To facilitate this, lookup nodes will routinely perform a backup procedure as a pre-emptive measure for network failure. When the network fails, a new network will be launched from the backup databases.
Update to lookup and introduction of seed nodes
We have introduced a new network layer called the seed node network. New seed nodes will be able to register with a lookup multiplier, a special node that mirrors lookup traffic to seed nodes, to be part of the seed network. The role of the seed node is to receive transactions from services such as a wallet and forward the transactions to the lookup. The lookups will then batch the transactions and assign them to the corresponding shard for processing.
Shard node to trigger the rejoining process if it misses final block(s)
A shard node can miss a final block due to various issues such as intermittent network failure. We have introduced a mechanism for the shard node to securely check whether it is missing final block(s). In the event of missing final block(s), the shard node will then re-sync itself and rejoin the shard.
Addition of tolerance for validating IP addresses in the sharding structure
In some instances, the Directory Service leader and backups may receive PoW submissions from the same node but with different IP addresses (or ports), resulting in the backup failing to validate the sharding structure proposed by the leader. Such a situation can occur when, for example, a node has been restarted by its user with a different IP address or port, or perhaps when the change in IP address is due to IP address lease expiration. To accommodate these possibilities, we have added a tolerance value when validating the sharding structure, allowing the DS backups to accept the sharding structure from the DS leader if the number of nodes with differing IP addresses is within said limit.
Changes in Protobuf field definitions to allow backward compatibility in persistent storage
The Protobuf field definitions for serializing and deserializing messages to and from persistent storage are mostly set to “required”. This means that these fields must be set in order for the message object to be initialized. However, there is a possibility of these fields being unused or deprecated in future updates. Deserializing from persistent storage can then become an issue. As part of our effort to support backward compatibility, we have now set these fields to “optional”. The core C++ source code will now implement the checks for those fields that are considered required. This essentially moves the enforcement of required fields from the message content (i.e., the Protobuf definitions) to the source code, which is easier to change between software updates than the format of data already stored.