Self-monitoring reporting specs #4877

shortthefomo · 2024-01-10T14:47:34Z

Summary

There is a feature that slated for development in 2.1.0 for validator self monitoring #3508

It is also wanted to have some form of data on the nodes specs and should be considered to be added or grouped with that ticket.
Namely number of CPU core, amount of RAM and disk space, possibly a indicator of free disk as well would be amazing.

Motivation

This can help the UNL list builders get some understanding of the head room the nodes they choose to add to their lists. As well as other automated tools that could then be built for node monitoring, to let operators know they may be running out of disk (a very common issue) or possibly looking to the other specs along with the sync data that is available from the VHS to report CPU/RAM possible improvements.

Solution

It could be approached something like the 255 flag ledgers in the validation stream, but I would even look at reporting it at a lower interval 1 a day (~20000 ledgers).

mvadari · 2024-01-10T14:59:14Z

Note: there could be an attack vector if this info is public on the network. Then potential attackers would know how hard they need to hit the network to take it down.

shortthefomo · 2024-01-11T00:24:17Z

Note: there could be an attack vector if this info is public on the network. Then potential attackers would know how hard they need to hit the network to take it down.

I don't fully understand this comment, are you saying that if I throw enough tx at the network I can bring it down? Has this been shown to happen before?

bigcjat · 2024-01-14T01:58:53Z

We could identify OS or hardware specific issues much quicker. Like the current online delete issue, is there something in common across all nodes? If that can be identified, it can be fixed appropriately and swiftly.

I think that's where @mvadari's concern is, the same data we'd use to find potential issues, bad actors could use as attack vectors. But I think the data collected for the health of the network outweighs the potential for an exploited vulnerability.

ximinez · 2024-01-16T16:57:17Z

If something like this is added, it must be made opt-in, that is the default setting is "off".

bigcjat · 2024-01-16T19:35:01Z

If something like this is added, it must be made opt-in, that is the default setting is "off".

You can always opt-out by keeping your validator anonymous. There’s no point in identifying yourself if you’re not going to disclose details required for trust anyway. Other than superficial ones.

ximinez · 2024-01-22T22:15:36Z

You can always opt-out by keeping your validator anonymous. There’s no point in identifying yourself if you’re not going to disclose details required for trust anyway. Other than superficial ones.

No, that's not sufficient. However unlikely, if anonymity is broken, this would give an attacker that much more information to work with. If it's not published in the first place, then it is that much harder to obtain.

intelliot · 2024-01-22T22:20:37Z

@lathanbritz - can you update your post to clarify that you're requesting for the ability to share certain specs publicly over the p2p network?

Note, I believe #3508 is a very different request: it allows validator operators to self-monitor their own server. The data would be written to a private log file; it would not be shared publicly.

shortthefomo · 2024-01-23T00:34:09Z

@intelliot yes my request was to possibly add further information on a flag ledger to possibly the flag ledger but reading all the feed back here I see the push back on it.

* telENV_RPC_FAILED is a new code, reserved exclusively for unit tests when RPC fails unexpectedly. This will make those types of errors distinct and easier to test for when expect and/or diagnose when not.

* telENV_RPC_FAILED is a new code, reserved exclusively for unit tests when RPC fails. This will make those types of errors distinct and easier to test for when expected and/or diagnose when not. * Output RPC command result when result is not expected.

* Price Oracle (XLS-47d): (XRPLF#4789) (XRPLF#4789) Implement native support for Price Oracles. A Price Oracle is used to bring real-world data, such as market prices, onto the blockchain, enabling dApps to access and utilize information that resides outside the blockchain. Add Price Oracle functionality: - OracleSet: create or update the Oracle object - OracleDelete: delete the Oracle object To support this functionality add: - New RPC method, `get_aggregate_price`, to calculate aggregate price for a token pair of the specified oracles - `ltOracle` object The `ltOracle` object maintains: - Oracle Owner's account - Oracle's metadata - Up to ten token pairs with the scaled price - The last update time the token pairs were updated Add Oracle unit-tests * fix compile error on gcc 13: (XRPLF#4932) The compilation fails due to an issue in the initializer list of an optional argument, which holds a vector of pairs. The code compiles correctly on earlier gcc versions, but fails on gcc 13. * Set version to 2.2.0-b1 * Remove default ctors from SecretKey and PublicKey: (XRPLF#4607) * It is now an invariant that all constructed Public Keys are valid, non-empty and contain 33 bytes of data. * Additionally, the memory footprint of the PublicKey class is reduced. The size_ data member is declared as static. * Distinguish and identify the PublisherList retrieved from the local config file, versus the ones obtained from other validators. * Fixes XRPLF#2942 * Fast base58 codec: (XRPLF#4327) This algorithm is about an order of magnitude faster than the existing algorithm (about 10x faster for encoding and about 15x faster for decoding - including the double hash for the checksum). The algorithms use gcc's int128 (fast MS version will have to wait, in the meantime MS falls back to the slow code). * feat: add user version of `feature` RPC (XRPLF#4781) * uses same formatting as admin RPC * hides potentially sensitive data * build: add STCurrency.h to xrpl_core to fix clio build (XRPLF#4939) * Embed patched recipe for RocksDB 6.29.5 (XRPLF#4947) * fix: order book update variable swap: (XRPLF#4890) This is likely the result of a typo when the code was simplified. * Fix workflows (XRPLF#4948) The problem was `CONAN_USERNAME` environment variable, which Conan 1.x uses as the default user in package references. * Upgrade to xxhash 0.8.2 as a Conan requirement, enable SIMD hashing (XRPLF#4893) We are currently using old version 0.6.2 of `xxhash`, as a verbatim copy and paste of its header file `xxhash.h`. Switch to the more recent version 0.8.2. Since this version is in Conan Center (and properly protects its ABI by keeping the state object incomplete), add it as a Conan requirement. Switch to the SIMD instructions (in the new `XXH3` family) supported by the new version. * Update remaining actions (XRPLF#4949) Downgrade {upload,download}-artifact action to v3 because of unreliability with v4. * Install more public headers (XRPLF#4940) Fixes some mistakes in XRPLF#4885 * test: Env unit test RPC errors return a unique result: (XRPLF#4877) * telENV_RPC_FAILED is a new code, reserved exclusively for unit tests when RPC fails. This will make those types of errors distinct and easier to test for when expected and/or diagnose when not. * Output RPC command result when result is not expected. * Fix workflows (XRPLF#4951) - Update container for Doxygen workflow. Matches Linux workflow, with newer GLIBC version required by newer actions. - Fixes macOS workflow to install and configure Conan correctly. Still fails on tests, but that does not seem attributable to the workflow. * perf: improve `account_tx` SQL query: (XRPLF#4955) The witness server makes heavily use of the `account_tx` RPC command. Perf testing showed that the SQL query used by `account_tx` became unacceptably slow when the DB was large and there was a `marker` parameter. The plan for the query showed only indexed reads. This appears to be an issue with the internal SQLite optimizer. This patch rewrote the query to use `UNION` instead of `OR` and significantly improves performance. See RXI-896 and RIPD-1847 for more details. * `fixEmptyDID`: fix amendment to handle empty DID edge case: (XRPLF#4950) This amendment fixes an edge case where an empty DID object can be created. It adds an additional check to ensure that DIDs are non-empty when created, and returns a `tecEMPTY_DID` error if the DID would be empty. * Enforce no duplicate slots from incoming connections: (XRPLF#4944) We do not currently enforce that incoming peer connection does not have remote_endpoint which is already used (either by incoming or outgoing connection), hence already stored in slots_. If we happen to receive a connection from such a duplicate remote_endpoint, it will eventually result in a crash (when disconnecting) or weird behavior (when updating slot state), as a result of an apparently matching remote_endpoint in slots_ being used by a different connection. * Remove zaphod.alloy.ee hub from default server list: (XRPLF#4903) Remove the zaphod.alloy.ee hubs from the bootstrap and default configuration after 5 years. It has been an honor to run these servers, but it is now time for another entity to step into this role. The zaphod servers will be taken offline in a phased manner keeping all those who have peering arrangements informed. These would be the preferred attributes of a boostrap set of hubs: 1. Commitment to run the hubs for a minimum of 2 years 2. Highly available 3. Geographically dispersed 4. Secure and up to date 5. Committed to ensure that peering information is kept private * Write improved `forAllApiVersions` used in NetworkOPs (XRPLF#4833) * Don't reach consensus as quickly if no other proposals seen: (XRPLF#4763) This fixes a case where a peer can desync under a certain timing circumstance--if it reaches a certain point in consensus before it receives proposals. This was noticed under high transaction volumes. Namely, when we arrive at the point of deciding whether consensus is reached after minimum establish phase duration but before having received any proposals. This could be caused by finishing the previous round slightly faster and/or having some delay in receiving proposals. Existing behavior arrives at consensus immediately after the minimum establish duration with no proposals. This causes us to desync because we then close a non-validated ledger. The change in this PR causes us to wait for a configured threshold before making the decision to arrive at consensus with no proposals. This allows validators to catch up and for brief delays in receiving proposals to be absorbed. There should be no drawback since, with no proposals coming in, we needn't be in a huge rush to jump ahead. * fixXChainRewardRounding: round reward shares down: (XRPLF#4933) When calculating reward shares, the amount should always be rounded down. If the `fixUniversalNumber` amendment is not active, this works correctly. If it is not active, then the amount is incorrectly rounded up. This patch introduces an amendment so it will be rounded down. * Remove unused files * Remove packaging scripts * Consolidate external libraries * Simplify protobuf generation * Rename .hpp to .h * Format formerly .hpp files * Rewrite includes $ find src/ripple/ src/test/ -type f -exec sed -i 's:include\s*["<]ripple/$.*$\.h$pp$\?[">]:include <ripple/\1.h>:' {} + * Fix source lists * Add markers around source lists * fix: improper handling of large synthetic AMM offers: A large synthetic offer was not handled correctly in the payment engine. This patch fixes that issue and introduces a new invariant check while processing synthetic offers. * Set version to 2.1.1 * chore: change Github Action triggers for build/test jobs (XRPLF#4956) Github Actions for the build/test jobs (nix.yml, mac.yml, windows.yml) will only run on branches that build packages (develop, release, master), and branches with names starting with "ci/". This is intended as a compromise between disabling CI jobs on personal forks entirely, and having the jobs run as a free-for-all. Note that it will not affect PR jobs at all. * Address compiler warnings * Fix search for protoc * chore: Default validator-keys-tool to master branch: (XRPLF#4943) * master is the default branch for that project. There's no point in using develop. * Remove unused lambdas from MultiApiJson_test * fix Conan component reference typo * Set version to 2.2.0-b2 * bump version * 2.2.3 * 2.2.4 * 2.2.5 --------- Co-authored-by: Gregory Tsipenyuk <gregtatcam@users.noreply.github.com> Co-authored-by: seelabs <scott.determan@yahoo.com> Co-authored-by: Chenna Keshava B S <21219765+ckeshava@users.noreply.github.com> Co-authored-by: Mayukha Vadari <mvadari@gmail.com> Co-authored-by: John Freeman <jfreeman08@gmail.com> Co-authored-by: Bronek Kozicki <brok@incorrekt.com> Co-authored-by: Ed Hennis <ed@ripple.com> Co-authored-by: Olek <115580134+oleks-rip@users.noreply.github.com> Co-authored-by: Alloy Networks <45832257+alloynetworks@users.noreply.github.com> Co-authored-by: Mark Travis <mtrippled@users.noreply.github.com> Co-authored-by: Gregory Tsipenyuk <gtsipenyuk@ripple.com>

shortthefomo added the Feature Request Used to indicate requests to add new features label Jan 10, 2024

shortthefomo closed this as completed Jan 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Self-monitoring reporting specs #4877

Self-monitoring reporting specs #4877

shortthefomo commented Jan 10, 2024 •

edited

mvadari commented Jan 10, 2024

shortthefomo commented Jan 11, 2024

bigcjat commented Jan 14, 2024

ximinez commented Jan 16, 2024

bigcjat commented Jan 16, 2024

ximinez commented Jan 22, 2024

intelliot commented Jan 22, 2024

shortthefomo commented Jan 23, 2024

Self-monitoring reporting specs #4877

Self-monitoring reporting specs #4877

Comments

shortthefomo commented Jan 10, 2024 • edited

Summary

Motivation

Solution

mvadari commented Jan 10, 2024

shortthefomo commented Jan 11, 2024

bigcjat commented Jan 14, 2024

ximinez commented Jan 16, 2024

bigcjat commented Jan 16, 2024

ximinez commented Jan 22, 2024

intelliot commented Jan 22, 2024

shortthefomo commented Jan 23, 2024

shortthefomo commented Jan 10, 2024 •

edited