Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Read-Only Geth #24445

Closed
wants to merge 7 commits into from
Closed

Conversation

haowang0402
Copy link

@haowang0402 haowang0402 commented Feb 21, 2022

Read-Only Geth

In this PR, I want to provide an option for using geth as a shared library to provide an abstraction to access the data without spawning a backend or rpc.

Motivation

This is useful for scaling reading from the ethereum historical data.

output-onlinepngtools

Currently, if a user wants to read data from the Ethereum blockchain, he has to connect to the rpc endpoint (provided by alchemy, infura, or self-hosted nodes) and make a request.

This does not scale well because of the bottleneck of the server end, because the server needs to handle all kinds of requests from clients. To support the increasing volume of reading, we need to have more geth nodes.

test

However, if we could create a read-only abstraction that could access the level db (on some file systems even remote) storing the blockchain data directly, then the client would not need to interact with the server to read data.

If we want to increase the reading speed, we could intuitively add more clients.
Providing the read-only geth as an option makes the scaling of reading from Ethereum more intuitive and easier.

test2

Updates

  1. Added readOnly and LocalLib options in the backend.go and node.go to shutdown disk writings and external network settings.
  2. Capitalized some functions in the rpc package, because we might want to use them outside of the rpc package.
  3. Added wrapper under cmd/read-only-lib, which can be compiled as a shared library.

Example

This is a brief example of how to use it as a python wrapper.

ffi = FFI()
ffi.cdef(
    """
    struct wrapper_call_return {
    char* data;
    int len;
    };
    extern int open_database(char* datadir);
    extern void close_database();
    extern struct wrapper_call_return wrapper_call(char* cargs, int clen);
    """
)
geth_lib = self.ffi.dlopen("read-only-lib.so")
geth_lib.open_database(DATADIR)
geth_lib.wrapper_call(jsonMsg, len(jsonMsg))
geth_lib.close_database()

Acknowledgement

This PR is completed during my internship at Hudson River Trading. Thank the sponsorship and help from Hudson River Trading and my mentor Daniel Maclennan.

@fjl
Copy link
Contributor

fjl commented Feb 22, 2022

Hi,

I have checked this out a bit, and there are many open questions. First of all, I think this is a good direction to explore. Running 'follower nodes' on a database maintained by one 'leader' geth instance is great. But the implementation details matter a lot.

This PR is actually two very different changes in one:

The C Library

Here you are adding a shared library build artifact, which provides a single method to run JSON-RPC requests. I think the addition of this library is not related to primary goal of this PR, and we should discuss and implement the C library build in a separate PR. I'm definitely in favor of adding something like that.

Regarding implementation details, I think the RPC integration can be done in a better way. You had to export deep internals of package rpc here because you picked the wrong interface. handler is not to be exposed. The preferred thing to expose would be Server.serveSingleRequest and we can discuss in detail how to expose that exactly. What we should provide is something like ServeRawRequest(context.Context, []byte) []byte.

Read-Only Mode

This area has many open questions still. In particular, you should know that maintaining an up-to-date follower database will require more changes than just making one node read-only. One big question is, how will the follower geth learn about updates?

There have been multiple attempts to figure out this feature in the past:

  • (1) Someone once proposed a PR to store all data in MySQL: Use MySQL as external db for geth #16474. We didn't merge this PR because we thought the K/V database is too low-level to be a good integration point.

  • (2) Recently we have become interested to see if the light client contained in geth could be used for a leader-follower setup. We have done some testing and performance seems sufficient. The big advantage with this scheme would be that it's already implemented, and that it also allows sending transactions from follower instances.

So overall, I would be more in favor of work on solution (2). What's necessary for that is:

  • We need more documentation on how to set up this functionality. It's possible but not super easy to configure.
  • Peering rules need to be improved to ensure light clients from a LAN can always connect, and will not be disconnected by the server.

If you are interested in working on this some more, we are happy to talk about it in a call.

@haowang0402
Copy link
Author

Hi,

We are also happy to schedule a call to talk about this more. Should I reach out to you on something like discord or telegram? Thank you!

@holiman
Copy link
Contributor

holiman commented Mar 14, 2023

This PR and the discussion about it seems to now be stale. I"m closing this, feel free to reopen if you want to pursue this track.

@holiman holiman closed this Mar 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants