Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(cmd/rpc): Automatically detect running node for RPC requests #3246

Merged
merged 24 commits into from
Apr 17, 2024

Conversation

mastergaurang94
Copy link
Contributor

@mastergaurang94 mastergaurang94 commented Mar 7, 2024

Closes #2859.

TL;DR Previously, the node.store flag had to be specified manually for each RPC request. This commit introduces automatic detection of the running node.

Assumptions:

  • presence of lock indicates a running node
  • specific order of networks (mainnet, mocha, arabica, private, custom) and type (bridge, full, light)
  • 1 network will only have 1 running node type. multiple nodes of same network, same type are disallowed (receive Error: node: store is in use).
  • auth token, other flags still retain prev behavior and have priority
  • address, port read in from config
  • skipAuth skips auth for trusted setups
  • aligns with Unix daemon conventions
  • non-default node store & cel-key still require config flags

Sample Test Cases

  1. Node store set
❯ celestia blob get 1318129 0x42690c204d39600fddd3 0MFhYKQUi2BU+U1jxPzG7QY2BVV1lb3kiU+zAK7nUiY= --node.store=$NODE_STORE
{
  "result": {
    "namespace": "AAAAAAAAAAAAAAAAAAAAAAAAAEJpDCBNOWAP3dM=",
    "data": "0x676d",
    "share_version": 0,
    "commitment": "0MFhYKQUi2BU+U1jxPzG7QY2BVV1lb3kiU+zAK7nUiY=",
    "index": 23
  }
}
  1. Node store not set, but flag specified
❯ celestia blob get 1318129 0x42690c204d39600fddd3 0MFhYKQUi2BU+U1jxPzG7QY2BVV1lb3kiU+zAK7nUiY= --node.store=
Error: cant access the auth token: root directory was not specified: unable to load in config: open config.toml: no such file or directory
  1. No node store flag specified, yay
❯ celestia blob get 1318129 0x42690c204d39600fddd3 0MFhYKQUi2BU+U1jxPzG7QY2BVV1lb3kiU+zAK7nUiY=
{ // valid result, same as 1. }
  1. Multiple networks running, will go to mainnet before mocha
❯ celestia blob get 1318129 0x42690c204d39600fddd3 0MFhYKQUi2BU+U1jxPzG7QY2BVV1lb3kiU+zAK7nUiY=
{
  "result": "RPC client error: sendRequest failed: http status 401 Unauthorized unmarshaling response: EOF"
}
  1. Multiple networks running, will go to mocha before arabica
❯ celestia blob get 1318129 0x42690c204d39600fddd3 0MFhYKQUi2BU+U1jxPzG7QY2BVV1lb3kiU+zAK7nUiY=
{
  "result": "header: given height is from the future: networkHeight: 802095, requestedHeight: 1318129"
}
  1. Run node with rpc config Address: 0.0.0.1 and Port: 25231 -- accurately directs request
❯ celestia blob get 1318129 0x42690c204d39600fddd3 0MFhYKQUi2BU+U1jxPzG7QY2BVV1lb3kiU+zAK7nUiY=
{
  "result": "RPC client error: sendRequest failed: Post \"http://0.0.0.1:25231\": dial tcp 0.0.0.1:25231: connect: no route to host"
}
  1. Run node with --rpc.skip_auth=true and config with SkipAuth: true (prints log in rpc/server here)
❯ celestia blob get 1318129 0x42690c204d39600fddd3 0MFhYKQUi2BU+U1jxPzG7QY2BVV1lb3kiU+zAK7nUiY=
{ // valid result, same as 1. }
  1. Run node with --rpc.skip_auth=true and config with SkipAuth: false (server skips auth, prints log in rpc/server here)
❯ celestia blob get 1318129 0x42690c204d39600fddd3 0MFhYKQUi2BU+U1jxPzG7QY2BVV1lb3kiU+zAK7nUiY=
{ // valid result, same as 1. }

@github-actions github-actions bot added the external Issues created by non node team members label Mar 7, 2024
Copy link
Member

@Wondertan Wondertan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the pr!

Lets not complicate things and stick to the former but with full swaped with light in the order.

In the case that multiple node types are running on 1 network, the request is directed to 1 node in this order: bridge, light, full. Can change that ordering or alternatively, can detect many node types and ask user to clarify via CLI input. I like former approach since this use case seems uncommon.

libs/fslock/locker.go Outdated Show resolved Hide resolved
cmd/rpc.go Outdated Show resolved Hide resolved
cmd/rpc.go Outdated Show resolved Hide resolved
Wondertan added a commit that referenced this pull request Mar 13, 2024
This PR is prompted by #3246. While looking through the code and understanding how `IsLocked` worked there, I realized that our fslock lib was missing a feature. Instead of asking @mastergaurang94 to fix it, I decided to completely migrate over to a more [comprehensive locking solution](https://github.com/gofrs/flock), which additionally:
* avoids the need custom code to maintain(NIHS)
* solves the problem of locking being annoying after an ungraceful shutdown
* gives us Windows and other niche OS support
	* as well as works around niche FS edge-cases
nodebuilder/store.go Outdated Show resolved Hide resolved
cmd/rpc.go Outdated Show resolved Hide resolved
@mastergaurang94
Copy link
Contributor Author

mastergaurang94 commented Mar 14, 2024

@Wondertan thanks for comments! Updated

also, note that flock now leaves .lock file in directory - https://github.com/gofrs/flock/blob/master/flock.go#L54

@Wondertan
Copy link
Member

also, note that flock now leaves .lock file in director

I am aware and that's ok. The locking functionality should be around the flock syscall, not the existence of the file.

nodebuilder/store.go Outdated Show resolved Hide resolved
nodebuilder/store.go Outdated Show resolved Hide resolved
nodebuilder/store.go Outdated Show resolved Hide resolved
nodebuilder/store.go Outdated Show resolved Hide resolved
Copy link
Member

@Wondertan Wondertan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pushing this forward and writing good tests!

One minor comment and one major comment about logic improvement

nodebuilder/store.go Show resolved Hide resolved
cmd/rpc.go Outdated Show resolved Hide resolved
renaynay pushed a commit to renaynay/celestia-node that referenced this pull request Apr 3, 2024
This PR is prompted by celestiaorg#3246. While looking through the code and understanding how `IsLocked` worked there, I realized that our fslock lib was missing a feature. Instead of asking @mastergaurang94 to fix it, I decided to completely migrate over to a more [comprehensive locking solution](https://github.com/gofrs/flock), which additionally:
* avoids the need custom code to maintain(NIHS)
* solves the problem of locking being annoying after an ungraceful shutdown
* gives us Windows and other niche OS support
	* as well as works around niche FS edge-cases
Copy link
Member

@walldiss walldiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Just one small nit on formatting

nodebuilder/p2p/network.go Show resolved Hide resolved
Copy link
Member

@Wondertan Wondertan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this great contribution and for persistence on resolving multiple requested changes!

@mastergaurang94
Copy link
Contributor Author

Haven't seen the approval in a long time 🥹 Thank you for lending your time & energy to this review @Wondertan! Had fun

@Wondertan Wondertan added the kind:feat Attached to feature PRs label Apr 17, 2024
Copy link
Member

@vgonkivs vgonkivs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work. Thank you 🚀

@vgonkivs
Copy link
Member

cc @jcstein for visibility

@jcstein
Copy link
Member

jcstein commented Apr 17, 2024

Cross-posting from slack:

This flag also appears:

  1. in the cel-key utility: https://docs.celestia.org/developers/celestia-node-key#steps-for-generating-node-keys
  2. for using a different directory than the standard dir: https://docs.celestia.org/nodes/celestia-node-troubleshooting#changing-the-location-of-your-node-store

these are the two that i found on a quick search

what impact does this have on cel-key utility or someone wishing to use a different directory than the default?

@Wondertan
Copy link
Member

what impact does this have on cel-key utility or someone wishing to use a different directory than the default?

@jcstein
No impact on cel-key, and I think, as it stands, cel-key does not integrate automatic path resolution.

For non-default path users, they have to specify the path using the flag.

@jcstein
Copy link
Member

jcstein commented Apr 17, 2024

For non-default path users, they have to specify the path using the flag.

Got it, thanks @Wondertan. If I were running with a non-default node store, making an RPC request, I would still use the node.store flag, correct?

@Wondertan
Copy link
Member

No impact on cel-key, and I think, as it stands, cel-key does not integrate automatic path resolution.

Correction after looking at the code. cel-key does have automatics but they aren't as smart as in this PR and this PR doesn't chang those

@Wondertan
Copy link
Member

Got it, thanks @Wondertan. If I were running with a non-default node store, making an RPC request, I would still use the node.store flag, correct?

Yes

@mastergaurang94
Copy link
Contributor Author

Thanks @vgonkivs & @walldiss! Also, PR will need to be labeled by someone who has access for merge

@Wondertan Wondertan enabled auto-merge (squash) April 17, 2024 16:57
@Wondertan Wondertan merged commit 29678cb into celestiaorg:main Apr 17, 2024
25 of 26 checks passed
@mastergaurang94 mastergaurang94 deleted the gaurang.rpc-detect-node branch April 17, 2024 17:11
@jcstein
Copy link
Member

jcstein commented Apr 17, 2024

gm @mastergaurang94 - I'm working on the docs for this. can you help me understand which network(s) you were running a node(s) for on for 4-6?

  1. Multiple networks running, will go to mainnet before mocha
❯ celestia blob get 1318129 0x42690c204d39600fddd3 0MFhYKQUi2BU+U1jxPzG7QY2BVV1lb3kiU+zAK7nUiY=
{
 "result": "RPC client error: sendRequest failed: http status 401 Unauthorized unmarshaling response: EOF"
}

I'm assuming you had a mocha node running, the request was sent to mainnet node, but looks like the call failed?


  1. Multiple networks running, will go to mocha before arabica
❯ celestia blob get 1318129 0x42690c204d39600fddd3 0MFhYKQUi2BU+U1jxPzG7QY2BVV1lb3kiU+zAK7nUiY=
{
  "result": "header: given height is from the future: networkHeight: 802095, requestedHeight: 1318129"
}

I'm guessing you had mocha and arabica running?


  1. Run node with rpc config Address: 0.0.0.1 and Port: 25231 -- accurately directs request
❯ celestia blob get 1318129 0x42690c204d39600fddd3 0MFhYKQUi2BU+U1jxPzG7QY2BVV1lb3kiU+zAK7nUiY=
{
  "result": "RPC client error: sendRequest failed: Post \"http://0.0.0.1:25231\": dial tcp 0.0.0.1:25231: connect: no route to host"
}

It looks like this request fails?

@mastergaurang94
Copy link
Contributor Author

hey @jcstein, sure:

  1. mainnet full, mocha light. re: fail, I confirmed the request went to the right node, not the validity of the response. 401 showed up in the mainnet node logs (that blob nor namespace exists there)
  2. yes, both mocha full & arabica light. re: header in future, I had deleted my node store so new one hadn't caught up yet.

main thing, node types are unique. these were the cases I listed but any variation will work

  1. similar to above, the request directs to http://0.0.0.1:25231. failure because I wasn't running anything at that address, if I was, it would have to that node

@jcstein
Copy link
Member

jcstein commented Apr 17, 2024

awesome, thanks for the details here @mastergaurang94

could you please take a look at the corresponding docs PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
external Issues created by non node team members kind:feat Attached to feature PRs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

cmd: command accessing RPC should detect running node
6 participants