Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Algod: Leverage 2 SRV record providers for resolving relay addresses w/ de-duplication. #5087

Merged
merged 28 commits into from Jun 7, 2023

Conversation

gmalouf
Copy link
Contributor

@gmalouf gmalouf commented Jan 31, 2023

Summary

Introduces support for having two sets of SRV records intended to resolve to the same relay hosts. This allows for resiliency against a single DNS provider failing.

As a part of this, we are introducing the concept of primary and backup SRV bootstrap entries. For these, one can provide a deduplication pattern for merging between them:

<network>.algorand0.network?backup=<network>.algorand.network&dedup=<name>.(algorand-<network>|n-<network>.algorand0).network

Today, we support providing multiple independent bootstrap entries such as

<network>.algorand.network;<network>.myawesomebootstraponalgorand.com

In these cases, no deduplication is attempted unless resolved relay names are identical. The equivalent of that with our new backup support looks like:

<network>.algorand0.network?backup=<network>.algorand.network&dedup=<name>.(algorand-<network>|n-<network>.algorand0).network;<network>.myawesomebootstraponalgorand.com

Test Plan

  • Several new unit test cases were introduced, leveraging the Rapid library
  • We ran the node with several combinations of bootstrap ids and networks as laid out in this sheet

@codecov
Copy link

codecov bot commented Jan 31, 2023

Codecov Report

Merging #5087 (06a432d) into master (6b712a1) will increase coverage by 0.07%.
The diff coverage is 80.50%.

@@            Coverage Diff             @@
##           master    #5087      +/-   ##
==========================================
+ Coverage   55.46%   55.54%   +0.07%     
==========================================
  Files         447      448       +1     
  Lines       63290    63360      +70     
==========================================
+ Hits        35103    35191      +88     
+ Misses      25807    25787      -20     
- Partials     2380     2382       +2     
Impacted Files Coverage Δ
cmd/algod/main.go 0.00% <0.00%> (ø)
cmd/algoh/main.go 0.50% <0.00%> (-0.01%) ⬇️
cmd/catchupsrv/download.go 14.70% <0.00%> (ø)
tools/network/telemetryURIUpdateService.go 31.14% <60.00%> (ø)
network/wsNetwork.go 72.90% <79.16%> (+2.43%) ⬆️
config/dnsbootstrap.go 100.00% <100.00%> (ø)
config/localTemplate.go 70.76% <100.00%> (+10.16%) ⬆️

... and 10 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@urtho
Copy link

urtho commented Mar 14, 2023

This feature allows for user friendly way of booting off off two separate lists - not only primary/secondary.
The merger of non-overlapping entries allows for a nice scenarios where node connects to the primary relay list as well as chosen "community run" list.

It would be even better to treat 3rd and following boostrapid entries as secondary to allow the merger of even more relay sets.
Communities could enrich the primary list (or primary+secondary list) with multiple sets they trust.

network/wsNetwork.go Outdated Show resolved Hide resolved
network/wsNetwork.go Outdated Show resolved Hide resolved
network/wsNetwork_test.go Outdated Show resolved Hide resolved
network/wsNetwork_test.go Outdated Show resolved Hide resolved
Comment on lines 3828 to 3850
dedupedRelayDomains = removeDuplicateStr(relayDomains)

relayPeers = netA.GetPeers(PeersPhonebookRelays)
assert.Equal(t, len(dedupedRelayDomains), len(relayPeers))

relayAddrs = make([]string, len(relayPeers))
for pi, peer := range relayPeers {
relayAddrs[pi] = peer.(HTTPPeer).GetAddress()
}
sort.Strings(relayAddrs)
sort.Strings(dedupedRelayDomains)
assert.Equal(t, dedupedRelayDomains, relayAddrs)

archivePeers = netA.GetPeers(PeersPhonebookArchivers)
assert.Equal(t, len(dedupedArchiveDomains), len(archivePeers))

archiveAddrs = make([]string, len(archivePeers))
for pi, peer := range archivePeers {
archiveAddrs[pi] = peer.(HTTPPeer).GetAddress()
}
sort.Strings(archiveAddrs)
sort.Strings(dedupedArchiveDomains)
assert.Equal(t, dedupedArchiveDomains, archiveAddrs)
Copy link
Contributor

@winder winder Mar 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd wrap this whole check in an inline function. It was not clear to me at first that half of this test was the assertion logic and this could help.

@gmalouf gmalouf force-pushed the multi-dns-mappings branch 2 times, most recently from 5961d55 to fc99282 Compare May 12, 2023 04:38
@gmalouf gmalouf changed the title WIP: Leverage 2 SRV record providers for resolving relay addresses w/ de-duplication. Leverage 2 SRV record providers for resolving relay addresses w/ de-duplication. May 22, 2023
@gmalouf gmalouf requested review from onetechnical and removed request for algocce May 22, 2023 16:38
@gmalouf gmalouf changed the title Leverage 2 SRV record providers for resolving relay addresses w/ de-duplication. Algod: Leverage 2 SRV record providers for resolving relay addresses w/ de-duplication. May 22, 2023
@gmalouf gmalouf marked this pull request as ready for review May 22, 2023 16:40
gmalouf and others added 17 commits June 7, 2023 10:08
….net as backup, with dedup pattern between resolved addresses from the two.
… in-depth description of how ParseDNSBootstrap behaves.
…wsNetwork_test.go

Co-authored-by: Will Winder <wwinder.unh@gmail.com>
Co-authored-by: Will Winder <wwinder.unh@gmail.com>
Co-authored-by: Will Winder <wwinder.unh@gmail.com>
Co-authored-by: cce <51567+cce@users.noreply.github.com>
… validate and just parse flavors. The Validate function is now called during bootstrapping in both the algod and algoh main functions (where errors stop start up).
…Template. In practice, there is no reason to vary whether or not we stop in validate vs 'just read' modes.
GenesisID: genesisID,
NetworkID: networkID,
nodeInfo: nodeInfo,
resolveSRVRecords: tools_network.ReadFromSRV,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 nice

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@gmalouf gmalouf requested a review from cce June 7, 2023 14:28
@@ -23,13 +23,16 @@ import (
"encoding/binary"
"encoding/json"
"fmt"
"github.com/algorand/go-algorand/internal/rapidgen"
Copy link
Contributor

@cce cce Jun 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when I resurrect #3734 the gci tool can auto-organize all the imports across the codebase, there used to be a preferred structure for import sections but we have drifted a lot

Copy link
Contributor

@winder winder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@gmalouf gmalouf merged commit 9f8cc8d into algorand:master Jun 7, 2023
24 checks passed
@gmalouf gmalouf deleted the multi-dns-mappings branch June 7, 2023 18:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants