Added retry mechanism in executionEngine for executePayload #3854

dadepo · 2022-03-13T01:14:15Z

Motivation

Adding retries when CL calls EL

codecov · 2022-03-13T01:14:23Z

Codecov Report

Merging #3854 (f8a6521) into unstable (684c2e3) will increase coverage by 36.11%.
The diff coverage is n/a.

❗ Current head f8a6521 differs from pull request most recent head 713f213. Consider uploading reports for the commit 713f213 to get more accurate results

@@              Coverage Diff              @@
##           unstable    #3854       +/-   ##
=============================================
+ Coverage          0   36.11%   +36.11%     
=============================================
  Files             0      325      +325     
  Lines             0     9043     +9043     
  Branches          0     1421     +1421     
=============================================
+ Hits              0     3266     +3266     
- Misses            0     5632     +5632     
- Partials          0      145      +145

github-actions · 2022-03-13T01:34:39Z

Performance Report

✔️ no performance regression detected

Full benchmark results

Benchmark suite	Current: `d1864c0`	Previous: `292b36d`	Ratio
getPubkeys - index2pubkey - req 1000 vs - 250000 vc	2.3306 ms/op	2.2164 ms/op	1.05
getPubkeys - validatorsArr - req 1000 vs - 250000 vc	77.744 us/op	67.183 us/op	1.16
BLS verify - blst-native	1.8497 ms/op	2.1676 ms/op	0.85
BLS verifyMultipleSignatures 3 - blst-native	3.8003 ms/op	4.4843 ms/op	0.85
BLS verifyMultipleSignatures 8 - blst-native	8.1823 ms/op	9.6956 ms/op	0.84
BLS verifyMultipleSignatures 32 - blst-native	29.665 ms/op	35.236 ms/op	0.84
BLS aggregatePubkeys 32 - blst-native	39.564 us/op	46.980 us/op	0.84
BLS aggregatePubkeys 128 - blst-native	152.06 us/op	182.79 us/op	0.83
getAttestationsForBlock	47.806 ms/op	44.068 ms/op	1.08
isKnown best case - 1 super set check	420.00 ns/op	474.00 ns/op	0.89
isKnown normal case - 2 super set checks	414.00 ns/op	466.00 ns/op	0.89
isKnown worse case - 16 super set checks	409.00 ns/op	464.00 ns/op	0.88
CheckpointStateCache - add get delete	9.2780 us/op	9.4910 us/op	0.98
validate gossip signedAggregateAndProof - struct	4.2945 ms/op	5.0248 ms/op	0.85
validate gossip attestation - struct	2.0359 ms/op	2.3669 ms/op	0.86
altair verifyImport mainnet_s3766816:31	8.9031 s/op	8.9948 s/op	0.99
pickEth1Vote - no votes	2.0698 ms/op	2.1036 ms/op	0.98
pickEth1Vote - max votes	21.915 ms/op	22.846 ms/op	0.96
pickEth1Vote - Eth1Data hashTreeRoot value x2048	11.878 ms/op	13.212 ms/op	0.90
pickEth1Vote - Eth1Data hashTreeRoot tree x2048	20.524 ms/op	21.857 ms/op	0.94
pickEth1Vote - Eth1Data fastSerialize value x2048	1.5257 ms/op	1.5243 ms/op	1.00
pickEth1Vote - Eth1Data fastSerialize tree x2048	17.078 ms/op	15.537 ms/op	1.10
bytes32 toHexString	1.0630 us/op	1.1950 us/op	0.89
bytes32 Buffer.toString(hex)	676.00 ns/op	822.00 ns/op	0.82
bytes32 Buffer.toString(hex) from Uint8Array	942.00 ns/op	1.0910 us/op	0.86
bytes32 Buffer.toString(hex) + 0x	688.00 ns/op	826.00 ns/op	0.83
Object access 1 prop	0.36200 ns/op	0.43400 ns/op	0.83
Map access 1 prop	0.29200 ns/op	0.30500 ns/op	0.96
Object get x1000	17.732 ns/op	11.571 ns/op	1.53
Map get x1000	1.1050 ns/op	1.1340 ns/op	0.97
Object set x1000	117.70 ns/op	89.066 ns/op	1.32
Map set x1000	71.693 ns/op	59.614 ns/op	1.20
Return object 10000 times	0.36790 ns/op	0.44360 ns/op	0.83
Throw Error 10000 times	6.0166 us/op	5.9495 us/op	1.01
enrSubnets - fastDeserialize 64 bits	2.7080 us/op	3.3000 us/op	0.82
enrSubnets - ssz BitVector 64 bits	753.00 ns/op	906.00 ns/op	0.83
enrSubnets - fastDeserialize 4 bits	384.00 ns/op	479.00 ns/op	0.80
enrSubnets - ssz BitVector 4 bits	729.00 ns/op	889.00 ns/op	0.82
prioritizePeers score -10:0 att 32-0.1 sync 2-0	95.221 us/op	95.712 us/op	0.99
prioritizePeers score 0:0 att 32-0.25 sync 2-0.25	124.24 us/op	135.24 us/op	0.92
prioritizePeers score 0:0 att 32-0.5 sync 2-0.5	214.12 us/op	247.53 us/op	0.87
prioritizePeers score 0:0 att 64-0.75 sync 4-0.75	484.23 us/op	336.05 us/op	1.44
prioritizePeers score 0:0 att 64-1 sync 4-1	464.63 us/op	407.29 us/op	1.14
RateTracker 1000000 limit, 1 obj count per request	187.22 ns/op	202.64 ns/op	0.92
RateTracker 1000000 limit, 2 obj count per request	141.82 ns/op	152.16 ns/op	0.93
RateTracker 1000000 limit, 4 obj count per request	115.58 ns/op	126.26 ns/op	0.92
RateTracker 1000000 limit, 8 obj count per request	110.57 ns/op	111.02 ns/op	1.00
RateTracker with prune	5.0390 us/op	4.8710 us/op	1.03
array of 16000 items push then shift	3.2095 us/op	51.612 us/op	0.06
LinkedList of 16000 items push then shift	29.758 ns/op	17.391 ns/op	1.71
array of 16000 items push then pop	259.22 ns/op	231.28 ns/op	1.12
LinkedList of 16000 items push then pop	22.962 ns/op	14.823 ns/op	1.55
array of 24000 items push then shift	4.5769 us/op	77.388 us/op	0.06
LinkedList of 24000 items push then shift	32.876 ns/op	22.016 ns/op	1.49
array of 24000 items push then pop	215.89 ns/op	200.67 ns/op	1.08
LinkedList of 24000 items push then pop	23.880 ns/op	16.495 ns/op	1.45
intersect bitArray bitLen 8	11.757 ns/op	11.003 ns/op	1.07
intersect array and set length 8	169.04 ns/op	160.85 ns/op	1.05
intersect bitArray bitLen 128	61.864 ns/op	55.513 ns/op	1.11
intersect array and set length 128	2.3625 us/op	2.0491 us/op	1.15
pass gossip attestations to forkchoice per slot	3.6214 ms/op	2.8437 ms/op	1.27
computeDeltas	3.9844 ms/op	3.1766 ms/op	1.25
computeProposerBoostScoreFromBalances	907.60 us/op	804.27 us/op	1.13
altair processAttestation - 250000 vs - 7PWei normalcase	3.9875 ms/op	4.0912 ms/op	0.97
altair processAttestation - 250000 vs - 7PWei worstcase	5.8850 ms/op	5.9621 ms/op	0.99
altair processAttestation - setStatus - 1/6 committees join	209.30 us/op	179.37 us/op	1.17
altair processAttestation - setStatus - 1/3 committees join	393.60 us/op	343.90 us/op	1.14
altair processAttestation - setStatus - 1/2 committees join	560.68 us/op	484.51 us/op	1.16
altair processAttestation - setStatus - 2/3 committees join	712.92 us/op	633.80 us/op	1.12
altair processAttestation - setStatus - 4/5 committees join	996.41 us/op	890.37 us/op	1.12
altair processAttestation - setStatus - 100% committees join	1.1800 ms/op	1.0869 ms/op	1.09
altair processBlock - 250000 vs - 7PWei normalcase	28.444 ms/op	24.584 ms/op	1.16
altair processBlock - 250000 vs - 7PWei normalcase hashState	41.769 ms/op	34.341 ms/op	1.22
altair processBlock - 250000 vs - 7PWei worstcase	81.984 ms/op	88.620 ms/op	0.93
altair processBlock - 250000 vs - 7PWei worstcase hashState	103.44 ms/op	97.535 ms/op	1.06
phase0 processBlock - 250000 vs - 7PWei normalcase	4.5582 ms/op	4.1568 ms/op	1.10
phase0 processBlock - 250000 vs - 7PWei worstcase	49.830 ms/op	53.595 ms/op	0.93
altair processEth1Data - 250000 vs - 7PWei normalcase	1.0605 ms/op	826.60 us/op	1.28
Tree 40 250000 create	902.48 ms/op	826.25 ms/op	1.09
Tree 40 250000 get(125000)	296.02 ns/op	246.14 ns/op	1.20
Tree 40 250000 set(125000)	2.6400 us/op	2.3106 us/op	1.14
Tree 40 250000 toArray()	32.935 ms/op	27.942 ms/op	1.18
Tree 40 250000 iterate all - toArray() + loop	34.312 ms/op	28.200 ms/op	1.22
Tree 40 250000 iterate all - get(i)	113.79 ms/op	111.33 ms/op	1.02
MutableVector 250000 create	16.278 ms/op	14.161 ms/op	1.15
MutableVector 250000 get(125000)	13.059 ns/op	10.890 ns/op	1.20
MutableVector 250000 set(125000)	684.13 ns/op	604.95 ns/op	1.13
MutableVector 250000 toArray()	7.7284 ms/op	6.7037 ms/op	1.15
MutableVector 250000 iterate all - toArray() + loop	12.687 ms/op	6.8739 ms/op	1.85
MutableVector 250000 iterate all - get(i)	3.3053 ms/op	2.6940 ms/op	1.23
Array 250000 create	6.6270 ms/op	6.6154 ms/op	1.00
Array 250000 clone - spread	2.7669 ms/op	3.5862 ms/op	0.77
Array 250000 get(125000)	1.1620 ns/op	1.6720 ns/op	0.69
Array 250000 set(125000)	1.1310 ns/op	1.6420 ns/op	0.69
Array 250000 iterate all - loop	167.87 us/op	152.96 us/op	1.10
effectiveBalanceIncrements clone Uint8Array 300000	73.143 us/op	60.893 us/op	1.20
effectiveBalanceIncrements clone MutableVector 300000	781.00 ns/op	1.0940 us/op	0.71
effectiveBalanceIncrements rw all Uint8Array 300000	252.50 us/op	247.59 us/op	1.02
effectiveBalanceIncrements rw all MutableVector 300000	184.79 ms/op	192.02 ms/op	0.96
phase0 afterProcessEpoch - 250000 vs - 7PWei	180.66 ms/op	189.44 ms/op	0.95
phase0 beforeProcessEpoch - 250000 vs - 7PWei	91.580 ms/op	81.313 ms/op	1.13
altair processEpoch - mainnet_e81889	598.85 ms/op	553.51 ms/op	1.08
mainnet_e81889 - altair beforeProcessEpoch	164.92 ms/op	127.02 ms/op	1.30
mainnet_e81889 - altair processJustificationAndFinalization	30.674 us/op	16.496 us/op	1.86
mainnet_e81889 - altair processInactivityUpdates	12.051 ms/op	9.1157 ms/op	1.32
mainnet_e81889 - altair processRewardsAndPenalties	99.154 ms/op	82.473 ms/op	1.20
mainnet_e81889 - altair processRegistryUpdates	4.9400 us/op	2.7910 us/op	1.77
mainnet_e81889 - altair processSlashings	1.2420 us/op	676.00 ns/op	1.84
mainnet_e81889 - altair processEth1DataReset	1.4720 us/op	657.00 ns/op	2.24
mainnet_e81889 - altair processEffectiveBalanceUpdates	2.5095 ms/op	1.9856 ms/op	1.26
mainnet_e81889 - altair processSlashingsReset	8.6240 us/op	4.4220 us/op	1.95
mainnet_e81889 - altair processRandaoMixesReset	5.6050 us/op	4.6520 us/op	1.20
mainnet_e81889 - altair processHistoricalRootsUpdate	1.3250 us/op	710.00 ns/op	1.87
mainnet_e81889 - altair processParticipationFlagUpdates	4.1900 us/op	4.2690 us/op	0.98
mainnet_e81889 - altair processSyncCommitteeUpdates	744.00 ns/op	582.00 ns/op	1.28
mainnet_e81889 - altair afterProcessEpoch	192.53 ms/op	219.46 ms/op	0.88
phase0 processEpoch - mainnet_e58758	549.08 ms/op	498.87 ms/op	1.10
mainnet_e58758 - phase0 beforeProcessEpoch	251.64 ms/op	184.25 ms/op	1.37
mainnet_e58758 - phase0 processJustificationAndFinalization	27.949 us/op	18.502 us/op	1.51
mainnet_e58758 - phase0 processRewardsAndPenalties	129.73 ms/op	104.83 ms/op	1.24
mainnet_e58758 - phase0 processRegistryUpdates	13.852 us/op	9.1840 us/op	1.51
mainnet_e58758 - phase0 processSlashings	1.0450 us/op	688.00 ns/op	1.52
mainnet_e58758 - phase0 processEth1DataReset	994.00 ns/op	646.00 ns/op	1.54
mainnet_e58758 - phase0 processEffectiveBalanceUpdates	2.1173 ms/op	1.9674 ms/op	1.08
mainnet_e58758 - phase0 processSlashingsReset	5.0650 us/op	4.8380 us/op	1.05
mainnet_e58758 - phase0 processRandaoMixesReset	5.8210 us/op	4.2930 us/op	1.36
mainnet_e58758 - phase0 processHistoricalRootsUpdate	887.00 ns/op	713.00 ns/op	1.24
mainnet_e58758 - phase0 processParticipationRecordUpdates	6.0710 us/op	4.5490 us/op	1.33
mainnet_e58758 - phase0 afterProcessEpoch	157.25 ms/op	163.05 ms/op	0.96
phase0 processEffectiveBalanceUpdates - 250000 normalcase	2.6486 ms/op	1.9893 ms/op	1.33
phase0 processEffectiveBalanceUpdates - 250000 worstcase 0.5	2.9990 ms/op	2.2588 ms/op	1.33
altair processInactivityUpdates - 250000 normalcase	43.238 ms/op	34.449 ms/op	1.26
altair processInactivityUpdates - 250000 worstcase	42.755 ms/op	36.828 ms/op	1.16
phase0 processRegistryUpdates - 250000 normalcase	8.0430 us/op	6.1030 us/op	1.32
phase0 processRegistryUpdates - 250000 badcase_full_deposits	461.25 us/op	373.94 us/op	1.23
phase0 processRegistryUpdates - 250000 worstcase 0.5	221.16 ms/op	182.90 ms/op	1.21
altair processRewardsAndPenalties - 250000 normalcase	93.474 ms/op	113.84 ms/op	0.82
altair processRewardsAndPenalties - 250000 worstcase	120.48 ms/op	77.572 ms/op	1.55
phase0 getAttestationDeltas - 250000 normalcase	13.269 ms/op	13.250 ms/op	1.00
phase0 getAttestationDeltas - 250000 worstcase	12.796 ms/op	14.194 ms/op	0.90
phase0 processSlashings - 250000 worstcase	5.3566 ms/op	5.2039 ms/op	1.03
altair processSyncCommitteeUpdates - 250000	277.76 ms/op	298.35 ms/op	0.93
BeaconState.hashTreeRoot - No change	471.00 ns/op	537.00 ns/op	0.88
BeaconState.hashTreeRoot - 1 full validator	63.223 us/op	72.511 us/op	0.87
BeaconState.hashTreeRoot - 32 full validator	635.90 us/op	720.24 us/op	0.88
BeaconState.hashTreeRoot - 512 full validator	5.8907 ms/op	8.8544 ms/op	0.67
BeaconState.hashTreeRoot - 1 validator.effectiveBalance	78.828 us/op	90.601 us/op	0.87
BeaconState.hashTreeRoot - 32 validator.effectiveBalance	1.1649 ms/op	1.2572 ms/op	0.93
BeaconState.hashTreeRoot - 512 validator.effectiveBalance	15.895 ms/op	17.606 ms/op	0.90
BeaconState.hashTreeRoot - 1 balances	58.422 us/op	70.232 us/op	0.83
BeaconState.hashTreeRoot - 32 balances	568.56 us/op	647.20 us/op	0.88
BeaconState.hashTreeRoot - 512 balances	6.4830 ms/op	6.3652 ms/op	1.02
BeaconState.hashTreeRoot - 250000 balances	90.537 ms/op	103.61 ms/op	0.87
aggregationBits - 2048 els - zipIndexesInBitList	32.002 us/op	34.338 us/op	0.93
regular array get 100000 times	67.467 us/op	60.626 us/op	1.11
wrappedArray get 100000 times	67.448 us/op	60.709 us/op	1.11
arrayWithProxy get 100000 times	28.862 ms/op	29.090 ms/op	0.99
ssz.Root.equals	473.00 ns/op	580.00 ns/op	0.82
byteArrayEquals	448.00 ns/op	585.00 ns/op	0.77
shuffle list - 16384 els	10.995 ms/op	11.332 ms/op	0.97
shuffle list - 250000 els	161.98 ms/op	167.51 ms/op	0.97
processSlot - 1 slots	12.025 us/op	13.741 us/op	0.88
processSlot - 32 slots	1.7401 ms/op	1.9738 ms/op	0.88
getEffectiveBalanceIncrementsZeroInactive - 250000 vs - 7PWei	963.21 us/op	392.94 us/op	2.45
getCommitteeAssignments - req 1 vs - 250000 vc	5.2777 ms/op	5.3968 ms/op	0.98
getCommitteeAssignments - req 100 vs - 250000 vc	7.3146 ms/op	7.8735 ms/op	0.93
getCommitteeAssignments - req 1000 vs - 250000 vc	7.7576 ms/op	8.4522 ms/op	0.92
computeProposers - vc 250000	18.454 ms/op	19.129 ms/op	0.96
computeEpochShuffling - vc 250000	165.80 ms/op	171.22 ms/op	0.97
getNextSyncCommittee - vc 250000	269.97 ms/op	293.24 ms/op	0.92

by benchmarkbot/action

packages/cli/src/options/beaconNodeOptions/execution.ts

g11tech · 2022-03-13T07:12:32Z

packages/cli/src/options/beaconNodeOptions/execution.ts

+    description: "Delay time between retries when retrying calls to the execution engine API",
+    type: "number",
+    defaultDescription:
+      defaultOptions.executionEngine.mode === "http" ? String(defaultOptions.executionEngine.retryDelay) : "0",


do we really need to tailor make it for http? even if there is for example a non http mode ever (like ws) this would stay the same i believe.

This was done so as to make it uniform with already existing execution.urls and execution.timeout options which are tailor made for http.

I would just do

defaultDescription: String(defaultOptions.executionEngine.retryDelay)

i think typescript isn't happy otherwise, mixes it up with mock options, so i guess this is alright!

packages/lodestar/src/executionEngine/http.ts

packages/lodestar/test/unit/executionEngine/http.test.ts

dapplion · 2022-03-14T04:42:18Z

packages/lodestar/src/executionEngine/http.ts

-      // treated seperate from being INVALID. For now, just pass the error upstream.
-      .catch((e: Error): EngineApiRpcReturnTypes[typeof method] => {
+      // treated separate from being INVALID. For now, just pass the error upstream.
+      .catch(async (e: Error) => {


I'm not sure if that's allowed, Can a catch callback handle rejects without causing an unhandled promise?

dapplion · 2022-03-14T04:44:32Z

packages/lodestar/src/executionEngine/http.ts

+        retryAttempts: this.retryAttempts,
+        retryDelay: this.retryDelay,
+      }
+    );


Here it would be very important to track retry behaviour. Meaning adding metrics for:

histogram of retry attempts per call

histogram of overall requests times to EL

So to de-duplicate code you can add a private method fetchEl that handles metrics and calls fetchWithRetries. You can even move the logic in fetchWithRetries to here since no-one else is using it.

You can even move the logic in fetchWithRetries to here since no-one else is using it.

I decided to leave fetchWithRetries as is, because as mentioned by @g11tech here it makes it easier to test the retry mechanism separately.

@dapplion can you help clarify what you meant by histogram of retry attempts per call? I interpreted that to mean another histogram that captures the duration for each retry, while @g11tech interpreted that to mean a counter to try each request.

Also I was wondering, this suggestion only focuses on adding metrics to retries to done in executionEngine/http, while there are other places the JsonRpcHttpClient is used, for example in eth1Provider.ts. Would it be an idea to move metrics tracking totally into JsonRpcHttpClient? Then every client who uses JsonRpcHttpClient do not need to have metrics separate, it would be generated automatically from using JsonRpcHttpClient to make requests.

If yes, then I can do that as another PR. Let me know what you think

Actually might be good idea to add metrics to JsonRpcHttpClient. eth1 calls to not the execution engine should have metrics too

dapplion · 2022-03-14T04:47:43Z

packages/lodestar/src/eth1/provider/jsonRpcHttpClient.ts

+        shouldRetry: opts?.shouldRetry,
+      }
+    );
+    return parseRpcResponse(res, payload);


The should retry logic here should be based exclusively in networking errors. You only want to retry when the EL is unavailable. Here it's a bit tough because the underlying httpclient can change. So you should add a lot of unit (or e2e) tests that spin up a real server and try different things like:

bad URL

bad port

NGINX rejects to forward

etc

And ensure that only those are retried. Otherwise you can try to guess when a response is actually from an EL client, and don't retry only when you know the error is an "app layer" EL error

The should retry logic here should be based exclusively in networking errors. You only want to retry when the EL is unavailable

I believe this the case already. Line 94 (parseRpcResponse) would only be reached if the call returns 200, and parseRpcResponse parses that and throws ErrorJsonRpcResponse. If the call fails for any other reason (if response is not ok, ie network errors) that is only when the request is retried.

So you should add a lot of unit (or e2e) tests that spin up a real server

I'll add these

g11tech · 2022-03-14T07:42:24Z

packages/lodestar/test/unit/executionEngine/httpRetry.test.ts

+  describe("getPayload", async () => {
+    it("getPayload with retry", async function () {
+      this.timeout("10 min");
+      /**
+       * curl -X POST -H "Content-Type: application/json" --data '{"jsonrpc":"2.0","method":"engine_getPayload","params":["0x0"],"id":67}' http://localhost:8545
+       */


you don't need to rewrite all the execution api cases, as it will be difficult to maintain/update any changes in two places, rather just test out fetchWithRetries functionality exhaustively against any hypothetical request response (in the same way its happening here) basically write its test cases independentaly of the execution api logic

What do you say to removing the tests as you suggested but leaving just:

notifyForkchoiceUpdate no retry when no pay load attributes and

notifyForkchoiceUpdate with retry when pay load attributes tests?

as this are specific to how the node calls the EL, and when those calls should be retried or not.

ok, lets leave it at that, no need to remove 👍

dapplion · 2022-03-23T04:39:46Z

docker/grafana/datasource.local.yml

@@ -4,6 +4,6 @@ datasources:
  - name: Prometheus
    type: prometheus
    access: proxy
-    url: http://localhost:9090
+    url: http://host.docker.internal:9090


Why is localhost:9090 not reachable in MacOS?

I think that has to do with the fact that docker on MacOS does not have access to the host directly and it sits on top of a linux vm. Which makes it impossible to use localhost to refer to the actual host.

Then I would move this changes to another PR to keep this one in scope to dadepo/retry-el-executepayload

This changes was made due to the request to add metrics to the retry mechanism. Since it makes it easier to see the metrics via the Prometheus/Grafana setup locally, instead of having to first push and run on a linux server.

But I can move it to a separate PR as you suggested and just use the /metrics endpoint exposed by the node

g11tech · 2022-03-24T19:04:03Z

@dadepo @dapplion doing a bit of refac, moving metrics inside jsonrpc, adding dashboard, hopefully things will become cleaner and abstracted out from execution engine view point!

dapplion · 2022-03-25T07:09:14Z

packages/lodestar/src/executionEngine/http.ts

@@ -113,6 +126,9 @@ export class ExecutionEngineHttp implements IExecutionEngine {
        {
          retryAttempts: this.retryAttempts,
          retryDelay: this.retryDelay,
+          onEachRetryFn: () => {
+            this?.metrics?.executionEngineRequestCount.inc({method});


this can be undefined?

I am removing this because i am moving metrics inside json rpc, also intend is to capture the retry count, not the total count

g11tech · 2022-03-25T11:31:23Z

grafana dashboard cc: @dadepo @dapplion :

dadepo · 2022-03-25T14:49:48Z

grafana dashboard cc: @dadepo @dapplion :

Graph looks good. Just noticed that engine_getPayloadV1 is missing from the screenshot. Is it because it is actually missing or it was not just part of the screenshoot

g11tech · 2022-03-25T14:52:45Z

grafana dashboard cc: @dadepo @dapplion :

Graph looks good. Just noticed that engine_getPayloadV1 is missing from the screenshot. Is it because it is actually missing or it was not just part of the screenshoot

That method is called when the validator builds and proposes, this is just from my local setup without validator on the kiln network

dadepo · 2022-04-28T10:10:18Z

@dadepo @dapplion doing a bit of refac, moving metrics inside jsonrpc, adding dashboard, hopefully things will become cleaner and abstracted out from execution engine view point!

Hi @dapplion. Is this PR fine by you and can be merged? Or there are things you still will like improved?

dapplion · 2022-05-09T14:41:08Z

packages/lodestar/src/util/retry.ts

@@ -28,7 +28,7 @@ export async function retry<A>(fn: (attempt: number) => A | Promise<A>, opts?: I
  const shouldRetry = opts?.shouldRetry;

  let lastError: Error = Error("RetryError");
-  for (let i = 1; i <= maxRetries; i++) {
+  for (let i = 0; i < maxRetries; i++) {


Warning! This change can break all existing usages of retry downstream! Please review carefully

Hi @g11tech can you help look into this? Given you made the modifications, it is probably faster to share any insights you had explaining the reason for the change. As far as I can see, everything still looks fine.

hey @dadepo I think i did it to not count first attempt as retry, you can revert it and may be correct a test that might fail because of retry comparison.

dapplion · 2022-05-09T14:55:50Z

Hi @dapplion. Is this PR fine by you and can be merged? Or there are things you still will like improved?

Overall looks good! Some changes I've done

merged the metrics definitions into lodestar file, no need to keep them separate
de-duplicate test code

Check the issue with packages/lodestar/src/util/retry.ts changes and should be good to go.

If I've broken the tests please I appreciate if you can review them 🙏

wemeetagain

LGTM

dadepo requested review from wemeetagain, tuyennhv, dapplion and g11tech as code owners March 13, 2022 01:14

g11tech reviewed Mar 13, 2022

View reviewed changes

packages/cli/src/options/beaconNodeOptions/execution.ts Outdated Show resolved Hide resolved

g11tech reviewed Mar 13, 2022

View reviewed changes

packages/cli/src/options/beaconNodeOptions/execution.ts Outdated Show resolved Hide resolved

g11tech reviewed Mar 13, 2022

View reviewed changes

packages/lodestar/src/executionEngine/http.ts Outdated Show resolved Hide resolved

g11tech reviewed Mar 13, 2022

View reviewed changes

packages/lodestar/test/unit/executionEngine/http.test.ts Outdated Show resolved Hide resolved

dapplion reviewed Mar 14, 2022

View reviewed changes

dapplion suggested changes Mar 14, 2022

View reviewed changes

g11tech reviewed Mar 14, 2022

View reviewed changes

dadepo dismissed a stale review via 77276a9 March 18, 2022 04:51

dapplion reviewed Mar 23, 2022

View reviewed changes

dapplion reviewed Mar 25, 2022

View reviewed changes

g11tech dismissed a stale review via 2119b44 March 25, 2022 11:30

g11tech previously approved these changes Mar 25, 2022

View reviewed changes

philknows added the status-work-in-progress label Apr 5, 2022

g11tech mentioned this pull request Apr 22, 2022

Merge 🐼 Tracker #3945

Closed

22 tasks

dadepo dismissed stale reviews from ghost and g11tech via bfc911b April 28, 2022 09:40

dadepo requested a review from a team as a code owner April 28, 2022 09:40

dapplion suggested changes May 9, 2022

View reviewed changes

dapplion changed the base branch from master to unstable May 27, 2022 04:33

g11tech mentioned this pull request Jul 7, 2022

Lodestar sends same UNAVAILABLE block to EL at-infinitum #4251

Closed

g11tech force-pushed the dadepo/retry-el-executepayload branch from 1b34c47 to 4c23e6c Compare July 9, 2022 12:55

philknows removed the status-work-in-progress label Jul 13, 2022

Add retry and metrics to the execution engine

e9ba839

g11tech force-pushed the dadepo/retry-el-executepayload branch from 4c23e6c to e9ba839 Compare July 19, 2022 15:06

g11tech added 5 commits July 19, 2022 20:44

revert retry counter change

e976046

cleanup separate exec metrics

56e9f19

fix the test

9e7c634

retry fixes

3a1448b

fix the rpc fn

713f213

g11tech enabled auto-merge (squash) July 19, 2022 19:09

wemeetagain approved these changes Jul 20, 2022

View reviewed changes

g11tech requested a review from dapplion July 22, 2022 10:59

g11tech merged commit 8b3fef2 into unstable Jul 25, 2022

g11tech deleted the dadepo/retry-el-executepayload branch July 25, 2022 15:53

dapplion mentioned this pull request Oct 5, 2022

Fix fetchWithRetries typo #4642

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added retry mechanism in executionEngine for executePayload #3854

Added retry mechanism in executionEngine for executePayload #3854

dadepo commented Mar 13, 2022

codecov bot commented Mar 13, 2022 •

edited

github-actions bot commented Mar 13, 2022 •

edited

g11tech Mar 13, 2022

dadepo Mar 13, 2022

dapplion Mar 14, 2022

g11tech Mar 25, 2022

dapplion Mar 14, 2022

dapplion Mar 14, 2022

dapplion Mar 18, 2022

dadepo Mar 23, 2022

dapplion Mar 25, 2022

dapplion Mar 14, 2022

dadepo Mar 14, 2022

g11tech Mar 14, 2022

dadepo Mar 15, 2022

g11tech Mar 16, 2022

dapplion Mar 23, 2022

dadepo Mar 23, 2022

dapplion Mar 23, 2022

dadepo Mar 23, 2022

g11tech commented Mar 24, 2022

dapplion Mar 25, 2022

g11tech Mar 25, 2022 •

edited

g11tech commented Mar 25, 2022 •

edited

dadepo commented Mar 25, 2022

g11tech commented Mar 25, 2022

dadepo commented Apr 28, 2022

dapplion May 9, 2022

dadepo May 23, 2022

g11tech May 23, 2022

dapplion commented May 9, 2022

wemeetagain left a comment

Added retry mechanism in executionEngine for executePayload #3854

Added retry mechanism in executionEngine for executePayload #3854

Conversation

dadepo commented Mar 13, 2022

codecov bot commented Mar 13, 2022 • edited

Codecov Report

github-actions bot commented Mar 13, 2022 • edited

Performance Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

g11tech commented Mar 24, 2022

Choose a reason for hiding this comment

g11tech Mar 25, 2022 • edited

Choose a reason for hiding this comment

g11tech commented Mar 25, 2022 • edited

dadepo commented Mar 25, 2022

g11tech commented Mar 25, 2022

dadepo commented Apr 28, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dapplion commented May 9, 2022

wemeetagain left a comment

Choose a reason for hiding this comment

codecov bot commented Mar 13, 2022 •

edited

github-actions bot commented Mar 13, 2022 •

edited

g11tech Mar 25, 2022 •

edited

g11tech commented Mar 25, 2022 •

edited