Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core/scheduler: add duty query method #384

Merged
merged 10 commits into from
Apr 12, 2022

Conversation

leolara
Copy link
Contributor

@leolara leolara commented Apr 7, 2022

Add a method to the scheduler to query a duty arguments once it has been resolved

category: feature
ticket: #352

@leolara leolara requested a review from corverroos April 7, 2022 13:29
@codecov
Copy link

codecov bot commented Apr 7, 2022

Codecov Report

Merging #384 (333f922) into main (39c2aa8) will decrease coverage by 0.12%.
The diff coverage is 80.64%.

❗ Current head 333f922 differs from pull request most recent head c1af5d0. Consider uploading reports for the commit c1af5d0 to get more accurate results

@@            Coverage Diff             @@
##             main     #384      +/-   ##
==========================================
- Coverage   56.93%   56.81%   -0.13%     
==========================================
  Files          63       63              
  Lines        5605     5506      -99     
==========================================
- Hits         3191     3128      -63     
+ Misses       1993     1965      -28     
+ Partials      421      413       -8     
Impacted Files Coverage Δ
core/scheduler/scheduler.go 73.92% <80.64%> (-0.11%) ⬇️
core/validatorapi/router.go 61.06% <0.00%> (-3.65%) ⬇️
core/validatorapi/validatorapi.go 47.85% <0.00%> (-3.58%) ⬇️
app/app.go 65.07% <0.00%> (+0.95%) ⬆️
testutil/beaconmock/beaconmock.go 68.49% <0.00%> (+1.82%) ⬆️
core/types.go 44.77% <0.00%> (+3.10%) ⬆️
core/encode.go 64.00% <0.00%> (+5.91%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 39c2aa8...c1af5d0. Read the comment docs.

@@ -290,6 +327,32 @@ type slot struct {
SlotDuration time.Duration
}

func newSlot(ctx context.Context, eth2Cl eth2Provider, height uint64) (slot, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resolveDuties doesn't need the full internal slot type, it only uses slot and epoch int64s. I feel it would be simpler to refactor resolveDuties rather than adding this which adds some code duplication (from newSlotTicker).

@@ -173,6 +209,7 @@ func (s *Scheduler) scheduleSlot(ctx context.Context, slot slot) error {
}

// resolveDuties resolves the duties for the slot's epoch, caching the results.
// Do not call if you do not hold the dutiesMutex.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you do not hold the mutex when call this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The caller holds the mutex

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this comment not accurate anymore, can remove

Comment on lines 161 to 162
s.dutiesMutex.Lock()
defer s.dutiesMutex.Unlock()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I always decrease the time locks held to a minimum. I would strongly discourage doing network calls while a lock is held for example. Beacon API calls often take multiple seconds, during which you will be locking up validatorapi in this case (or whatever other components use GetDuty

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest adding getter/setter methods for mutex protected data. Then the logic can be lock free.

func (s *Scheduler) getFetchArgSet(duty core.Duty) core.FetchArgSet, bool {
  s.dutiesMutex.Lock()
  defer s.dutiesMutex.Unlock()
  
  return s.duties[duty]
}

func (s *Scheduler) setFetchArg(duty core.Duty, pubkey core.Pubkey, core.FetchArg) bool {
  s.dutiesMutex.Lock()
  defer s.dutiesMutex.Unlock()
  
  argSet, ok := s.duties[duty]
  if !ok {
    argSet = make(core.FetchArgSet)
  }
  if _, ok := argSet[pubkey]; ok {
    return false
  }
  
  s.duties[duty] = set
  return true
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@corverroos it is not just the duties map that we are sharing, it is also the resolvedEpoch, that is why I put mutex around everything. Otherwise, we will need another mutex and another extra private methods for resolvedEpoch, that perhaps that is what you want.

I recommend a redesign, where these duties are on a external component that stores them, and also the resolved epochs

Copy link
Contributor

@corverroos corverroos Apr 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can update the resolveDuties map in setFetchArg and also add a getter resolvedEpoch with similar pattern.

Another mutex isn't required, nor a is refactor I think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we add a map, I am confused why we keep resolved epochs instead of resolved slots?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I don't think I can update the resolvedEpochs or resolveSlot as an setFetchArg does not mean the whole slot is resolved.

If we move to a resolvedSlot map to bool then is ok, but I think it would be easier then to just put an empty value when a slot is resolved but there are no duties.

@@ -123,8 +125,42 @@ func (s *Scheduler) Run() error {
}
}

func (s *Scheduler) GetDuty(ctx context.Context, duty core.Duty) (core.FetchArgSet, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add a godoc

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, sorry about that

return s.duties[duty], nil
}

func (s *Scheduler) isEpochResolved(epoch uint64) bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with your concern that this will result in skipping a bunch of duties if a VC calls some far future slot. We should refactor resolvedEpoch into a map or something

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I propose to refactor with an external component that has all that logic, perhaps ResolvedDutiesDB ?

Perhaps we should define:

  • Duty
  • ResolvedDuty
  • SolvedDuty

or some separated concepts and include them in the type system for clarity.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think that is necessary, we are almost there.

@leolara
Copy link
Contributor Author

leolara commented Apr 7, 2022

Let's think about this for a moment, all this is because the validatorAPI and the scheduler need to know the public key of a proposer for a given slot.

Instead of making the validtorAPI communicate with the scheduler and access some data structure that was thought to be internal of scheduler in the first time.

Lets create a new component that provides this information (and caches it if the BN cliend does not cache it) to both the scheduler and the validatorAPI.

@leolara
Copy link
Contributor Author

leolara commented Apr 8, 2022

@corverroos please check it before I add tests

return argSet, ok
}

func (s *Scheduler) setFetchArg(duty core.Duty, pubkey core.PubKey, set core.FetchArg) bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not used?

@leolara
Copy link
Contributor Author

leolara commented Apr 8, 2022

@corverroos I forgot a few places to call the private methods, I think now they are all

@@ -206,7 +230,7 @@ func (s *Scheduler) resolveDuties(ctx context.Context, slot slot) error {

duty := core.Duty{Slot: int64(attDuty.Slot), Type: core.DutyAttester}

argSet, ok := s.duties[duty]
argSet, ok := s.getFetchArgSet(duty)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to call getFetchArgSet first. Just call setFetchArgSet below.

if !s.setFetchArg(duty, pubkey, arg){
  log.Debug(ctx, "Ignoring previously resolved duty", z.Any("duty", duty))
  continue
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@leolara leolara force-pushed the leolara/duty-query-scheduler branch from 612e695 to 937add0 Compare April 8, 2022 12:35
@@ -159,7 +182,7 @@ func (s *Scheduler) scheduleSlot(ctx context.Context, slot slot) error {
}

span.End()
delete(s.duties, duty)
s.deleteDuty(duty)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we delete duties after scheduling, so calls to GetDuty will error after it has been scheduled.

Suggest removing the deleteDuty function and adding a TODO:

// TODO(leo): Trim duties after some time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we had the shared component this would not be necessary

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, we will indeed not need to do this here . But we will still need to do it in DutyResolver though. So same logic is required somewhere.

Comment on lines 347 to 349
func (s *Scheduler) isEpochResolved(epoch uint64) bool {
return s.getResolvedEpoch() >= epoch
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this will return true for default s.resolvedEpoch=math.MaxUint64 before the first epoch is resolved which is technically not correct, but maybe that is not a big issue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@corverroos ok, so we need to take this into account

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check now

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice

Copy link
Contributor

@corverroos corverroos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@leolara
Copy link
Contributor Author

leolara commented Apr 8, 2022

Ok, I am going to add tests

@leolara leolara marked this pull request as ready for review April 11, 2022 12:24
Copy link
Contributor

@corverroos corverroos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@leolara leolara added the merge when ready Indicates bulldozer bot may merge when all checks pass label Apr 12, 2022
@obol-bulldozer obol-bulldozer bot merged commit 46ffd09 into main Apr 12, 2022
@obol-bulldozer obol-bulldozer bot deleted the leolara/duty-query-scheduler branch April 12, 2022 14:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merge when ready Indicates bulldozer bot may merge when all checks pass
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants