Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trainer register etcd #3053

Closed

Conversation

typhoonzero
Copy link
Contributor

@typhoonzero typhoonzero commented Jul 25, 2017

Fix #3051

Copy link
Contributor

@gongweibao gongweibao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a comment:
没有单测的话,如果确定是对的?

@typhoonzero
Copy link
Contributor Author

etcd client暂时没有好的方法单测。只能集成测试或者e2e test,如果有好的测试方法,欢迎PR~

}

ctx, cancel := context.WithTimeout(context.Background(), timeout)
_, err = c.Put(ctx, DefaultTrainerPath+"/"+clientUUID.String(), trainerIP, clientv3.WithLease(resp.ID))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about write the pass status into etcd as the following?

type TrainerStatus struct {
    trainerIP string
    pass_num int
    pass_status string
}

For the master check a pass status:

  1. master check trainer pass status:
    1.1. if all trainer finished a pass, goto 2
    1.2. else goto 1
  2. master move task from Done Queue to Todo Queue.

For the trainer check a pass status:

  1. fetch task from master
    1.1. if NoTaskFound, Update status to Finished and goto 2
    1.1. else training with the task
  2. if pass_num == finished_pass_num; then finish the train; else goto 1.

@helinwang
Copy link
Contributor

@typhoonzero @gongweibao 有了glide之后我们可以用https://godoc.org/github.com/coreos/etcd/embed 了,欢迎PR哈!

@helinwang
Copy link
Contributor

Maybe we don't need this PR yet due to the latest change in #2948 ? If so maybe we can close it, and reopen it once needed :)

@typhoonzero
Copy link
Contributor Author

Closing, reopen when needed.

@typhoonzero typhoonzero deleted the trainer_register_etcd branch August 11, 2017 06:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants