Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DeepRec supports multiple evaluator #305

Closed
cuiddyy opened this issue Jul 10, 2022 · 1 comment
Closed

DeepRec supports multiple evaluator #305

cuiddyy opened this issue Jul 10, 2022 · 1 comment

Comments

@cuiddyy
Copy link

cuiddyy commented Jul 10, 2022

Background

At present, DeepRec cannot support the evaluation of very large models on a single node. Multiple ps are required to load large models, and multiple workers are used for distributed evaluation.This can improve DeepRec's support for more scenarios

Realize ideas

Unlike training models, evaluating models does not require modifying the network structure to improve model accuracy, but instead requires consideration of how to improve the throughput of model evaluation and reduce evaluation latency. DeepRec already supports distributed training, and the evaluation is actually simpler compared to the training process because no updates to ps are involved. In the code, DeepRec first decides whether to initialize the cluster and how to initialize it according to the parameters.

There are two modes of distributed multi-evaluator evaluation of the system that need to be implemented.
1.Mode 1 contains ps, worker and evaluator nodes.DeepRec has implemented the case of a single evaluator in this mode,we need to implement multiple evaluators.One of the ideas is to directly add multiple evaluators to the initialization list of distributed clusters in DeepRec, or use the tf.distribute.Strategy interface
2.Mode 2 only has ps and evaluator nodes.The difference between this mode and mode 1 is that there is no need to train, just load the offline model that has been trained into ps and directly evaluate its performance.

@candyzone
Copy link
Collaborator

duplicated #306

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants