This repository has been archived by the owner on Jun 6, 2024. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 544
[Azure RDMA] Merge Azure RDMA change into master branch #2091
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…t passed in (#2010) * add avg to singlestate panel (#2006) * Add necessary rdma enviroment in azure to restserver's yarn container startup script. * Issue Fix * [Doc] update job tutorial doc about minFailedTaskCount and minSucceededTaskCount (#2009) * update job tutorial doc * fix comment * fix comment * fix min succeed task count * Issue Fix * fix_log_path (#2012) * Issue Fix * Issue Fix * Issue Fix * Issue Fix * issue fix * add more node related alerts (#2008) * update virtual cluster doc (#1991) * update virtual cluster doc * change vc's definition * add description of vc capacity and availability * fix typo * issue fix * issue fix * issue fix * issue fix * issue fix * issue fix
ydye
requested review from
xudifsd,
hao1939,
Gerhut,
fanyangCS,
wangcan0329,
sterowang and
mzmssg
January 28, 2019 10:41
mzmssg
reviewed
Jan 28, 2019
hao1939
reviewed
Jan 28, 2019
hao1939
reviewed
Jan 28, 2019
logger.warning("3 Times......... Sorry, we will force stopping your operation.") | ||
sys.exit(1) | ||
|
||
def run(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'd better choose some name more readable.
For example, it could be: uploade_xxx_ and so on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so. I think the unified entry point could make more sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
run
is a too common name, which should be avoided.
mzmssg
reviewed
Jan 28, 2019
hao1939
reviewed
Jan 28, 2019
yqwang-ms
approved these changes
Jan 28, 2019
Optimization of machine-list is done. |
Depending on |
Replace |
hao1939
reviewed
Jan 29, 2019
mzmssg
approved these changes
Jan 29, 2019
Move machine list generate logic to paictl. Done. |
hao1939
approved these changes
Jan 29, 2019
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Cluster Level Configuration Switch for Admin to enable the Az RDMA feature
User Level Job Parameter to get the Az RDMA Capable container
Necessary code change in restserver to enable az-RDMA environment to the job container.
Some useful tool of ssh and sftp-copy to help admin to maintain the cluster machines.
An example job of intel MPI benchmark based on azure RDMA. And Guide user to run the mpi task.
A tutorial for admin to enable azure rdma for the cluster.
existing issue