Skip to content

Conversation

@Mesilenceki
Copy link
Contributor

No description provided.

Copy link
Collaborator

@candyzone candyzone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

device_name = pydev.canonical_name(spec.tensor.device)
device_idx = device_name.find("/device")
if device_idx != -1:
canonical_device.add(device_name[:idx])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

idx -> device_idx

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already updated.

@Mesilenceki Mesilenceki force-pushed the fix/nums-of-saveops-inconsistent-with-device branch from 17ab349 to c0ade86 Compare August 31, 2023 02:15
…d training.

Signed-off-by: 泊霆 <hujunqi.hjq@alibaba-inc.com>
@Mesilenceki Mesilenceki force-pushed the fix/nums-of-saveops-inconsistent-with-device branch from c0ade86 to c2b3500 Compare November 8, 2023 03:23
@candyzone candyzone merged commit 29d9b46 into DeepRec-AI:main Nov 9, 2023
@candyzone candyzone changed the title [Op] Prevent inconsistent number of Ops and devices during distributed training. [Op] Canonicalize SaveV2 Op device spec in distributed training. Nov 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants