Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 5 additions & 4 deletions examples/link_prediction/heterogeneous_inference.py
Original file line number Diff line number Diff line change
Expand Up @@ -266,13 +266,14 @@ def _inference_process(
f"--- Rank {rank} finished writing embeddings to GCS for node type {inference_node_type}, which took {time.time()-write_embedding_start_time:.2f} seconds"
)

# We first call barrier to ensure that all machines and processes have finished inference. Only once this is ensured is it safe to delete the data loader on the current
# machine + process -- otherwise we may fail on processes which are still doing on-the-fly subgraph sampling. We then call `gc.collect()` to cleanup the memory
# used by the data_loader on the current machine.
# We first call barrier to ensure that all machines and processes have finished inference.
# Only once all machines have finished inference is it safe to shutdown the data loader.
# Otherwise, processes which are still sampling *will* fail as the loaders they are trying to communicatate with will be shutdown.
# We then call `gc.collect()` to cleanup the memory used by the data_loader on the current machine.

barrier()

del data_loader
data_loader.shutdown()
gc.collect()

logger.info(
Expand Down
9 changes: 5 additions & 4 deletions examples/link_prediction/homogeneous_inference.py
Original file line number Diff line number Diff line change
Expand Up @@ -253,13 +253,14 @@ def _inference_process(
f"--- Rank {rank} finished writing embeddings to GCS, which took {time.time()-write_embedding_start_time:.2f} seconds"
)

# We first call barrier to ensure that all machines and processes have finished inference. Only once this is ensured is it safe to delete the data loader on the current
# machine + process -- otherwise we may fail on processes which are still doing on-the-fly subgraph sampling. We then call `gc.collect()` to cleanup the memory
# used by the data_loader on the current machine.
# We first call barrier to ensure that all machines and processes have finished inference.
# Only once all machines have finished inference is it safe to shutdown the data loader.
# Otherwise, processes which are still sampling *will* fail as the loaders they are trying to communicatate with will be shutdown.
# We then call `gc.collect()` to cleanup the memory used by the data_loader on the current machine.

barrier()

del data_loader
data_loader.shutdown()
gc.collect()

logger.info(
Expand Down