You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Attempting inference on ogbn-papers100M for link prediction is causing an OOM (out-of-memory) issue, as shown in the attached screenshot. As a result, the system is becoming extremely unresponsive. The OOM happens in the evaluation function (val_mrr, test_mrr = self.evaluator.evaluate(None, test_scores, 0)) function here. System can successfully save the node embedding and relational embeddings and exit the program without any issue when the evaluation function is omitted.
Experiment setup:
Dataset: ogbn-papers100M partitioned into 3
Instance: g4dn.metal
Command to run Inferencing:
isratnisa
changed the title
OOM while inferencing link prediction on ogbn-papers100M
OOM while inferencing on ogbn-papers100M for link prediction
Apr 10, 2023
isratnisa
changed the title
OOM while inferencing on ogbn-papers100M for link prediction
[BugFix] OOM while inferencing on ogbn-papers100M for link prediction
Apr 10, 2023
isratnisa
changed the title
[BugFix] OOM while inferencing on ogbn-papers100M for link prediction
[Bug] OOM while inferencing on ogbn-papers100M for link prediction
Apr 10, 2023
Resolves [issue# 77](#77)
Attempting distributed inference on ogbn-papers100M (3 partitions) for
link prediction is causing an OOM (out-of-memory) issue. This
[issue](#77) has the
details.
The following screenshot is collected while running inference on 3
g4dn.metal instances. It causes OOM and the screenshot shows 94%
consumed memory:
<img width="1324" alt="Screenshot 2023-04-10 at 11 47 20 AM"
src="https://user-images.githubusercontent.com/18449426/231774872-c7aff289-b274-44f6-9603-402411d0a666.png">
With the proposed fix, the OOM issue is not encountered anymore with a
memory consumption of ~14%
<img width="1006" alt="Screenshot 2023-04-13 at 9 30 09 AM"
src="https://user-images.githubusercontent.com/18449426/231774802-be5245fc-a38c-46ee-b228-5bef50e6d71b.png">
---------
Co-authored-by: Israt Nisa <nisisrat@amazon.com>
Attempting inference on ogbn-papers100M for link prediction is causing an OOM (out-of-memory) issue, as shown in the attached screenshot. As a result, the system is becoming extremely unresponsive. The OOM happens in the evaluation function (
val_mrr, test_mrr = self.evaluator.evaluate(None, test_scores, 0)
) function here. System can successfully save the node embedding and relational embeddings and exit the program without any issue when the evaluation function is omitted.Experiment setup:
Dataset: ogbn-papers100M partitioned into 3
Instance: g4dn.metal
Command to run Inferencing:
Reproduced with the following environment:
Smaller dataset like ogbn-mag works fine on similar setup.
The text was updated successfully, but these errors were encountered: