bugfix: processor device, in-place ops and more by shuyijia · Pull Request #7 · Fung-Lab/MatDeepLearn_dev

shuyijia · 2022-11-15T02:19:36Z

This merge fixes/updates the following:

Default processing device is set to cpu to handle large datasets that are memory hungry.
Loading of data.pt files now handles both cpu and gpu cases.
Removed/replaced all in-place operations in processing to allow gradient flow
Changed min-max scaling over the entire dataset to scaling between 0 and cutoff radius.
Set tqdm progress bar silent if logging.root.level > logging.INFO.

Tested on NERSC on STO_DOS_data and MP_data_npj.

saraheisenach

Left some comments/ideas (nothing crazy, just mainly style suggestions haha) :)

saraheisenach · 2022-11-15T17:59:16Z

+        if device is None:
+            try:
+                self.data, self.slices = torch.load(self.processed_paths[0])
+            except:
+                self.data, self.slices = torch.load(self.processed_paths[0], map_location=torch.device('cpu'))
+        else:
+            if device == 'cpu':
+                self.data, self.slices = torch.load(self.processed_paths[0], map_location=torch.device(device))
+            else:
+                self.data, self.slices = torch.load(self.processed_paths[0])


I'm not sure I fully understand this logic. Wouldn't this behavior be the same?

Suggested change

if device is None:

try:

self.data, self.slices = torch.load(self.processed_paths[0])

except:

self.data, self.slices = torch.load(self.processed_paths[0], map_location=torch.device('cpu'))

else:

if device == 'cpu':

self.data, self.slices = torch.load(self.processed_paths[0], map_location=torch.device(device))

else:

self.data, self.slices = torch.load(self.processed_paths[0])

try:

self.data, self.slices = torch.load(self.processed_paths[0], map_location=torch.device(device))

except:

self.data, self.slices = torch.load(self.processed_paths[0], map_location=torch.device('cpu'))

If map_location is None, that is the same as the default value, so it shouldn't matter whether it is passed. And then the fallback is just if whatever device is passed in doesn't work, then default to cpu, right?

Also, we should try to specify the exception that's thrown whenever possible, so in this case, do you know what error would be thrown? If not we can leave it, but it's nice to have

There might be a situation in which GPU is available but the dataset needs to be loaded on CPU. I have re-implemented the logic to make it clearer.

saraheisenach · 2022-11-15T18:03:04Z


 def threshold_sort(all_distances, r, n_neighbors):
-    A = all_distances.clone().detach()
+    # A = all_distances.clone().detach()


Do we still need this commented out line?

I think we can leave it there for the time being.

saraheisenach · 2022-11-15T18:08:54Z

            try:
                delattr(data, attr)
-            except AttributeError:
+            except:


Like I said above, if possible, we should try to keep the specific exceptions in our try except blocks, so if there's another exception you think is needed, you could doexcept (AttributeError, <other exception>):

Reimplemented this part to do without try and except blocks.

saraheisenach · 2022-11-15T18:11:27Z

-        # fill in the original values
-        self_loop_diag = distance_matrix.diagonal()
-        cutoff_distance_matrix.diagonal().copy_(self_loop_diag)
+    # if image_selfloop:


Same here, is this something that will be implemented later or can we remove it? If we need it, maybe add a TODO saying what needs to be done?

I think we can leave it there for the time being.

saraheisenach · 2022-11-15T18:13:49Z

+        self.disable_tqdm = logging.root.level > logging.INFO
+        self.device = "cpu"
+
+    def set_device(self, device):


Is this function called anywhere? Or is the device now always cpu?

device should be cpu by default. I have removed the setter method to make device as an input parameter with default value cpu.

saraheisenach · 2022-11-15T18:25:48Z

+            elif isinstance(s["y"], list):
+                _y = [float(each) for each in s["y"]]
+                y.append(_y)
+                y_dim = len(_y)


y_dim is the same for every s['y'] in the same run, right? Could you maybe move y_dim assignment from line 212 and from 240 to right above 244 outside of the for loop just to keep them together? Like maybe (or something a little less messy):

y_dim = len(original_structures[0]['y']) if isinstance(original_structures[0]['y'], list) else 1

To me, it makes more sense to keep assignments together so it's easier to read, but no pressure to move it if you feel differently!

Thanks for the suggestion! Accepted your change.

bugfix: processor device, in-place ops and more

0aa3fa3

shuyijia requested a review from saraheisenach November 15, 2022 02:19

saraheisenach reviewed Nov 15, 2022

View reviewed changes

updated processing with cleaner fixes

35ea15f

vxfung merged commit edd54f4 into main Dec 17, 2022

Conversation

shuyijia commented Nov 15, 2022

Uh oh!

saraheisenach left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants