Skip to content

bugfix: processor device, in-place ops and more#7

Merged
vxfung merged 2 commits intomainfrom
bugfix/processing
Dec 17, 2022
Merged

bugfix: processor device, in-place ops and more#7
vxfung merged 2 commits intomainfrom
bugfix/processing

Conversation

@shuyijia
Copy link
Copy Markdown
Collaborator

This merge fixes/updates the following:

  1. Default processing device is set to cpu to handle large datasets that are memory hungry.
  2. Loading of data.pt files now handles both cpu and gpu cases.
  3. Removed/replaced all in-place operations in processing to allow gradient flow
  4. Changed min-max scaling over the entire dataset to scaling between 0 and cutoff radius.
  5. Set tqdm progress bar silent if logging.root.level > logging.INFO.

Tested on NERSC on STO_DOS_data and MP_data_npj.

Copy link
Copy Markdown
Contributor

@saraheisenach saraheisenach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments/ideas (nothing crazy, just mainly style suggestions haha) :)

Comment thread matdeeplearn/preprocessor/datasets.py Outdated
Comment on lines +19 to +28
if device is None:
try:
self.data, self.slices = torch.load(self.processed_paths[0])
except:
self.data, self.slices = torch.load(self.processed_paths[0], map_location=torch.device('cpu'))
else:
if device == 'cpu':
self.data, self.slices = torch.load(self.processed_paths[0], map_location=torch.device(device))
else:
self.data, self.slices = torch.load(self.processed_paths[0])
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I fully understand this logic. Wouldn't this behavior be the same?

Suggested change
if device is None:
try:
self.data, self.slices = torch.load(self.processed_paths[0])
except:
self.data, self.slices = torch.load(self.processed_paths[0], map_location=torch.device('cpu'))
else:
if device == 'cpu':
self.data, self.slices = torch.load(self.processed_paths[0], map_location=torch.device(device))
else:
self.data, self.slices = torch.load(self.processed_paths[0])
try:
self.data, self.slices = torch.load(self.processed_paths[0], map_location=torch.device(device))
except:
self.data, self.slices = torch.load(self.processed_paths[0], map_location=torch.device('cpu'))

If map_location is None, that is the same as the default value, so it shouldn't matter whether it is passed. And then the fallback is just if whatever device is passed in doesn't work, then default to cpu, right?

Also, we should try to specify the exception that's thrown whenever possible, so in this case, do you know what error would be thrown? If not we can leave it, but it's nice to have

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There might be a situation in which GPU is available but the dataset needs to be loaded on CPU. I have re-implemented the logic to make it clearer.

Comment thread matdeeplearn/preprocessor/datasets.py Outdated

def threshold_sort(all_distances, r, n_neighbors):
A = all_distances.clone().detach()
# A = all_distances.clone().detach()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need this commented out line?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can leave it there for the time being.

Comment thread matdeeplearn/preprocessor/helpers.py Outdated
try:
delattr(data, attr)
except AttributeError:
except:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like I said above, if possible, we should try to keep the specific exceptions in our try except blocks, so if there's another exception you think is needed, you could doexcept (AttributeError, <other exception>):

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reimplemented this part to do without try and except blocks.

# fill in the original values
self_loop_diag = distance_matrix.diagonal()
cutoff_distance_matrix.diagonal().copy_(self_loop_diag)
# if image_selfloop:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, is this something that will be implemented later or can we remove it? If we need it, maybe add a TODO saying what needs to be done?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can leave it there for the time being.

Comment thread matdeeplearn/preprocessor/processor.py Outdated
self.disable_tqdm = logging.root.level > logging.INFO
self.device = "cpu"

def set_device(self, device):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this function called anywhere? Or is the device now always cpu?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

device should be cpu by default. I have removed the setter method to make device as an input parameter with default value cpu.

Comment thread matdeeplearn/preprocessor/processor.py Outdated
elif isinstance(s["y"], list):
_y = [float(each) for each in s["y"]]
y.append(_y)
y_dim = len(_y)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

y_dim is the same for every s['y'] in the same run, right? Could you maybe move y_dim assignment from line 212 and from 240 to right above 244 outside of the for loop just to keep them together? Like maybe (or something a little less messy):

y_dim = len(original_structures[0]['y']) if isinstance(original_structures[0]['y'], list) else 1

To me, it makes more sense to keep assignments together so it's easier to read, but no pressure to move it if you feel differently!

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion! Accepted your change.

Comment thread matdeeplearn/preprocessor/datasets.py Outdated
Comment thread matdeeplearn/preprocessor/helpers.py Outdated
Comment thread matdeeplearn/preprocessor/processor.py Outdated
@vxfung vxfung merged commit edd54f4 into main Dec 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants