Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Raise error for inconsistent add_ml_model and add_script parameters #324

Merged
merged 17 commits into from
Jul 29, 2023

Conversation

juliaputko
Copy link
Contributor

Added error message for Model.add_ml_model and Model.add_script when devices_per_node parameter is >1 and device parameter is set to CPU.

Test for add_ml_model and add_script added to ensure error is correctly thrown.

@juliaputko juliaputko requested review from ashao and ankona July 21, 2023 18:51
smartsim/entity/dbobject.py Outdated Show resolved Hide resolved
for device_num in range(self.devices_per_node):
devices.append(f"{self.device}:{str(device_num)}")
else:
devices = [self.device]

return devices

def _check_arguments(self):
Copy link
Contributor

@ankona ankona Jul 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i love pulling complex validation out into separate methods but _check_arguments feels like it could be unclear which arguments are being checked (especially since it has none).

I'd probably rename this to _check_devices to make it clear where it's valid to use. I'd also probably pass the inputs instead of allowing the object to maybe be partially initialized.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could see this as being a general sanity checker of the initialization. There's a bit of a tradeoff between cluttering up the constructor and breaking it out into a validator for just devices and number of devices.

As currently coded up (where it only relies on the set properties after initialization is completed), it's unambiguous that the settings are correct. Thoughts @juliaputko @ankona?

Copy link
Contributor

@ankona ankona Jul 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like _check_xxx methods in the constructor follow the pattern of offloading per-field validation to another method. I wouldn't care if they're combined but _check_parameters is definitely improper naming given the existing:

self.file = self._check_filepath(file_path)

smartsim/entity/dbobject.py Outdated Show resolved Hide resolved
msg = "Cannot set devices_per_node>1 if a device numeral is specified, "
msg += f"the device was set to {self.device} and devices_per_node=={self.devices_per_node}"
raise ValueError(msg)
if self.device in ["CPU"] and self.devices_per_node > 1:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need for in?

for device_num in range(self.devices_per_node):
devices.append(f"{self.device}:{str(device_num)}")
else:
devices = [self.device]

return devices

def _check_arguments(self):
devices = []
if ":" in self.device and self.devices_per_node > 1:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we do a check at the start:

if self.devices_per_node <= 1:
  return

We can avoid having compound conditionals below, making it easier to read what's required to trigger a failure.

@@ -578,3 +578,60 @@ def test_db_script_errors(fileutils, wlmutils, mlutils):
# an in-memory script
with pytest.raises(SSUnsupportedError):
colo_ensemble.add_model(colo_model)

def test_inconsistent_params_add_script(fileutils, wlmutils, mlutils):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like we're testing a lot of extra code here. Aren't the new validators in a constructor?

Can we build tests around the smallest unit?

... 
# you could vary x, y, z and exercise the validation code 
# more directly instead of allowing other code to potentially break
with pytest.raises(SSUnsupportedError):
  dbscript = DBScript(x,y,z)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point especially because the add_model, add_script, add_function are essentially just factory methods for their respective DBEntity

if self.device in ["CPU"] and self.devices_per_node > 1:
raise SSUnsupportedError(
"Cannot set devices_per_node>1 if CPU is specified under devices"
)
Copy link
Contributor

@ankona ankona Jul 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to check device?

if not self._check_device(self.device):
  raise ValueError("invalid device...")

I think we have a hole since device is an attribute. Changes after the constructor won't be validated.

Consider adding properties that use _check_device on sets!

@@ -53,6 +54,7 @@ def __init__(
self.file = self._check_filepath(file_path)
self.device = self._check_device(device)
self.devices_per_node = devices_per_node
self._check_arguments()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider using device: Literal["CPU", "GPU"] type hint to tighten the arguments for device (instead of str) in _check_device/__init__

Copy link
Member

@ashao ashao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing this up!

A few typos and some excellent suggestions from @ankona to address

for device_num in range(self.devices_per_node):
devices.append(f"{self.device}:{str(device_num)}")
else:
devices = [self.device]

return devices

def _check_arguments(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could see this as being a general sanity checker of the initialization. There's a bit of a tradeoff between cluttering up the constructor and breaking it out into a validator for just devices and number of devices.

As currently coded up (where it only relies on the set properties after initialization is completed), it's unambiguous that the settings are correct. Thoughts @juliaputko @ankona?

msg += f"the device was set to {self.device} and devices_per_node=={self.devices_per_node}"
raise ValueError(msg)
if self.device in ["CPU", "GPU"] and self.devices_per_node > 1:
if self.device in ["GPU"] and self.devices_per_node > 1:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just check against GPU (no in needed). I suppose too we could always make CPU and GPU something like an enum and attach them to DBObject so that we're comparing an object of some variety instead of a magic string

@@ -793,3 +793,37 @@ def test_colocated_db_model_errors(fileutils, wlmutils, mlutils):

with pytest.raises(SSUnsupportedError):
colo_ensemble.add_model(colo_model)

def test_inconsistent_params_add_ml_model(fileutils, wlmutils, mlutils):
"""Test error when devices_per_node parameter>1 when devices is set to CPU in addd_ml_model function"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo addd_ml_model

@@ -578,3 +578,60 @@ def test_db_script_errors(fileutils, wlmutils, mlutils):
# an in-memory script
with pytest.raises(SSUnsupportedError):
colo_ensemble.add_model(colo_model)

def test_inconsistent_params_add_script(fileutils, wlmutils, mlutils):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point especially because the add_model, add_script, add_function are essentially just factory methods for their respective DBEntity

@codecov
Copy link

codecov bot commented Jul 27, 2023

Codecov Report

Merging #324 (52abef3) into develop (4c741be) will increase coverage by 0.02%.
The diff coverage is 81.81%.

Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff             @@
##           develop     #324      +/-   ##
===========================================
+ Coverage    87.30%   87.33%   +0.02%     
===========================================
  Files           59       59              
  Lines         3522     3529       +7     
===========================================
+ Hits          3075     3082       +7     
  Misses         447      447              
Files Changed Coverage Δ
smartsim/entity/ensemble.py 99.20% <ø> (ø)
smartsim/entity/model.py 95.07% <ø> (ø)
smartsim/entity/dbobject.py 68.51% <80.00%> (+2.86%) ⬆️
smartsim/_core/utils/redis.py 87.23% <100.00%> (-0.53%) ⬇️

@juliaputko juliaputko merged commit 53bff05 into CrayLabs:develop Jul 29, 2023
10 checks passed
@MattToast MattToast added area: api Issues related to API changes type: usability Issues related to ease of use labels Sep 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: api Issues related to API changes type: usability Issues related to ease of use
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants