-
Notifications
You must be signed in to change notification settings - Fork 635
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Setting attributes should have dtype checks #2764
Comments
@richardjgowers i would like to work on this issue |
I would like to work on this issue |
@richardjgowers sir, can you please review my Pull Request #2788 regarding this issue |
@richardjgowers is this issue still relevant? Is there anything where you'd want to describe more clearly what you'd want to see so that the issue becomes suitable for the GSOC-starter label? |
is this issue still open to work on |
I don't see an open PR referencing this issue so please feel free to open a PR. Make sure that you fill in the PR comment template properly and reference this issue. |
Hello @orbeckst. I'm Fortune from the Outreachy program. I just opened a pull request for this issue. |
@richardjgowers I'm not sure you can blindly enforce dtypes without also giving a preferred actual type; the dtype of string arrays is set to The TypeError raised in the initial message isn't just because the dtype is wrong, it's because the value setter assumes the given value is either a string or a list. Then no further type checking is done. Setting multiple names works fine:
Tagging @fortune-max here as well as you've opened a PR for the issue :) |
Ok I see, its possible there are object topologyattrs that aren’t string,
so hacking the internet mixin seems good yeah.
…On Sat, Apr 23, 2022 at 09:19, Lily Wang ***@***.***> wrote:
@richardjgowers <https://github.com/richardjgowers> I'm not sure you can
blindly enforce dtypes without also giving a preferred actual type; the
dtype of string arrays is set to object, which an integer also is. Other
people have also wanted to set TopologyAttrs of arbitrary type as well, so
we can't assume every object is a string. The fix for strings might need to
edit _StringInternerMixin._set_X to just warn or error if a non-string is
given.
The TypeError raised in the initial message isn't just because the dtype
is wrong, it's because the value setter assumes the given value is
*either* a string or a list. Setting multiple names works fine:
>>> u.atoms[:3].names = [1, 2, 3]
>>> u.atoms[:3].names
array([1, 2, 3], dtype=object)
Tagging @fortune-max <https://github.com/fortune-max> here as well as
you've opened a PR for the issue :)
—
Reply to this email directly, view it on GitHub
<#2764 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACGSGBYL4DO2ZGUG322OVC3VGOXBHANCNFSM4N7PQFRA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Okay, nice. A warning would definitely be better. What about the issue of assigning floats to ids as stated above, should that also throw a warning? @richardjgowers |
My approach would be:
def _set_X(self, ag, values, required_dtype):
if not isinstance(values, required_dtype) and not all(isinstance(v, required_dtype) for v in values):
raise TypeError(f"Values must be of type {required_dtype.__name__}")
class AtomStringAttr(_StringInternerMixin, AtomAttr):
@_check_length
def set_atoms(self, ag, values):
return self._set_X(ag, values, str)
|
@HeetVekariya you've got the string part of it very close! The overall issue here is that TopologyAttrs can have different types (which is set by the When setting the attribute, we should check that the values match this dtype and raise an error if not. Right now for example, as mentioned in the top comment, setting My comment (#2764 (comment)) is referencing that some TopologyAttrs (e.g. Resnames https://github.com/MDAnalysis/mdanalysis/blob/develop/package/MDAnalysis/core/topologyattrs.py#L2738 ) have Even in the other non-string classes,
Yes, this change should be tested thoroughly with separate cases for different types! |
@lilyminium Thank you for guiding me.
def _set_X(self, ag, values):
# Type checking logic
if self.dtype is object: # Special handling for string attributes
if not (isinstance(values, str) or all(isinstance(v, str) for v in values)):
raise TypeError("All values must be strings for string attributes")
else:
# Handling for non-string attributes
if not (isinstance(values, self.dtype) or all(isinstance(v, expected_dtype) for v in values)):
raise TypeError(f"All values must be of type {self.dtype.__name__}")
# Remaining code of _set_X function
Question:
|
No worries @HeetVekariya, it's better to have a clear approach and know what you want to do before creating PRs!
|
Thank you 🙏 😄
def _set_X(self, ag, values):
if not isinstance(values, str) and not all(isinstance(v, str) for v in values):
raise TypeError("All values must be a string") Question:
|
@HeetVekariya the reason I bring up Perhaps the easiest way for you to see what I mean is to write some tests that currently fail, but will pass with the right solution. For example, try the examples listed in the top comment #2764 (comment). In both those cases, the code @richardjgowers has given, should produce an error. Perhaps you could use pytest.raises to write a test that expects a particular error to be raised in those scenarios. There are examples of similar tests in the testsuite, e.g. here and here. Writing tests should pass with your solution, but currently fail, is a great first step in opening a PR as it allows you to check that you're actually fixing the right behaviours -- it's basically test driven development. |
|
@HeetVekariya yes sure, go ahead. |
@lilyminium Hi there ! Hope you are doing well.
Thank you for your guidance till now. |
@HeetVekariya as @lilyminium said above: start a PR where you write a test that fails. Then go from there. The PR can be small and doesn’t have to solve the problem. But it’s a lot easier for others to comment on existing code than to talk about solutions in theory. You already have a number of code contributions and are more experienced. The kind of comments that you got above are typically what we would consider sufficient for a developer to get started on a problem. |
Our Topology system is statically typed, each
TopologyAttr
(found here) should only accept one data type, e.g. atom names (here) should always be strings, resids should always be ints.Currently setting an atom name to an integer gives "
TypeError: 'int' object is not iterable
"This should instead fail and inform that atom names need to be strings.
Setting a Residue's resid to a float seems to work...
This probably is using the inbuilt rounding in numpy int arrays, but it should probably raise an error too.
To fix this, we could have a
check_dtype
decorator aroundTopologyAttr.set_atoms
(and residue/segment) to check that the supplied values are the correct type.The text was updated successfully, but these errors were encountered: