Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sanitizing ArrayData for REST-API #5613

Merged
merged 3 commits into from
Aug 17, 2022
Merged

Sanitizing ArrayData for REST-API #5613

merged 3 commits into from
Aug 17, 2022

Conversation

JPchico
Copy link
Contributor

@JPchico JPchico commented Aug 6, 2022

Sanitizing np.array from Nan and np.inf to ensure that the arrays are proper JSON and can be obtained via the REST-API. Fixes #5559

@JPchico
Copy link
Contributor Author

JPchico commented Aug 10, 2022

@sphuber, @chrisjsewell Do you agree with this PR? Or should something else be done before merging?

@sphuber
Copy link
Contributor

sphuber commented Aug 10, 2022

I'm happy with the concept of changing inf and nan to None. We should probably just add a test and maybe update the documentation of the REST API. Maybe add a note in this section. I also haven't checked in depth whether the numpy array manipulation is correct or the most efficient solution, but if there is a test for it, I am happy to accept it.

@JPchico
Copy link
Contributor Author

JPchico commented Aug 11, 2022

Hi @sphuber!

I'm happy with the concept of changing inf and nan to None. We should probably just add a test and maybe update the documentation of the REST API.

Sure thing! I'll add a test, I guess that what we are interested in is setting a test in the route to verify that the resulting JSON is valid. Since otherwise the valid entries of the array are the same.
I was looking at the test_route and I think that one should then add an array like what is done for the structure

def init_profile(self, aiida_profile_clean, aiida_localhost): # pylint: disable=unused-argument
"""Initialize the profile."""
# pylint: disable=attribute-defined-outside-init,disable=too-many-locals,too-many-statements
api = configure_api(catch_internal_server=True)
self.app = api.app
self.app.config['TESTING'] = True
self.user = orm.User.collection.get_default()
# create test inputs
cell = ((2., 0., 0.), (0., 2., 0.), (0., 0., 2.))
structure = orm.StructureData(cell=cell)
structure.append_atom(position=(0., 0., 0.), symbols=['Ba'])
structure.store()
structure.base.comments.add('This is test comment.')
structure.base.comments.add('Add another comment.')

After that one would just request for that node and check what is obtained from it like in this case

def test_structure_download(self):
"""
Test download of structure file
"""
from aiida.orm import load_node
node_uuid = self.get_dummy_data()['structuredata'][0]['uuid']
url = f'{self.get_url_prefix()}/nodes/{node_uuid}/download?download_format=xsf'
with self.app.test_client() as client:
rv_obj = client.get(url)
structure_data = load_node(node_uuid)._exportcontent('xsf')[0] # pylint: disable=protected-access
assert rv_obj.data == structure_data

Maybe add a note in this section.

Sure thing I'll add a note in the documentation indicating that if an ArrayData contains np.nan or np.inf they will be served as null to make sure that they are valid JSON.

I also haven't checked in depth whether the numpy array manipulation is correct or the most efficient solution, but if there is a test for it, I am happy to accept it.

Yes, I'm also unsure of what the most efficient implementation would be, but this is the best I could find. there is a function in numpy to replace np.nan as numbers nan_to_num. However, that only allows one to replace the np.nan by numbers, which of course means that one would need to arbitrarily choose a number to replace the np.nan and then transform it to None which can be a bit dangerous.
Trying to use a call like array[np.isnan(array)]= None will fail as None cannot be set in a numeric array, if we typecast the array to object then np.isnan does not work.
Pandas has a function that allows one to do this without problems, however, then one would need to include pandas as a dependency which seems overkill.
If there is a more efficient way of doing it I'll happily do it, just have not really found a better way without adding extra dependencies.

@JPchico
Copy link
Contributor Author

JPchico commented Aug 11, 2022

@sphuber done, test and note in docs added. If something else is needed just let me know.

@sphuber sphuber self-requested a review August 16, 2022 14:52
sphuber
sphuber previously approved these changes Aug 16, 2022
Copy link
Contributor

@sphuber sphuber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @JPchico , the changes look ok to me. If you can just fix the failing docs build, this can be merged.

@sphuber
Copy link
Contributor

sphuber commented Aug 17, 2022

If the docsstring are giving you issues, it might be best to remove the typing from the docstring and use type hints directly in the function signature. Didn't request this in the review since it is a minor thing, but since you are touching it now anyway, might as well use type hints

@JPchico
Copy link
Contributor Author

JPchico commented Aug 17, 2022

Hi @sphuber , funnily enough it seems to be that there were some issues with unavailable resources (in other areas of the code), for example one of the errors that I'm getting is the following

(topics/data_types: line  752) -ignored- https://onlinelibrary.wiley.com/doi/10.1002/qua.560300306: 503 Server Error: Service Temporarily Unavailable for url: https://onlinelibrary.wiley.com/doi/10.1002/qua.560300306

I solved another in which the link for the requests documentation seemed to be pointing to an unsafe ulr that could not be accessed. This one though is a bit strange since I can enter the url but the compilation ends in error.

@JPchico
Copy link
Contributor Author

JPchico commented Aug 17, 2022

So removing the numpy.ndarray type hinting and fixing the broken links worked. I will squash the last commits into a single one to have a cleaner history.

…t they are valid JSON.

Adding a note in the documentation explaining that invalid JSON values will be replaced by None which will be rendered as nulls.
Copy link
Contributor

@sphuber sphuber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @JPchico

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

REST API failing with sanitation of JSON response in Arrays
2 participants