-
-
Notifications
You must be signed in to change notification settings - Fork 274
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Arraycore #634
base: pt4
Are you sure you want to change the base?
Arraycore #634
Conversation
Looks good to me so far. I specially like the first attempt at getting rid of the node manager as h5py should be the place to handle this. |
Nice work! |
…ead() always returns in sys.byteorder)
if atom is None or shape is None: | ||
raise TypeError('if the obj parameter is not specified ' | ||
'(or None) then both the atom and shape ' | ||
'parametes should be provided.') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
parameters
tables/core/array.py
Outdated
if not out.flags['C_CONTIGUOUS']: | ||
raise ValueError('output array not C contiguous') | ||
|
||
np.copyto(out, arr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can avoid the arr
temporary by using read_direct(array, source_sel=None, dest_sel=None)
.
tables/core/group.py
Outdated
else: | ||
flavor = flavor_of(obj) | ||
# use a temporary object because converting obj at this stage | ||
# breaks some test. This is soultion performs a double, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"This is soultion performs " -> "This fix performs".
tables/core/group.py
Outdated
strides=(0,) * len(shape)) | ||
else: | ||
flavor = flavor_of(obj) | ||
# use a temporary object because converting obj at this stage |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[STYLE] Start comments in its own line with an uppercase letter.
tables/core/leaf.py
Outdated
if self.shape == (): | ||
return 1 | ||
else: | ||
return len(self) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For compactness you can make use of the ternary operator there:
return 1 if self.shape == () else len(self)
Where did the work from the Perth workshop end up? |
🐑 Never mind, I sorted it out (that code is on the pt4 branch which this is a PR into). |
np_byteorders = { | ||
'big': '>', | ||
'little': '<', | ||
'irrelevant': '|' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'irrelevant' should be 'native'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'irrelevant' is meant for ASCII strings and the like, where the order is, well, irrelevant.
tables/core/group.py
Outdated
|
||
# Finally, check whether the desired node is an instance | ||
# of the expected class. | ||
if classname: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might be worth checking classname is not None
so other 'falsy' values (which should probably raise) do not safely pass.
return where.create_table(name, desc, *args, **kwargs) | ||
|
||
def get_node(self, where): | ||
return self.root[where] | ||
def get_node(self, where, name=None, classname=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is to match the API on tables.file.File.get_node
?
- this will conflict with the develop branch - this will become irrelevant on this branch eventually (as all of the cython should be dropped)
- createparents is positional-only now, adjust test - match positional order of remaining args
albertosm27#1 <- getting copy to work. |
@@ -2209,7 +2209,7 @@ def test00b_zeros(self): | |||
print("First row-->", ca[0]) | |||
print("Defaults-->", ca.atom.dflt) | |||
self.assertTrue(allequal(ca[0], numpy.zeros(N, 'S3'))) | |||
self.assertTrue(allequal(ca.atom.dflt, numpy.zeros(N, 'S3'))) | |||
self.assertTrue(allequal(ca.atom.dflt, b"")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why change the tests like this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Multidimensional dtypes are giving me a hard time.
I thought h5py as numpy was putting the dtype.shape inside the array.shape so I did that for the Atom.dflt (equivalent to fillvalue in h5py) and thats why I needed to change the test like that.
I still do not know if multidimensional dtypes are fully supported in h5py, for example in tables/tests/array_mdatom.h5 you can get '/arr' and it will be a multidimensional array but trying it to copy (with create_dataset) will raise an error like 'Can't broadcast shape (5, 5, 5) to (5, 5, 5, 3)' the 3 comes from the dtype shape.
Using the h5py copy method from Group will copy the array correctly with its multidensional dtype (still reading the array returns it as numpy with the dtype shape inside the array shape)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean by 'multidimensional dtypes'?
h5py
should support all dtypes that hdf5 supports (and if any are missing it should be fixed in h5py).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean dtypes with shape like this numpy.dtype('(3,)i4')
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually think this is a numpy bug
In [58]: np.zeros(5, dtype=np.dtype('(3,)i4')).shape
Out[58]: (5, 3)
In [59]: np.zeros(5, dtype=np.dtype('(3,)i4,f4')).shape
Out[59]: (5,)
In [60]: np.zeros(5, dtype=np.dtype('(3,)i4')).dtype
Out[60]: dtype('int32')
In [61]: np.zeros(5, dtype=np.dtype('(3,)i4,f4')).dtype
Out[61]: dtype([('f0', '<i4', (3,)), ('f1', '<f4')])
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe:
In [63]: np.zeros(5, dtype=np.dtype('i4,i4,i4')).shape
Out[63]: (5,)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, so '(3,)i4'
and 3*'i4'
are definitely different datatypes, the question is if we expect the promotion (demotion?) to a int32 array with shape + (3,)
or not.
https://docs.scipy.org/doc/numpy/user/basics.rec.html#structured-arrays
Sorry for the stream of conscious comments as I sort this out 🐑
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tacaswell I remember discussing with Travis the 'demotion' of multidimensional dtypes in NumPy for atomic dtypes (i.e. not compound or structured) quite long time ago and IIRC while he agreed that the demotion was not consistent, he argued easiness of implementation to keep the current behaviour. Perhaps the NumPy guys can change that, but IMO, this ship sailed long time ago so it is not a good idea to ask for that change now.
…tidimensional dtypes
tables/core/array.py
Outdated
@@ -136,8 +140,10 @@ def copy(self, newparent=None, newname=None, | |||
newparent = self.root._get_or_create_path(newparent, createparents) | |||
if self.__class__.__name__ == 'Array': | |||
create_function = newparent.create_array | |||
else: | |||
elif self.__class__.__name__ == 'CArray': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably it would be saner to use isinstance to test the condition, but you need to reverse the order of tests because CArraiy is an Array and EArray is a CArray.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried, but with isinstance I need to import CArray in array.py and CArray also imports Array which makes it crash.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@albertosm27 yes, you are right. Now I remember the issue :/
self.assertRaises(ValueError, vlarray.__setitem__, 1, "shrt") | ||
self.assertRaises(ValueError, vlarray.__setitem__, 1, "toolong") | ||
#self.assertRaises(ValueError, vlarray.__setitem__, 1, "shrt") | ||
#self.assertRaises(ValueError, vlarray.__setitem__, 1, "toolong") | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sure that you add new tests for the assignation of values that are of different length than the old values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, do not forget to add an entry abut this new feature in the CHANGELOG for PyTables 4.0.
First implementation of the new array using h5py backend. The objective of this branch is to pass the basic tests in tables/tests/test_array.py.