Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Array mods #113

Merged
merged 8 commits into from
Jun 25, 2015
Merged

Array mods #113

merged 8 commits into from
Jun 25, 2015

Conversation

JimNrao
Copy link
Contributor

@JimNrao JimNrao commented Jun 4, 2015

  1. Modified Array object by adding a new method reformOrResize. This will allow an array with a large amount of memory allocated to change shape without forcing the memory to be reallocated.

  2. Addted ElementType typedef to allow template users of Array to determin the element type; for example:

template
void f (T & x){
T tmp = x (0);
// etc.
}

   that is allocated but not currently needed.  An example would be if
   adding a Matrix of data onto a Cube.

2) Added "typedef T ElementType;" so that templates using an array can
   have access to the element type.  (STL data structures usually do this).
@tammojan
Copy link
Contributor

tammojan commented Jun 5, 2015

To get the github cross-references right: this would fix #111, so whenever this gets merged, also #111 can be closed.

@gervandiepen
Copy link
Contributor

I have a few comments. They have to be addressed in the code.

  1. The ElementType typedef should be removed. Standard STL is to use value_type for such a typedef and that is already present in Array.h.
  2. I do not understand why the implementation of the reform function has changed. baseReform already checks if the length matches. The error message suggests that reform is a nonStrict reform, which it is not.
  3. Why not inline the capacity function?
  4. In Array.h the comment for function capacity needs to be placed before the function declaration, otherwise doxygen does not pick it up.
  5. In Array.h the comments of reformOrResize contain quite some typos. It should not use matrix, but array. In the middle resize instead of copy is used. I got the impression that resizePercentage is applied to the current shape, but it is to the new shape.
  6. reformOrResize checks if the array is contiguous. Note that even if contiguous, the array can be a view on a larger array. It does not check if other Array objects are referencing the same storage. It might be better to check that no more references exist to the array's storage.
    A serious issue: it does not check that the dimensionality does not change, hence it is possible that a Vector gets resized to a 2-dim array which must not be possible.
    I would like to see the code not dependent on the template, being implemented in ArrayBase to avoid bloat. Certainly the first part checking the shape, could be put in ArrayBase.
  7. The test 'shape() == newShape' will throw an exception if their lengths mismatch. Better to use the isEqual function.
  8. Data is not copied if there is sufficient space and copyDataIfNeeded is true. Maybe easier to remove argument copyDataIfNeeded.

@tammojan
Copy link
Contributor

Reopening the pull request, so that improvements can be made in the existing pull request.

@tammojan tammojan reopened this Jun 11, 2015
@JimNrao
Copy link
Contributor Author

JimNrao commented Jun 15, 2015

Response to Ger's comments above:

I have a few comments. They have to be addressed in the code.

1. The ElementType typedef should be removed. Standard STL is to use value_type for such a typedef and that is already present in Array.h.

--> Removed this; I missed the existing typedef (lot's of typedefs in the iterator definitions).

2. I do not understand why the implementation of the reform function has changed. baseReform already checks if the length matches. The error message suggests that reform is a nonStrict reform, which it is not.

--> Reverted the implementation back since this class no longer needs a new implementation.  
    (Manually reverted it but only got the signature changed).

3. Why not inline the capacity function?

--> Done.

4. In Array.h the comment for function capacity needs to be placed before the function declaration, otherwise doxygen does not pick it up.

--> Done

5. In Array.h the comments of reformOrResize contain quite some typos. It should not use matrix, but array. In the middle resize instead of copy is used. I got the impression that resizePercentage is applied to the current shape, but it is to the new shape.

--> Fixed a couple of typos.

6. reformOrResize checks if the array is contiguous. Note that even if contiguous, the array can be a view on a larger array. It does not check if other Array objects are referencing the same storage. It might be better to check that no more references exist to the array's storage.
A serious issue: it does not check that the dimensionality does not change, hence it is possible that a Vector gets resized to a 2-dim array which must not be possible.
I would like to see the code not dependent on the template, being implemented in ArrayBase to avoid bloat. Certainly the first part checking the shape, could be put in ArrayBase.

--> Added two checks to throw an exception if the array is shared or if an attempt is made
    to change the dimensionality.

--> About 1/2 of the checks require access to data_p; it seems more readable to keep all of the
    validation checks together rather than split them between Array and ArrayBase.

7. The test 'shape() == newShape' will throw an exception if their lengths mismatch. Better to use the isEqual function.

--> Changed to isEquals.  (Rather odd that operator== doesn't simply return false if the shapes 
    are different)

8. Data is not copied if there is sufficient space and copyDataIfNeeded is true. Maybe easier to remove argument copyDataIfNeeded.

--> If there is sufficient space, the data is left in place.  The most likely use cases will
    increase or decrease the last dimension, so leaving the data in place is appropriate.
    If a different change is made (e.g., a 2x3 going to a 3x2) it's really hard to say what
    a copy ought to do.  I added a line in the comments to warn the user about this case.

@gervandiepen
Copy link
Contributor

Hi Jim,

Thanks for the changes.
I agree it is odd that IPosition::operator== throws an exception if the
lengths mismatch. Brian Glendenning has written that code long time ago (in
1992) and I always hesitated to change it. Maybe we have to bite the bullet
one time.

I still think copyDataIfNeeded should be removed. The argument promises to
copy the array elements to their new places when reshaping from e.g. [2,4]
to [3,4]. However, that won't be done if there is enough storage. Note: in
such a case element [1,1] is not the same as before the reformOrResize. It
makes it unpredictable for the caller if a copy will be made or not, so
better to never do it. Do you need that argument?

I do not agree data_p is used a lot. In fact, about everything could be
done in ArrayBase because resize is a virtual function. It only requires
data_p.nrefs() to be passed to that base function. In fact, only the call
to setEndIter() needs to be in Array.tcc.

Now there is a test on nrefs()==1, I think the test on isContiguous makes
little sense. As I already commented, even if contiguous the Array could be
a view on a part of the original, already deleted, Array object. I think it
does not matter for reformOrResize if the view is contiguous or not. What
do you think?

Ger

On Mon, Jun 15, 2015 at 7:24 PM, Jim notifications@github.com wrote:

Response to Ger's comments above:

I have a few comments. They have to be addressed in the code.

  1. The ElementType typedef should be removed. Standard STL is to use value_type for such a typedef and that is already present in Array.h.

--> Removed this; I missed the existing typedef (lot's of typedefs in the iterator definitions).

  1. I do not understand why the implementation of the reform function has changed. baseReform already checks if the length matches. The error message suggests that reform is a nonStrict reform, which it is not.

--> Reverted the implementation back since this class no longer needs a new implementation.
(Manually reverted it but only got the signature changed).

  1. Why not inline the capacity function?

--> Done.

  1. In Array.h the comment for function capacity needs to be placed before the function declaration, otherwise doxygen does not pick it up.

--> Done

  1. In Array.h the comments of reformOrResize contain quite some typos. It should not use matrix, but array. In the middle resize instead of copy is used. I got the impression that resizePercentage is applied to the current shape, but it is to the new shape.

--> Fixed a couple of typos.

  1. reformOrResize checks if the array is contiguous. Note that even if contiguous, the array can be a view on a larger array. It does not check if other Array objects are referencing the same storage. It might be better to check that no more references exist to the array's storage.
    A serious issue: it does not check that the dimensionality does not change, hence it is possible that a Vector gets resized to a 2-dim array which must not be possible.
    I would like to see the code not dependent on the template, being implemented in ArrayBase to avoid bloat. Certainly the first part checking the shape, could be put in ArrayBase.

--> Added two checks to throw an exception if the array is shared or if an attempt is made
to change the dimensionality.

--> About 1/2 of the checks require access to data_p; it seems more readable to keep all of the
validation checks together rather than split them between Array and ArrayBase.

  1. The test 'shape() == newShape' will throw an exception if their lengths mismatch. Better to use the isEqual function.

--> Changed to isEquals. (Rather odd that operator== doesn't simply return false if the shapes
are different)

  1. Data is not copied if there is sufficient space and copyDataIfNeeded is true. Maybe easier to remove argument copyDataIfNeeded.

--> If there is sufficient space, the data is left in place. The most likely use cases will
increase or decrease the last dimension, so leaving the data in place is appropriate.
If a different change is made (e.g., a 2x3 going to a 3x2) it's really hard to say what
a copy ought to do. I added a line in the comments to warn the user about this case.


Reply to this email directly or view it on GitHub
#113 (comment).

@JimNrao
Copy link
Contributor Author

JimNrao commented Jun 17, 2015

Maybe I should change the name to copyIfResizing. My use case is if
when a shape of [2,3,4] goes to [2,3,5](i.e., adding another row on a
vis cube); the user need not know if the operation will require resizing
or reforming but they do want the data in element (1,1,1) to be the same
in the resulting array. I pass the argument to the Array::resize method
if resizing is required.

I can see where a user who radically changes shape might be confused
into thinking that this parameter could be used to preserve the data
(e.g., going from [2,3,4] to [4, 2, 3]) even when resizing is not
required. The original Array::reform method just lets the data remain
at the same memory location.

Maybe there needs to be two different methods with slightly different
semantics? One method only allows altering the last dimension while
preserving the data already in place. The second method lets the user
radically reshape the array but provides no guarantees about existing
data. The underlying code might be the same (e.g. the protected
ArrayBase::reformOrResize method would be the same in both cases) with a
two thin methods providing the public API. Maybe one would be called
extendArray (preserves data) and the other could be called
reformOrReshape (no data guarantees).

On 06/16/2015 12:56 AM, Ger van Diepen wrote:

Hi Jim,

Thanks for the changes.
I agree it is odd that IPosition::operator== throws an exception if the
lengths mismatch. Brian Glendenning has written that code long time
ago (in
1992) and I always hesitated to change it. Maybe we have to bite the
bullet
one time.

I still think copyDataIfNeeded should be removed. The argument promises to
copy the array elements to their new places when reshaping from e.g. [2,4]
to [3,4]. However, that won't be done if there is enough storage. Note: in
such a case element [1,1] is not the same as before the reformOrResize. It
makes it unpredictable for the caller if a copy will be made or not, so
better to never do it. Do you need that argument?

I do not agree data_p is used a lot. In fact, about everything could be
done in ArrayBase because resize is a virtual function. It only requires
data_p.nrefs() to be passed to that base function. In fact, only the call
to setEndIter() needs to be in Array.tcc.

Now there is a test on nrefs()==1, I think the test on isContiguous makes
little sense. As I already commented, even if contiguous the Array
could be
a view on a part of the original, already deleted, Array object. I
think it
does not matter for reformOrResize if the view is contiguous or not. What
do you think?

Ger

On Mon, Jun 15, 2015 at 7:24 PM, Jim notifications@github.com wrote:

Response to Ger's comments above:

I have a few comments. They have to be addressed in the code.

  1. The ElementType typedef should be removed. Standard STL is to use
    value_type for such a typedef and that is already present in Array.h.

--> Removed this; I missed the existing typedef (lot's of typedefs
in the iterator definitions).

  1. I do not understand why the implementation of the reform function
    has changed. baseReform already checks if the length matches. The
    error message suggests that reform is a nonStrict reform, which it is not.

--> Reverted the implementation back since this class no longer
needs a new implementation.
(Manually reverted it but only got the signature changed).

  1. Why not inline the capacity function?

--> Done.

  1. In Array.h the comment for function capacity needs to be placed
    before the function declaration, otherwise doxygen does not pick it up.

--> Done

  1. In Array.h the comments of reformOrResize contain quite some
    typos. It should not use matrix, but array. In the middle resize
    instead of copy is used. I got the impression that resizePercentage is
    applied to the current shape, but it is to the new shape.

--> Fixed a couple of typos.

  1. reformOrResize checks if the array is contiguous. Note that even
    if contiguous, the array can be a view on a larger array. It does not
    check if other Array objects are referencing the same storage. It
    might be better to check that no more references exist to the array's
    storage.
    A serious issue: it does not check that the dimensionality does not
    change, hence it is possible that a Vector gets resized to a 2-dim
    array which must not be possible.
    I would like to see the code not dependent on the template, being
    implemented in ArrayBase to avoid bloat. Certainly the first part
    checking the shape, could be put in ArrayBase.

--> Added two checks to throw an exception if the array is shared or
if an attempt is made
to change the dimensionality.

--> About 1/2 of the checks require access to data_p; it seems more
readable to keep all of the
validation checks together rather than split them between Array and
ArrayBase.

  1. The test 'shape() == newShape' will throw an exception if their
    lengths mismatch. Better to use the isEqual function.

--> Changed to isEquals. (Rather odd that operator== doesn't simply
return false if the shapes
are different)

  1. Data is not copied if there is sufficient space and
    copyDataIfNeeded is true. Maybe easier to remove argument
    copyDataIfNeeded.

--> If there is sufficient space, the data is left in place. The
most likely use cases will
increase or decrease the last dimension, so leaving the data in
place is appropriate.
If a different change is made (e.g., a 2x3 going to a 3x2) it's
really hard to say what
a copy ought to do. I added a line in the comments to warn the user
about this case.


Reply to this email directly or view it on GitHub
#113 (comment).


Reply to this email directly or view it on GitHub
#113 (comment)
Bug from
https://github.com/notifications/beacon/AGPGtIjHmFFFrfkJC0bbLx7kplb-3vGkks5oT8A5gaJpZM4E42Hu.gif

Jim Jacobs

phone: 835-7235
office: 301

…Base with

needed template-specific info passed in as parameters.
and extend.  The first method promised nothing about the data contained
in the array while the second method always preserves it.
@JimNrao
Copy link
Contributor Author

JimNrao commented Jun 18, 2015

Created a new method, extend, which only allows changing the last dimension. The new method always preserves the relevant portion of the data. The old now method offers no guarantee about the underlying data. This allows removing the "copyIfNeeded" parameter and provides a clearer set of semantics for each call. The same ArrayBase method does the bulk of the work for both methods so the change is only to improve the quality of the Array API.

@gervandiepen
Copy link
Contributor

I see your use case. But adding an element to another axis than the last
one, is just as valid and will have different semantics. We could say that
if copy=True and another axis than the last one changes, a resize with copy
is always done (because that is cheaper and easier than making a temp copy
in another array; when not using a temp array, you have to figure out how
to copy which can be quite difficult).

But having a function extendArray also sounds very good to me. Do you
really need the more general reformOrResize? If not, we can as well drop
it. Although you can argue it might be useful to be able to reuse an
existing array for totally different purposes, the limitiation that the
dimensionality cannot change undermines it.
Would extendArray also allow you to make the last axis shorter? If so, the
function name is incorrect.

On Wed, Jun 17, 2015 at 7:34 PM, Jim notifications@github.com wrote:

Maybe I should change the name to copyIfResizing. My use case is if
when a shape of [2,3,4] goes to [2,3,5](i.e., adding another row on a
vis cube); the user need not know if the operation will require resizing
or reforming but they do want the data in element (1,1,1) to be the same
in the resulting array. I pass the argument to the Array::resize method
if resizing is required.

I can see where a user who radically changes shape might be confused
into thinking that this parameter could be used to preserve the data
(e.g., going from [2,3,4] to [4, 2, 3]) even when resizing is not
required. The original Array::reform method just lets the data remain
at the same memory location.

Maybe there needs to be two different methods with slightly different
semantics? One method only allows altering the last dimension while
preserving the data already in place. The second method lets the user
radically reshape the array but provides no guarantees about existing
data. The underlying code might be the same (e.g. the protected
ArrayBase::reformOrResize method would be the same in both cases) with a
two thin methods providing the public API. Maybe one would be called
extendArray (preserves data) and the other could be called
reformOrReshape (no data guarantees).

On 06/16/2015 12:56 AM, Ger van Diepen wrote:

Hi Jim,

Thanks for the changes.
I agree it is odd that IPosition::operator== throws an exception if the
lengths mismatch. Brian Glendenning has written that code long time
ago (in
1992) and I always hesitated to change it. Maybe we have to bite the
bullet
one time.

I still think copyDataIfNeeded should be removed. The argument promises
to
copy the array elements to their new places when reshaping from e.g.
[2,4]
to [3,4]. However, that won't be done if there is enough storage. Note:
in
such a case element [1,1] is not the same as before the reformOrResize.
It
makes it unpredictable for the caller if a copy will be made or not, so
better to never do it. Do you need that argument?

I do not agree data_p is used a lot. In fact, about everything could be
done in ArrayBase because resize is a virtual function. It only requires
data_p.nrefs() to be passed to that base function. In fact, only the call
to setEndIter() needs to be in Array.tcc.

Now there is a test on nrefs()==1, I think the test on isContiguous makes
little sense. As I already commented, even if contiguous the Array
could be
a view on a part of the original, already deleted, Array object. I
think it
does not matter for reformOrResize if the view is contiguous or not. What
do you think?

Ger

On Mon, Jun 15, 2015 at 7:24 PM, Jim notifications@github.com wrote:

Response to Ger's comments above:

I have a few comments. They have to be addressed in the code.

  1. The ElementType typedef should be removed. Standard STL is to use
    value_type for such a typedef and that is already present in Array.h.

--> Removed this; I missed the existing typedef (lot's of typedefs
in the iterator definitions).

  1. I do not understand why the implementation of the reform function
    has changed. baseReform already checks if the length matches. The
    error message suggests that reform is a nonStrict reform, which it is
    not.

--> Reverted the implementation back since this class no longer
needs a new implementation.
(Manually reverted it but only got the signature changed).

  1. Why not inline the capacity function?

--> Done.

  1. In Array.h the comment for function capacity needs to be placed
    before the function declaration, otherwise doxygen does not pick it up.

--> Done

  1. In Array.h the comments of reformOrResize contain quite some
    typos. It should not use matrix, but array. In the middle resize
    instead of copy is used. I got the impression that resizePercentage is
    applied to the current shape, but it is to the new shape.

--> Fixed a couple of typos.

  1. reformOrResize checks if the array is contiguous. Note that even
    if contiguous, the array can be a view on a larger array. It does not
    check if other Array objects are referencing the same storage. It
    might be better to check that no more references exist to the array's
    storage.
    A serious issue: it does not check that the dimensionality does not
    change, hence it is possible that a Vector gets resized to a 2-dim
    array which must not be possible.
    I would like to see the code not dependent on the template, being
    implemented in ArrayBase to avoid bloat. Certainly the first part
    checking the shape, could be put in ArrayBase.

--> Added two checks to throw an exception if the array is shared or
if an attempt is made
to change the dimensionality.

--> About 1/2 of the checks require access to data_p; it seems more
readable to keep all of the
validation checks together rather than split them between Array and
ArrayBase.

  1. The test 'shape() == newShape' will throw an exception if their
    lengths mismatch. Better to use the isEqual function.

--> Changed to isEquals. (Rather odd that operator== doesn't simply
return false if the shapes
are different)

  1. Data is not copied if there is sufficient space and
    copyDataIfNeeded is true. Maybe easier to remove argument
    copyDataIfNeeded.

--> If there is sufficient space, the data is left in place. The
most likely use cases will
increase or decrease the last dimension, so leaving the data in
place is appropriate.
If a different change is made (e.g., a 2x3 going to a 3x2) it's
really hard to say what
a copy ought to do. I added a line in the comments to warn the user
about this case.


Reply to this email directly or view it on GitHub
<#113 (comment)
.


Reply to this email directly or view it on GitHub
#113 (comment)

Bug from

https://github.com/notifications/beacon/AGPGtIjHmFFFrfkJC0bbLx7kplb-3vGkks5oT8A5gaJpZM4E42Hu.gif

Jim Jacobs

phone: 835-7235
office: 301


Reply to this email directly or view it on GitHub
#113 (comment).

@tammojan tammojan mentioned this pull request Jun 22, 2015
@tammojan tammojan added this to the 2.0.2 milestone Jun 23, 2015
// Perform an exact resize

resize (newShape, copyDataIfNeeded);
resetEnd = False;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think setEndIter needs to be done by the caller when the array is resized.

@JimNrao
Copy link
Contributor Author

JimNrao commented Jun 23, 2015

Actually I use both aspects of the functionality, although the "extend"
one is the most important. In both use-cases, the data always retains
the same number of dimensions. In the more frequent case, I have to
increase or decrease the last dimension of the cube (or matrix or
vector); this one can occur in the tightest of the read loops (we
normally process a set of visibilities from the same spectral window
over a modest time interval). At the next level up, we potentially
change spectral windows which can change the size of all three axes
(correlation, channel, row). This one occurs less frequently and is not
as important to performance, I think. My suspicion is that anyone using
this method would normally be interested in reusing a block of storage.

I think by splitting it into one call that reforms it to allow storage
reuse (i.e., any data in the array prior to the call should be
considered effectively lost) and one that extends the array in the last
dimension which guarantees data preservation would seem to be a
reasonable set.

Maybe the "extend" method could be called "adjustLastAxis". "Extend"
makes sense if you allow for "negative extension" but when looking at an
API I would probably scan past "extend" looking for something else
(e.g., "shrink"), etc.

On 06/22/2015 12:12 AM, Ger van Diepen wrote:

I see your use case. But adding an element to another axis than the last
one, is just as valid and will have different semantics. We could say that
if copy=True and another axis than the last one changes, a resize with
copy
is always done (because that is cheaper and easier than making a temp copy
in another array; when not using a temp array, you have to figure out how
to copy which can be quite difficult).

But having a function extendArray also sounds very good to me. Do you
really need the more general reformOrResize? If not, we can as well drop
it. Although you can argue it might be useful to be able to reuse an
existing array for totally different purposes, the limitiation that the
dimensionality cannot change undermines it.
Would extendArray also allow you to make the last axis shorter? If so, the
function name is incorrect.

On Wed, Jun 17, 2015 at 7:34 PM, Jim notifications@github.com wrote:

Maybe I should change the name to copyIfResizing. My use case is if
when a shape of [2,3,4] goes to [2,3,5](i.e., adding another row on a
vis cube); the user need not know if the operation will require resizing
or reforming but they do want the data in element (1,1,1) to be the same
in the resulting array. I pass the argument to the Array::resize method
if resizing is required.

I can see where a user who radically changes shape might be confused
into thinking that this parameter could be used to preserve the data
(e.g., going from [2,3,4] to [4, 2, 3]) even when resizing is not
required. The original Array::reform method just lets the data remain
at the same memory location.

Maybe there needs to be two different methods with slightly different
semantics? One method only allows altering the last dimension while
preserving the data already in place. The second method lets the user
radically reshape the array but provides no guarantees about existing
data. The underlying code might be the same (e.g. the protected
ArrayBase::reformOrResize method would be the same in both cases) with a
two thin methods providing the public API. Maybe one would be called
extendArray (preserves data) and the other could be called
reformOrReshape (no data guarantees).

On 06/16/2015 12:56 AM, Ger van Diepen wrote:

Hi Jim,

Thanks for the changes.
I agree it is odd that IPosition::operator== throws an exception
if the
lengths mismatch. Brian Glendenning has written that code long time
ago (in
1992) and I always hesitated to change it. Maybe we have to bite the
bullet
one time.

I still think copyDataIfNeeded should be removed. The argument
promises
to
copy the array elements to their new places when reshaping from e.g.
[2,4]
to [3,4]. However, that won't be done if there is enough storage.
Note:
in
such a case element [1,1] is not the same as before the
reformOrResize.
It
makes it unpredictable for the caller if a copy will be made or
not, so
better to never do it. Do you need that argument?

I do not agree data_p is used a lot. In fact, about everything
could be
done in ArrayBase because resize is a virtual function. It only
requires
data_p.nrefs() to be passed to that base function. In fact, only
the call
to setEndIter() needs to be in Array.tcc.

Now there is a test on nrefs()==1, I think the test on
isContiguous makes
little sense. As I already commented, even if contiguous the Array
could be
a view on a part of the original, already deleted, Array object. I
think it
does not matter for reformOrResize if the view is contiguous or
not. What
do you think?

Ger

On Mon, Jun 15, 2015 at 7:24 PM, Jim notifications@github.com wrote:

Response to Ger's comments above:

I have a few comments. They have to be addressed in the code.

  1. The ElementType typedef should be removed. Standard STL is to use
    value_type for such a typedef and that is already present in Array.h.

--> Removed this; I missed the existing typedef (lot's of typedefs
in the iterator definitions).

  1. I do not understand why the implementation of the reform function
    has changed. baseReform already checks if the length matches. The
    error message suggests that reform is a nonStrict reform, which it is
    not.

--> Reverted the implementation back since this class no longer
needs a new implementation.
(Manually reverted it but only got the signature changed).

  1. Why not inline the capacity function?

--> Done.

  1. In Array.h the comment for function capacity needs to be placed
    before the function declaration, otherwise doxygen does not pick
    it up.

--> Done

  1. In Array.h the comments of reformOrResize contain quite some
    typos. It should not use matrix, but array. In the middle resize
    instead of copy is used. I got the impression that resizePercentage is
    applied to the current shape, but it is to the new shape.

--> Fixed a couple of typos.

  1. reformOrResize checks if the array is contiguous. Note that even
    if contiguous, the array can be a view on a larger array. It does not
    check if other Array objects are referencing the same storage. It
    might be better to check that no more references exist to the array's
    storage.
    A serious issue: it does not check that the dimensionality does not
    change, hence it is possible that a Vector gets resized to a 2-dim
    array which must not be possible.
    I would like to see the code not dependent on the template, being
    implemented in ArrayBase to avoid bloat. Certainly the first part
    checking the shape, could be put in ArrayBase.

--> Added two checks to throw an exception if the array is shared or
if an attempt is made
to change the dimensionality.

--> About 1/2 of the checks require access to data_p; it seems more
readable to keep all of the
validation checks together rather than split them between Array and
ArrayBase.

  1. The test 'shape() == newShape' will throw an exception if their
    lengths mismatch. Better to use the isEqual function.

--> Changed to isEquals. (Rather odd that operator== doesn't simply
return false if the shapes
are different)

  1. Data is not copied if there is sufficient space and
    copyDataIfNeeded is true. Maybe easier to remove argument
    copyDataIfNeeded.

--> If there is sufficient space, the data is left in place. The
most likely use cases will
increase or decrease the last dimension, so leaving the data in
place is appropriate.
If a different change is made (e.g., a 2x3 going to a 3x2) it's
really hard to say what
a copy ought to do. I added a line in the comments to warn the user
about this case.


Reply to this email directly or view it on GitHub

<#113 (comment)
.


Reply to this email directly or view it on GitHub

#113 (comment)

Bug from

https://github.com/notifications/beacon/AGPGtIjHmFFFrfkJC0bbLx7kplb-3vGkks5oT8A5gaJpZM4E42Hu.gif

Jim Jacobs

phone: 835-7235
office: 301


Reply to this email directly or view it on GitHub
#113 (comment).


Reply to this email directly or view it on GitHub
#113 (comment)
Bug from
https://github.com/notifications/beacon/AGPGtLYCivXP5KRnE85Wogh_JnWaVjaPks5oV57kgaJpZM4E42Hu.gif

Jim Jacobs

phone: 835-7235
office: 301

@gervandiepen
Copy link
Contributor

I sympathize with changing extend to a name that better covers the
functionality. adjustLastAxis seems fine.
I now realize my remark about setEndIter was incorrect, because resize is
already doing setEndIter. Maybe better to add that as a comment to the code.
I'll merge the pull request tomorrow morning once you've change the name
extend.
Ger

On Tue, Jun 23, 2015 at 8:04 PM, Jim notifications@github.com wrote:

Actually I use both aspects of the functionality, although the "extend"
one is the most important. In both use-cases, the data always retains
the same number of dimensions. In the more frequent case, I have to
increase or decrease the last dimension of the cube (or matrix or
vector); this one can occur in the tightest of the read loops (we
normally process a set of visibilities from the same spectral window
over a modest time interval). At the next level up, we potentially
change spectral windows which can change the size of all three axes
(correlation, channel, row). This one occurs less frequently and is not
as important to performance, I think. My suspicion is that anyone using
this method would normally be interested in reusing a block of storage.

I think by splitting it into one call that reforms it to allow storage
reuse (i.e., any data in the array prior to the call should be
considered effectively lost) and one that extends the array in the last
dimension which guarantees data preservation would seem to be a
reasonable set.

Maybe the "extend" method could be called "adjustLastAxis". "Extend"
makes sense if you allow for "negative extension" but when looking at an
API I would probably scan past "extend" looking for something else
(e.g., "shrink"), etc.

On 06/22/2015 12:12 AM, Ger van Diepen wrote:

I see your use case. But adding an element to another axis than the last
one, is just as valid and will have different semantics. We could say
that
if copy=True and another axis than the last one changes, a resize with
copy
is always done (because that is cheaper and easier than making a temp
copy
in another array; when not using a temp array, you have to figure out how
to copy which can be quite difficult).

But having a function extendArray also sounds very good to me. Do you
really need the more general reformOrResize? If not, we can as well drop
it. Although you can argue it might be useful to be able to reuse an
existing array for totally different purposes, the limitiation that the
dimensionality cannot change undermines it.
Would extendArray also allow you to make the last axis shorter? If so,
the
function name is incorrect.

On Wed, Jun 17, 2015 at 7:34 PM, Jim notifications@github.com wrote:

Maybe I should change the name to copyIfResizing. My use case is if
when a shape of [2,3,4] goes to [2,3,5](i.e., adding another row on a
vis cube); the user need not know if the operation will require
resizing
or reforming but they do want the data in element (1,1,1) to be the
same
in the resulting array. I pass the argument to the Array::resize method
if resizing is required.

I can see where a user who radically changes shape might be confused
into thinking that this parameter could be used to preserve the data
(e.g., going from [2,3,4] to [4, 2, 3]) even when resizing is not
required. The original Array::reform method just lets the data remain
at the same memory location.

Maybe there needs to be two different methods with slightly different
semantics? One method only allows altering the last dimension while
preserving the data already in place. The second method lets the user
radically reshape the array but provides no guarantees about existing
data. The underlying code might be the same (e.g. the protected
ArrayBase::reformOrResize method would be the same in both cases) with
a
two thin methods providing the public API. Maybe one would be called
extendArray (preserves data) and the other could be called
reformOrReshape (no data guarantees).

On 06/16/2015 12:56 AM, Ger van Diepen wrote:

Hi Jim,

Thanks for the changes.
I agree it is odd that IPosition::operator== throws an exception
if the
lengths mismatch. Brian Glendenning has written that code long time
ago (in
1992) and I always hesitated to change it. Maybe we have to bite the
bullet
one time.

I still think copyDataIfNeeded should be removed. The argument
promises
to
copy the array elements to their new places when reshaping from e.g.
[2,4]
to [3,4]. However, that won't be done if there is enough storage.
Note:
in
such a case element [1,1] is not the same as before the
reformOrResize.
It
makes it unpredictable for the caller if a copy will be made or
not, so
better to never do it. Do you need that argument?

I do not agree data_p is used a lot. In fact, about everything
could be
done in ArrayBase because resize is a virtual function. It only
requires
data_p.nrefs() to be passed to that base function. In fact, only
the call
to setEndIter() needs to be in Array.tcc.

Now there is a test on nrefs()==1, I think the test on
isContiguous makes
little sense. As I already commented, even if contiguous the Array
could be
a view on a part of the original, already deleted, Array object. I
think it
does not matter for reformOrResize if the view is contiguous or
not. What
do you think?

Ger

On Mon, Jun 15, 2015 at 7:24 PM, Jim notifications@github.com
wrote:

Response to Ger's comments above:

I have a few comments. They have to be addressed in the code.

  1. The ElementType typedef should be removed. Standard STL is to
    use
    value_type for such a typedef and that is already present in Array.h.

--> Removed this; I missed the existing typedef (lot's of typedefs
in the iterator definitions).

  1. I do not understand why the implementation of the reform
    function
    has changed. baseReform already checks if the length matches. The
    error message suggests that reform is a nonStrict reform, which it is
    not.

--> Reverted the implementation back since this class no longer
needs a new implementation.
(Manually reverted it but only got the signature changed).

  1. Why not inline the capacity function?

--> Done.

  1. In Array.h the comment for function capacity needs to be placed
    before the function declaration, otherwise doxygen does not pick
    it up.

--> Done

  1. In Array.h the comments of reformOrResize contain quite some
    typos. It should not use matrix, but array. In the middle resize
    instead of copy is used. I got the impression that resizePercentage
    is
    applied to the current shape, but it is to the new shape.

--> Fixed a couple of typos.

  1. reformOrResize checks if the array is contiguous. Note that even
    if contiguous, the array can be a view on a larger array. It does not
    check if other Array objects are referencing the same storage. It
    might be better to check that no more references exist to the array's
    storage.
    A serious issue: it does not check that the dimensionality does not
    change, hence it is possible that a Vector gets resized to a 2-dim
    array which must not be possible.
    I would like to see the code not dependent on the template, being
    implemented in ArrayBase to avoid bloat. Certainly the first part
    checking the shape, could be put in ArrayBase.

--> Added two checks to throw an exception if the array is shared
or
if an attempt is made
to change the dimensionality.

--> About 1/2 of the checks require access to data_p; it seems more
readable to keep all of the
validation checks together rather than split them between Array and
ArrayBase.

  1. The test 'shape() == newShape' will throw an exception if their
    lengths mismatch. Better to use the isEqual function.

--> Changed to isEquals. (Rather odd that operator== doesn't simply
return false if the shapes
are different)

  1. Data is not copied if there is sufficient space and
    copyDataIfNeeded is true. Maybe easier to remove argument
    copyDataIfNeeded.

--> If there is sufficient space, the data is left in place. The
most likely use cases will
increase or decrease the last dimension, so leaving the data in
place is appropriate.
If a different change is made (e.g., a 2x3 going to a 3x2) it's
really hard to say what
a copy ought to do. I added a line in the comments to warn the user
about this case.


Reply to this email directly or view it on GitHub

<#113 (comment)
.


Reply to this email directly or view it on GitHub

<#113 (comment)
.Web

Bug from

https://github.com/notifications/beacon/AGPGtIjHmFFFrfkJC0bbLx7kplb-3vGkks5oT8A5gaJpZM4E42Hu.gif

Jim Jacobs

phone: 835-7235
office: 301


Reply to this email directly or view it on GitHub
<#113 (comment)
.


Reply to this email directly or view it on GitHub
#113 (comment)

Bug from

https://github.com/notifications/beacon/AGPGtLYCivXP5KRnE85Wogh_JnWaVjaPks5oV57kgaJpZM4E42Hu.gif

Jim Jacobs

phone: 835-7235
office: 301


Reply to this email directly or view it on GitHub
#113 (comment).

@JimNrao
Copy link
Contributor Author

JimNrao commented Jun 24, 2015

The name change seems like a good idea.

I was having the ArrayBase method return a bool to indicate to the
caller (in Array) that it needed to call setEndIter since that wasn't
available to ArrayBase.

On 6/23/2015 11:59 PM, Ger van Diepen wrote:

I sympathize with changing extend to a name that better covers the
functionality. adjustLastAxis seems fine.
I now realize my remark about setEndIter was incorrect, because resize is
already doing setEndIter. Maybe better to add that as a comment to the
code.
I'll merge the pull request tomorrow morning once you've change the name
extend.
Ger

On Tue, Jun 23, 2015 at 8:04 PM, Jim notifications@github.com wrote:

Actually I use both aspects of the functionality, although the "extend"
one is the most important. In both use-cases, the data always retains
the same number of dimensions. In the more frequent case, I have to
increase or decrease the last dimension of the cube (or matrix or
vector); this one can occur in the tightest of the read loops (we
normally process a set of visibilities from the same spectral window
over a modest time interval). At the next level up, we potentially
change spectral windows which can change the size of all three axes
(correlation, channel, row). This one occurs less frequently and is not
as important to performance, I think. My suspicion is that anyone using
this method would normally be interested in reusing a block of storage.

I think by splitting it into one call that reforms it to allow storage
reuse (i.e., any data in the array prior to the call should be
considered effectively lost) and one that extends the array in the last
dimension which guarantees data preservation would seem to be a
reasonable set.

Maybe the "extend" method could be called "adjustLastAxis". "Extend"
makes sense if you allow for "negative extension" but when looking at an
API I would probably scan past "extend" looking for something else
(e.g., "shrink"), etc.

On 06/22/2015 12:12 AM, Ger van Diepen wrote:

I see your use case. But adding an element to another axis than
the last
one, is just as valid and will have different semantics. We could say
that
if copy=True and another axis than the last one changes, a resize with
copy
is always done (because that is cheaper and easier than making a temp
copy
in another array; when not using a temp array, you have to figure
out how
to copy which can be quite difficult).

But having a function extendArray also sounds very good to me. Do you
really need the more general reformOrResize? If not, we can as
well drop
it. Although you can argue it might be useful to be able to reuse an
existing array for totally different purposes, the limitiation
that the
dimensionality cannot change undermines it.
Would extendArray also allow you to make the last axis shorter? If so,
the
function name is incorrect.

On Wed, Jun 17, 2015 at 7:34 PM, Jim notifications@github.com wrote:

Maybe I should change the name to copyIfResizing. My use case is if
when a shape of [2,3,4] goes to [2,3,5](i.e., adding another
row on a
vis cube); the user need not know if the operation will require
resizing
or reforming but they do want the data in element (1,1,1) to be the
same
in the resulting array. I pass the argument to the Array::resize
method
if resizing is required.

I can see where a user who radically changes shape might be confused
into thinking that this parameter could be used to preserve the data
(e.g., going from [2,3,4] to [4, 2, 3]) even when resizing is not
required. The original Array::reform method just lets the data
remain
at the same memory location.

Maybe there needs to be two different methods with slightly
different
semantics? One method only allows altering the last dimension while
preserving the data already in place. The second method lets the
user
radically reshape the array but provides no guarantees about
existing
data. The underlying code might be the same (e.g. the protected
ArrayBase::reformOrResize method would be the same in both
cases) with
a
two thin methods providing the public API. Maybe one would be called
extendArray (preserves data) and the other could be called
reformOrReshape (no data guarantees).

On 06/16/2015 12:56 AM, Ger van Diepen wrote:

Hi Jim,

Thanks for the changes.
I agree it is odd that IPosition::operator== throws an exception
if the
lengths mismatch. Brian Glendenning has written that code long
time
ago (in
1992) and I always hesitated to change it. Maybe we have to
bite the
bullet
one time.

I still think copyDataIfNeeded should be removed. The argument
promises
to
copy the array elements to their new places when reshaping
from e.g.
[2,4]
to [3,4]. However, that won't be done if there is enough storage.
Note:
in
such a case element [1,1] is not the same as before the
reformOrResize.
It
makes it unpredictable for the caller if a copy will be made or
not, so
better to never do it. Do you need that argument?

I do not agree data_p is used a lot. In fact, about everything
could be
done in ArrayBase because resize is a virtual function. It only
requires
data_p.nrefs() to be passed to that base function. In fact, only
the call
to setEndIter() needs to be in Array.tcc.

Now there is a test on nrefs()==1, I think the test on
isContiguous makes
little sense. As I already commented, even if contiguous the Array
could be
a view on a part of the original, already deleted, Array object. I
think it
does not matter for reformOrResize if the view is contiguous or
not. What
do you think?

Ger

On Mon, Jun 15, 2015 at 7:24 PM, Jim notifications@github.com
wrote:

Response to Ger's comments above:

I have a few comments. They have to be addressed in the code.

  1. The ElementType typedef should be removed. Standard STL is to
    use
    value_type for such a typedef and that is already present in
    Array.h.

--> Removed this; I missed the existing typedef (lot's of
typedefs
in the iterator definitions).

  1. I do not understand why the implementation of the reform
    function
    has changed. baseReform already checks if the length matches. The
    error message suggests that reform is a nonStrict reform,
    which it is
    not.

--> Reverted the implementation back since this class no longer
needs a new implementation.
(Manually reverted it but only got the signature changed).

  1. Why not inline the capacity function?

--> Done.

  1. In Array.h the comment for function capacity needs to be
    placed
    before the function declaration, otherwise doxygen does not pick
    it up.

--> Done

  1. In Array.h the comments of reformOrResize contain quite some
    typos. It should not use matrix, but array. In the middle resize
    instead of copy is used. I got the impression that
    resizePercentage
    is
    applied to the current shape, but it is to the new shape.

--> Fixed a couple of typos.

  1. reformOrResize checks if the array is contiguous. Note
    that even
    if contiguous, the array can be a view on a larger array. It
    does not
    check if other Array objects are referencing the same storage. It
    might be better to check that no more references exist to the
    array's
    storage.
    A serious issue: it does not check that the dimensionality
    does not
    change, hence it is possible that a Vector gets resized to a 2-dim
    array which must not be possible.
    I would like to see the code not dependent on the template,
    being
    implemented in ArrayBase to avoid bloat. Certainly the first part
    checking the shape, could be put in ArrayBase.

--> Added two checks to throw an exception if the array is
shared
or
if an attempt is made
to change the dimensionality.

--> About 1/2 of the checks require access to data_p; it
seems more
readable to keep all of the
validation checks together rather than split them between
Array and
ArrayBase.

  1. The test 'shape() == newShape' will throw an exception if
    their
    lengths mismatch. Better to use the isEqual function.

--> Changed to isEquals. (Rather odd that operator== doesn't
simply
return false if the shapes
are different)

  1. Data is not copied if there is sufficient space and
    copyDataIfNeeded is true. Maybe easier to remove argument
    copyDataIfNeeded.

--> If there is sufficient space, the data is left in place. The
most likely use cases will
increase or decrease the last dimension, so leaving the data in
place is appropriate.
If a different change is made (e.g., a 2x3 going to a 3x2) it's
really hard to say what
a copy ought to do. I added a line in the comments to warn
the user
about this case.


Reply to this email directly or view it on GitHub

<#113 (comment)
.


Reply to this email directly or view it on GitHub

<#113 (comment)
.Web

Bug from

https://github.com/notifications/beacon/AGPGtIjHmFFFrfkJC0bbLx7kplb-3vGkks5oT8A5gaJpZM4E42Hu.gif

Jim Jacobs

phone: 835-7235
office: 301


Reply to this email directly or view it on GitHub

<#113 (comment)
.


Reply to this email directly or view it on GitHub

#113 (comment)

Bug from

https://github.com/notifications/beacon/AGPGtLYCivXP5KRnE85Wogh_JnWaVjaPks5oV57kgaJpZM4E42Hu.gif

Jim Jacobs

phone: 835-7235
office: 301


Reply to this email directly or view it on GitHub
#113 (comment).


Reply to this email directly or view it on GitHub
#113 (comment)
Bug from
https://github.com/notifications/beacon/AGPGtPmkzDG7gBzYmQjqA1wSkRzsHllKks5oWj6mgaJpZM4E42Hu.gif

Jim Jacobs

gervandiepen added a commit that referenced this pull request Jun 25, 2015
Issue #111: Added the capacity feature making it possible to resize without acquiring memory.
@gervandiepen gervandiepen merged commit 26cb696 into casacore:master Jun 25, 2015
@gervandiepen
Copy link
Contributor

Finalized issue #111

@juliantaylor
Copy link
Contributor

you shouldn't merge PR's with failing travis tests, casacore now fails to build:

/home/jtaylor/eso/casa/casacore/casa/Arrays/test/tArray.cc:770:20: error: ‘class casa::Array<int>’ has no member named ‘extend’
  bool resized = a1.extend (newShape);

@JimNrao
Copy link
Contributor Author

JimNrao commented Jun 25, 2015

Fixed the error in tArray.cc. Probably needs to be repoened and remerged after travis passes on it?

@tammojan
Copy link
Contributor

@JimNrao Could you just fix it on the master branch?

@juliantaylor
Copy link
Contributor

I don't think you can add to an already merged PR, please make a new one with the fix.

@tammojan
Copy link
Contributor

A new PR is indeed better, I'll merge it once it passes travis.

@gervandiepen
Copy link
Contributor

I agree. I hadn't paid enough attention to it. A good lesson.

On Thu, Jun 25, 2015 at 4:49 PM, Julian Taylor notifications@github.com
wrote:

you shouldn't merge PR's with failing travis tests, casacore now fails to
build:

/home/jtaylor/eso/casa/casacore/casa/Arrays/test/tArray.cc:770:20: error: ‘class casa::Array’ has no member named ‘extend’
bool resized = a1.extend (newShape);


Reply to this email directly or view it on GitHub
#113 (comment).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants