# Video: Flattening Arrays with Views

This video covers the different choices and tradeoffs flattening multi-dimensional arrays back down to one dimension.




* If we want to turn a multiple dimensional array back into a one-dimensional array, we have a few choices.
* First we will illustrate them, with a simple, contiguous array, then we will repeat the choices with a non-contiguous example.


In [None]:
import numpy as np

* Here is a 2-dimensional array that is contiguous.

In [None]:
x_contiguous = np.array([[0, 1, 2], [3, 4, 5]])
x_contiguous

array([[0, 1, 2],
       [3, 4, 5]])

* Let's confirm that the data is contiguous.

In [None]:
x_contiguous.flags

  C_CONTIGUOUS : True
  F_CONTIGUOUS : False
  OWNDATA : True
  WRITEABLE : True
  ALIGNED : True
  WRITEBACKIFCOPY : False

* Some interesting stuff in there.
* The one that we want to check for this example is C Contiguous.
* The C there stands for the C programming language, which uses row major order.
* The F there stands for the Fortran programming language which uses column major order.
* So C is not column major order here.
* Anyway, those flags show that the array is contiguous in row major order which is what we want for this example.

* Before I continue I'm going to write a quick little function to print out a few properties that we will keep checking.

In [None]:
def check(a):
    print("ID", id(a))
    print("DATA")
    print(a)
    if a.base is not None:
        print("BASE")
        print(a.base)
        print("BASE ID", id(a.base))
    print("C_CONTIGUOUS", a.flags["C_CONTIGUOUS"])
    print("OWNDATA", a.flags["OWNDATA"])
    print("STRIDES", a.strides)

check(x_contiguous)

ID 137930051686288
DATA
[[0 1 2]
 [3 4 5]]
C_CONTIGUOUS True
OWNDATA True
STRIDES (24, 8)


* The reshape function will flatten an array to one dimension if you just give it one size.
* And if you specify -1 for that size, it will figure it out and make that the number of elements in the array.

In [None]:
check(np.reshape(x_contiguous, -1))

ID 137930051913072
DATA
[0 1 2 3 4 5]
BASE
[[0 1 2]
 [3 4 5]]
BASE ID 137930051686288
C_CONTIGUOUS True
OWNDATA False
STRIDES (8,)


* So we can see that the output of reshape is linked to the input array, and that's confirmed by checking the base id.

* There's another function ravel that we used earlier that also will flatten out the array, and guarantees that the result will be contiguous.

In [None]:
check(np.ravel(x_contiguous))

ID 137930051775024
DATA
[0 1 2 3 4 5]
BASE
[[0 1 2]
 [3 4 5]]
BASE ID 137930051686288
C_CONTIGUOUS True
OWNDATA False
STRIDES (8,)


* Like before, we can see ravel creates a view, and is linked to the original array via the base id.

* Let's take an example that is not contiguous.

In [None]:
x_not_contiguous = x_contiguous[:,::2]
check(x_not_contiguous)

ID 137930051774832
DATA
[[0 2]
 [3 5]]
BASE
[[0 1 2]
 [3 4 5]]
BASE ID 137930051686288
C_CONTIGUOUS False
OWNDATA False
STRIDES (24, 16)


* We did not cover this syntax yet, but this array slice is taking every other value along each row.
* It is still a view.
* If you look at the strides, you can see it moving twice as far in memory when you move along a row.
* 16 bytes instead of 8 before.
* This makes sense since it was made to take every second element along a row.
* And as intended, this array is the first one that is not contiguous.

* Before flattening this array, what do you think it will look like?
* Since the data values are 0, 2, 3, 5, can it be a view of our original array?
* The steps moving through the array will be plus 2, plus 1, plus 2.
* Those uneven steps will not work with strides.
* So we should not expect a view since we can not pick strides walking through the original array.

In [None]:
check(np.reshape(x_not_contiguous, -1))

ID 137930051913360
DATA
[0 2 3 5]
BASE
[[0 2]
 [3 5]]
BASE ID 137930051915664
C_CONTIGUOUS True
OWNDATA False
STRIDES (8,)


* This one surprised me when I was preparing this example.
* Because the data was not contiguous, I knew that it could not be implemented with a view of the original data.
* So I was very surprised to see that this was a view.
* The trick was that the base array here is not the original array that we started with.
* The base shape and contents look like our smaller not contiguous array, but the id does not match that either.
* Let's look at this mystery base some more.

In [None]:
mystery_base = np.reshape(x_not_contiguous, -1).base
check(mystery_base)

ID 137930051778672
DATA
[[0 2]
 [3 5]]
C_CONTIGUOUS True
OWNDATA True
STRIDES (16, 8)


* This mystery base has the same shape and contents of as the first not contiguous array that we made, but this mystery base is contiguous.
* It looks like the first thing that NumPy did after detecting that a view would not work was to make a copy with contiguous data, and then make a view based on that copy.
* Having two arrays with the same data is not very expensive, since the data is the part that can take a lot of memory.

* The NumPy documentation says that the way to check if an object is a view is to check if the base attribute is None.
* That technically worked here.
* But the question that we really wanted answered was whether the new array was a view of the input array.
* Not whether the new array was a view of an array that we never saw before.

**<font color="red">Gareth, can we do some face palm emoji here?</font>**

* Our real question was a bit tricker to answer, as you saw.
* Usually the base attribute check is enough, and honestly, I would not have noticed if I was not trying to highlight the different behavior here.


* Let's wrap up checking ravel on this not contiguous array.

In [None]:
check(np.ravel(x_not_contiguous))

ID 137930051778960
DATA
[0 2 3 5]
C_CONTIGUOUS True
OWNDATA True
STRIDES (8,)


* As expected, ravel returns a new array in this case.
* Both because a view would not work, and because a view would not be contiguous, but ravel promises a contiguous array.

* Wrapping up, why did I spend so much time looking into when these functions used views, and when they did not?
* First, creating a view is very fast. Operations that are just views are basically instant. If you need to copy data, that will take a while if you have a lot of data.
* Second, copying is slow, but usually you get contiguous arrays afterwards, and those will be faster to operate on when you get to doing the real calculations. Not to say that you want a lot of copying, but one copy at the end of a view transformations might speed up the rest of your code.
* Third, it is good to know which behavior is happening. Both for speed, and because if you have a view, there will be data sharing. You really want to know if writing to one array changes another. It is always a bad surprise to find that someone, including yourself, has been overwriting your data unexpectedly.

* This video has been about understanding views to flatten arrays and checking or confirming that they were used.
* There is one more function, actually a method, to flatten arrays.
* Every NumPy array has a method flatten that will return a copy of the array flattened into one dimension.

In [None]:
check(x_not_contiguous.flatten())

ID 137930051914032
DATA
[0 2 3 5]
C_CONTIGUOUS True
OWNDATA True
STRIDES (8,)


* As promised, it made a copy, not a view.
* I skipped flatten earlier, because it promises not to make a view.
* But I am mentioning it now for completeness, and since you will sometimes want to force a copy.

**Code Notes:**
* The NumPy function [`numpy.reshape`](https://numpy.org/doc/stable/reference/generated/numpy.reshape.html), referenced as `np.reshape`, returns a new array with the same data and a specified shape.
  * `reshape` returns a view whenever possible.
  * If you specify length -1 for one axis in the shape, NumPy will calculate that size for you based on the other sizes and the number of data elements.
* The NumPy function [`numpy.ravel`](https://numpy.org/doc/stable/reference/generated/numpy.ravel.html) returns a 1-dimensional array with the same contents as the input, and guarantees that the array will be contiguous.
  * If possible, `ravel` will return a view.
* The NumPy ndarray method [`numpy.ndarray.flatten`](https://numpy.org/doc/stable/reference/generated/numpy.ndarray.flatten.html) returns a copy of the array's data as a one-dimensional array.
  * `flatten` never attempts to return a view which distinguishes it from `numpy.reshape` and `numpy.ravel`.