# Operating on Numpy arrays

## Objectives

Introduce numpy's **pass-by-reference** approach handling numpy arrays and methods for avoiding pitfalls when operating on numpy arrays.

## Introduction

From: http://span.ece.utah.edu/common-mistakes-in-moving-from-matlab-to-python:

"Whenever you reference an array in Python, the computer will provide the memory address for the thing you are accessing, not the actual value. This is called **pass-by-reference**. This saves memory and makes your programs faster, but it is also harder to keep straight."  

From: https://docs.python.org/2/library/copy.html

"Assignment statements in Python do not copy objects, they create bindings [pointers] between a target and an object." "... a copy is sometimes needed so one can change one copy without changing the other. The 'copy' module provides generic ... copy operations."

If you are not familiar with the **pass-by-reference** aspect of Python then I strongly suggest you read this short, informative essay on "Python Names and Values": https://nedbatchelder.com/text/names.html

We've briefly touched on this important subject in earlier tutorials.  Now we'll go into a bit more detail.

## Variable assignments

Unlike some other languages, creating a new variable with an assignment statement in Python such as
`
x = some_numpy_array
`

does not make a copy of ``some_numpy_array``.  Instead, the assignment statement makes ``x`` and ``some_numpy_array`` both point to the same `numpy` array in memory.  Because ``x`` and ``some_numpy_array`` are both refer (or pointer) to the same `numpy` array in memory, the `numpy` array can be changed by operations on either ``x`` or ``some_numpy_array``.  If you aren't aware of this behavior then you may run into very difficult to identify bugs in your calculations!

### A simple demonstration

Let's demonstrate this issue with a very simple `numpy` array

In [1]:
import numpy as np
from copy import deepcopy 
import ecco_v4_py as ecco
import xarray as xr

In [2]:
# Create a simple numpy array
a=np.array([1, 2, 3, 4, 5])

# Assign 'b' to point to the same numpy array
b=a

# Test to see if b and a point to the same thing
b is a

True

Now change the fourth element of ``b`` and print both ``a`` and ``b``

In [3]:
b[3] = 10
print a
print b

[ 1  2  3 10  5]
[ 1  2  3 10  5]


### A fancier demonstration

Let's now demonstrate with a `numpy` array that stores ``SSH`` output.

In [4]:
# specify the location of your nctiles_monthly directory
data_dir='/Volumes/ECCO_BASE/ECCO_v4r3/nctiles_monthly/SSH/'    
var = 'SSH'
var_type = 'c'
ssh_all_tiles = ecco.load_all_tiles_from_netcdf(data_dir, var, var_type)
ecco.minimal_metadata(ssh_all_tiles)

# specify the location of your nctiles_grid directory
grid_dir='/Volumes/ECCO_BASE/ECCO_v4r3/nctiles_grid/'
var = 'GRID'
var_type = 'grid'
grid_all_tiles = ecco.load_all_tiles_from_netcdf(grid_dir, var, var_type)
ecco.minimal_metadata(grid_all_tiles)

# Merge these datasets
output_all = xr.merge([ssh_all_tiles, grid_all_tiles])


>>> LOADING TILES FROM NETCDF

loading /Volumes/ECCO_BASE/ECCO_v4r3/nctiles_monthly/SSH/SSH.0001.nc
loading /Volumes/ECCO_BASE/ECCO_v4r3/nctiles_monthly/SSH/SSH.0002.nc
loading /Volumes/ECCO_BASE/ECCO_v4r3/nctiles_monthly/SSH/SSH.0003.nc
loading /Volumes/ECCO_BASE/ECCO_v4r3/nctiles_monthly/SSH/SSH.0004.nc
loading /Volumes/ECCO_BASE/ECCO_v4r3/nctiles_monthly/SSH/SSH.0005.nc
loading /Volumes/ECCO_BASE/ECCO_v4r3/nctiles_monthly/SSH/SSH.0006.nc
loading /Volumes/ECCO_BASE/ECCO_v4r3/nctiles_monthly/SSH/SSH.0007.nc
loading /Volumes/ECCO_BASE/ECCO_v4r3/nctiles_monthly/SSH/SSH.0008.nc
loading /Volumes/ECCO_BASE/ECCO_v4r3/nctiles_monthly/SSH/SSH.0009.nc
loading /Volumes/ECCO_BASE/ECCO_v4r3/nctiles_monthly/SSH/SSH.0010.nc
loading /Volumes/ECCO_BASE/ECCO_v4r3/nctiles_monthly/SSH/SSH.0011.nc
loading /Volumes/ECCO_BASE/ECCO_v4r3/nctiles_monthly/SSH/SSH.0012.nc
loading /Volumes/ECCO_BASE/ECCO_v4r3/nctiles_monthly/SSH/SSH.0013.nc
total file load time  0.55019903183 s
concatenated all tiles.  this can

Recall the dimensions of our ``SSH`` `DataArray`:

In [5]:
output_all.SSH.dims

('time', 'tile', 'j', 'i')

Show the first four SSH values in **j** and **i** for the fifth month (May 1992) and second tile:

In [6]:
output_all.SSH[4,1,0:4,0:4].values

array([[-1.46229744, -1.45909059, -1.45165074, -1.45542085],
       [-1.41486502, -1.40930796, -1.40463269, -1.41828668],
       [-1.374681  , -1.36917841, -1.3710494 , -1.39537776],
       [-1.35166287, -1.34794021, -1.35757709, -1.38864243]])

Assign the variable `ssh_tmp` to this *subset* of the `numpy` array that ``SSH`` points to:

In [7]:
ssh_tmp = output_all.SSH[4,1,0:2,0:2].values
ssh_tmp

array([[-1.46229744, -1.45909059],
       [-1.41486502, -1.40930796]])

Now change the values of all elements of ``ssh_tmp`` to 10

In [8]:
ssh_tmp[:] = 10
ssh_tmp

array([[ 10.,  10.],
       [ 10.,  10.]])

And see that yes, in fact, this change is reflected in our ``SSH`` `DataArray`:

In [9]:
output_all.SSH[4,1,0:4,0:4].values

array([[ 10.        ,  10.        ,  -1.45165074,  -1.45542085],
       [ 10.        ,  10.        ,  -1.40463269,  -1.41828668],
       [ -1.374681  ,  -1.36917841,  -1.3710494 ,  -1.39537776],
       [ -1.35166287,  -1.34794021,  -1.35757709,  -1.38864243]])

## Dealing with *pass-by-reference*: right hand side operations

One way to have a new variable assignment not point to the original variable is to *perform an operation on the right hand side of the assignment statement*.  

"Python evaluates expressions from left to right. Notice that while evaluating an assignment, the right-hand side is evaluated before the left-hand side."
https://docs.python.org/2/reference/expressions.html#evaluation-order

Performing an operation on the right hand side creates new values in memory.  The new variable assignment will then point to these new values, leaving the original untouched.

### Simple demonstration 1
Operate on ``a`` by adding 1 before the assigment statement

In [10]:
# Create a simple numpy array
a=np.array([1, 2, 3, 4, 5])

b = a + 1

print a
print b

[1 2 3 4 5]
[2 3 4 5 6]


Now change the fourth element of ``b`` and print both ``a`` and ``b``

In [11]:
b[3] = 10
print a
print b

[1 2 3 4 5]
[ 2  3  4 10  6]


``a`` and ``b`` do indeed point to different values in memory.

### Simple demonstration 2

Operate on ``a`` by adding 0 before the assigment statement.  This is a kind of dummy operation.

In [12]:
# Create a simple numpy array
a=np.array([1, 2, 3, 4, 5])

# Add 0 to `a`:
b = a + 0

print a
print b

[1 2 3 4 5]
[1 2 3 4 5]


In [13]:
# Test to see if b and a point to the same thing
b is a

False

Now change the fourth element of ``b`` and print both ``a`` and ``b``

In [14]:
b[3] = 10
print a
print b

[1 2 3 4 5]
[ 1  2  3 10  5]


Once again we see that ``a`` and ``b`` do indeed point to different values in memory.

### A fancier demonstration

Let's now demonstrate with a `numpy` array that stores ``SSH`` output.

In [15]:
output_all.SSH[4,1,5:9,5:9].values

array([[-1.43729115, -1.43942964, -1.42965627, -1.41491973],
       [-1.40405977, -1.40120637, -1.38851774, -1.37088501],
       [-1.35820544, -1.3533982 , -1.33985138, -1.32088363],
       [-1.30428982, -1.29889154, -1.28463423, -1.26418531]])

In [16]:
ssh_tmp = output_all.SSH[4,1,5:9,5:9].values * output_all.RAC[1,5:9,5:9].values
ssh_tmp[:] = 10
ssh_tmp

array([[ 10.,  10.,  10.,  10.],
       [ 10.,  10.,  10.,  10.],
       [ 10.,  10.,  10.,  10.],
       [ 10.,  10.,  10.,  10.]])

In [17]:
output_all.SSH[4,1,5:9,5:9].values

array([[-1.43729115, -1.43942964, -1.42965627, -1.41491973],
       [-1.40405977, -1.40120637, -1.38851774, -1.37088501],
       [-1.35820544, -1.3533982 , -1.33985138, -1.32088363],
       [-1.30428982, -1.29889154, -1.28463423, -1.26418531]])

Operating on the right hand side of the assignment does indeed new arrays in memory leaving the original SSH `numpy` array untouched.

## Dealing with *pass-by-reference*: copy and deepcopy

A second way to have a new variable assignment not point to the original variable is to *use the copy or deepcopy command*.

### Simple demonstration
Use the `numpy` command.

In [18]:
# Create a simple numpy array
a=np.array([1, 2, 3, 4, 5])
b=np.copy(a)

print a
print b

[1 2 3 4 5]
[1 2 3 4 5]


Now change the fourth element of ``b`` and print both ``a`` and ``b``

In [19]:
b[3] = 10
print a
print b

[1 2 3 4 5]
[ 1  2  3 10  5]


In [20]:
output_all.SSH

<xarray.DataArray 'SSH' (time: 288, tile: 13, j: 90, i: 90)>
array([[[[      nan, ...,       nan],
         ..., 
         [-1.483355, ..., -1.354059]],

        ..., 
        [[-0.713382, ...,       nan],
         ..., 
         [-1.46542 , ...,       nan]]],


       ..., 
       [[[      nan, ...,       nan],
         ..., 
         [-1.408568, ..., -1.319521]],

        ..., 
        [[-0.580478, ...,       nan],
         ..., 
         [-1.390469, ...,       nan]]]])
Coordinates:
  * time      (time) float64 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 10.0 11.0 ...
  * j         (j) float64 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 10.0 11.0 12.0 ...
  * i         (i) float64 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 10.0 11.0 12.0 ...
    tim       (time) datetime64[ns] 1992-01-16 1992-02-16 1992-03-16 ...
    timestep  (time) float64 732.0 1.428e+03 2.172e+03 2.892e+03 3.636e+03 ...
    lon_c     (tile, j, i) float64 -111.6 -111.3 -110.9 -110.5 -110.0 -109.3 ...
    lat_c     (tile, j, i) float64 -

### Fancier demonstration

`Dataset` and `DataArray` objects are too complicated for `numpy`'s `copy` command.  For complex objects such as these use the `deepcopy` command.

In [21]:
ssh_tmp = deepcopy(output_all.SSH)
ssh_tmp[:] = 10
ssh_tmp[4,1,5:9,5:9].values

array([[ 10.,  10.,  10.,  10.],
       [ 10.,  10.,  10.,  10.],
       [ 10.,  10.,  10.,  10.],
       [ 10.,  10.,  10.,  10.]])

In [22]:
output_all.SSH[4,1,5:9,5:9].values

array([[-1.43729115, -1.43942964, -1.42965627, -1.41491973],
       [-1.40405977, -1.40120637, -1.38851774, -1.37088501],
       [-1.35820544, -1.3533982 , -1.33985138, -1.32088363],
       [-1.30428982, -1.29889154, -1.28463423, -1.26418531]])

Using `deepcopy` gives us an entirely new array in memory.  Operations on ``ssh_tmp`` do not affect the original fields that we found in the `output_all_SSH` `DataArray`.

## Conclusion

You now know about the possible pitfalls for dealing with Python's **pass-by-reference** way of handling assignment statements and different methods for making copies of `numpy` arrays and `Datasets` and `DataArrays`.  