# numpy (numerical python)

For a more comprehensive introduction to NumPy, see the official tutorial here:  [https://docs.scipy.org/doc/numpy-dev/user/quickstart.html](https://docs.scipy.org/doc/numpy-dev/user/quickstart.html).

In [1]:
import numpy

* The main object (i.e., data type) in NumPy is the multidimensional array.  In numpy, dimensions are called *axes*.  The **```numpy.array()```** function is how you can create a simple array.  Note that two parentheses that are needed **```(())```** for correct syntax:

In [2]:
array1 = numpy.array(( [1,2,3], [4,5,6], [7,8,9] ))
array2 = numpy.array([ (1,2), (3,4) ]) # brackets and parentheses both work
array3 = numpy.array(( (1,2), (3,4) ))
array4 = numpy.array(( 'apple', 'orange', 'banana' )) # they can hold strings too!

In [3]:
print(array1)

[[1 2 3]
 [4 5 6]
 [7 8 9]]


* If your array contains multiple data types, they'll be converted to the same one
* Read more about numpy data types (dtypes) [here](https://docs.scipy.org/doc/numpy/reference/arrays.dtypes.html)

In [4]:
array0 = numpy.array(( 'apple', 7.0))
print(array0)
print(type(array0[1]))

['apple' '7.0']
<class 'numpy.str_'>


## ==========> NOW YOU TRY <==========

__... DEBUGGING IN PYTHON:__
* A line number is given, which is sometimes helpful.  Toggle lines on/off in the a notebook by typing:   ```esc``` and then ```shift```+```L```
* Here, it says ```invalid syntax```, so we've done something wrong... sometimes, the error is a line above the one reported

In [5]:
print(array1) # print function requires () in Python 3.x
print() # print empty line for separator
print(array1.shape)
print()
print(array2)
print()
print(array3
print()
print(array4)

SyntaxError: invalid syntax (<ipython-input-5-ed729f202d23>, line 8)

In [6]:
print("I had an " + array4[0] + " with lunch.") # note the print statement accepts a + to concatenate strings

I had an apple with lunch.


## numpy array copying versus assigning

In [7]:
array5 = array3 # assignment:  array3 and array5 are the same array, where python is concerned
print(array5)

[[1 2]
 [3 4]]


In [8]:
array5 *= 2 # multiply array5 by 2, in place
print(array5)
print(array3) # note array3 is the SAME as array5, and changes accordingly

[[2 4]
 [6 8]]
[[2 4]
 [6 8]]


* Try to avoid weird array behavior by using either [```numpy.copy()``` or ```numpy.deepcopy()```](https://docs.python.org/2/library/copy.html)

In [9]:
# avoid this behavior by using numpy.copy
array5 = numpy.copy(array3)
array5 *= 2
print(array5)
print(array3)

[[ 4  8]
 [12 16]]
[[2 4]
 [6 8]]


## Placeholder arrays:  zeros, ones, and empty

You can create a large placeholder array to fill in later.  This can be done using **```numpy.zeros()```**, **```numpy.ones()```**, or **```numpy.empty()```**:

In [10]:
zeros_array = numpy.zeros((5,5)) # (nrows,ncols)
ones_array = numpy.ones((5,5))
empty_array = numpy.empty((3,3))

In [11]:
print(empty_array)

[[ 2.68156159e+154 -2.00389240e+000  2.18850244e-314]
 [ 2.18852068e-314  2.18852071e-314  2.18852075e-314]
 [ 2.18852078e-314  2.18852081e-314  2.18849601e-314]]


## Means, sums, and dimensions

* You can take the mean of an array two different ways:
  * ```numpy.mean(ARRAY)```
  * ```ARRAY.mean()```
* Note it also accepts an ```axis=``` argument; if you don't specify an axis, it flattens the array and takes the mean of everything

In [12]:
# create a 5x5 array of random floats between 0 and 1
random_array = numpy.random.random((5,5))

In [13]:
numpy.mean(random_array, axis=0) # takes mean down columns

array([0.52154475, 0.38111369, 0.50699572, 0.61328978, 0.41147709])

In [14]:
random_array.mean(axis=0)

array([0.52154475, 0.38111369, 0.50699572, 0.61328978, 0.41147709])

In [15]:
print(array1.shape) # same as numpy.shape(array1)
print(array1.mean()) # same as numpy.mean(array1)
print(array1.sum()) # same as numpy.sum(array1)

(3, 3)
5.0
45


## NaNs in numpy
* A NaN value in numpy is declared via ```numpy.nan```
* There are methods that can handle NaNs: ```numpy.nanmean()```, ```numpy.nansum()```, ```numpy.nanstd()```, ```numpy.nanvar()```, ```numpy.nanmin()```, ```numpy.nanmax()```, etc.

In [16]:
random_array[3,3] = numpy.nan
print(random_array)

[[0.37474785 0.10934025 0.02430989 0.36601704 0.2764679 ]
 [0.77335735 0.66021927 0.71612517 0.52373697 0.02690813]
 [0.68054155 0.39870562 0.53667139 0.87092262 0.51370592]
 [0.40009001 0.62741315 0.91362894        nan 0.82317538]
 [0.37898701 0.10989017 0.34424321 0.49925633 0.41712812]]


In [17]:
numpy.mean(random_array, axis=0)

array([0.52154475, 0.38111369, 0.50699572,        nan, 0.41147709])

In [18]:
numpy.nanmean(random_array, axis=0)

array([0.52154475, 0.38111369, 0.50699572, 0.56498324, 0.41147709])

## The ```numpy.arange()``` and ```numpy.linspace``` functions

* **```numpy.arange()```** is similar to **```range()```**, but here, a numpy array is returned

In [19]:
print(numpy.arange(10)) # if you enter an integer, it will be of numpy.int data type
print(numpy.arange(0, 1, 0.1)) # adding the decimal ensures it is of numpy.float type
print(numpy.linspace(0, 1, 11)) # numpy.linspace(start,stop,size)

[0 1 2 3 4 5 6 7 8 9]
[0.  0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9]
[0.  0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1. ]


## ==========> NOW YOU TRY <==========

* Why does ```numpy.arange(0, 1, 0.1)``` __NOT__ include 1.0 as output?
* Fix it to include 1.0 below:

In [20]:
numpy.arange(0, 1, 0.1)

array([0. , 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9])

# Operations on arrays

Basic arithmetic operations on NumPy arrays occur *element-wise*.  

In [21]:
A = numpy.array(( [1,2], [3,4], [5,6] ), dtype=float) # dtype=float, int, etc.
print(A)

[[1. 2.]
 [3. 4.]
 [5. 6.]]


In [22]:
B = numpy.array(( [7,8], [9,10], [11,12] ))
print(B)

[[ 7  8]
 [ 9 10]
 [11 12]]


In [23]:
print(A+B)
print(A-B)

[[ 8. 10.]
 [12. 14.]
 [16. 18.]]
[[-6. -6.]
 [-6. -6.]
 [-6. -6.]]


In [24]:
print(A*B) # element-wise mutiplication

[[ 7. 16.]
 [27. 40.]
 [55. 72.]]


### Dot products on arrays

For the MATLAB users, remember numpy is a bit different when doing "matrix" algebra.  Read the [numpy for MATLAB users](https://docs.scipy.org/doc/numpy-dev/user/numpy-for-matlab-users.html) documentation if you're a MATLABer!

* Remember from linear algebra:  The product of two matrices (i.e., their dot product) needs to have matching interior dimensions:

```[i x j] . [j x k] = [j x k]```

* __FYI:__  If you really, _really_ hate arrays, numpy supports 2D matrices (only! 2D) and considers them a subclass of arrays.  Convert an array to a matrix using [```numpy.matrix()```](https://docs.scipy.org/doc/numpy/reference/generated/numpy.matrix.html).

## ==========> NOW YOU TRY <==========

__Can you solve the error below?__  Hint:  Transpose an array in place using ```array_name.T``` or ```numpy.transpose(array_name)```

In [25]:
print(A.shape)
print(B.shape)
print(numpy.dot(A,B)) # matrix dot product

(3, 2)
(3, 2)


ValueError: shapes (3,2) and (3,2) not aligned: 2 (dim 1) != 3 (dim 0)

Quick computations of means, sums, min, and max can be computed using either the **```A.function()```** or the **```numpy.function(A)```** notation:

In [26]:
print(A.mean(axis=None), \
      A.sum(), \
      A.min(), \
      A.max())
print(numpy.mean(A, axis=None), \
      numpy.sum(A), \
      numpy.min(A), \
      numpy.max(A))

3.5 21.0 1.0 6.0
3.5 21.0 1.0 6.0


### Indexing, slicing, iterating

To index a NumPy array, use square brackets **```[i,j]```** for **```[row,column]```**.

(Don't forget zero indexing.)

In [27]:
C = numpy.arange(100).reshape((10,10)) # note array is reshaped upon creation
C

array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
       [20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
       [30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
       [40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
       [50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
       [60, 61, 62, 63, 64, 65, 66, 67, 68, 69],
       [70, 71, 72, 73, 74, 75, 76, 77, 78, 79],
       [80, 81, 82, 83, 84, 85, 86, 87, 88, 89],
       [90, 91, 92, 93, 94, 95, 96, 97, 98, 99]])

## ==========> NOW YOU TRY <==========

What are the ```[i,j]``` indices for the number __5__?  The number __98__?  The number __99__?

In [28]:
# example/hint:  find number 89...
# it's in the second-to-last row and last column
print(C[-2,-1])
print(C[8,9])

89
89


* Printing subsets of an array

In [29]:
C[:2,:3] # SAME AS BELOW
#C[0:2,0:3]
#C[[0,1],:][:,[0,1,2]] # list of row indices, then list of column indices

array([[ 0,  1,  2],
       [10, 11, 12]])

In [30]:
D = numpy.arange(20)
print(D)

[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19]


* Reverse arrays using the **```[::-1]```** syntax:

In [31]:
D[::-1]

array([19, 18, 17, 16, 15, 14, 13, 12, 11, 10,  9,  8,  7,  6,  5,  4,  3,
        2,  1,  0])

* Skip every nth element sing the **```[::N]```** syntax

In [32]:
D[::2]

array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18])

* To print every nth element:

In [33]:
D[0:16:3] # prints first 16 elements, skips by 3

array([ 0,  3,  6,  9, 12, 15])

### Reshaping versus resizing

The **```numpy.reshape()```** function returns a *new* argument with the specified shape.
The **```numpy.resize()```** function resizes the array *in place*.

Be aware of this difference when flattening or reshaping arrays.

In [34]:
F = numpy.linspace(0,10,50)
F.shape

(50,)

In [35]:
F.reshape(10,5)

array([[ 0.        ,  0.20408163,  0.40816327,  0.6122449 ,  0.81632653],
       [ 1.02040816,  1.2244898 ,  1.42857143,  1.63265306,  1.83673469],
       [ 2.04081633,  2.24489796,  2.44897959,  2.65306122,  2.85714286],
       [ 3.06122449,  3.26530612,  3.46938776,  3.67346939,  3.87755102],
       [ 4.08163265,  4.28571429,  4.48979592,  4.69387755,  4.89795918],
       [ 5.10204082,  5.30612245,  5.51020408,  5.71428571,  5.91836735],
       [ 6.12244898,  6.32653061,  6.53061224,  6.73469388,  6.93877551],
       [ 7.14285714,  7.34693878,  7.55102041,  7.75510204,  7.95918367],
       [ 8.16326531,  8.36734694,  8.57142857,  8.7755102 ,  8.97959184],
       [ 9.18367347,  9.3877551 ,  9.59183673,  9.79591837, 10.        ]])

In [36]:
F.shape # F has not been rewritten!  Shape is still (50,)

(50,)

In [37]:
F.resize(10,5)
print(F)

[[ 0.          0.20408163  0.40816327  0.6122449   0.81632653]
 [ 1.02040816  1.2244898   1.42857143  1.63265306  1.83673469]
 [ 2.04081633  2.24489796  2.44897959  2.65306122  2.85714286]
 [ 3.06122449  3.26530612  3.46938776  3.67346939  3.87755102]
 [ 4.08163265  4.28571429  4.48979592  4.69387755  4.89795918]
 [ 5.10204082  5.30612245  5.51020408  5.71428571  5.91836735]
 [ 6.12244898  6.32653061  6.53061224  6.73469388  6.93877551]
 [ 7.14285714  7.34693878  7.55102041  7.75510204  7.95918367]
 [ 8.16326531  8.36734694  8.57142857  8.7755102   8.97959184]
 [ 9.18367347  9.3877551   9.59183673  9.79591837 10.        ]]


In [38]:
print(F.shape)
print(F.size)

(10, 5)
50
