# The Numpy Library

**Burton Rosenberg, May 2017**

Python is being used increasingly as a language for scientific computing because of its qualities as a programing language and because of community developed libraries extending the langauges abilities. Some of these libraries are written in C to truely extend the way the language represents and manipulates data.

Numby introduces the datatype _ndarray_, a multi-dimensional array of numbers. The ndarray improves on the list for efficiency and the collection methods it supports. That includes the notion of _universal functions_ and _broadcasting_. These concepts and methods make it very intuitive to us arrays for scientific computation. 

Numpy is one element of a larger library for Pythonic scientific computing called Scipy. The entire scipy library is described at [scipy.org](https://www.scipy.org/docs.html). One might also look at the [scipy-lectures](http://www.scipy-lectures.org/index.html) tutoral.

## Libraries

Libraries in Python include packages and modules. Modules are files containing Python code that is made available for use with the _import statement_. Packages are collections of modules, represented as entire directory trees of modules. An import statement is of the form  import-as or from-import-as.

Import statements must first find the module in the system enviornment, then make the contents available by populating the local namespace. A simple _import module_ command finds a file with the same name as the module name and populates a local namespace of the same name as the module name. If one wishes the local namespace name to be different, use _import module as name-. 

Here we import the numpy library.

In [15]:
import numpy as np

The from form of the import statement, _from module import name as name_, operates by first looking up the module, then introducing namespaces one by one, according to the trailing name-as clauses.

#### Search path

Names are found by searching the system path. This path can be accessed using the path list of strings in the sys module.


In [16]:
#import sys
import sys
print ("type(sys):", type(sys))
print ("type(sys.path)", type(sys.path))
print ("type(sys.path[0]):", type(sys.path[0]))
print ("sys.path:", sys.path)



type(sys): <class 'module'>
type(sys.path) <class 'list'>
type(sys.path[0]): <class 'str'>
sys.path: ['', '/Users/ojo/anaconda3/lib/python36.zip', '/Users/ojo/anaconda3/lib/python3.6', '/Users/ojo/anaconda3/lib/python3.6/lib-dynload', '/Users/ojo/anaconda3/lib/python3.6/site-packages', '/Users/ojo/anaconda3/lib/python3.6/site-packages/aeosa', '/Users/ojo/anaconda3/lib/python3.6/site-packages/IPython/extensions', '/Users/ojo/.ipython']


#### Packages

_Packages_ are directory-like spaces of modules. Imports of packages retain the directory hierarchy by dot-delimited names in the python namespace. For instance, the _from matplotlib import pyplot as plt_ command brings in matplotlib.pyplot, but binds it to the simple name plt.

## Numpy arrays

The ndarray type provided by the numby library is an efficient and flexible array for scientific computing. It contains only numbers or other arrays in a rectangular fashion. An ndarray is represented in memory as a block of memory, an indexing function (the array's shape) and an base type (some sort of number). The indexing function gives the number of dimensions, the size along each dimension, and a _stride_ indicating offsets between successive elements in a dimension. 

In [17]:
import numpy as np
import inspect

na  = np.array([1,2,3],dtype="float")
nb= np.zeros(3)
nc = np.ones((3,2))
print ("na:",na)
print ("nb:", nb)
print ("nc:", nc)
                
for i,t in enumerate(inspect.getmro(type(na))):
    print (i,t)

# advanced

nia = np.eye(3)
nib = np.diag(np.ones(3))
print ("We hold these truths to be self-evident:", np.array_equal(nia,nib))

na: [1. 2. 3.]
nb: [0. 0. 0.]
nc: [[1. 1.]
 [1. 1.]
 [1. 1.]]
0 <class 'numpy.ndarray'>
1 <class 'object'>
We hold these truths to be self-evident: True


#### element-wise operations, masking-indexing and array functions

Numby arrays support:
* element-wise operations such as addition, multiplication and logical operations; 
* indexing operations based on masking;
* some common functions involving arrays

See the scipy.org numpy-1.12.0 [reference](https://docs.scipy.org/doc/numpy-1.12.0/reference/routines.html)

In [18]:
import numpy as np

a = np.ones(5)
c = np.array([1,2,3,4,5])
print ("We hold these truths to be self-evident:", c.sum()/len(c)==c.mean())
print ("The big ones:", c[c > (a+a)])
print ("We hold these truths to be self-evident:", c.dot(a) == np.sum(a[:]*c[:]))
print ("We hold these truths to be self-evident:", np.linalg.norm(c)**2 == np.inner(c,c))

# note that dot extends to matrix multiplication, but inner 
# dereduces an array by one dimension, by distributing the inner product
# over the last dimension

a = np.arange(12).reshape((3,4))
print ("The shape of a:", a.shape)
print ("The rows of a:")

for i,r in enumerate(a):
    print ('\t',i,r)
print ("A vector in inner products:", np.inner(a,np.arange(4)))
print ("We hold these truths to be self-evident:", np.array_equal(
        np.dot(a,np.arange(4)),
        np.inner(a,np.arange(4))))


We hold these truths to be self-evident: True
The big ones: [3 4 5]
We hold these truths to be self-evident: True
We hold these truths to be self-evident: True
The shape of a: (3, 4)
The rows of a:
	 0 [0 1 2 3]
	 1 [4 5 6 7]
	 2 [ 8  9 10 11]
A vector in inner products: [14 38 62]
We hold these truths to be self-evident: True


#### Arrays in C

The technology behind the ndarray is to wrap native arrays, as represented in C language, inside a Python object. To better understand ndarrays, we discuss how C handles arrays.

C language adopts an model of memory as an array of bytes. Each byte has an address, and given an address A, there are address A+1, A+2, etc., subsequent in memory. It is considered extremely efficient, from an address A and an index i, to store and fetch memory location A+i. 

C has a collect of primitive types including char, int, long, float and double. A declared variable of a type requires a certain number of bytes for storage, as given by the keyword _sizeof_. C mandates the general scaling of types. For instance, sizeof(char)<=sizeof(int), but does not mandate any other value other than that sizeof(char) is one. However, a common situation is short is 2 bytes, int is 4 bytes, long is 8 bytes, float is 8 bytes and and double is 16 bytes.

Arrays in C are _contiguously_ in memory, beginning at a base address associated with the name of the array (bare, without any indexing decorations), with the i-th element located at i times the sizeof the type of element the array stores. An example of notation is to declare a ten element array of integer as _int a[10]_. Then _a_ is a pointer to int, and _a+i_ is the address _i * sizeof(int)_ bytes from the base of the array.

Multidimensional arrays, such as _int [2][3]_ are understood as a sequence of 2 sequences of 3 integers. Here is a picture

<pre>
 int int int.  int int int
+---+---+---+ +---+---+---+   sequences of three int's, each one after the other in memory
+-----------+ +-----------+   the array is two such sequences, one after the other in memory,
</pre>
Numpy makes use of the idea of the _stride_, which is the number of bytes between elements that are consecutive in the array indexing. This can be understood similarly in the C native version of the array, on which the ndarray is built. In this instance, the stride of the rightmost index is sizeof(int), and the stide of the leftmost index is three times that. For 32 bit integers, the numpy would notate this as (12,4) &mdash; meaning that changing the "column" index by one finds the data element exactly 4 bytes further on in memory, and changing the "row" index by one finds the data element exactly 12 bytes further on in memory.

As a further example, _int a[2][3][3]_ would be a sequence of 2 sequences of 3 sequences of 3 int's. Here is a picture:
<pre>
 +--+--+--+ +-+--+--+ +--+--+--+ +--+--+--+ +--+--+--+ +--+--+--+ seq of 3 int's
 +--------+ +-------+ +--------+ +--------+ +--------+ +--------+ seq of 3 int[3]'s
 +-----------------------------+ +------------------------------+ seq of 2 int[3][3]'s.
</pre>
with stride (36,12,4) in 72 total contiguous bytes of memory.

Note then that in order of memory position, the rightmost index moves fastest in this C layout. That is, the elements occur from lowest address to highest as 
<pre>
a[0][0][0], a[0][0][1], a[0][0][2], a[0][1][0], a[0][1][1], ..., a[0][2][2], a[1][0][0], ... , a[1][2][2].
</pre>

#### Slices and views

The ndarray is an object that contains a C-like array in a contiguous block of memory. It also contains and indexing function that maps into the memory in ways more flexible than C syntax. C syntax has a fixed stride at compile time, and no introspective access to that stride, or way to change it. ndarrays have a shape property that can be set, or a reshape method that can be called, to modify the index function. A simple exercise is to call reshape(-1), which will flatten the whatever the ndarray shape is into a one dimensional sequence of values.

A _view_ is an alternative indexing into an array. It might or might no be a copy, depending on whether the view is possible by manipulating strides alone, or applying transpose maps to the indices. For instance, the view generated by the slice notation [::2], that is, take every other element, does not require a copy, but simply doubles the stride in the particular dimension. The transpose operator exchanges the stride and shape values, creating a view of a transpose without rewriting memory.

In [19]:
a = np.arange(12)
b = a[::2].view()
print ("a.strides= {}, a.shape= {}\na= {}\nb.strides= {}, b.shape= {}\nb= {}\n".format(a.strides, 
    a.shape, a, b.strides, b.shape, b))
a.shape = (4,3)
b = a[::-2,::2].view()
print ("a.strides= {}, a.shape= {}\na= {}\nb.strides= {}, b.shape= {}\nb= {}\n".format(a.strides, 
    a.shape, a, b.strides, b.shape, b))
b = a.T.view()
a[0][0]=-1 # change an element in array a, and in array b as well, as it is a no-copy view of a.
print ("a.strides= {}, a.shape= {}\na= {}\nb.strides= {}, b.shape= {}\nb= {}\n".format(a.strides, 
    a.shape, a, b.strides, b.shape, b))


a.strides= (8,), a.shape= (12,)
a= [ 0  1  2  3  4  5  6  7  8  9 10 11]
b.strides= (16,), b.shape= (6,)
b= [ 0  2  4  6  8 10]

a.strides= (24, 8), a.shape= (4, 3)
a= [[ 0  1  2]
 [ 3  4  5]
 [ 6  7  8]
 [ 9 10 11]]
b.strides= (-48, 16), b.shape= (2, 2)
b= [[ 9 11]
 [ 3  5]]

a.strides= (24, 8), a.shape= (4, 3)
a= [[-1  1  2]
 [ 3  4  5]
 [ 6  7  8]
 [ 9 10 11]]
b.strides= (8, 24), b.shape= (3, 4)
b= [[-1  3  6  9]
 [ 1  4  7 10]
 [ 2  5  8 11]]





#### Broadcasting

When the shape of two arrays are broadcast-compatible, the can be extended by broadcasting to distribute multiple copies of one array over the other. In simple terms this includes such operations as _2*c_ where c is an array, since 2 will be broadcast over each element in c (it can also be that _times_ is rewritten as a universal function).

Broadcast-compatible arrays are those whose shapes either agree on any dimension, or one dimension is 1, or they differ in dimensions, in which case the missing dimensions are considers 1's.


In [20]:
a = np.arange(12).reshape(3,4)
print ("We hold these truths to be self-evident:", np.array_equal(
    # broadcast a 1x4 over a 3x4
    a+np.ones(4),
    # elementwise addition of two 3x4's
    a+np.ones(12).reshape(3,4)))


We hold these truths to be self-evident: True


#### Ufuncs

Universal functions are distributed elementwise over each element in an array. This includes some operators, and other standard functions that have been elevated to become ufuncs.
    

In [21]:

# times-by-2 and add-1 are show broadcasting
# elementwise modulo and lessthan are ufunc'ed
c = np.array([1,2,3,4,5])
print ("We hold these truths to be self-evident:", np.all(2*c%2==0))
print ("Vector dominance ordering:", np.all(c<(c+1)))

SyntaxError: invalid syntax (<ipython-input-21-e14954be02bc>, line 5)

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt

x = np.linspace(0,2*np.pi,100)  # ndarray by a linear spacing within a range
s = np.sin(x) # universal function
c = np.cos(x)
plt.plot(x,s) # matlab-like plotting library
plt.plot(x,c)
plt.xlabel('Angle (rad)')
plt.legend(['sine','cosine'])
plt.show()