# The numpy library

numpy is an abbreviation for the **num**eric **py**thon library. It is a library that is based upon a main data structure:

* the ```NDArray``` class

The ```NDArray``` class is a numeric datastructure similar to a Python ```list``` but unlike a Python ```list``` broadcasts numeric operators and mathematical functions. 

```numpy``` is the most commonly used third-party Python library. It is fundamental for other popular data science libraries:

* The Python and Data Analysis Library - ```pandas```
* The Matrix Plotting Library - ```matplotlib```
* The Data Visualization Library - ```seaborn``` 

These libraries are based upon ```numpy``` and are collectively known as the ```numpy``` stack.

## Tuples and Lists

The ```list``` is a ```builtins``` collection that can be used to store numeric data:

In [1]:
nums1 = [1, 2, 3, 4, 5]
nums2 = [2, 4, 6, 8, 10]

However operators are setup for collections and the ```+``` operator for example performs concatenation, instead of addition:

In [2]:
nums1 + nums2

[1, 2, 3, 4, 5, 2, 4, 6, 8, 10]

Numeric addition and other mathematical operations can be broadcast along an inbuilt array using a ```for``` loop:

In [3]:
summed = []

for idx in range(len(nums1)):
    summed.append(nums1[idx] + nums2[idx])

print(summed)

[3, 6, 9, 12, 15]


Or a slightly more elegant list comprehension:

In [4]:
[num1 + num2 for num1, num2 in zip(nums1, nums2)]

[3, 6, 9, 12, 15]

## Array Module

The ```tuple``` and ```list``` collections are very versatile and each record can be a Python ```object``` from a different class:

This versatility however becomes disadvantagous when the intent is to work with only numeric data using a ```for``` loop as seen above. 

Having the wrong datatype for an element will result in a ```TypeError```.

Python has an ```array``` module. It can be imported using:

In [5]:
import array

The ```array``` module has the following identifiers:

In [6]:
dir(array)

['ArrayType',
 '__doc__',
 '__loader__',
 '__name__',
 '__package__',
 '__spec__',
 '_array_reconstructor',
 'array',
 'typecodes']

The two main identifiers are the attribute ```typecodes```:

In [8]:
array.typecodes

'bBuhHiIlLqQfd'

And the ```array``` class which can be used to create an ```array``` of a uniform datatype:

In [7]:
array.array?

[1;31mInit signature:[0m [0marray[0m[1;33m.[0m[0marray[0m[1;33m([0m[0mself[0m[1;33m,[0m [1;33m/[0m[1;33m,[0m [1;33m*[0m[0margs[0m[1;33m,[0m [1;33m**[0m[0mkwargs[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m     
array(typecode [, initializer]) -> array

Return a new array whose items are restricted by typecode, and
initialized from the optional initializer value, which must be a list,
string or iterable over elements of the appropriate type.

Arrays represent basic values and behave very much like lists, except
the type of objects stored in them is constrained. The type is specified
at object creation time by using a type code, which is a single character.
The following type codes are defined:

    Type code   C Type             Minimum size in bytes
    'b'         signed integer     1
    'B'         unsigned integer   1
    'u'         Unicode character  2 (see note)
    'h'         signed integer     2
    'H'         unsigned integer   

For example the type code ```'l'``` can be used to create an array where each element is a 4 byte signed integer:

In [11]:
nums1 = array.array('l', [1, 2, 3, 4, 5])
nums2 = array.array('l', [2, 4, 6, 8, 10])

In [12]:
nums1

array('l', [1, 2, 3, 4, 5])

In [13]:
nums2

array('l', [2, 4, 6, 8, 10])

The ```array``` instance otherwise behaves consistently to a ```list``` and the ```+``` operator performs concatenation:

In [14]:
nums1 + nums2

array('l', [1, 2, 3, 4, 5, 2, 4, 6, 8, 10])

```list``` comprehension can be used for addition:

In [15]:
array.array('l', [num1 + num2 for num1, num2 in zip(nums1, nums2)])

array('l', [3, 6, 9, 12, 15])

It is possible to use other type codes to conserve memory however this comes at the expense of dynamic range. The type code ```'B'``` for example corresponds to an unsigned 1 byte integer. This means it has the maximum value: 

In [25]:
2 ** (1 * 8) # 1 byte

256

In [35]:
2 ** (2 * 8) # 2 bytes

65536

In [36]:
2 ** (4 * 8) # 4 bytes

4294967296

Recall Python uses zero order indexing so this is ```0:256``` inclusive of the lower bound and exclusive of the upper bound.

The type code ```'b'``` corresponds to a signed 1 byte integer which means half of these values must correspond to negative numbers and the other half of the values correspond to positive numbers. So this is ```-128:128``` inclusive of the lower bound and exclusive of the upper bound.

The type code ```'d'``` can be used to create an array where each element is a 8 byte floating point number:

In [30]:
nums3 = array.array('d', [0.1, 0.2, 0.3, 0.4, 0.5])
nums4 = array.array('d', [0.2, 0.4, 0.6, 0.8, 1.0])

In [31]:
nums3

array('d', [0.1, 0.2, 0.3, 0.4, 0.5])

In [32]:
nums4

array('d', [0.2, 0.4, 0.6, 0.8, 1.0])

Each ```float``` in this array behaves consistently to a ```float``` and is displayed in decimal but encoded in binary. Therefore the recursive rounding errors encountered previously when the ```float``` class was examined still apply:

In [33]:
array.array('d', [num3 + num4 for num3, num4 in zip(nums3, nums4)])

array('d', [0.30000000000000004, 0.6000000000000001, 0.8999999999999999, 1.2000000000000002, 1.5])

The datatype can be changed to ```'f'``` from ```'d'``` which halves the precision which can be seen by a reduction in the trailing zeros:

In [34]:
array.array('f', [num3 + num4 for num3, num4 in zip(nums3, nums4)])

array('f', [0.30000001192092896, 0.6000000238418579, 0.8999999761581421, 1.2000000476837158, 1.5])

Note the ```float``` in ```builtins``` uses ```'d'``` by default which is why this lower precision ```'f'``` is displayed in the above with the precision of ```'d'```.

## Dimensions

The ```array``` class above is also only 1 dimensional. It is not possible to nest other collections in it, as all the values in it by definition must be of a fixed data type. 

In [37]:
[(1, 2, 3),
 (4, 5, 6),
 (7, 8, 9)]

[(1, 2, 3), (4, 5, 6), (7, 8, 9)]