# NumPy Basics

Numpy is short for *Numerical Python* and is the fundamental package used for high performance numerical computing in `Python`. 

Numpy provides the following:

- `ndarray` - a fast and memory-efficient multidimensional array providing vectorized operations and *broadcasting* capabilities.
- Standard mathematical functions for fast operations on entire arrays of data without the need for loops (i.e. *vectorized*)
- Tools for reading and writing data to disk and with memory-mapped files.
- Linear algebra, psuedo-random number generators, FFT's, etc
- Tools for linking `Python` to very efficient low-level codes written in `C`, `C++`, and `Fortran`

## The NumPy `ndarray`: A Multidimensional Array Object

The key feature of NumPy is its N-dimensional array object `ndarray`. This allows us to use a fast, flexible container for scientific data sets and to perform mathematical operations on these data efficiently.

Here is a simple example:

In [1]:
import numpy as np

data = np.ones((4,4))

In [2]:
whos

Variable   Type       Data/Info
-------------------------------
data       ndarray    4x4: 16 elems, type `float64`, 128 bytes
np         module     <module 'numpy' from 'C:\<...>ges\\numpy\\__init__.py'>


In [3]:
data

array([[1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.]])

In [4]:
data.dtype

dtype('float64')

In [7]:
data.shape

(4, 4)

## Creating `ndarrays`

The most straightforward way to create an `ndarray` is to use the `array` function like so with the argument being a `list`:

In [8]:
arr = np.array([1,2,4,6.5])

In [9]:
arr

array([1. , 2. , 4. , 6.5])

In [10]:
whos

Variable   Type       Data/Info
-------------------------------
arr        ndarray    4: 4 elems, type `float64`, 32 bytes
data       ndarray    4x4: 16 elems, type `float64`, 128 bytes
np         module     <module 'numpy' from 'C:\<...>ges\\numpy\\__init__.py'>


Can also do it on a declared, and even nested list (will make a matrix):

In [None]:
arr.dtype

In [16]:
arr.shape

(4,)

In [12]:
dat = [[1,2], [3,4]]
arr2 = np.array(dat)

In [13]:
arr2

array([[1, 2],
       [3, 4]])

In [21]:
rows, cols = arr2.shape
print(rows, cols)

2 2


In [15]:
whos

Variable   Type       Data/Info
-------------------------------
arr        ndarray    4: 4 elems, type `float64`, 32 bytes
arr2       ndarray    2x2: 4 elems, type `int32`, 16 bytes
dat        list       n=2
data       ndarray    4x4: 16 elems, type `float64`, 128 bytes
np         module     <module 'numpy' from 'C:\<...>ges\\numpy\\__init__.py'>


## Data Types

In [22]:
arr3 = np.array([1,2,3], dtype=np.float64)

In [23]:
arr4 = np.array([1,2,3], dtype=np.int32)

In [24]:
arr3.dtype

dtype('float64')

In [25]:
arr4.dtype

dtype('int32')

In [26]:
whos

Variable   Type       Data/Info
-------------------------------
arr        ndarray    4: 4 elems, type `float64`, 32 bytes
arr2       ndarray    2x2: 4 elems, type `int32`, 16 bytes
arr3       ndarray    3: 3 elems, type `float64`, 24 bytes
arr4       ndarray    3: 3 elems, type `int32`, 12 bytes
cols       int        2
dat        list       n=2
data       ndarray    4x4: 16 elems, type `float64`, 128 bytes
np         module     <module 'numpy' from 'C:\<...>ges\\numpy\\__init__.py'>
rows       int        2


There are other data types, but these will be the most commonly used. You can also make explicit conversions between types:

In [27]:
arr5 = np.array([1,2,3,4,5])

In [28]:
arr5.dtype

dtype('int32')

In [29]:
float_arr = arr5.astype(np.float64)

In [30]:
float_arr.dtype

dtype('float64')

## Operations on Arrays

In [31]:
arr = np.array([[1.,2.,3.], [4.,5.,6.]])

In [32]:
arr

array([[1., 2., 3.],
       [4., 5., 6.]])

In [33]:
arr * arr

array([[ 1.,  4.,  9.],
       [16., 25., 36.]])

In [35]:
arr - arr

array([[0., 0., 0.],
       [0., 0., 0.]])

In [36]:
1 / arr

array([[1.        , 0.5       , 0.33333333],
       [0.25      , 0.2       , 0.16666667]])

In [37]:
arr ** 0.5

array([[1.        , 1.41421356, 1.73205081],
       [2.        , 2.23606798, 2.44948974]])

In [38]:
np.sqrt(arr)

array([[1.        , 1.41421356, 1.73205081],
       [2.        , 2.23606798, 2.44948974]])

In [40]:
2. * arr

array([[ 2.,  4.,  6.],
       [ 8., 10., 12.]])

In [41]:
np.pi * arr

array([[ 3.14159265,  6.28318531,  9.42477796],
       [12.56637061, 15.70796327, 18.84955592]])

In [42]:
type(np.pi)

float

## Basic Indexing and Slicing

In [50]:
arr = np.arange(10)

In [51]:
arr

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [52]:
arr[5]

5

In [54]:
arr[5:8]

array([5, 6, 7])

In [55]:
arr[5:8] = 12

In [56]:
arr

array([ 0,  1,  2,  3,  4, 12, 12, 12,  8,  9])

In [57]:
arr_slice = arr[5:8]

In [58]:
whos

Variable    Type       Data/Info
--------------------------------
arr         ndarray    10: 10 elems, type `int32`, 40 bytes
arr2        ndarray    2x2: 4 elems, type `int32`, 16 bytes
arr3        ndarray    3: 3 elems, type `float64`, 24 bytes
arr4        ndarray    3: 3 elems, type `int32`, 12 bytes
arr5        ndarray    5: 5 elems, type `int32`, 20 bytes
arr_slice   ndarray    3: 3 elems, type `int32`, 12 bytes
cols        int        2
dat         list       n=2
data        ndarray    4x4: 16 elems, type `float64`, 128 bytes
float_arr   ndarray    5: 5 elems, type `float64`, 40 bytes
np          module     <module 'numpy' from 'C:\<...>ges\\numpy\\__init__.py'>
rows        int        2


In [59]:
arr_slice[1] = 12345

In [60]:
arr

array([    0,     1,     2,     3,     4,    12, 12345,    12,     8,
           9])

In [61]:
arr_slice[:] = 64

In [62]:
arr

array([ 0,  1,  2,  3,  4, 64, 64, 64,  8,  9])

In [63]:
arr2d = np.array([[1,2,3],[4,5,6],[7,8,9]])

In [64]:
arr2d

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [65]:
arr2d[2]

array([7, 8, 9])

In [66]:
arr2d[0][2]

3

In [67]:
arr2d[0,2]

3

In [68]:
arr3d = np.array([[[1,2,3], [4,5,6]], [[7,8,9], [10,11,12]]])

In [69]:
arr3d

array([[[ 1,  2,  3],
        [ 4,  5,  6]],

       [[ 7,  8,  9],
        [10, 11, 12]]])

In [70]:
arr3d.shape

(2, 2, 3)

In [71]:
arr3d[0]

array([[1, 2, 3],
       [4, 5, 6]])

Both scalar values and arrays can be assigned to `arr3d[0]`:

In [72]:
old_values = arr3d[0].copy()

In [73]:
arr3d[0] = 42

In [74]:
arr3d

array([[[42, 42, 42],
        [42, 42, 42]],

       [[ 7,  8,  9],
        [10, 11, 12]]])

In [75]:
arr3d[0] = old_values

In [76]:
arr3d

array([[[ 1,  2,  3],
        [ 4,  5,  6]],

       [[ 7,  8,  9],
        [10, 11, 12]]])

In [77]:
arr3d[1,0]

array([7, 8, 9])

In [78]:
arr3d[1,0,0]

7

## Indexing with Slices

`ndarrays` can be sliced just like `Python` `lists`:

In [79]:
arr[1:6]

array([ 1,  2,  3,  4, 64])

In [80]:
arr2d

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [82]:
arr2d[:2]

array([[1, 2, 3],
       [4, 5, 6]])

In [None]:
arr2d[:2,1:]

In [None]:
arr2d[1,:2]

In [None]:
arr2d[2,:1]

In [None]:
arr2d[:,:1]

And of course you can assign using slices as well:

In [None]:
arr2d[:2,1:] = 0

In [None]:
arr2d

## Boolean Indexing

In [None]:
names = np.array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe'])

In [None]:
data = np.random.randn(7, 4)

In [None]:
names

In [None]:
data

In [None]:
names == 'Bob'

In [None]:
data[names == 'Bob']

In [None]:
data[names == 'Bob', 2:]

In [None]:
data[names == 'Bob', 3]

In [None]:
names != 'Bob'

In [None]:
data[~(names == 'Bob')]

In [None]:
~(names == 'Bob')

In [None]:
mask = (names == 'Bob') | (names == 'Will')

In [None]:
mask

In [None]:
data[mask]

In [None]:
data[data < 0]

In [None]:
data[data < 0] = 0

In [None]:
data

In [None]:
data[names != 'Joe'] = 7

In [None]:
data

## Fancy Indexing

In [None]:
arr = np.empty((8, 4))

In [None]:
for i in range(8):
    arr[i] = i

In [None]:
arr

In [None]:
arr[[4,3,0,6]]

Using negative indices select rows from the end:

In [None]:
arr[[-3,-5,-7]]

In [None]:
arr = np.arange(32).reshape((8, 4))

In [None]:
arr

In [None]:
arr[[1,5,7,2], [0, 3, 1, 2]]

What just happened? The elements `(1, 0)`, `(5, 3)`, `(7,1)`, and `(2,2)` were selected.

In [None]:
arr[[1,5,7,2]][:, [0,3,1,2]]

Another way to index is to use the `np.ix_` function, which converts two 1D integer arrays into an indexer that selects the square region:

In [None]:
arr[np.ix_([1,5,7,2], [0, 3, 1, 2])]

## Transposing Arrays and Swapping Axes

Blah blah blah

## Universal Functions: Fast Element-wise Array Functions

NumPy defines the concept of a universal function, or *ufunc*, which is a function that performs elementwise operations on the data stored in `ndarrays`. This gives us very fast vectorized functions for arrays of data.

Take for example, the functions `sqrt1` and `exp`:

In [None]:
arr = np.arange(10)

In [None]:
arr

In [None]:
np.sqrt(arr)

In [None]:
np.exp(arr)

These are so-called *unary* ufuncs - i.e. they operate on a single `ndarray`. There are also *binary* ufuncs that take two or more `ndarrays` as arguments.

In [None]:
x = np.random.randn(8)

In [None]:
y = np.random.randn(8)

In [None]:
x

In [None]:
y

In [None]:
np.maximum(x, y) # element-wise maximum

It is uncommon, but a ufunc can return multiple arrays as output. `modf` is one such example. It returns the fractional and integer parts of floating point numbers:

In [None]:
arr = np.random.randn(7) * 5

In [None]:
np.modf(arr)

The following is a snapshot of Table 4-3 and Table 4-4 from the book, which gives a listing of the different ufuncs available:

![Tables 4-3 and 4-4](Table4-3-4.jpg)

## Data Processing Using Arrays

Using NumPy `ndarray`s allows us to write very compact snippets of code to express complex data processing tasks that otherwise would require verbose syntax involving many loops, etc

As a simple example, say we want to evaluate the function `sqrt(x^2 + y^2)` across a regular grid of numerical values. The `np.meshgrid` function takes two 1D arrays and produces two 2D matrices corresponding to all points of `(x,y)` in the two arrays:

In [None]:
points = np.arange(-5, 5, 0.01)  # 1000 equally spaced points

In [None]:
xs, ys = np.meshgrid(points, points)

In [None]:
ys

In [None]:
z = np.sqrt(xs ** 2 + ys ** 2)

In [None]:
z

In [None]:
import matplotlib.pyplot as plt

In [None]:
plt.imshow(z, cmap=plt.cm.gray)
plt.colorbar()

In [None]:
plt.title("Image plot of $\sqrt{x^2+ y^2}$ for a grid of values")

![*Plot of function evaluated on grid*](figure_1.jpeg)