## NumPy Arrays

NumPy is the standard Python package for scientific computing. It provides support for multidimensional arrays which are more efficient than standard Python data structures.

To start off, we import the NumpPy package. We can import it as *np* for shorthand.

In [None]:
import numpy as np

### NumPy 1D Arrays

The fundamental NumPy data structure is an *array*: a memory-efficient container that provides fast numerical operations.
Unlike standard Python lists, NumPy arrays only contain a single type of value (e.g. only floats; only integers etc). 

The simplest type of array is 1-dimensional (1D). We can create an array from an existing Python list

In [None]:
mylist = [5,18,3,12,20,0,24]
a = np.array(mylist)
print(a)

In [None]:
a.shape

Unlike standard Python lists, NumPy arrays only contain a single type of value, such as an integer or a float.

In [None]:
a.dtype

In [None]:
b = np.array( [0.3, 0.12, 1.4, 2.3, 4.5] )
b

In [None]:
b.dtype

#### Numerical Operations
We can apply standard numerical operations to arrays using scalars (numbers). The operations are applied element-wise - i.e. applied separately to every element (value) in the array.

In [None]:
c = np.array([2,4,6,8,10,20,100])
c

In [None]:
c + 1

In [None]:
c - 2

In [None]:
c * 10

In [None]:
c / 10

In [None]:
# Square the values in the array
c**2

### NumPy 2D Arrays

An array can have more than 1 dimension. A 2D array can be viewed as a matrix, with rows and columns. Arrays can also have > 2 dimensions.

We can create 2D arrays from a list containing other Python lists. These lists must contain the same number of values. Also, make sure to include the outer [ ] brackets!

In [None]:
r1 = [ 4, 3, 2, 3 ]
r2 = [ 3, 5, 6, 4 ]
m = np.array( [ r1, r2 ] )
m

The *rank* of an array is the number of dimensions it has.

In [None]:
m.ndim

The *shape* of an array is a tuple of integers giving the length of the array in each dimension.

In [None]:
m.shape

The *size* of an array is the total number of elements it has. In the below, this is number of rows X number of columns.

In [None]:
m.size

### Array Creation Alternatives

Rather than using Python lists, a variety of functions are available for conveniently creating and populating arrays.

Use the *np.zeros()* function to create an array full of 0s with required shape 

In [None]:
x = np.zeros(4)
x

In [None]:
y = np.zeros( (3,2) )
y

Use the *np.ones()* function to create an array full of 1s with required shape 

In [None]:
v = np.ones(5)
v

In [None]:
np.ones((2,2))

The default type for the above functions is float. Use the *dtype* parameter to tell NumPy we want an array of ints, not floats.

In [None]:
np.ones((2,4),dtype=int)

We can create an array corresponding to a sequence using the *arange()* function. For instance, create an array containing values starting at 2, ending before 9, in steps of 1.

In [None]:
v = np.arange(2,9)
v

We can also use a different step size. For instance, create an array starting at 5, ending before 60, in steps of size 10.

In [None]:
v = np.arange( 5, 60, 10 )
v

The range and step sizes do not have to be integers. We can also specify floats:

In [None]:
x = np.arange(0.5, 9.4, 1.3)
x

The *linspace()* function creates an array with a specified number of evenly-spaced samples in a given range. For example, we can divide up the range [1,10] into 4 evenly-spaced values, including the endpoints:

In [None]:
np.linspace(1, 10, 4)

In [None]:
np.linspace(1, 20, 5)

### Array Shape Manipulation

We can change the shape of an array. The original values are copied to a new array with the specified shape, so the original array is not affected.

In [None]:
x = np.arange(0,12)
x

In [None]:
m = x.reshape(3,4)
m

Note that the size of the reshaped array has to be same as the original.

In [None]:
y = np.ones(4)
y

In [None]:
y.reshape(2,2)

### Accessing Values

To access a value in a 1D array, specify the position *[i]* counting from 0.

In [None]:
a = np.array( [5,18,3,12,20,0,24] )
a[2]

Using a negative number allows to access values from the end of the array in reverse:

In [None]:
a[-1]

We can also use this notation to change the values in an existing array.

In [None]:
a[0] = 100
a

When working with arrays with more than 1 dimension, use the notation *[i,j]*, where the position in each dimension is separated by commas.

In [None]:
r1 = [ 5, 9, 2, 11 ]
r2 = [ 0, 5, 6, 4 ]
m = np.array( [ r1, r2 ] )
m

In [None]:
m[0,1]

In [None]:
m[1,3]

In [None]:
m[0,3] = 200
m

NumPy provides concise syntax to access sub-arrays via slicing. This creates a "view" on the original array, not a copy. Slicing 1D NumPy arrays works just like slicing Python lists, using the *[i:j]* notation:

In [None]:
a = np.array([4,7,3,5,1,8])
a

In [None]:
# Start at position 2, end before 4
a[2:4]

In [None]:
# From position 2 onwards
a[2:]

In [None]:
# Stop before position 4
a[:4]

Again we can also use this notation to change values in a slice of the array.

In [None]:
# Set everything from position 3 onwards to 0
a[3:] = 0
a

For multidimensional arrays, we specify the slices for each dimension, separated by commas - e.g. for 2D *[i:j,p:q]*

In [None]:
r1 = [ 5, 9, 2, 11 ]
r2 = [ 0, 5, 6, 4 ]
r3 = [ 1, 8, 13, 16 ]
m = np.array( [ r1, r2, r3 ] )
m

In [None]:
m[0:2,1:3]

In [None]:
# Get a full row
m[0,:]

In [None]:
# Get a full column
m[:,2]

### Basic Array Operations

We can run batch operations on NumPy arrays without writing for loops. These operations create a new copy of the original array.

In [None]:
d = np.array([[1,4,2], [9,8,2]])
d

In [None]:
d * 5

In [None]:
d / 2

In [None]:
d + 1

In [None]:
1.0/d

In [None]:
# note this is multiplying corresponding elements together
d * d

We can also apply functions to all elements in an array.

In [None]:
# calculate the log of every element in d
np.log(d)

In [None]:
# apply square root to every element in d
np.sqrt(d)

We can use standard boolean expressions in batch to all elements in an array. The result is a new boolean array of the same shape.

In [None]:
# which elements are greater than 2?
d > 2

In [None]:
# return the values of the elements that are greater than 2
d[d>2]

In [None]:
# update the values that are less than 3
d[d<3] = -1
d

### Basic Statistics

NumPy arrays also have basic descriptive statistics functions.

In [None]:
m = np.linspace(1, 20, 5)
m

In [None]:
m.mean()

In [None]:
m.max()

In [None]:
m.min()

For multidimensional arrays, the above can also take an optional axis parameter. If this is specified, calculations are only performed along that axis (dimension) and the result is a new array.

In [None]:
d = np.array([[5,4,0],[0,1,2]])
d

In [None]:
# mean of all values
d = np.array([[5,4,0],[0,1,2]])
d.mean()

In [None]:
# Mean for each of the 3 columns
d.mean(axis=0)

In [None]:
# Mean for each of the 2 rows
d.mean(axis=1)

### Storing NumPy Data
We can save and load NumPy arrays in comma-separated format.

In [None]:
# Generate a random 1D array
v = np.random.randn(20)
# reshape this into a 2D array
a = v.reshape(5,4)
a

Save the data in the array, with a comma delimiter separating each value

In [None]:
np.savetxt("random.csv",a,delimiter=",")

Load the data back into a new array

In [None]:
x = np.loadtxt("random.csv",delimiter=",")
x

### Visualising NumPy Data 
Matplotlib can be used in conjunction with NumPy arrays to visualise numeric data. in the same way we used it previously for values in standard lists.

In [None]:
import matplotlib
import matplotlib.pyplot as plt
%matplotlib inline

As an example, we will create a scatter plot of one 1D array against another 1D array. 

First, we generate a set of example X values.

In [None]:
x = np.linspace(0.1, 1.0, 12)
x

For the Y values, we will generate an array with the same number of random values.

In [None]:
y = np.random.randn(12)
y

In [None]:
# create the figure
plt.figure(figsize=(9,5))
# create a scatter plot
plt.scatter(x, y, c="green", s=150, marker="o")
# add labels to the axes
plt.xlabel("X Numbers", fontsize=13)
plt.ylabel("Y Numbers", fontsize=13)

For 2D NumPy arrays, a common type of visualisation is a colour plot, which can be produced using Matplotlib. 

In [None]:
# Generate a random 1D array
v = np.random.randn(20)
# reshape this into a 2D array
a = v.reshape(5,4)
a

We create a plot, where each entry in the coloured matrix corresponds to an entry in the original 2D arra.y 

In [None]:
# create the figure
plt.figure(figsize=(8,6))
# draw the coloured matrix
plt.pcolor(a)    
# draw the legend bar beside the heatmap
plt.colorbar()    

### Using Pandas with NumPy
Since Pandas is built on top of NumPy, we can easily convert values between a NumPy array and a Pandas Series or Data Frame.

In [None]:
import pandas as pd

Firstly, we will go from an array to a Data Frame:

In [None]:
# Generate a random 1D array
v = np.random.randn(12)
# reshape this into a 2D array
m = v.reshape(4,3)
print(m)

In [None]:
col_index = ["A","B","C"]
row_index = ["r1","r2","r3","r4"]
df = pd.DataFrame(m, columns=col_index, index=row_index )
df

We can also go from a DataFrame to an array. Note that we lose the row and column index information.

In [None]:
m2 = np.array( df )
m2