# Creating and Manipulating NumPy Arrays

## Numpy

[NumPy](https://numpy.org/) is one of the most commonly used packages in Python. It contains functionality which allows large amounts of data to be handled easily and efficiently.

Numpy gives access to complex, high-quality code which you can use in your projects. Using this can save a large amount of time compared to writing code yourself to produce the same results.

In addition, the code used in NumPy contains a lot of code written in the programming language C, which is significantly more computationally efficient that Python, allowing it to be faster than even the most well-written Python code could possibly be. The complexities of the interfaces between Python and C is hidden inside the package, meaning we don't need to worry about the complexities of how the interface works.

Due to the ease of use and the speed of Numpy, it's widely used in a variety of different applications and disciplines.

## Arrays

One of the most important constructs in NumPy is the ```array```. An array is an arrangements of data in one or more dimensions. A one-dimensional array is similar to a list in some ways.

To create an array, we first need to import the NumPy module into our code. When we do this, it is common to give it the alias ```np```. We can then create an array by writing ```np.array()``` and providing a sequence (such as a list or tuple) of values.

In [None]:
import numpy as np

my_array = np.array([1, 2, 3])

Once we create an array, we cannot change its size or shape. As we'll see later though, we can change the values stored in the array.

## Printing Arrays

We can print arrays as normal using the print statement:

In [None]:
print(my_array)

print([1, 2, 3])

Note that, when we print an array, it is printed within square brackets and without commas separating values, whilst a list is printed with commas. This allows us to easily distinguish these data types when they're printed.

### Array Data Types

The elements of an array must all the same type. So we can create an array of strings or bools:

In [None]:
import numpy as np

string_array = np.array(["str1", "str2"])
print(string_array)

bool_array = np.array([True, False])
print(bool_array)

However, if we try to define an array with a mixture of types, NumPy will try to convert some or all of the values so that all values have the same type.

In [None]:
import numpy as np

mixed_array = np.array([1, 1.2])
print(mixed_array) # Both values are converted to floats

mixed_array2 = np.array([4, "str"])
print(mixed_array2) # Both values are converted to strings

We can check the type of data stored in an array using the ```dtype``` property:

In [None]:
import numpy as np

array1 = np.array([1, 2])
print(array1.dtype)

array2 = np.array([True, False])
print(array2.dtype)

array3 = np.array(["1", "2"])
print(array3.dtype)

array4 = np.array(["1234567890", "2"])
print(array4.dtype)

array5 = np.array([1.0, -2.0])
print(array5.dtype)

Note that the data types reported are different to normal Python data types - these are the types in the underlying C code which NumPy uses to store the data. For instance, ```int64``` describes as 64-bit integer, ```<U1``` describes a one-character string and ```float64``` describes a 64 bit float. The conversion to C data types allows for NumPy to more efficiently execute many operations directly in the underlying compiled C code of NumPy.

#### Non-C Types and Mixed Arrays

It is also possible to create a Numpy array with the data type of ```object``` which can hold referenced to Python objects of a mixture of any types, including ones which do not have an implementation in C.

In [None]:
import numpy as np

# Define a very simple class
# If you're not familiar with user-defined classes, don't worry about this too much
class MyClass:
    x = 1

# Define a simple function
def my_func():
    return(1)

# Create an array containing an instance of our class and a reference to our function
mixed_array = np.array([MyClass(), my_func])

#The type of the array is that of a generic Python object
print(mixed_array.dtype)

When we do this, the Numpy array is storing references to the Python objects. As the data is not converted into C data types, this prevents Numpy from using the its internal compiled C code to efficiently carry out operations. As a result, Numpy arrays with the object datatype are much slower to use than arrays with C datatypes.

## Arrays with Multiple Dimensions

It's also possible to create arrays with multiple dimensions. We can do this using the ```array``` function. If we pass a sequence of sequences to this function, the returned value will be a two dimensional array. A sequence of sequences of sequences would produce a three-dimensional array and so on.

We can find the dimension of a NumPy array by using its ```ndim``` property and the extent of an array in each dimension using the ```shape``` property.

In [None]:
import numpy as np

a = np.array([[1,2], [3,4], [5,6]])

print(a)
print(a.ndim)
print(a.shape)

Note that, when creating a multi-dimensional array, the size of the resultant array must be consistent in each dimension (i.e. you cannot create a "ragged" array). As an example, each row of a two-dimensional array must have the same number of entries, otherwise an error will be returned:

In [None]:
import numpy as np

a = np.array([[1], [3,4]])

### Zeros

It's also possible to create an array of zeros using the ```zeros``` function. As an argument, this accepts a sequence of integers which specify the size of the array to be created in each dimension. As a result, the number of entries in the sequence defines the number of dimensions in the array. ```zeros``` returns an array of floats.

The following creates an array with 3 dimensions and a size of two in the first two dimensions and a size of three in the third dimension.

In [None]:
import numpy as np

zero = np.zeros([2,2,3])

print(zero)

Note that, as the number of dimensions increase it becomes progressively more difficult to interpret the data stored within an array by printing all of it, although NumPy does its best to present it in a helpful way.

### Full

It's possible to create a new array with all values being the same using the ```full``` function, which works in a similar way to the ```zeros``` function. The first argument is a sequence which defines the size of the array in as many dimensions as the array has entries and the second is the value to be stored in all entries of the array.

In [None]:
import numpy as np

# Create an array with 2 rows and 3 columns
# All entries have a value of 1
a = np.full([2,3], 1)
print(a)

### Arange

The ```arange``` function works in a similar way to the ```range``` function, but returns a NumPy array containing the values specified:

In [None]:
import numpy as np

# Providing a single value causes every integer (beginning with zero) up to (but excluding) that value to be used
print(np.arange(3))


# Providing two values causes every integer beginning with and including the first value, up to but not including the second value to be used.
print(np.arange(3,5))

# Providing two values causes every integer in steps of the third value beginning with and including the first value, up to but not including the second value to be used.
print(np.arange(4, 13, 3)) # Every third value beginning with 4 and stopping before 13

### Reshape

It's also possible to use the ```reshape``` method of an array to create a version of an an array with a different shape (and, potentially, a different number of dimensions). For instance, here we will create an one-dimensional array with 12 entries using ```arange``` before reshaping it to be a three-dimensional array with sizes in each dimension of 2, 2 and 3.

In [None]:
a = np.arange(12).reshape([2,2,3])

print(a.ndim)
print(a.shape)
print(a)

The new shape of the array must contain the same number of entries as the original shape. For instance, the following will fail:

In [None]:
import numpy as np

a = np.arange(7).reshape([2,3])

### Exercise

In the code cell below, do the following:

* Create a one-dimensional array with a series of three bools of your choice
* Create a three-dimensional array with every value being equal to zero. This array should have a size of 4 in dimension 1, a size of 3 in dimension 2 and a size of 2 in dimension 3
* Create an array of the same dimension and size with the values 0-23 (inclusive)
* Create an array of the same dimension and size with the values every other value from 4-50 (inclusive).

In [None]:
#@title

# Import NumPy
import numpy as np

# Create the 1D array
bools = np.array([True, False, True])
print(bools)

# Create the array of zeros
a = np.zeros([4,3,2])
print(a)

# Print a separating line to separate the arrays in the output
print("==============")

# Create the array with the increasing numbers
b = np.arange(24).reshape([4,3,2])
print(b)

# Print a separating line to separate the arrays in the output
print("==============")

# Create an array with every other value from 4 to 50
c = np.arange(4, 51, 2).reshape([4,3,2])
print(c)

## Selecting Items from an Array

To select single items from an array, we provide the indices in each dimension of the item, separated by commas within a set of square brackets. There are a number of different ways of specifying indices and it's possible to mix and match different ways of selecting indices in different dimensions. In this section, we'll visit some of the common ways of indexing a dimension.

### Selecting a Single Index

We can specify a single index by writing single integer. It's also possible to use a calculation or variable as part of the specification of the index. 

When providing a single value for an index, Numpy will reduce the number of dimensions of the returned array. If every dimension is indexed with a single value, the returned value will be a scalar.

Remember that, like other Python collections, the first entry has an index of 0.

In [None]:
import numpy as np

a = np.arange(12).reshape([2, 2, 3]) # Create the array

print(a) # Print the array for reference

i = 1 # Define a variable to use when indexing the array
print(a[0, i, i + 1]) # Print the value of a[0, 1, 2]. Because each index is a single value, a scalar is returned

### Every Index in a Dimension

Using a single colon will cause every integer in a dimension to be returned.

In [None]:
a = np.arange(6).reshape([1, 2, 3])

print(a) # Print the array for reference

# In the first dimension, we've selected every index. Even though this dimension happens to only have an extent of 1 (the index 0), this dimension is not collapsed.
# In the second dimension, we chose only the index 1. This dimension is collapsed.
# In the third dimension we select every index
# The result is a 2D numpy ndarray of shape (1,2)
b = a[:, 1, :]
print(b)
print(type(b))
print(b.ndim)
print(b.shape)

It's also possible to do this for arrays with multiple dimensions with the specifications of indices for each separated with commas. If the specification of the indices for a dimension is purely a colon, every possible index from that dimension is used for the returned data.

### Everything Before and Index

 If we put the a colon before a single value, everything index before that will will be selected. This will not collapse the dimension even if this happens to be a single index.

In [None]:
import numpy as np

a = np.arange(12).reshape([2,2,3])
print(a)

# Select the index 0 in the first dimension. The dimension will be collapsed
# Select the index 1 in the first dimension. The dimension will be collapsed
# Select the indices 0 and 1 in the second dimension
print(a[0, 1, :2])

# Select the index 0 in the first dimension. The dimension will be collapsed
# Select the index 0 in the second dimension. The dimension will not be collapsed
# Select the index 1 in the third dimension. This dimension will be collapsed
print(a[0, :1, 1])

### Every Index Starting From a Value

If we provide a single value followed by a colon in the position for a dimension, every index beginning with and including that value will be selected.

In [None]:
import numpy as np

a = np.arange(12).reshape([4, 3])
print(a)

# Select indices 2, 3 in the first dimension
# Select indices 1 and 2 in the second dimension
print(a[2:, 1:])

## Assigning to Items in an Array

It's possible to change the values inside an array using item assignment, selecting which entry or entries to assign to using the same notation as in the previous section. 

### Assignment with a Scalar

If a single value is provided on the right-hand side of the assignment operator then every specified location in the array will be given that value.

In [None]:
import numpy as np

a = np.arange(12).reshape([2,2,3])
a[1,1,2] = 50 # Give a single entry the value 50
a[0,:,1:] = 40 # Give the entries with indices [0, 0, 1], [0, 0, 2], [0, 1, 1], and [0, 1, 2] the value 40.

print(a)

### Assignment with an Array

If, instead another array is on the right-hand side of the assignment operator, the values of that array will be assigned to the location in the array on the left-hand side of the assignment operator. Assignment in this way requires that the array on the right-hand side and the selected locations from the array on the left-hand side have the same dimension and size.

In [None]:
import numpy as np

a = np.zeros([4, 2, 3])

# The arrays selected on the left and right side of the assignment operator both have shape (2, 2)
a[0,:,:2] = np.arange(1,5).reshape([2,2])

print(a)

In [None]:
import numpy as np

a = np.zeros([4, 2, 3])

# This will give an error as the array selected on the left has the size (2,2) and the array on the right has size (4)
a[1, :, :2] = np.arange(1,5)

## Exercise

* Create a 2-D array from the array below, using the values with the lowest two indices from the first dimension, the highest two indices from the second dimension and only the first index of the third dimension. This array should have the values ```[[6, 8] [16, 18]]```.
* In this new array, modify the array with indices ```[1, 1]``` to have the value ```4```.
* Also in the new array, modify the values which have an index of 0 in the first dimension to have the values ```[1,2]``` using a single assignment statement.

In [None]:
import numpy as np

start_array = np.arange(30).reshape([3,5,2])

In [None]:
#@title
import numpy as np

start_array = np.arange(30).reshape([3,5,2])

# Print the initial array for reference
print(start_array)
# Print a separating line to separate the arrays
print("==============")
a = start_array[:2,3:,0]
print(a)

# Print a separating line to separate the arrays
print("==============")
a[1,1] = 4
print(a)

# Print a separating line to separate the arrays
print("==============")
a[0,:] = np.array([1,2])
print(a)