## Python List

The standard mutable multi-element container in Python is the list. We can create a list of integers as follows:

In [1]:
List = list(range(10))
List

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

In [2]:
type(List[4])

int

List of Strings

In [3]:
String = [str(i) for i in List]
String

['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']

In [4]:
type(String[1])

str

Because of Python's dynamic typing, we can even create heterogeneous lists:

In [5]:
L1 = [True, 1, 2,'Hi', 4.67]
[type(i) for i in L1]

[bool, int, int, str, float]

In [6]:
# Python code
result = 0
for i in range(100):
    result += i
result

4950

Here i ranges from 0 to 99
The sum of first N integers is N(N+1)/2
i.e., 99(100)/2 = 4950

## Fixed Type arrays in Python

Python offers several different options for storing data in efficient, fixed-type data buffers. The built-in `array` module (available since Python 3.3) can be used to create dense arrays of a uniform type:

In [7]:
import array
N = range(12)
Arr = array.array('i', N)
print(Arr)

array('i', [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11])


Here `'i'` is a type code indicating the contents are integers.

Much more useful, however, is the `ndarray` object of the NumPy package. While Python's `array` object provides efficient storage of array-based data, NumPy adds to this efficient operations on that data. 

## Creating Array from Python Lists

In [8]:
import numpy as np
np.__version__

'1.20.3'

First, we can use `np.array` to create arrays from Python lists

In [9]:
np.array([1,2,3,4,5,6])

array([1, 2, 3, 4, 5, 6])

In [10]:
np.array(range(10))

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

Remember that unlike Python lists, NumPy is constrained to arrays that all contain the same type. If types do not match, NumPy will upcast if possible (here, integers are up-cast to floating point):

In [11]:
np.array([3.5, 3, 4, 5])

array([3.5, 3. , 4. , 5. ])

In [12]:
np.array([1.3,4.6,7.2,9], dtype = 'float32')

array([1.3, 4.6, 7.2, 9. ], dtype=float32)

Finally, unlike Python lists, NumPy arrays can explicitly be multi-dimensional; here's one way of initializing a multidimensional array using a list of lists:

In [13]:
# nested lists result in multi-dimensional arrays
np.array([range(i, i+3) for i in [2,2,6,4]])

array([[2, 3, 4],
       [2, 3, 4],
       [6, 7, 8],
       [4, 5, 6]])

The inner lists are treated as rows of the resulting two-dimensional array.

In [14]:
# Create a 4x3 floating-point array filled with zeros
np.zeros((4,3), dtype = 'int')

array([[0, 0, 0],
       [0, 0, 0],
       [0, 0, 0],
       [0, 0, 0]])

In [15]:
# Create a 2x4 floating-point array filled with ones
np.ones((2,4), dtype = 'int')

array([[1, 1, 1, 1],
       [1, 1, 1, 1]])

In [16]:
# Create a 3x5 array filled with 3.14
np.full((3,5), 3.43)

array([[3.43, 3.43, 3.43, 3.43, 3.43],
       [3.43, 3.43, 3.43, 3.43, 3.43],
       [3.43, 3.43, 3.43, 3.43, 3.43]])

In [17]:
# Create an array filled with a linear sequence
# Starting at 0, ending at 20, stepping by 2
# (this is similar to the built-in range() function)
np.arange(0,20,2)

array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18])

In [18]:
# Create an array of five values evenly spaced between 0 and 2
np.linspace(0,2,5)

array([0. , 0.5, 1. , 1.5, 2. ])

In [19]:
# Create a 3x5 array of uniformly distributed
# random values between 0 and 1
np.random.random((3,5))

array([[0.51575739, 0.5425438 , 0.40000301, 0.19613225, 0.60290587],
       [0.46131842, 0.64383125, 0.93647477, 0.26499762, 0.96878241],
       [0.16968817, 0.88908059, 0.4278195 , 0.39191519, 0.24406613]])

In [20]:
# Create a 3x5 array of normally distributed random values
# with mean 0 and standard deviation 1
np.random.normal(0,1,(3,5))

array([[ 0.22338772, -1.47386013,  0.55749992, -0.12485834, -0.50657361],
       [ 1.91110565,  1.52142666,  0.66177779, -0.23772757,  0.57331003],
       [-0.3537687 ,  0.48633321, -0.85906482,  2.45475109,  0.74321373]])

In [21]:
# Create a 3x3 identity matrix
np.eye(4)

array([[1., 0., 0., 0.],
       [0., 1., 0., 0.],
       [0., 0., 1., 0.],
       [0., 0., 0., 1.]])

In [22]:
# Create a 3x3 array of random integers in the interval [0, 10)
np.random.randint(0,10, (3,3))

array([[5, 5, 5],
       [7, 7, 0],
       [2, 4, 8]])

In [23]:
# Create an uninitialized array of three integers
# The values will be whatever happens to already exist at that memory locati
np.empty(3)

array([4.94065646e-324, 1.07585187e-311, 5.43472210e-323])

In [24]:
import numpy as np
np.random.seed(0)  # seed for reproducibility

x1 = np.random.randint(10, size=6)  # One-dimensional array
x2 = np.random.randint(10, size=(3, 4))  # Two-dimensional array
x3 = np.random.randint(10, size=(3, 4, 5))  # Three-dimensional array

Each `array` has attributes `ndim` (the number of dimensions), `shape` (the `size` of each dimension), and `size` (the total size of the array):

In [25]:
print("x3 ndim: ", x3.ndim)
print("x3 shape:", x3.shape)
print("x3 size: ", x3.size)

x3 ndim:  3
x3 shape: (3, 4, 5)
x3 size:  60


In [26]:
print("x2 ndim: ", x2.ndim)
print("x2 shape:", x2.shape)
print("x2 size: ", x2.size)

x2 ndim:  2
x2 shape: (3, 4)
x2 size:  12


Another useful attribute is the dtype, the data type of the array

In [27]:
print("dtype", x2.dtype)

dtype int32


In [28]:
print("dtype", x3.dtype)

dtype int32


Onther attributes include itemsize, which lists the size (in bytes) of each array element, and nbytes, which lists the total size (in bytes) of the array:

In [29]:
print("itemsize",  x3.itemsize, "bytes")
print("Nbytes", x3.nbytes, "bytes")

itemsize 4 bytes
Nbytes 240 bytes


## Array Indexing: Accessing Single Elements

If you are familiar with Python's standard list indexing, indexing in NumPy will feel quite familiar. In a one-dimensional array, the ith value (counting from zero) can be accessed by specifying the desired index in square brackets, just as with Python lists:

In [30]:
x1

array([5, 0, 3, 3, 7, 9])

In [31]:
x1[2]

3

### Array Slicing: Accessing Subarrays

Just as we can use square brackets to access individual array elements, we can also use them to access subarrays with the slice notation, marked by the colon `(:)` character. The NumPy slicing syntax follows that of the standard Python list; to access a slice of an array `x`, use this:

In [32]:
x1[2:4]

array([3, 3])

In [33]:
x1[-1]

9

In [34]:
x1[-3]

3

In [35]:
x1[-6]

5

In a multi-dimensional array, items can be accessed using a comma-separated tuple of indices:

In [36]:
x2

array([[3, 5, 2, 4],
       [7, 6, 8, 8],
       [1, 6, 7, 7]])

In [37]:
x2[2,3]

7

In [38]:
x2[:,3]

array([4, 8, 7])

In [39]:
x2[2,:]

array([1, 6, 7, 7])

In [40]:
x2[-1,:]

array([1, 6, 7, 7])

In [41]:
x2[-2,2]

8

In [42]:
x2[2,3] = 12
x2

array([[ 3,  5,  2,  4],
       [ 7,  6,  8,  8],
       [ 1,  6,  7, 12]])

Keep in mind that, unlike Python lists, NumPy arrays have a fixed type. This means, for example, that if you attempt to insert a floating-point value to an integer array, the value will be silently truncated. Don't be caught unaware by this behavior!

x[start:stop:step] 

If any of these are unspecified, they default to the values start=0, stop=size of dimension, step=1. We'll take a look at accessing sub-arrays in one dimension and in multiple dimensions.

In [43]:
x1[0] = 3.14159  # this will be truncated!
x1

array([3, 0, 3, 3, 7, 9])

In [44]:
x = np.arange(10)
x

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [45]:
x[:5]  # first five elements

array([0, 1, 2, 3, 4])

In [46]:
x[5:] # elements after index 5

array([5, 6, 7, 8, 9])

In [47]:
x[4:7] # middle sub-array

array([4, 5, 6])

In [48]:
x[::2] # every other element

array([0, 2, 4, 6, 8])

In [49]:
x[1::2] # every other element, starting at index 1

array([1, 3, 5, 7, 9])

A potentially confusing case is when the step value is negative. In this case, the defaults for start and stop are swapped. This becomes a convenient way to reverse an array:

In [50]:
x[::-1]

array([9, 8, 7, 6, 5, 4, 3, 2, 1, 0])

In [51]:
x[5::-1]

array([5, 4, 3, 2, 1, 0])

In [52]:
x[5::-2]

array([5, 3, 1])

## Multi-dimensional sub arrays

Multi-dimensional slices work in the same way, with multiple slices separated by commas. For example:

In [53]:
x2

array([[ 3,  5,  2,  4],
       [ 7,  6,  8,  8],
       [ 1,  6,  7, 12]])

In [54]:
x2[:2,:3]

array([[3, 5, 2],
       [7, 6, 8]])

In [55]:
x2[:3, ::2]

array([[3, 2],
       [7, 8],
       [1, 7]])

In [56]:
x2[::-1,::-1]

array([[12,  7,  6,  1],
       [ 8,  8,  6,  7],
       [ 4,  2,  5,  3]])

### Accessing array rows and columns

One commonly needed routine is accessing of single rows or columns of an array. This can be done by combining indexing and slicing, using an empty slice marked by a single colon `(:):`

In [57]:
x2[:,0] # first column of x2

array([3, 7, 1])

In [58]:
x2[0,:] # first row of x2

array([3, 5, 2, 4])

In the case of row access, the empty slice can be omitted for a more compact syntax:

In [59]:
x2[0]

array([3, 5, 2, 4])

## Subarrays as no-copy views

One important–and extremely useful–thing to know about array slices is that they return views rather than copies of the array data. This is one area in which NumPy array slicing differs from Python list slicing: in lists, slices will be copies. Consider our two-dimensional array from before:

In [60]:
print([x2])

[array([[ 3,  5,  2,  4],
       [ 7,  6,  8,  8],
       [ 1,  6,  7, 12]])]


Let's extract a 2×2 subarray from this:

In [61]:
x2_sub = x2[:2,:2]
print(x2_sub)

[[3 5]
 [7 6]]


Now if we modify this subarray, we'll see that the original array is changed! Observe:

In [62]:
x2_sub[0,0] = 6
print(x2_sub)

[[6 5]
 [7 6]]


This default behavior is actually quite useful: it means that when we work with large datasets, we can access and process pieces of these datasets without the need to copy the underlying data buffer.

### Creating copies of arrays

Despite the nice features of array views, it is sometimes useful to instead explicitly copy the data within an array or a subarray. This can be most easily done with the `copy()` method:

In [63]:
x2_sub_copy = x2[:2,:2].copy()
print(x2_sub_copy)

[[6 5]
 [7 6]]


If we now modify this subarray, the original array is not touched:

In [64]:
x2_sub_copy[0,0] = 22
print(x2_sub_copy)

[[22  5]
 [ 7  6]]


In [65]:
print(x2)

[[ 6  5  2  4]
 [ 7  6  8  8]
 [ 1  6  7 12]]


## Reshaping of Arrays

Another useful type of operation is reshaping of arrays. The most flexible way of doing this is with the reshape method. For example, if you want to put the numbers 1 through 9 in a 3×3 grid, you can do the following:

In [66]:
grid = np.arange(1,10).reshape((3,3))
print(grid)

[[1 2 3]
 [4 5 6]
 [7 8 9]]


Note that for this to work, the size of the initial array must match the size of the `reshaped` array. Where possible, the `reshape` method will use a no-copy view of the initial array, but with non-contiguous memory buffers this is not always the case.

Another common reshaping pattern is the conversion of a one-dimensional array into a two-dimensional row or column matrix. This can be done with the `reshape` method, or more easily done by making use of the `newaxis` keyword within a slice operation

In [67]:
x = np.array([2, 4,6, 8])

In [68]:
# row vector via reshape
x.reshape(1,4)

array([[2, 4, 6, 8]])

In [69]:
# row vector via newaxis
x[np.newaxis, :]

array([[2, 4, 6, 8]])

In [70]:
x.reshape((4,1))

array([[2],
       [4],
       [6],
       [8]])

In [71]:
x[:, np.newaxis]

array([[2],
       [4],
       [6],
       [8]])

## Array Concatenation and Splitting

Concatenation, or joining of two arrays in NumPy, is primarily accomplished using the routines `np.concatenate`, `np.vstack`, and `np.hstack`. `np.concatenate` takes a tuple or list of arrays as its first argument, as we can see here:

In [72]:
x = np.array([1,4,5,7])
y = np.array([2,6,8,3])

In [73]:
np.concatenate([x,y])

array([1, 4, 5, 7, 2, 6, 8, 3])

You can also concatenate more than two arrays at once:

In [74]:
z = np.array([1,4,3,9])
np.concatenate([x,y,z])

array([1, 4, 5, 7, 2, 6, 8, 3, 1, 4, 3, 9])

It can also be used for two-dimensional arrays:

In [75]:
grid = np.array([[1,2,8],[4,6,9]])
print(grid)

[[1 2 8]
 [4 6 9]]


In [76]:
np.concatenate([grid,grid])

array([[1, 2, 8],
       [4, 6, 9],
       [1, 2, 8],
       [4, 6, 9]])

In [77]:
# concatenate along the second axis (zero-indexed)
np.concatenate([grid,grid], axis = 1)

array([[1, 2, 8, 1, 2, 8],
       [4, 6, 9, 4, 6, 9]])

For working with arrays of mixed dimensions, it can be clearer to use the np.vstack (vertical stack) and np.hstack (horizontal stack) functions:

In [78]:
x = np.array([1,2,3])
grid = np.array([[3,4,5],[6,8,9]])
np.vstack([x,grid])

array([[1, 2, 3],
       [3, 4, 5],
       [6, 8, 9]])

In [79]:
y = np.array([[7],[5]])
np.hstack([grid, y])

array([[3, 4, 5, 7],
       [6, 8, 9, 5]])

Similary, `np.dstack` will stack arrays along the third axis.

In [80]:
z = np.array([[1,2,3],[4,5,6],[7,8,9]])
print(z)

[[1 2 3]
 [4 5 6]
 [7 8 9]]


In [81]:
np.dstack([z,z])

array([[[1, 1],
        [2, 2],
        [3, 3]],

       [[4, 4],
        [5, 5],
        [6, 6]],

       [[7, 7],
        [8, 8],
        [9, 9]]])

In [85]:
z = np.array([[1,2], [3,4]])
print(z)
np.dstack([z, z])

[[1 2]
 [3 4]]


array([[[1, 1],
        [2, 2]],

       [[3, 3],
        [4, 4]]])

### Splitting of array

The opposite of concatenation is splitting, which is implemented by the functions np.split, np.hsplit, and np.vsplit. For each of these, we can pass a list of indices giving the split points:

In [83]:
x = [2,4,6,8,4,8,9]
x1,x2,x3 = np.split(x,[3,5])

In [84]:
print(x1,x2,x3)

[2 4 6] [8 4] [8 9]


Notice that N split-points, leads to N + 1 subarrays. The related functions np.hsplit and np.vsplit are similar:

In [86]:
data = np.arange(16).reshape(4,4)
print(data)

[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]
 [12 13 14 15]]


In [87]:
upper, lower = np.vsplit(data,[2])
print(upper)
print(lower)

[[0 1 2 3]
 [4 5 6 7]]
[[ 8  9 10 11]
 [12 13 14 15]]


In [88]:
left, right = np.hsplit(data, [2])
print(left)
print(right)

[[ 0  1]
 [ 4  5]
 [ 8  9]
 [12 13]]
[[ 2  3]
 [ 6  7]
 [10 11]
 [14 15]]


In [95]:
tensor = np.arange(16).reshape(2,2,4)
print(tensor)

[[[ 0  1  2  3]
  [ 4  5  6  7]]

 [[ 8  9 10 11]
  [12 13 14 15]]]


In [96]:
x = np.dsplit(tensor, [2])

In [97]:
print(x)

[array([[[ 0,  1],
        [ 4,  5]],

       [[ 8,  9],
        [12, 13]]]), array([[[ 2,  3],
        [ 6,  7]],

       [[10, 11],
        [14, 15]]])]
