## Numerical Python: NumPy (arrays)

NumPy is a library for Python that provides another structure for storing and manipulating data called arrays. Arrays are similar to lists in that they store a group of values. However, arrays allow faster operations on vectors and matrices, which are used for calculating statistical information about data. Arrays are ideal for manipulating and storing dense data (I.e. data where you have values for all or most attributes you're interested in).

NumPy also includes linear algebra operations, such as solving systems of linear equations and computation of Eigenvectors and Eigenvalues; both of which are key operations in data science, but are not part of the standard Python math module.

The first step to working with NumPy is to import the library:

In [1]:
# When you add 'as' you can alias your library

# import [library] as [prefered_name]

import numpy as np

The core of NumPy is the array object class. Unlike a Python List, all the elements of a NumPy array must be of the same type; commonly, a numeric type like an integer (int) or decimal (float). In NumPy, vectors (one dimension) and matrices (two or more dimensions) are both called arrays.

 

![image.png](attachment:image.png)

## Vectors (1D Arrays)

The 1-dimensional vector is the most commonly used array. 1D arrays store values at indexes, similar to lists.

![image.png](attachment:image.png)

### Creating a 1D array

There are several ways you can create arrays with NumPy:

1. Using the arange() function.

2. Providing a list.

3. Using the zeros() function.

### Creating a 1D array with a range

Use NumPy arange() when the values you want in your array are between a minimum and maximum value with a consistent interval (i.e. amount of space) between them. For example, if you want the even numbers between 2 and 20, the minimum is 2, the maximum is 20 and the interval is 2 — you want every second number added to your array.

The format of the arange() function is:

arange([start,] stop[, step,], dtype=None)

start of the interval is a numeric value included in the interval and it is optional. The default start value is 0.

stop is the end of the interval. It is a numeric value not included in the interval.

step is the space between the values. it is a numeric and optional value. The default step size is 1.

dtype indicates the type of the array. If dtype is not given, infer the data type from the other input arguments.

In [3]:
import numpy as np

# using Numpy arange with just stop argument
# start and step default to 0 and 1 and the type is defined by the type of the stop argument(int)

a1 = np.arange(5)
a1

array([0, 1, 2, 3, 4])

In [4]:
# the type of the array

type(a1)

numpy.ndarray

In [5]:
# creating an array of float

a2 = np.arange(5.0)
a2

array([0., 1., 2., 3., 4.])

In [9]:
# specifying start and stop
# step default 1 and dtype defaults to the arguments(int)
# NOTE: The interval does not include 12

a3 = np.arange(5,12)
a3

array([ 5,  6,  7,  8,  9, 10, 11])

In [10]:
# specifying a step
a4 = np.arange(5,12,2)
a4

array([ 5,  7,  9, 11])

## Creating a 1D Array from a List

If you have your data in list form (for example, after reading it from a CSV file), you can convert it into an array using the function array(). The main arguments of the function are the list to be converted into the array and the type.

In [11]:
import numpy as np
# create an array using the array() function
a1 = np.array([0,1,2,3,4],float)
a1

array([0., 1., 2., 3., 4.])

In [12]:
# if your list has floats and you define the type as int
# the function array() will truncate the digits after the decimal point
a2 = np.array([0,1.0,2.5,3.4,4],int)
a2

array([0, 1, 2, 3, 4])

In [13]:
# creating an array of strings
a3 = np.array([0,1.0,2.5,3.4,4],str)
a3

array(['0', '1.0', '2.5', '3.4', '4'], dtype='<U3')

## Creating a 1D array with zeros ()

The zeros() function will create an array filled with zeros for you. This is useful when some values may be missing in your data — you can represent the missing information with zeros.

The format of the zeros() function is:

zeros(shape, dtype , order)

**shape** int or tuple of ints.

**dtype** indicates the type of the array and it is optional or you can use any of the types define in NumPy. For example np.int64 or np.float64.

**order** defines how to store the multi-dimensional data row-major (programming language C-style) or column-major (programming language Fortran-style). The argument order is optional and the default is 'C'. It is unlikely you will need to use Fortran order, and you won't see any difference when working with the array in Python (it will look the same to you), but be aware Fortran ordering exists should you need it.

In [14]:
import numpy as np
# create an array using zeros()
# only shape is specified: 3 elements
# dtype defaults to float
a4 = np.zeros(3)
a4

array([0., 0., 0.])

In [15]:
# specifying data type

a4 = np.zeros((3),dtype=int)
a4

array([0, 0, 0])

## 2D Arrays

The term shape may seem to be an odd choice of name for the number of elements. That is because the concept of shape is related to having rows and columns. In 1D arrays, you really only have columns; similar to a single row in a spreadsheet with multiple values in different columns. Often, you have multiple rows, with each row containing data related to each column. This sort of row-column structure is stored as a 2D array.

2D arrays don't just have a number of elements; they have a shape. For example, if you have 20 elements, those elements could be arranged as 4 rows; each with 5 columns, or 10 rows; each with 2 columns, or 5 rows; each with 4 columns. All of these different shapes store 20 elements. The shape of an array is the number of rows and columns the array has.

In [25]:
import numpy as np
a1 = np.array([0,1,2,3,4],float)
# the 1D array, a1, has 5 rows and no columns
np.shape(a1)

(5,)

The shape property can be used to create zero filled arrays with the zeros() function by specifying the number of rows and columns in the array as a tuple. Note the different shapes created in these examples:

In [17]:
# Creating a multi-dimensional array of zeros
a5 = np.zeros((3,4))
a5

array([[0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.]])

In [19]:
a6 = np.zeros((4,3))
a6

array([[0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.]])

You can also create 2D arrays from a list of lists.

In [21]:
# Create a 2D array from nested lists
a7 = np.array([[1,2,3],[4,5,6],[7,8,9]])
a7

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

All NumPy arrays (both 1D and 2D) have a shape property, which is a tuple of the number of rows and columns the array contains. You can get this information by accessing the shape property of the array object 

In [23]:
# Example of shape property

# 1D array
a1 = np.array([[1,2,3]])
rows,colums = a1.shape

print('a1 has', rows,'rows and', colums,'colums.')

a2 = np.array([[1,2,3],[4,5,6],[7,8,9]])
rows,colums = a2.shape 
print('a2 has', rows,'rows and', colums,'colums.')

a2

a1 has 1 rows and 3 colums.
a2 has 3 rows and 3 colums.


array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])