# What is NumPy?

NumPy is the fundamental package for scientific computing in Python. It is a Python library that provides a multidimensional array object, various derived objects (such as masked arrays and matrices), and an assortment of routines for fast operations on arrays, including mathematical, logical, shape manipulation, sorting, selecting, I/O, discrete Fourier transforms, basic linear algebra, basic statistical operations, random simulation and much more.

At the core of the NumPy package, is the `ndarray` object. This encapsulates n-dimensional arrays of homogeneous data types, with many operations being performed in compiled code for performance. There are several important differences between NumPy arrays and the standard Python sequences:

- NumPy arrays have a **fixed** size at creation, unlike Python lists (which can grow dynamically). Changing the size of an ndarray will create a new array and delete the original.
- The elements in a NumPy array are all required to be of the **same** data type, and thus will be the same size in memory. The exception: one can have arrays of (Python, including NumPy) objects, thereby allowing for arrays of different sized elements.
- NumPy arrays facilitate advanced mathematical and other types of operations on **large** numbers of data. Typically, such operations are executed more **efficiently** and with less code than is possible using Python’s built-in sequences.
- A growing plethora of scientific and mathematical Python-based packages are using NumPy arrays; though these typically support Python-sequence input, they convert such input to NumPy arrays prior to processing, and they often output NumPy arrays. In other words, in order to efficiently use much (perhaps even most) of today’s scientific/mathematical Python-based software, just knowing how to use Python’s built-in sequence types is insufficient - one also needs to know how to use NumPy arrays.

#### Why use NumPy?

Python lists are excellent, general-purpose containers. They can be “heterogeneous”, meaning that they can contain elements of a variety of types, and they are quite fast when used to perform individual operations on a handful of elements.

Depending on the characteristics of the data and the types of operations that need to be performed, other containers may be more appropriate; by exploiting these characteristics, we can improve speed, reduce memory consumption, and offer a high-level syntax for performing a variety of common processing tasks. NumPy shines when there are large quantities of “homogeneous” (same-type) data to be processed on the CPU.

What is an “array”?

In computer programming, an array is a structure for storing and retrieving data. We often talk about an array as if it were a grid in space, with each cell storing one element of the data. For instance, if each element of the data were a number, we might visualize a “one-dimensional” array like a list:

![image.png](attachment:image.png)

A two-dimensional array would be like a table:

![image.png](attachment:image.png)



A three-dimensional array would be like a set of tables, perhaps stacked as though they were printed on separate pages. In NumPy, this idea is generalized to an arbitrary number of dimensions, and so the fundamental array class is called `ndarray`: it represents an “N-dimensional array”.

Most NumPy arrays have some restrictions. For instance:

- All elements of the array must be of the same type of data.
- Once created, the total size of the array can’t change.
- The shape must be “rectangular”, not “jagged”; e.g., each row of a two-dimensional array must have the same number of columns.

When these conditions are met, NumPy exploits these characteristics to make the array faster, more memory efficient, and more convenient to use than less restrictive data structures.

For the remainder of this document, we will use the word “array” to refer to an instance of ndarray.

## Loading Numpy

`as` is alias. The convention is to use a shorter name `np` to represent the packae `numpy`. So you will then use `np` to call functions in `numpy`.

In [1]:
import numpy as np

## Array fundamentals

In [2]:
a = np.array([1, 2, 3, 4, 5])

print(a)

[1 2 3 4 5]


To index an element in the array

In [3]:
a[0]

1

The array is mutable, to modify the array

In [4]:
a[0] = 10

print(a)

[10  2  3  4  5]


To index elements from index 3 all the way to the end

In [5]:
b = a[3:]

print(b)

[4 5]


Two- and higher-dimensional arrays can be initialized from nested Python lists

In [6]:
arr_2d = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])

print(arr_2d)

[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]


In NumPy, a dimension of an array is sometimes referred to as an “axis”. 

Another difference between an array and a list of lists is that an element of the array can be accessed by specifying the index along each axis within a single set of square brackets, separated by commas. For instance, the element `8` is in row `1` and column `3`:

In [7]:
arr_2d[1, 3]

8

Note: It is familiar practice in mathematics to refer to elements of a matrix by the row index first and the column index second.

## Array attributes

The number of dimensions of an array is contained in the `ndim` attribute.

In [8]:
arr_2d.ndim #This is a 2 dimensional arary

2

The `shape` of an array is a tuple of non-negative integers that specify the number of elements along each dimension.

In [9]:
arr_2d.shape #This array has 3 rows and 4 columns

(3, 4)

The fixed, total number of elements in array is contained in the `size` attribute.

In [10]:
arr_2d.size

12

Arrays are typically “homogeneous”, meaning that they contain elements of only one “data type”. The data type is recorded in the dtype attribute.

In [11]:
arr_2d.dtype # "int" for integer, "64" for 64-bit

dtype('int64')

## How to create a basic array

In [12]:
np.zeros(2) # an array of 2 zeros

array([0., 0.])

In [13]:
np.ones(5) # an array of 5 ones

array([1., 1., 1., 1., 1.])

In [14]:
np.arange(4) # an array of a sequence from 0 to 4 (excluding 4)

array([0, 1, 2, 3])

And even an array that contains a range of evenly spaced intervals. To do this, you will specify the first number, last number, and the step size.

In [15]:
np.arange(2, 9, 2) # from 2 to 9 with a step of 2


array([2, 4, 6, 8])

You can also use `np.linspace()` to create an array with values that are spaced linearly in a specified interval:

In [16]:
np.linspace(0, 10, num=5) # between 0 and 10 have 5 numbers that are equally spaced

array([ 0. ,  2.5,  5. ,  7.5, 10. ])

## Adding, removing, and sorting elements

Sorting an array is simple with np.sort(). You can specify the axis, kind, and order when you call the function.

In [17]:
messy_arr = np.array([2, 1, 5, 3, 7, 4, 6, 8])

In [18]:
np.sort(messy_arr)

array([1, 2, 3, 4, 5, 6, 7, 8])

To concatenate two arrays

In [19]:
a = np.array([1, 2, 3, 4])
b = np.array([5, 6, 7, 8])

In [20]:
np.concatenate([a, b])

array([1, 2, 3, 4, 5, 6, 7, 8])

You can also concatenate array along a specificed axis

In [21]:
x = np.array([[1, 2], [3, 4]])
y = np.array([[5, 6]])

np.concatenate([x, y], axis=0) # axis=0 refers to merge two arrays row-wise

array([[1, 2],
       [3, 4],
       [5, 6]])

You can stack arrays vertically or horizontally

In [22]:
a1 = np.array([[1, 1],
               [2, 2]])

a2 = np.array([[3, 3],
               [4, 4]])

np.vstack([a1, a2])

array([[1, 1],
       [2, 2],
       [3, 3],
       [4, 4]])

In [23]:
np.hstack([a1, a2])

array([[1, 1, 3, 3],
       [2, 2, 4, 4]])

## Reshaping arrays

We can use `arr.reshape()` to give a new shape to an array without changing the data. Just remember that when you use the reshape method, the array you want to produce needs to have the same number of elements as the original array. If you start with an array with 12 elements, you’ll need to make sure that your new array also has a total of 12 elements.

If you start with this array:

In [24]:
a = np.arange(6)
print(a)

[0 1 2 3 4 5]


In [25]:
a.shape

(6,)

In [26]:
a.reshape(2,3) #2 rows and 3 columns

array([[0, 1, 2],
       [3, 4, 5]])

In [27]:
a.reshape(3,2) # 3 rows and 2 columns

array([[0, 1],
       [2, 3],
       [4, 5]])

In [28]:
# -1 refers to the proper (6 elements, 2 rows, so there should be 3 columns) length for the dimension
a.reshape(2, -1) 

array([[0, 1, 2],
       [3, 4, 5]])

## Indexing and slicing

Similar to python lists:

In [29]:
data = np.array([1, 2, 3, 4])

In [30]:
print(data[1])
print(data[0:2])
print(data[1:])
print(data[-2:])

2
[1 2]
[2 3 4]
[3 4]


If you want to select values from your array that fulfill certain conditions, it’s straightforward with NumPy.

In [31]:
a = np.array([[1 , 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])

Get all of the values in the array that are less than 5.

In [32]:
print(a[a < 5])

[1 2 3 4]


All the numbers above 5

In [33]:
print(a[a > 5])

[ 6  7  8  9 10 11 12]


Or you can select elements that satisfy two conditions using the & (and) and | (or) operators:

In [34]:
#give me numbers >2 and <11
condition = (a > 2) & (a < 11)

print(a[condition])

[ 3  4  5  6  7  8  9 10]


## Basic array operations

For arrays with the same dimension, performing math operations will do element-wise calculation

In [35]:
data = np.array([1, 2])

ones = np.ones(2) # [2, 2]

data_add_ones = data + ones

data_minus_ones = data - ones

print(data_add_ones)
print(data_minus_ones)

[2. 3.]
[0. 1.]


In [36]:
print(data * data)

[1 4]


sum all the elements in an array

In [37]:
b = np.array([[1, 1], [2, 2]])

np.sum(b)

6

Sum along an axis

In [38]:
b.sum(axis=0)

array([3, 3])

It worth mentioning `np.sum(b, axis=0)` is equivalent to `b.sum(axis=0)`

In [39]:
b.sum(axis=1)

array([2, 4])

some other useful functions

In [40]:
print(np.max(b))
print(np.min(b))
print(np.mean(b))

2
1
1.5


Operations can be applied to a specific axis

In [41]:
print(np.max(b, axis=0)) # maximum for all rows in each column
print(np.min(b, axis=0)) #minimum for all rows in each column
print(np.mean(b, axis=0)) #mean for all rows in each column

[2 2]
[1 1]
[1.5 1.5]


## Generating random number

In [42]:
np.random.uniform() # from unifrom distribution 0 to 1

0.9938720536633547

In [43]:
np.random.uniform(size=10) # with size of 10

array([0.01432097, 0.94342553, 0.95038139, 0.379027  , 0.28078461,
       0.89778769, 0.83951024, 0.65497895, 0.13702502, 0.46161897])

In [44]:
np.random.normal() # from standard normal distribution with 0 mean and std of 1

1.433148534584288

In [45]:
np.random.normal(size=10) # with size of 10

array([ 1.20153157,  2.51306873,  1.06530408,  0.52067429, -0.89987521,
        1.04300725, -0.50070828,  0.85144937, -0.53259375, -0.4463089 ])

### Use a random seed to control the generated random number. 

In [46]:
np.random.seed(11)

np.random.randn()

1.7494547413051793

## How to get unique items and counts

In [47]:
a = np.array([11, 11, 12, 13, 14, 15, 16, 17, 12, 13, 11, 14, 18, 19, 20])

In [48]:
unique_values = np.unique(a)
print(unique_values)

[11 12 13 14 15 16 17 18 19 20]


For a 2-D array, you can find unique values along a certain axis

In [49]:
a_2d = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12], [1, 2, 3, 4]])

In [50]:
unique_values = np.unique(a_2d)
print(unique_values)

[ 1  2  3  4  5  6  7  8  9 10 11 12]


In [51]:
unique_rows = np.unique(a_2d, axis=0) # unique numbers for all rows in each column
print(unique_rows)

[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]


## Transposing

In [52]:
a = np.array([[1, 2, 3],
               [4, 5, 6]])

In [53]:
a

array([[1, 2, 3],
       [4, 5, 6]])

In [54]:
a.T

array([[1, 4],
       [2, 5],
       [3, 6]])

## Revsering

In [55]:
arr = np.array([1, 2, 3, 4, 5, 6, 7, 8])

In [56]:
np.flip(arr)

array([8, 7, 6, 5, 4, 3, 2, 1])

For 2D array

In [57]:
arr_2d = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])

arr_2d

array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12]])

Revsersing it will reverse all the elements while keeping the dimension of the array

In [58]:
np.flip(arr_2d)

array([[12, 11, 10,  9],
       [ 8,  7,  6,  5],
       [ 4,  3,  2,  1]])

Reverse along an axis

In [59]:
np.flip(arr_2d, axis=0) # Reverse all elements in each column

array([[ 9, 10, 11, 12],
       [ 5,  6,  7,  8],
       [ 1,  2,  3,  4]])

In [60]:
np.flip(arr_2d, axis=1) # Reverse all elemenets in each row

array([[ 4,  3,  2,  1],
       [ 8,  7,  6,  5],
       [12, 11, 10,  9]])

### Flattening

In [61]:
arr_2d.flatten()

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12])

## Use `help()` function to see the documents

In [62]:
help(np.sort)

Help on function sort in module numpy:

sort(a, axis=-1, kind=None, order=None)
    Return a sorted copy of an array.
    
    Parameters
    ----------
    a : array_like
        Array to be sorted.
    axis : int or None, optional
        Axis along which to sort. If None, the array is flattened before
        sorting. The default is -1, which sorts along the last axis.
    kind : {'quicksort', 'mergesort', 'heapsort', 'stable'}, optional
        Sorting algorithm. The default is 'quicksort'. Note that both 'stable'
        and 'mergesort' use timsort or radix sort under the covers and, in general,
        the actual implementation will vary with data type. The 'mergesort' option
        is retained for backwards compatibility.
    
        .. versionchanged:: 1.15.0.
           The 'stable' option was added.
    
    order : str or list of str, optional
        When `a` is an array with fields defined, this argument specifies
        which fields to compare first, second, etc.  A si