<a href="https://colab.research.google.com/github/DalsinaLorda/Chordzy/blob/main/aibootcamp_numpy.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

![](https://drive.google.com/uc?export=view&id=15_DdTY6GOFnj4oLK3vf9zS6UQoL4osGW)

# NumPy

NumPy (short for Numerical Python) provides an efficient interface to store and operate on dense data buffers. In some ways, NumPy arrays are like Python's built-in list type, but NumPy arrays provide much more efficient storage and data operations as the arrays grow larger in size. NumPy arrays form the core of nearly the entire ecosystem of data science tools in Python, so time spent learning to use NumPy effectively will be valuable no matter what aspect of data science interests you.

If you followed the advice outlined in the Preface and installed the Anaconda stack, you already have NumPy installed and ready to go. If you're more the do-it-yourself type, you can go to http://www.numpy.org/ and follow the installation instructions found there. Once you do, you can import NumPy and double-check the version:

## Libraries


In [None]:
import numpy as np

## Python Data Type Representation

An integer in Python 3.4 is represented in C as follows:

```
struct _longobject {
    long ob_refcnt;
    PyTypeObject *ob_type;
    size_t ob_size;
    long ob_digit[1];
};
```

And the four pieces of data it contains are:

*   **ob_refcnt**, a reference count that
helps Python silently handle memory allocation and deallocation
*   **ob_type**, which encodes the type of the variable
*   **ob_size**, which specifies the size of the following data members
*   **ob_digit**, which contains the actual integer value that we expect the Python variable to represent.


This means that there is some overhead in storing an integer in Python as compared to an integer in a compiled language like C, as illustrated in the following figure:

![](https://drive.google.com/uc?export=view&id=1KFT_BZZr-jg1NiGsMiJ32XwlZPWhsN8H)

Here PyObject_HEAD is the part of the structure containing the reference count, type code, and other pieces mentioned before.

This extra information in the Python integer structure is what allows Python to be coded so freely and dynamically. All this additional information in Python types comes at a cost, however, which becomes especially apparent in structures that combine many of these objects.

The overhead becomes more pronounce when working with lists of items.  For example, below is the difference between a NumPy array and Python list:


![](https://drive.google.com/uc?export=view&id=1s3r--lKjf5qjJLoK54TMTakRRz7bsN3w)



## Data Types

NumPy arrays contain values of a single type, so it is important to have detailed knowledge of what those types and their limitations. Because NumPy is built in C, the types will be familiar to users of C, Fortran, and other related languages.

```
Data type	Description
bool_	Boolean (True or False) stored as a byte
int_	Default integer type (same as C long; normally either int64 or int32)
intc	Identical to C int (normally int32 or int64)
intp	Integer used for indexing (same as C ssize_t; normally either int32 or int64)
int8	Byte (-128 to 127)
int16	Integer (-32768 to 32767)
int32	Integer (-2147483648 to 2147483647)
int64	Integer (-9223372036854775808 to 9223372036854775807)
uint8	Unsigned integer (0 to 255)
uint16	Unsigned integer (0 to 65535)
uint32	Unsigned integer (0 to 4294967295)
uint64	Unsigned integer (0 to 18446744073709551615)
float_	Shorthand for float64.
float16	Half precision float: sign bit, 5 bits exponent, 10 bits mantissa
float32	Single precision float: sign bit, 8 bits exponent, 23 bits mantissa
float64	Double precision float: sign bit, 11 bits exponent, 52 bits mantissa
complex_	Shorthand for complex128.
complex64	Complex number, represented by two 32-bit floats
complex128	Complex number, represented by two 64-bit floats
```


## Arrays



**NumPy Array vs List**

NumPy Array:
*   Fixed Data Type
*   Can change datatype
*   No need to typecasting checking
*   Contiguous memory
*   E.g: a = [1, 3, 4, 5]


List:
*   Heterogenous Data Type
*   Cab't change datatype
*   Need typecasting checking
*   Non-contiguous memory
*   E.g: a = [1, 3, 'abc', 5]


![](https://drive.google.com/uc?export=view&id=1WyUBdIrAMiuo3M6CWbLe7PnYSG6xgbjV)

Numpy arrays can have any number of dimensions and different lengths along each dimension. We can inspect the length along each dimension using the .shape property of an array.

### Creation

The first step in manipulating arrays is creating them. NumPy provides various functions to create arrays with different shapes and data types. Here are some of the most commonly used array creation functions:


```
np.array: creates an array from a Python list or tuple
np.zeros: creates an array filled with zeros
np.ones: creates an array filled with ones
np.empty: creates an uninitialized array with random values
np.arange: creates an array with evenly spaced values
np.linspace: creates an array with a specified number of evenly spaced values between a start and end point
```

Arrays in NumPy (a.k.a. ndarray) can be created by passing a python list to it and using `np.array()`.  

In [8]:
# create a numpy array from a python list
import numpy as np
my_array = np.array( [1,2,3,4,5] )
print(my_array)

# create a numpy array from a python list
my_array = np.array( [1,2,3,4,5], dtype=int )
print(my_array)

# create a numpy array from a python list and specifying the data type
my_array = np.array( [1,2,3,4,5], dtype='float32')
print(my_array)

# create a numpy array of 5 elements initialized to the numbers 0 through 4
my_array = np.arange(5)
print(my_array)

print(np.zeros(5))


[1 2 3 4 5]
[1 2 3 4 5]
[1. 2. 3. 4. 5.]
[0 1 2 3 4]
[0. 0. 0. 0. 0.]


There are often cases when we want NumPy to initialize the values of the array for us. NumPy provides methods like ones(), zeros(), and random.random() for these cases. We just pass them the number of elements we want it to generate:

In [None]:
# create an array of 5 integers initialized to 1
my_array = np.ones(5, dtype=int)
print(my_array)

[1 1 1 1 1]


In [9]:
# create an array of 5 integers initialized to 0
my_array = np.zeros(5, dtype=int)
print(my_array)


[0 0 0 0 0]


In [None]:
# create an array of 5 elements initialized to random floating numbers
my_array = np.random.random(5)
print(my_array)

[0.44296732 0.76865506 0.79510828 0.11946297 0.32009556]


In [10]:
# Create an array of 5 elements filled with 3.14
my_array = np.full(5,  3.14)
print(my_array)

[3.14 3.14 3.14 3.14 3.14]


In [None]:
# Create an array filled with a linear sequence
# Starting at 0, ending at 20, stepping by 2
# (this is similar to the built-in range() function)
my_array = np.arange(0, 20, 2)
print(my_array)

[ 0  2  4  6  8 10 12 14 16 18]


In [None]:
# Create an array of five values evenly spaced between 0 and 1
my_array = np.linspace(0, 1, 5)
print(my_array)

[0.   0.25 0.5  0.75 1.  ]


In [None]:
# Create an uninitialized array of five integers
# The values will be whatever happens to already exist at that memory location
my_array = np.empty(5)
print(my_array)

[3.14 3.14 3.14 3.14 3.14]


Finally, unlike Python lists, NumPy arrays can explicitly be multi-dimensional; here's one way of initializing a multidimensional array using a list of lists:


In [12]:
# Create a 3x5 floating-point array filled with ones
my_array = np.ones((3, 5), dtype=int)
print(my_array)

# Create a 3x5 array filled with 3.14
my_array = np.full((3, 5), 6)
print(my_array)

[[1 1 1 1 1]
 [1 1 1 1 1]
 [1 1 1 1 1]]
[[6 6 6 6 6]
 [6 6 6 6 6]
 [6 6 6 6 6]]


### Array Attributes

First let's discuss some useful array attributes. We'll start by defining three random arrays, a one-dimensional, two-dimensional, and three-dimensional array. We'll use NumPy's random number generator, which we will seed with a set value in order to ensure that the same random arrays are generated each time this code is run:


In [14]:
np.random.seed(0)  # seed for reproducibility

x1 = np.random.randint(10, size=6)          # One-dimensional array
x2 = np.random.randint(10, size=(3, 4))     # Two-dimensional array
x3 = np.random.randint(10, size=(3, 4, 5))  # Three-dimensional array

In [15]:
print("x3 ndim: ", x3.ndim)               # the number of dimensions
print("x3 shape:", x3.shape)              # the size of each dimension
print("x3 size: ", x3.size)               # he total size of the array
print("dtype:", x3.dtype)                 # the data type of the array
print("itemsize:", x3.itemsize, "bytes")  # the size (in bytes) of each array element
print("nbytes:", x3.nbytes, "bytes")      # the total size (in bytes) of the array

x3 ndim:  3
x3 shape: (3, 4, 5)
x3 size:  60
dtype: int64
itemsize: 8 bytes
nbytes: 480 bytes


### Indexing

In [None]:
x = np.array([5, 0, 3, 3, 7, 9])
print(x)

[5 0 3 3 7 9]


In [None]:
y1 = x[0]  # access first element
y2 = x[4]  # access fith element
y3 = x[-1] # access last element
y4 = x[-2] # access 2nd to the last element
print(y1)
print(y2)
print(y3)
print(y4)

5
7
9
7


In [None]:
x = np.random.randint(10, size=(3, 4)) # create a 2 dimensional array
print( x )

[[4 3 4 4]
 [8 4 3 7]
 [5 5 0 1]]


In [None]:
y1 = x[0, 0] # access element at row 0, column 0
y2 = x[2, 1] # access element at row 2, column 1
y3 = x[2, -1] # access element at row 2, and last column
print( y1 )
print( y2 )
print( y3 )

4
5
1


### Slicing

Just as we can use square brackets to access individual array elements, we can also use them to access subarrays with the slice notation, marked by the colon (:) character. The NumPy slicing syntax follows that of the standard Python list; to access a slice of an array x, use this:


```
x[start:stop:step]
```

If any of these are unspecified, they default to the values start=0, stop=size of dimension, step=1. We'll take a look at accessing sub-arrays in one dimension and in multiple dimensions.

In [None]:
x = np.arange(10)
print(x)

[0 1 2 3 4 5 6 7 8 9]


In [None]:
y1 = x[:5]  # first five elements
y2 = x[5:]  # elements after index 5
y3 = x[4:7]  # middle sub-array
y4 = x[::2]  # every other element
y5 = x[1::2]  # every other element, starting at index 1
y6 = x[::-1]  # all elements, reversed
y7 = x[5::-2]  # reversed every other from index 5

print(y1)
print(y2)
print(y3)
print(y4)
print(y5)
print(y6)
print(y7)

[0 1 2 3 4]
[5 6 7 8 9]
[4 5 6]
[0 2 4 6 8]
[1 3 5 7 9]
[9 8 7 6 5 4 3 2 1 0]
[5 3 1]


In [None]:
x = np.random.randint(10, size=(3, 4))  # Two-dimensional array
print(x)

[[7 7 5 3]
 [7 4 0 8]
 [9 1 8 9]]


In [None]:
# two rows, three columns
y = x[:2, :3]
print(y)

[[3 2 3]
 [1 2 9]]


In [None]:
# all rows, every other column
y = x[:3, ::2]
print(y)

[[3 3]
 [1 9]
 [4 8]]


In [None]:
# Finally, subarray dimensions can even be reversed together:
y = x[::-1, ::-1]
print(y)

[[2 8 6 4]
 [1 9 2 1]
 [4 3 2 3]]


**Accessing array rows and columns**

One commonly needed routine is accessing of single rows or columns of an array. This can be done by combining indexing and slicing, using an empty slice marked by a single colon (:):

In [None]:
print(x[:, 0])  # first column of x
print(x[0, :])  # first row of x
print(x[0])     # equivalent to x[0, :]

[12  8  8]
[12  4  7  7]
[12  4  7  7]


**Subarrays as no-copy views**

One important–and extremely useful–thing to know about array slices is that they return views rather than copies of the array data. This is one area in which NumPy array slicing differs from Python list slicing: in lists, slices will be copies. Consider our two-dimensional array from before:

In [None]:
x = np.random.randint(10, size=(3, 4))  # Two-dimensional array
print(x)

[[2 7 8 5]
 [2 4 4 0]
 [9 6 3 9]]


In [None]:
# Let's extract a 2×2 subarray from x:
x_sub = x[:2, :2]
print(x_sub)

[[2 7]
 [2 4]]


Now if we modify this subarray, we'll see that the original array is changed! Observe in the following cell. This default behavior is actually quite useful: it means that when we work with large datasets, we can access and process pieces of these datasets without the need to copy the underlying data buffer.



In [None]:
x_sub[0, 0] = 99
print(x_sub)

[[99  7]
 [ 2  4]]


In [None]:
print(x)

[[99  7  8  5]
 [ 2  4  4  0]
 [ 9  6  3  9]]


### Creating copies of arrays

In [None]:
x = np.random.randint(10, size=(3, 4))  # Two-dimensional array
print(x)

[[8 5 6 3]
 [2 9 4 1]
 [7 1 1 4]]


In [None]:
x_sub_copy = x[:2, :2].copy()
print(x_sub_copy)

[[8 5]
 [2 9]]


In [None]:
# If we now modify this subarray, the original array is not touched:
x_sub_copy[0, 0] = 42
print(x_sub_copy)

[[42  5]
 [ 2  9]]


In [None]:
print(x)

[[8 5 6 3]
 [2 9 4 1]
 [7 1 1 4]]


### Reshaping of Arrays

In numpy arrays can be reshape from one dimension to another as long as the size of the original array will match the reshape array.  Here is an example:

In [None]:
# create an array integers initialized to numbers 1 through 16
my_array = np.arange(1, 17)
print('shape:', my_array.shape)
print('array:')
print(my_array)
print()

# reshape the array to a 4x4 grid
my_array = my_array.reshape((4, 4))
print('shape:', my_array.shape)
print('array:')
print(my_array)

shape: (16,)
array:
[ 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16]

shape: (4, 4)
array:
[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]
 [13 14 15 16]]


**Reshaping column vector to a row vector and vice versa**

In [None]:
x = np.array([1, 2, 3])

# row vector via reshape
x_row = x.reshape((1, 3))
print(x_row)
print(x_row.shape)

[[1 2 3]]
(1, 3)


In [None]:
# row vector via newaxis
x_row = x[np.newaxis, :]
print(x_row)
print(x_row.shape)

[[1 2 3]]
(1, 3)


In [None]:
# column vector via reshape
x_row = x.reshape((3, 1))
print(x_row)
print(x_row.shape)

[[1]
 [2]
 [3]]
(3, 1)


In [None]:
# column vector via newaxis
x_row = x[:, np.newaxis]
print(x_row)
print(x_row.shape)

[[1]
 [2]
 [3]]
(3, 1)


### Array Concatenation and Splitting

NumPy allows combining multiple arrays into one, and to conversely split a single array into multiple arrays.

In [None]:
x = np.array([1, 2, 3])
y = np.array([3, 2, 1])
z = np.concatenate([x, y])
print(z)

[1 2 3 3 2 1]


In [None]:
#You can also concatenate more than two arrays at once:
z = [99, 99, 99]
y = np.concatenate([x, y, z])
print(y)


[ 1  2  3  3  2  1 99 99 99]


In [None]:
# It can also be used for two-dimensional arrays:
grid = np.array([[1, 2, 3],
                 [4, 5, 6]])

# concatenate along the first axis
g = np.concatenate([grid, grid])

print(g)

[[1 2 3]
 [4 5 6]
 [1 2 3]
 [4 5 6]]


In [None]:
# concatenate along the second axis (zero-indexed)
g = np.concatenate([grid, grid], axis=1)
print(g)

[[1 2 3 1 2 3]
 [4 5 6 4 5 6]]


For working with arrays of mixed dimensions, it can be clearer to use the np.vstack (vertical stack) and np.hstack (horizontal stack) functions:

In [None]:
x = np.array([1, 2, 3])
grid = np.array([[9, 8, 7],
                 [6, 5, 4]])

# vertically stack the arrays
g = np.vstack([x, grid])

print(g)

[[1 2 3]
 [9 8 7]
 [6 5 4]]


In [None]:
# horizontally stack the arrays
y = np.array([[99],
              [99]])
g = np.hstack([grid, y])

print(g)

[[ 9  8  7 99]
 [ 6  5  4 99]]


### Splitting of arrays

The opposite of concatenation is splitting, which is implemented by the functions np.split, np.hsplit, and np.vsplit. For each of these, we can pass a list of indices giving the split points:

In [None]:
x = [1, 2, 3, 99, 99, 3, 2, 1]

# Notice that N split-points,
# leads to N + 1 subarrays. The
# related functions np.hsplit and
# np.vsplit are similar:
x1, x2, x3 = np.split(x, [3, 5])
print(x1, x2, x3)

[1 2 3] [99 99] [3 2 1]


In [None]:
grid = np.arange(16).reshape((4, 4))
print(grid)

[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]
 [12 13 14 15]]


In [None]:
upper, lower = np.vsplit(grid, [2])
print(upper)
print(lower)

[[0 1 2 3]
 [4 5 6 7]]
[[ 8  9 10 11]
 [12 13 14 15]]


In [None]:
left, right = np.hsplit(grid, [2])
print(left)
print(right)

[[ 0  1]
 [ 4  5]
 [ 8  9]
 [12 13]]
[[ 2  3]
 [ 6  7]
 [10 11]
 [14 15]]


### Array arithmetic

Keep in mind that, unlike Python lists, NumPy arrays have a fixed type. This means, for example, that if you attempt to insert a floating-point value to an integer array, the value will be silently truncated. Don't be caught unaware by this behavior!

In [None]:
x = np.arange(4)
print("x     =", x)
print("x + 5 =", x + 5)
print("x - 5 =", x - 5)
print("x * 2 =", x * 2)
print("x / 2 =", x / 2)
print("x // 2 =", x // 2)  # floor division
print("-x     = ", -x)
print("x ** 2 = ", x ** 2)
print("x % 2  = ", x % 2)

x     = [0 1 2 3]
x + 5 = [5 6 7 8]
x - 5 = [-5 -4 -3 -2]
x * 2 = [0 2 4 6]
x / 2 = [0.  0.5 1.  1.5]
x // 2 = [0 0 1 1]
-x     =  [ 0 -1 -2 -3]
x ** 2 =  [0 1 4 9]
x % 2  =  [0 1 0 1]


In [None]:
#Absolute value
x = np.array([-2, -1, 0, 1, 2])
y = np.abs(x)
print(x)

[-2 -1  0  1  2]


### Array Aggregation Functions

In [None]:
x = np.random.random( (4,4) )
print("x:")
print(x)
print("min = ", np.min(x) )
print("max = ", np.max(x) )
print("sum = ", np.sum(x) )
print("std = ", np.std(x) )
print("mean = ", np.mean(x) )
print("average = ", np.average(x) )
print("min axis 0 = ", np.min(x, axis=0) )
print("min axis 1 = ", np.min(x, axis=1) )


x:
[[0.08357819 0.54177736 0.92043156 0.22793658]
 [0.92179611 0.562045   0.94920277 0.13207103]
 [0.82430124 0.12445742 0.69142378 0.81835372]
 [0.08844248 0.8824531  0.48930418 0.7535714 ]]
min =  0.08357819423313484
max =  0.9492027680242644
sum =  9.011145914786976
std =  0.3206641113943162
mean =  0.563196619674186
average =  0.563196619674186
min axis 0 =  [0.08357819 0.12445742 0.48930418 0.13207103]
min axis 1 =  [0.08357819 0.13207103 0.12445742 0.08844248]


### Array Broadcasting

Broadcasting in NumPy allows performing element-wise operations on arrays of different shapes under certain conditions.



In [None]:
a = np.array([0, 1, 2])
b = np.array([5, 5, 5])
print('a:',a)
print('b:',b)

a: [0 1 2]
b: [5 5 5]


In [None]:
c = a + 5
print('a:',a)
print('c:',c)

a: [0 1 2]
c: [5 6 7]


In [None]:
m = np.ones((3, 3))
c = m + a
print('a:',a)
print('m:')
print(m)
print('c:')
print(c)

a: [0 1 2]
m:
[[1. 1. 1.]
 [1. 1. 1.]
 [1. 1. 1.]]
c:
[[1. 2. 3.]
 [1. 2. 3.]
 [1. 2. 3.]]


### Comparisons


In [None]:
x = np.array([1, 2, 3, 4, 5])

In [None]:
c = x < 3  # less than
print(c)

[ True  True False False False]


In [None]:
c = x > 3  # greater than
print(c)

[False False False  True  True]


### Filtering an Array

You can filter a numpy array by creating a list or an array of boolean values indicative of whether or not to keep the element in the corresponding array. This method is called boolean mask slicing. For example, if you filter the array [1, 2, 3] with the boolean list [True, False, True], the filtered array would be [1, 3].

In [None]:
# create a numpy array
arr = np.array([1, 4, 2, 7, 9, 3, 5, 8])

# boolean array of which elements to keep, here elements less than 4
mask = arr < 4

# filter the array
arr_filtered = arr[mask]

# above filtering in a single line
arr_filtered = arr[arr < 4]

print('mask:', mask)
print('arr:', arr)
print('arr_filtered:', arr_filtered)

mask: [ True False  True False False  True False False]
arr: [1 4 2 7 9 3 5 8]
arr_filtered: [1 2 3]


Alternatively, you can also use np.where() to get the indexes of the elements to keep and filter the numpy array based on those indexes. The following is the syntax:

In [None]:
# create a numpy array
arr = np.array([1, 4, 2, 7, 9, 3, 5, 8])

# indexes to keep based on the condition, here elements less than 4
indexes_to_keep = np.where(arr < 4)

# filter the array
arr_filtered = arr[indexes_to_keep]

# above filtering in a single line
arr_filtered = arr[np.where(arr < 4)]

print('indexes_to_keep:', indexes_to_keep)
print('arr:', arr)
print('arr_filtered:', arr_filtered)

indexes_to_keep: (array([0, 2, 5]),)
arr: [1 4 2 7 9 3 5 8]
arr_filtered: [1 2 3]


**Filter array based on two conditions**

To filter the array on multiple conditions, you can combine the conditions together using parenthesis () and the “and” & operator – ((condition1) & (condition2) & ...)

Let’s filter the array “arr” on two conditions – greater than 5 and less than 9 using boolean masking.

In [None]:
# create a numpy array
arr = np.array([1, 4, 2, 7, 9, 3, 5, 8])

# filter array
arr_filtered = arr[(arr > 5) & (arr < 9)]

print('arr:', arr)
print('arr_filtered:', arr_filtered)

arr: [1 4 2 7 9 3 5 8]
arr_filtered: [7 8]


### Miscelaneous

**Get Index of Max Value in Array**

In [None]:
# create numpy array
ar = np.array([1, 2, 5, 3, 4])
max_val = ar.argmax()

# get index of max value in array
print('index of max val:', max_val)

index of max val: 2


**Create a diagonal matrix**

In [None]:
# create a 1d array of diagonal elements
ar = np.array([1, 2, 3])
# create a diagonal matrix
res = np.diag(ar)
# display the returned matrix
print(res)

[[1 0 0]
 [0 2 0]
 [0 0 3]]


**Extract the diagonal elements of a numpy array**

In [None]:
# create a 2D numpy array
arr = np.array([
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9]
])
# get the diagonal elements
res = np.diag(arr)
# display the diagonal elements
print(res)

[1 5 9]


**Make All Negative Values Zero in Array**

In [None]:
# create a numpy array
x = np.array([-2, -1, 0, 1, 2, 3, -4])

# set negative values to zero
x[x < 0] = 0

# display the array
print(x)

[0 0 0 1 2 3 0]


**Splitting an array**

In [None]:
x = np.array([1, 2, 3, 4, 5, 6])

# split arr into 3 arrays
arrays = np.split(x, 3)

for a in arrays:
  print(a)

[1 2]
[3 4]
[5 6]


**Identity matrix**

In [None]:
x = np.identity(4)
print(x)

[[1. 0. 0. 0.]
 [0. 1. 0. 0.]
 [0. 0. 1. 0.]
 [0. 0. 0. 1.]]


**Saving an array to a file**

In [None]:
x = np.array([22, 31, 0, 13, 2, 3, 74])

# save the array to file
np.save('myarray.npy', x)

**Load an array from a file**

In [None]:
# load an array from file
x = np.load('myarray.npy')

# print the loaded array data
print(x)

[22 31  0 13  2  3 74]
