![](../docs/banner.png)

Book: Python Data Science Handbook:

https://github.com/jakevdp/PythonDataScienceHandbook

Chapter: 02.02-The-Basics-Of-NumPy-Arrays.ipynb

# Chapter: Introduction to NumPy

<h2>Chapter Outline<span class="tocSkip"></span></h2>
<hr>
<div class="toc"><ul class="toc-item"><li><span><a href="#1.-Introduction-to-NumPy" data-toc-modified-id="1.-Introduction-to-NumPy-1">1. Introduction to NumPy</a></span></li><li><span><a href="#2.-NumPy-Arrays" data-toc-modified-id="2.-NumPy-Arrays-2">2. NumPy Arrays</a></span></li><li><span><a href="#3.-Array-Operations-and-Broadcasting" data-toc-modified-id="3.-Array-Operations-and-Vectorization-3">3. Array Operations and Broadcasting</a></span></li><li><span><a href="#4.-Indexing-and-slicing" data-toc-modified-id="4.-Indexing-and-slicing-4">4. Indexing and slicing</a></span></li><li><span><a href="#5.-More-Useful-NumPy-Functions" data-toc-modified-id="5.-More-Useful-NumPy-Functions-5">5. More Useful NumPy Functions</a></span></li></ul></div>

## Chapter Learning Objectives
<hr>

- Use NumPy to create arrays with built-in functions inlcuding `np.array()`, `np.arange()`, `np.linspace()` and `np.full()`, `np.zeros()`, `np.ones()`
- Be able to access values from a NumPy array by numeric indexing and slicing and boolean indexing
- Perform mathematical operations on and with arrays.
- Understand how to use built-in NumPy functions like `np.sum()`, `np.mean()`, `np.log()` as stand alone functions or as methods of numpy arrays.

## Introduction to NumPy

NumPy stands for "Numerical Python" and it is the standard Python library used for working with arrays (i.e., vectors & matrices), linear algerba, and other numerical computations. NumPy is written in C, making NumPy arrays faster and more memory efficient than Python lists or arrays.

NumPy can be installed using `conda` (if not already):

```
conda install numpy
```

*   NumPy is a Python package and it stands for numerical python

*   It is a fundamental package for numerical computations in Python

* Supports N-dimensional array objects that can be used for processing multidimensional data

* NumPy array are by default Homogeneous, which means data inside an array must be of the same Datatype.

* NumPy offers comprehensive mathematical functions, random number generators, linear algebra routines, Fourier transforms, and more.

* Supports different data-types


# Array

Arrays are "n-dimensional" data structures that can contain all the basic Python data types, e.g., floats, integers, strings etc, but works best with numeric data.

* An array is a data structure that stores values of same data type
* Lists can contain values corresponding to different data types
* Arrays in python can only contain values corresponding to same data type

# NumPy Array

* A numpy array is a grid of values, all of the same type, and is indexed by a tuple of non-negative integers

* The number of dimensions is the rank of the array

* The shape of an array is a tuple of integers giving the size of the array along each dimension

* NumPy arrays ("ndarrays") are homogenous, which means that items in the array should be of the same type.  

Usually we import numpy with the alias `np` (to avoid having to type out n-u-m-p-y every time we want to use it).

Note: `import` is a keyword in Python  that means:

"Bring in code from somewhere else so I can use it here."

More specifically:

`import` allows you to reuse functionality (like functions, classes, constants) that someone else (or you) has written, without having to rewrite it yourself. You are telling Python to bring in the `numpy` library, and let me refer to it as `np` so I don’t have to type `numpy` every time."

In [1]:
import numpy as np

A numpy array is sort of like a list:

In [2]:
my_list = [11, 2, -3, 40, 77]
my_list

[11, 2, -3, 40, 77]

In [3]:
my_array = np.array([1, 2, 3, 4, 5])
my_array

array([1, 2, 3, 4, 5])

But it has the type `ndarray`:

In [4]:
type(my_array)

numpy.ndarray

Unlike a list, arrays can only hold a single type (usually numbers):

In [5]:
my_list = [1, "hi"]
my_list

[1, 'hi']

In [6]:
my_array = np.array((1, "hi"))
my_array

array(['1', 'hi'], dtype='<U21')

In [7]:
my_array = np.array([1, "hi"])
my_array  #you can use either square brackets or curved brackets when creating a NumPy array

array(['1', 'hi'], dtype='<U21')

Above: NumPy converted the integer `1` into the string `'1'`!

In [8]:
my_array = np.array([1, 2.9, 3, 4, -5])
my_array #all numbers are converted to float

array([ 1. ,  2.9,  3. ,  4. , -5. ])

### Summary

ndarrays are typically created using two main methods:
1. From existing data (usually lists or tuples) using `np.array()`, like we saw above; or,
2. Using built-in functions such as `np.arange()`, `np.linspace()`, `np.zeros()`, etc.

### Creating multi-dimensional arrays

In [9]:
my_list = [1, 2, 3]
np.array(my_list)

array([1, 2, 3])

Just like you can have "multi-dimensional lists" (by nesting lists in lists), you can have multi-dimensional arrays (indicated by double square brackets `[[ ]]`):

In [10]:
list_2d = [[1, 2], [3, 4], [5, 6]]
list_2d

[[1, 2], [3, 4], [5, 6]]

In [11]:
array_2d = np.array(list_2d)
array_2d

array([[1, 2],
       [3, 4],
       [5, 6]])

In [12]:
list_2d_mix = [[1, 2], [3.1, 4.5], [5, 6]]

In [13]:
array_2d_mix = np.array(list_2d_mix)
array_2d_mix

array([[1. , 2. ],
       [3.1, 4.5],
       [5. , 6. ]])

You'll probably use the built-in numpy array creators quite often. Here are some common ones (hint - don't forget to check the docstrings for help with these functions, if you're in Jupyter, remeber the `shift + tab` shortcut):

In [14]:
np.arange(1, 5)  # from 1 inclusive to 5 exclusive

array([1, 2, 3, 4])

In [15]:
np.arange(0, 11, 2)  # step by 2 from 0 to 10

array([ 0,  2,  4,  6,  8, 10])

In [16]:
np.linspace(0, 10, 5)  # 5 equally spaced points between 0 and 10

array([ 0. ,  2.5,  5. ,  7.5, 10. ])

In [17]:
np.ones((2, 2))  # an array of ones with size 2 x 2

array([[1., 1.],
       [1., 1.]])

In [18]:
np.zeros((2, 3))  # an array of zeros with size 2 x 3

array([[0., 0., 0.],
       [0., 0., 0.]])

In [19]:
np.full((3, 3), 3.14)  # an array of the number 3.14 with size 3 x 3

array([[3.14, 3.14, 3.14],
       [3.14, 3.14, 3.14],
       [3.14, 3.14, 3.14]])

In [20]:
np.full((3, 3, 3), 3.14)  # an array of the number 3.14 with size 3 x 3 x 3

array([[[3.14, 3.14, 3.14],
        [3.14, 3.14, 3.14],
        [3.14, 3.14, 3.14]],

       [[3.14, 3.14, 3.14],
        [3.14, 3.14, 3.14],
        [3.14, 3.14, 3.14]],

       [[3.14, 3.14, 3.14],
        [3.14, 3.14, 3.14],
        [3.14, 3.14, 3.14]]])

In [21]:
np.random.rand(5, 2)  # random numbers uniformly distributed from 0 to 1 with size 5 x 2

array([[0.38039852, 0.85880644],
       [0.44789973, 0.65986574],
       [0.87398722, 0.38884491],
       [0.33694582, 0.37043821],
       [0.12225254, 0.75943558]])

## NumPy Array Attributes
There are many useful attributes/methods that can be called off numpy arrays.

List of useful attributes/methods that can be called off numpy arrays:

In [22]:
print(dir(np.ndarray))

['T', '__abs__', '__add__', '__and__', '__array__', '__array_finalize__', '__array_function__', '__array_interface__', '__array_namespace__', '__array_priority__', '__array_struct__', '__array_ufunc__', '__array_wrap__', '__bool__', '__class__', '__class_getitem__', '__complex__', '__contains__', '__copy__', '__deepcopy__', '__delattr__', '__delitem__', '__dir__', '__divmod__', '__dlpack__', '__dlpack_device__', '__doc__', '__eq__', '__float__', '__floordiv__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__gt__', '__hash__', '__iadd__', '__iand__', '__ifloordiv__', '__ilshift__', '__imatmul__', '__imod__', '__imul__', '__index__', '__init__', '__init_subclass__', '__int__', '__invert__', '__ior__', '__ipow__', '__irshift__', '__isub__', '__iter__', '__itruediv__', '__ixor__', '__le__', '__len__', '__lshift__', '__lt__', '__matmul__', '__mod__', '__mul__', '__ne__', '__neg__', '__new__', '__or__', '__pos__', '__pow__', '__radd__', '__rand__', '__rdivmod__', '__reduce__',

## Dimension of an Array

*Dimension tells you how many directions an array has: A scalar has 0 dimensions, a vector has 1 dimension, a matrix has 2 dimensions, and so on.
You can access the dimension of an array using the `.ndim `attribute.

*Shape tells you how many elements are along each direction/axis: For example, an array with shape (3, 4) has 3 rows and 4 columns.
You can access the shape of an array using the `.shape` attribute.


In [23]:
import numpy as np

# Create a 2D array
arr = np.array([[1, 2, 3], [4, 5, 6]])

# Get the dimension
dimension = arr.ndim  # Output: 2

# Get the shape
shape = arr.shape  # Output: (2, 3)


Examples:

We'll start by defining three random arrays, a one-dimensional, two-dimensional, and three-dimensional array. We'll use NumPy's random number generator, which we will seed with a set value in order to ensure that the same random arrays are generated each time this code is run:

In [24]:
import numpy as np
np.random.seed(0)  # seed for reproducibility

x1 = np.random.randint(10, size=6)  # One-dimensional array
x2 = np.random.randint(10, size=(3, 4))  # Two-dimensional array
x3 = np.random.randint(10, size=(3, 4, 5))  # Three-dimensional array

Each array has attributes `ndim` (the number of dimensions), shape (the size of each dimension), and size (the total size of the array):

In [25]:
print("x3 ndim: ", x3.ndim)
print("x3 shape:", x3.shape)
print("x3 size: ", x3.size)

x3 ndim:  3
x3 shape: (3, 4, 5)
x3 size:  60


Another example:

In [26]:
x = np.random.rand(5, 2)
x

array([[0.65279032, 0.63505887],
       [0.99529957, 0.58185033],
       [0.41436859, 0.4746975 ],
       [0.6235101 , 0.33800761],
       [0.67475232, 0.31720174]])

Another useful attribute is the `dtype`, the data type of the array

In [27]:
print("dtype:", x3.dtype)

dtype: int64


Other attributes include `itemsize`, which lists the size (in bytes) of each array element, and `nbytes`, which lists the total size (in bytes) of the array:

In [28]:
print("itemsize:", x3.itemsize, "bytes")
print("nbytes:", x3.nbytes, "bytes")

itemsize: 8 bytes
nbytes: 480 bytes


In [29]:
x.transpose()

array([[0.65279032, 0.99529957, 0.41436859, 0.6235101 , 0.67475232],
       [0.63505887, 0.58185033, 0.4746975 , 0.33800761, 0.31720174]])

In [30]:
x.mean()

np.float64(0.5707536958539337)

`astype(int)` is a method used to change the data type of a NumPy array to integers.

In [31]:
import numpy as np

arr = np.array([1.2, 3.4, 5.6, -9, -7.1,-8.6])
arr_int = arr.astype(int)

print(arr_int)  # Output: [1 3 5]

[ 1  3  5 -9 -7 -8]


### Array Shapes

As you just saw above, arrays can be of any dimension, shape and size you desire. In fact, there are three main array attributes you need to know to work out the characteristics of an array:
- `.ndim`: the number of dimensions of an array
- `.shape`: the number of elements in each dimension (like calling `len()` on each dimension)
- `.size`: the total number of elements in an array (i.e., the product of `.shape`)

In [32]:
array_1d = np.ones(3)
print(f"Dimensions: {array_1d.ndim}")
print(f"     Shape: {array_1d.shape}")
print(f"      Size: {array_1d.size}")

Dimensions: 1
     Shape: (3,)
      Size: 3


Let's turn that print action into a function and try out some other arrays:

In [33]:
def print_array(x):
    print(f"Dimensions: {x.ndim}")
    print(f"     Shape: {x.shape}")
    print(f"      Size: {x.size}")
    print("")
    print(x)

In [34]:
array_2d = np.ones((3, 2))
print_array(array_2d)

Dimensions: 2
     Shape: (3, 2)
      Size: 6

[[1. 1.]
 [1. 1.]
 [1. 1.]]


In [35]:
array_4d = np.ones((1, 2, 3, 4))
print_array(array_4d)

Dimensions: 4
     Shape: (1, 2, 3, 4)
      Size: 24

[[[[1. 1. 1. 1.]
   [1. 1. 1. 1.]
   [1. 1. 1. 1.]]

  [[1. 1. 1. 1.]
   [1. 1. 1. 1.]
   [1. 1. 1. 1.]]]]


After 3 dimensions, printing arrays starts getting pretty messy. As you can see above, the number of square brackets (`[ ]`) in the printed output indicate how many dimensions there are: for example, above, the output starts with 4 square brackets `[[[[` indicative of a 4D array.

### 1-d Arrays

One of the most confusing things about numpy is 1-d arrays (vectors) can have 3 possible shapes!

In [36]:
x = np.ones(5)
print_array(x)

Dimensions: 1
     Shape: (5,)
      Size: 5

[1. 1. 1. 1. 1.]


In [37]:
y = np.ones((1, 5))
print_array(y)

Dimensions: 2
     Shape: (1, 5)
      Size: 5

[[1. 1. 1. 1. 1.]]


In [38]:
z = np.ones((5, 1))
print_array(z)

Dimensions: 2
     Shape: (5, 1)
      Size: 5

[[1.]
 [1.]
 [1.]
 [1.]
 [1.]]


We can use `np.array_equal()` to determine if two arrays have the same shape and elements:

In [39]:
np.array_equal(x, x)

True

In [40]:
np.array_equal(x, y)

False

In [41]:
np.array_equal(x, z)

False

In [42]:
np.array_equal(y, z)

False

The shape of your 1-d arrays can actually have big implications on your mathematical oeprations!

In [43]:
print(f"x: {x}")
print(f"y: {y}")
print(f"z: {z}")

x: [1. 1. 1. 1. 1.]
y: [[1. 1. 1. 1. 1.]]
z: [[1.]
 [1.]
 [1.]
 [1.]
 [1.]]


In [44]:
x + y  # makes sense

array([[2., 2., 2., 2., 2.]])

In [45]:
y + z  # wait, what?

array([[2., 2., 2., 2., 2.],
       [2., 2., 2., 2., 2.],
       [2., 2., 2., 2., 2.],
       [2., 2., 2., 2., 2.],
       [2., 2., 2., 2., 2.]])

What happened in the cell above is "broadcasting" and we'll discuss it below.

##  Array Operations  
<hr>

### Elementwise operations

Elementwise operations refer to operations applied to each element of an array or between the paired elements of two arrays.

In [46]:
x = np.ones(4)
x

array([1., 1., 1., 1.])

In [47]:
y = x + 1
y

array([2., 2., 2., 2.])

In [48]:
x - y

array([-1., -1., -1., -1.])

In [49]:
x == y

array([False, False, False, False])

In [50]:
x * y

array([2., 2., 2., 2.])

In [51]:
x ** y

array([1., 1., 1., 1.])

In [52]:
x / y

array([0.5, 0.5, 0.5, 0.5])

In [53]:
np.array_equal(x, y)

False

Concepts of indexing should be pretty familiar by now. Indexing arrays is similar to indexing lists but there are just more dimensions.

### Numeric Indexing

In [54]:
x = np.arange(10)
x

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [55]:
x[3]

np.int64(3)

In [56]:
x[2:]

array([2, 3, 4, 5, 6, 7, 8, 9])

In [57]:
x[:4]

array([0, 1, 2, 3])

In [58]:
x[2:5]

array([2, 3, 4])

In [59]:
x[2:3]

array([2])

In [60]:
x[-1]

np.int64(9)

In [61]:
x[-2]

np.int64(8)

In [62]:
x[5:0:-1]

array([5, 4, 3, 2, 1])

For 2D arrays:

In [63]:
x = np.random.randint(10, size=(4, 6))
x

array([[5, 9, 3, 0, 5, 0],
       [1, 2, 4, 2, 0, 3],
       [2, 0, 7, 5, 9, 0],
       [2, 7, 2, 9, 2, 3]])

In [64]:
x[3, 4]  # do this

np.int64(2)

In [65]:
x[3][4]  # i do not like this as much

np.int64(2)

In [66]:
x[3]

array([2, 7, 2, 9, 2, 3])

In [67]:
len(x)  # generally, just confusing

4

In [68]:
x.shape

(4, 6)

In [69]:
x[:, 2]  # column number 2

array([3, 4, 7, 2])

In [70]:
x[2:, :3]

array([[2, 0, 7],
       [2, 7, 2]])

In [71]:
x.T

array([[5, 1, 2, 2],
       [9, 2, 0, 7],
       [3, 4, 7, 2],
       [0, 2, 5, 9],
       [5, 0, 9, 2],
       [0, 3, 0, 3]])

In [72]:
x

array([[5, 9, 3, 0, 5, 0],
       [1, 2, 4, 2, 0, 3],
       [2, 0, 7, 5, 9, 0],
       [2, 7, 2, 9, 2, 3]])

In [73]:
x[1, 1] = 555555
x

array([[     5,      9,      3,      0,      5,      0],
       [     1, 555555,      4,      2,      0,      3],
       [     2,      0,      7,      5,      9,      0],
       [     2,      7,      2,      9,      2,      3]])

In [74]:
z = np.zeros(5)
z

array([0., 0., 0., 0., 0.])

In [75]:
z[0] = 5
z

array([5., 0., 0., 0., 0.])

### Boolean Indexing

In [76]:
x = np.random.rand(10)
x

array([0.91823547, 0.21682214, 0.56518887, 0.86510256, 0.50896896,
       0.91672295, 0.92115761, 0.08311249, 0.27771856, 0.0093567 ])

In [77]:
x + 1

array([1.91823547, 1.21682214, 1.56518887, 1.86510256, 1.50896896,
       1.91672295, 1.92115761, 1.08311249, 1.27771856, 1.0093567 ])

In [78]:
x_thresh = x > 0.5
x_thresh

array([ True, False,  True,  True,  True,  True,  True, False, False,
       False])

In [79]:
x[x_thresh] = 0.5  # set all elements  > 0.5 to be equal to 0.5
x

array([0.5       , 0.21682214, 0.5       , 0.5       , 0.5       ,
       0.5       , 0.5       , 0.08311249, 0.27771856, 0.0093567 ])

In [80]:
x = np.random.rand(10)
x

array([0.84234208, 0.64717414, 0.84138612, 0.26473016, 0.39782075,
       0.55282148, 0.16494046, 0.36980809, 0.14644176, 0.56961841])

In [81]:
x[x > 0.5] = 0.5
x

array([0.5       , 0.5       , 0.5       , 0.26473016, 0.39782075,
       0.5       , 0.16494046, 0.36980809, 0.14644176, 0.5       ])

##  More Useful NumPy Functions

Numpy has many built-in functions for mathematical operations, really it has almost every numerical operation you might want to do in its library. I'm not going to explore the whole library here, but as an example of some of the available functions, consider working out the hypotenuse of a triangle that with sides 3m and 4m:

![](img/chapter5/triangle.png)

In [82]:
sides = np.array([3, 4])

There are several ways we could solve this problem. We could directly use Pythagoras's Theorem:

$$c = \sqrt{a^2+b^2}$$

In [83]:
np.sqrt(np.sum([np.power(sides[0], 2), np.power(sides[1], 2)]))

np.float64(5.0)

We can leverage the fact that we're dealing with a numpy array and apply a "vectorized" operation (more on that in a bit) to the whole vector at one time:

In [84]:
(sides ** 2).sum() ** 0.5

np.float64(5.0)

Or we can simply use a numpy built-in function (if it exists):

In [85]:
np.linalg.norm(sides)  # you'll learn more about norms in 573

np.float64(5.0)

In [86]:
np.hypot(*sides)

np.float64(5.0)

### Vectorization

 Because numpy arrays are homogenous (contain the same dtype), we don't need to check that we can perform an operation on elements of a sequence before we do the operation which results in a huge speed-up. You can kind of think of this concept as NumPy being able to perform an operation on the whole array at the same time rather than one-by-one. You can read more about vectorization [here](https://www.pythonlikeyoumeanit.com/Module3_IntroducingNumpy/VectorizedOperations.html) but all you need to know is that most operations in NumPy are vectorized, so just try to do things at an "array-level" rather than an "element-level", e.g.:

In [87]:
# DONT DO THIS
array = np.array(range(5))
for i, element in enumerate(array):
    array[i] = element ** 2
array

array([ 0,  1,  4,  9, 16])

In [88]:
# DO THIS
array = np.array(range(5))
array **= 2

Let's do a quick timing experiment:

In [89]:
# loop method
array = np.array(range(5))
time_loop = %timeit -q -o -r 3 for i, element in enumerate(array): array[i] = element ** 2
# vectorized method
array = np.array(range(5))
time_vec = %timeit -q -o -r 3 array ** 2
print(f"Vectorized operation is {time_loop.average / time_vec.average:.2f}x faster than looping here.")

Vectorized operation is 3.34x faster than looping here.


# Practice Questions

1. What is the meaning of this code: `np.random.randint(1, 10, size=6)`

2.  What is the meaning of this code: `x = np.random.randint(10, size=6)`


3. What is the difference between `np.array()` and `np.zeros()`?  

4. How do you create a NumPy array with values from 1 to 10, inclusive?  

5. What does `arr.dtype` tell you about a NumPy array?  


6. How can you access the element at the second row and third column of a 2D array?  


7. What is the purpose of `np.random.seed()`?





1. What is the meaning of this code: `np.random.randint(1, 10, size=6)`

`np.random.randint()` This part uses NumPy's random number generator to create an array of random integers. The code creates an array named values containing 6 random integers between 1 (inclusive) and 10 (exclusive). So, the array values will contain 6 numbers picked randomly from the set: {1, 2, 3, 4, 5, 6, 7, 8, 9}.

2.  What is the meaning of this code: `x = np.random.randint(10, size=6)`

The code creates an array named x containing 6 random integers between 0 (inclusive) and 10 (exclusive). So, x will have 6 numbers picked randomly from the set: {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}.

3. What is the difference between `np.array()` and `np.zeros()`?  

* `np.array()` creates an array from existing data (like a list or tuple).
* `np.zeros()` creates an array filled with zeros, where you specify the shape and data type.

4. How do you create a NumPy array with values from 1 to 10, inclusive? (Tests knowledge of np.arange() or similar functions)

`arr = np.arange(1, 11)` # np.arange() is like Python's range() but creates an array


5. What does `arr.dtype` tell you about a NumPy array?  

`arr.dtype` tells you the data type of the elements in the array (e.g., int64, float32, etc.). It's important for understanding how the data is stored and the operations that can be performed on it.


6. How can you access the element at the second row and third column of a 2D array?  

`arr_2d = np.array([[1, 2, 3], [4, 5, 6]])`

`element = arr_2d[1, 2]  # Access element at row 1, column 2 (which is 6)`


7. What is the purpose of `np.random.seed()`?

`np.random.seed()` sets the seed for the random number generator. This ensures that you get the same sequence of random numbers each time you run your code with the same seed value, which is useful for reproducibility.

