# Introduction to Numpy
<hr>

`Numpy` is a crucial extension library in Python that provides extensive support for **array and matrix operations**, along with a comprehensive collection of mathematical methods designed to emulate the capabilities of mathematical software like MATLAB. NumPy is indispensable for data science, engineering, and research due to its efficiency and integration with other key libraries like `Pandas`, `Matplotlib`, `Scipy`, etc.

Key Features of NumPy:
- Multi-dimensional Arrays:
    - Efficient ndarray objects for numerical computations
    - Supports vectorized operations

- Mathematical methods:
    - Linear algebra, Fourier transforms, random number generation, etc.

We first need to import the `numpy` library before using it:

In [1]:
import numpy as np

## Create an array
<hr>

The `array` method in NumPy can directly convert Python's sequence type into NumPy's array type `ndarray`. 

For example, a one-dimensional array:

In [2]:
a = np.array((1, 2, 3, 4))
print(a)
type(a)

[1 2 3 4]


numpy.ndarray

A two-dimensional array:

In [3]:
b = [[1, 2], [3, 4]]
a = np.array(b)
a

array([[1, 2],
       [3, 4]])

```{note}
The parentheses in the `array` method must **include a sequence type**; it cannot be written directly as np.array(1, 2, 3, 4).
```

In NumPy, the `zeros` method can create a matrix with all elements set to 0, the `ones` method can create a matrix with all elements set to 1, and the `empty` method can create an uninitialized matrix with **arbitrary element values**.

In [4]:
np.zeros((3, 4))  # a zero matrix of 3 rows and 4 columns

array([[0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.]])

In [5]:
np.ones((3, 4))  # a ones matrix of 3 rows and 4 columns

array([[1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.]])

In [6]:
np.empty((3, 4))  # an empty matrix of 3 rows and 4 columns

array([[1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.]])

In NumPy, the `arange` method can generate an array with an arithmetic sequence.

In [7]:
np.arange(10)  # generate an array from 0 to 10 (exclusive) with a default step size of 1

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [8]:
np.arange(5, 10)  # generate an array from 5 to 10 (exclusive) with a default step size of 1

array([5, 6, 7, 8, 9])

In [9]:
np.arange(5, 10, 2)  # enerate an array from 5 to 10 (exclusive) with step size of 2

array([5, 7, 9])

Another similar method is `linspace`. The difference is that in the `arange` method, **the third argument represents the step size** of the arithmetic sequence, while in the `linspace` method, **the third argument indicates the total number of elements** to generate. If you need to generate a sequence of equally spaced floating-point numbers, using `linspace` is preferable.

In [10]:
np.linspace(0, 2, 9)  # generate 9 numbers from 0 to 2 (exclusive)

array([0.  , 0.25, 0.5 , 0.75, 1.  , 1.25, 1.5 , 1.75, 2.  ])

We can cast the type from ndarray to list by using `list()`.

In [11]:
a = np.arange(10)
list(a)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

The common attributes for `ndarray` are listed in the table below:

| Property       | Meaning                                  |  
|:-------------:|---------------------------------------------|  
| ndarray.ndim  | Dimensions of the ndarray                     |  
| ndarray.shape | A tuple of integers representing the size of the array in each dimension |  
| ndarray.size  | Total number of elements in the array         |  
| ndarray.dtype | Data type of the elements in the array        |

In [12]:
a = np.ones((2, 3))
a.ndim

2

In [13]:
a.shape

(2, 3)

In [14]:
a.size

6

In [15]:
a.dtype

dtype('float64')

## Indexing and slicing
<hr>

For one-dimensional arrays, NumPy's indexing and slicing are similar to those of Python list types.

In [16]:
a = np.arange(4, 10)  
a[2]

6

In [17]:
a[2]  # 数组 a 的第 3 个元素

6

In [18]:
a[2:4]  # the 3rd and 4th of the ndarray

array([6, 7])

In [19]:
a[-1]  # the last element in the array

9

In [20]:
a[::-1]  # reverse of the array

array([9, 8, 7, 6, 5, 4])

For multidimensional arrays, NumPy array indexing and slicing use a **single pair of brackets `[ ]`** with commas separating different dimensions.

In [21]:
# Create a 3-row by 4-column two-dimensional array by the 'reshape 'method
# 'reshape' does not alter the values but reorganizes the original array with the specified number of rows and columns
b = np.arange(12).reshape(3, 4)  
print(b)

[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]


In [22]:
b[1, 2]  # the element at row 2, column 3

6

In [23]:
b[1:3, 2]  # the elements from the 2nd to 3rd row in the 3rd column of the two-dimensional array.

array([ 6, 10])

In [24]:
b[2, :]  # all the elments in the 3rd row

array([ 8,  9, 10, 11])

In [25]:
list(b) # cast the 2-D ndarray to a list

[array([0, 1, 2, 3]), array([4, 5, 6, 7]), array([ 8,  9, 10, 11])]

## Concatenating
<hr>

In NumPy, arrays can be concatenated using the `append` or `concatenate` method: `append` joins two arrays, while `concatenate` can join two or more arrays.

In [26]:
a = np.arange(5)
a

array([0, 1, 2, 3, 4])

In [27]:
b = np.arange(3)
b

array([0, 1, 2])

In [28]:
np.append(a, b)

array([0, 1, 2, 3, 4, 0, 1, 2])

In [29]:
c = np.arange(4)
c

array([0, 1, 2, 3])

In [30]:
np.concatenate((a, b, c))  # there are parentheses () inside

array([0, 1, 2, 3, 4, 0, 1, 2, 0, 1, 2, 3])

## Array computations
<hr>

NumPy supports various algebraic operations on arrays.

In [31]:
a = np.arange(4)
b = np.arange(3, 7)
print(a)
print(b)
print(a - b)

[0 1 2 3]
[3 4 5 6]
[-3 -3 -3 -3]


In [32]:
print(a + b)

[3 5 7 9]


- Get the average value of an array by `mean()` or `average()`.

In [33]:
np.mean(a)

1.5

In [34]:
np.average(a)

1.5

- Get the average value of each column in a 2-D array with argument `axis=0`.
- Get the average value of each row in a 2-D array with and argument `axis=1`.

In [35]:
c = np.array([[4.0, 5.0], [6.0, 7.0]])
print(np.average(c)) # or using mean
print(np.average(c, axis=0))  # or using mean
print(np.average(c, axis=1))  # or using mean

5.5
[5. 6.]
[4.5 6.5]


- Get the maximum of an array by `max()`, minimu by `min`.
- Get the index of the maximum value by `argmax()`, minium index by `argmin()`.

In [36]:
np.max(a)

3

In [37]:
np.argmin(a)

0

In [38]:
a * 2  # multiply every element by 2

array([0, 2, 4, 6])

In [39]:
a**2  # square each element of the array.

array([0, 1, 4, 9])

In [40]:
a > 2  # comare every element with one value

array([False, False, False,  True])

In [41]:
c = np.array([[4.0, 5.0], [6.0, 7.0]])
print(c)
print(c.transpose()) # transpose a 2-D array

[[4. 5.]
 [6. 7.]]
[[4. 6.]
 [5. 7.]]


In [42]:
np.linalg.inv(c)  # inverse matrix 

array([[-3.5,  2.5],
       [ 3. , -2. ]])

In [43]:
eigenvalues, eigenvectors = np.linalg.eig(c)  # eigenvalues and eigenvectors of the 2D array c
print("eigenvalues are ", eigenvalues)
print("eigenvectors are", eigenvectors)

eigenvalues are  [-0.17890835 11.17890835]
eigenvectors are [[-0.76729658 -0.57152478]
 [ 0.64129241 -0.82058481]]


In [44]:
d = np.array([[1.0, 2.0], [3.0, 4.0]])
d

array([[1., 2.],
       [3., 4.]])

In [45]:
np.dot(c, d)  # the product of matrices c and d

array([[19., 28.],
       [27., 40.]])

In [46]:
np.multiply(c, d)  # element-wise multiplication of matrices c and d

array([[ 4., 10.],
       [18., 28.]])

The NumPy library also includes several other commonly used methods, as shown in the table below:  

| Method | Description |  
|--------|-----------|  
| `np.abs(x)` | Computes the absolute value of each element |  
| `np.sqrt(x)` | Computes the square root of each element |  
| `np.square(x)` | Computes the square of each element |  
| `np.sign(x)` | Determines the sign (positive/negative) of each element |  
| `np.ceil(x)` | Rounds each element up to the nearest integer |  
| `np.floor(x)` | Rounds each element down to the nearest integer |  
| `np.exp(x)` | Computes the exponential value of each element |  
| `np.log(x)`,`np.log10(x)`,`np.log2(x)` | Computes the natural logarithm, base-10 logarithm, and base-2 logarithm of each element |  

- NumPy performs numerical computations significantly faster than Python's built-in list. **For large-scale mathematical operations, prioritize using NumPy for processin**g.

Additionally, NumPy provides a specialized two-dimensional array type called `Matrix`, which offers more convenient operations for certain matrix computations. Interested readers can refer to the [official documentation](https://numpy.org/doc/stable/reference/generated/numpy.matrix.html) for details.  

## Generate random values using NumPy*[^1]
<hr>

[^1]: \* means this section may not be delivered in class.

The `random` module in NumPy supports virtually all probability distributions and can generate random numbers in multi-row, multi-column arrays at once.

In [47]:
import numpy as np

np.random.uniform(1, 10, [2, 2])  # generate 2 row 2 column random numbers uniformally distributed between 1 (inclusive) and 10 (exclusive)

array([[7.6715444 , 1.50785508],
       [1.04294057, 9.47483915]])

In [48]:
np.random.uniform(1, 10, 5)  # generate 5 random numbers uniformally distributed between 1 (inclusive) and 10 (exclusive)

array([7.45077053, 8.46464059, 4.64659544, 6.08784796, 7.86502348])

In [49]:
np.random.randint(1, 10, [2, 2])  # generate 2 row 2 column random integer numbers uniformally distributed between 1 (inclusive) and 10 (exclusive)

array([[1, 1],
       [3, 7]])

In [50]:
np.random.normal(5, 1, [2, 2])  # generate 2 row 2 column random numbers normally distributed with mean 5 and standard deviation 1

array([[5.84011476, 3.3582419 ],
       [4.0798618 , 4.44776339]])

In [51]:
np.random.poisson(5, [2, 2])  # generate 2 row 2 column random numbers Poisson distributed with argument 5

array([[5, 7],
       [5, 2]])

NumPy also allows setting the seed for the random number generator via the `random.seed()` argument. The same seed should produce identical random numbers.

In [52]:
np.random.seed(500)
np.random.normal(5, 1, 6)  # generate 6 random numbers normally distributed with mean 5 and standard deviation 1

array([4.62263642, 5.16675892, 5.68280238, 6.92137877, 4.8029632 ,
       4.24012124])

Additionally, NumPy also allows you to define a random number generator object via `RandomState()`, where the argument inside the parentheses is the random seed. You can then use this object to call specific random distribution generator methods. For example:

In [53]:
rvs = np.random.RandomState(500)
rvs.normal(5, 1, 6) 

array([4.62263642, 5.16675892, 5.68280238, 6.92137877, 4.8029632 ,
       4.24012124])

## `time`, `datatime` library
<hr>

### `time` library
<hr>

`time` is a Python built-in libray that provides many time-related methods, such as getting the current time, pausing program execution, measuring execution time, and more.

- Get the current time in seconds since January 1, 1970, 00:00:00 (UTC)  by `time()`.

In [54]:
import time

timestamp = time.time()
print(timestamp)

1750068922.496598


- Get a tuple of the structured format of the current time by `strtime`.

In [55]:
current_time = time.localtime()
print(current_time)

time.struct_time(tm_year=2025, tm_mon=6, tm_mday=16, tm_hour=11, tm_min=15, tm_sec=22, tm_wday=0, tm_yday=167, tm_isdst=1)


- Get the formatted time by `strftime`.

In [56]:
formatted_time = time.strftime("%Y-%m-%d %H:%M:%S", time.localtime())
print(formatted_time)

2025-06-16 11:15:22


The codes for time format in `strftime`:

| Code | Meaning                          |  
|------|----------------------------------|  
| %Y   | 4-digit year (e.g., 2025)        |  
| %y   | 2-digit year (e.g., 25)        |  
| %m   | 2-digit month (01-12)           |  
| %d   | 2-digit day (01-31)             |  
| %D   | Equivalent to %m/%d/%y           |  
| %B   | Full month name (January-December)           |  
| %b   | Abbreviated month name (e.g. Jul)|
| %W   | Week number of the year        |  
| %w   | Week number of the month           |  
| %H   | Hour in 24-hour format (00-23) |  
| %M   | Minute (00-59)                 |  
| %S   | Second (00-59)                 |  

- Suspends execution for the given number of seconds using `sleep()`.

In [57]:
print("Start")
time.sleep(3)  # sleep for 3 seconds
print("End")

Start


End


- Get the computation time by `time()`.

In [58]:
start = time.time()
for _ in range(1000000):
    pass
end = time.time()
print(f"Execution time: {end - start:.6f} seconds")

Execution time: 0.037595 seconds


- Get more precise computation time by `perf_counter()`.

In [59]:
start = time.perf_counter()
for _ in range(1000000):
    pass
end = time.perf_counter()
print(f"Execution time: {end - start:.6f} seconds")

Execution time: 0.026673 seconds


- Get the CPU time used by the current process.

In [60]:
start = time.process_time()
for _ in range(1000000):
    pass
end = time.process_time()
print(f"CPU time used: {end - start:.6f} seconds")

CPU time used: 0.024245 seconds


### `datetime` library
<hr>

`datetime` is part of Python's standard built-in library, offering methods for date and time manipulation, including fetching the current time, date calculations, timezone handling, and more.

- Get current datetime by `now()`.

In [61]:
from datetime import datetime

now = datetime.now()
print(now)

2025-06-16 11:15:25.618649


- Create datetime by `datetime()`.

In [62]:
from datetime import datetime

dt = datetime(2025, 2, 17, 10, 30, 57)
print(dt)

2025-02-17 10:30:57


- Format the datetime by `strftime()`.

In [63]:
from datetime import datetime

dt = datetime(2025, 2, 17, 10, 30, 57)

formatted_str = dt.strftime("%Y-%m-%d %H:%M:%S")
print(formatted_str)

2025-02-17 10:30:57


- Datetime compuation

In [64]:
from datetime import datetime, timedelta

dt = datetime(2025, 2, 17, 10, 30, 57)
new_dt = dt + timedelta(days=5, hours=3)
print(new_dt)

2025-02-22 13:30:57


In [65]:
from datetime import datetime

dt1 = datetime(2025, 2, 17, 10, 30, 57)
dt2 = datetime(2025, 2, 20, 15, 0, 0)

diff = dt2 - dt1
print(diff)
print(diff.total_seconds())

3 days, 4:29:03
275343.0


## Exercises
<hr>

```{exercise}
:label: create-array
How to create a NumPy array from a Python list?

A.&nbsp;&nbsp;  np.array(list)

B.&nbsp;&nbsp;  numpy(list)

C.&nbsp;&nbsp;  np.create(list)

D.&nbsp;&nbsp;  array(list)

```

````{solution} create-array
:class: dropdown
A
````

```{exercise}
:label: access-array
How to access the element at the second row and third column of a 2D NumPy array 'arr'?

A.&nbsp;&nbsp;  arr[1][2]

B.&nbsp;&nbsp;  arr[2, 3]

C.&nbsp;&nbsp;  arr[2][3]

D.&nbsp;&nbsp;  arr[1, 2]

```

````{solution} access-array
:class: dropdown
D
````

```{exercise}
:label: zeros
What does the np.zeros((1, 2)) method in NumPy do?

A.&nbsp;&nbsp;  Create an array with ones, 1 row and 2 columns

B.&nbsp;&nbsp;  Create an array with zeros, 1 row and 2 columns

C.&nbsp;&nbsp;  Create an identy matrix

D.&nbsp;&nbsp;  Creates an array with random values


```

````{solution} zeros
:class: dropdown
B
````

```{exercise}
:label: arange-step
What does the code np.arange(1, 10, 2) in NumPy do?

A.&nbsp;&nbsp;  Creates an array with values from 1 to 10

B.&nbsp;&nbsp;  Creates an array with values from 1 to 10 with step 2

C.&nbsp;&nbsp;  Creates an array with values from 1 to 9 with step 2

D.&nbsp;&nbsp;  Creates an array with values from 1 to 9


```

````{solution} arange-step
:class: dropdown
C
````

```{exercise}
:label: max_index
How to find the indices of the maximum value in a NumPy array 'arr'?

A.&nbsp;&nbsp;  np.max(arr)

B.&nbsp;&nbsp;  np.argmax(arr)

C.&nbsp;&nbsp;  np.maximum(arr)

D.&nbsp;&nbsp;  np.max_index(arr)

```

````{solution} max_index
:class: dropdown
B
````

```{exercise}
:label: mean-axis
How can you calculate the mean for each row of a 2D NumPy array 'arr'?

A.&nbsp;&nbsp;  np.mean(arr, axis=2)

B.&nbsp;&nbsp;  np.mean(arr, axis=0)

C.&nbsp;&nbsp;  np.mean(arr, axis=1)

D.&nbsp;&nbsp;  np.average(arr, axis=0)

```

````{solution} mean-axis
:class: dropdown
C
````

<script src="https://giscus.app/client.js"
        data-repo="robinchen121/book-Python-Data-Science"
        data-repo-id="R_kgDOKFdyOw"
        data-category="Announcements"
        data-category-id="DIC_kwDOKFdyO84CgWHi"
        data-mapping="pathname"
        data-strict="0"
        data-reactions-enabled="1"
        data-emit-metadata="0"
        data-input-position="bottom"
        data-theme="light"
        data-lang="en"
        crossorigin="anonymous"
        async>
</script>

<!-- Toogle google translation -->
<div id="google_translate_element"></div>

<script type="text/javascript">
      function googleTranslateElementInit() {
        new google.translate.TranslateElement({ pageLanguage: 'zh-CN',
                  inclusiveLanguages: 'en,zh-CN,zh-TW,ja,ko,de,ru,fr,es,it,pt,hi,ar,fa',
layout: google.translate.TranslateElement.InlineLayout.SIMPLE }, 'google_translate_element');
      }
</script>
<script type="text/javascript"
      src="https://translate.google.com/translate_a/element.js?cb=googleTranslateElementInit"
></script>
<br>