# What is NumPy?
NumPy is the fundamental package for scientific computing in Python. It is a Python library that provides a multidimensional array object, various derived objects (such as masked arrays and matrices), and an assortment of routines for fast operations on arrays, including mathematical, logical, shape manipulation, sorting, selecting, I/O, discrete Fourier transforms, basic linear algebra, basic statistical operations, random simulation and much more.

In [1]:
import numpy as np

# What is an “array”?
In computer programming, an array is a structure for storing and retrieving data. We often talk about an array as if it were a grid in space, with each cell storing one element of the data. 

For instance, if each element of the data were a number, we might visualize a “one-dimensional” array like a list:

![image.png](attachment:23d6a4dd-a395-429c-a85d-daf1adc78ebc.png)

A two-dimensional array would be like a table:

![image.png](attachment:ca7c97a3-0c12-4d71-8e0d-b36afc761e9d.png)

A three-dimensional array would be like a set of tables, perhaps stacked as though they were printed on separate pages. In NumPy, this idea is generalized to an arbitrary number of dimensions, and so the fundamental array class is called ndarray: it represents an “N-dimensional array”.

Most NumPy arrays have some restrictions. For instance:

- All elements of the array must be of the same type of data.

- Once created, the total size of the array can’t change.

- The shape must be “rectangular”, not “jagged”; e.g., each row of a two-dimensional array must have the same number of columns.

When these conditions are met, NumPy exploits these characteristics to make the array faster, more memory efficient, and more convenient to use than less restrictive data structures.

For the remainder of this document, we will use the word “array” to refer to an instance of ndarray.

# Create arrays

In [11]:
a = np.array([1, 2, 3])
print("a:", a)  # [1 2 3]

b = np.array([[1, 2], [3, 4]])
print("b:\n", b)

a: [1 2 3]
b:
 [[1 2]
 [3 4]]


In [6]:
c = np.array([[1, 2], [3, 4, 5]])
print("c:", c)

ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (2,) + inhomogeneous part.

In [12]:
# 创建一个 2×3 的数组，元素全为 0
zero_array = np.zeros((2, 3))
print("zero_array:\n", zero_array)

zero_array:
 [[0. 0. 0.]
 [0. 0. 0.]]


In [13]:
# 创建一个 3×2 的数组，元素全为 1
one_array = np.ones((3, 2))
print("one_array:\n", one_array)

one_array:
 [[1. 1.]
 [1. 1.]
 [1. 1.]]


In [14]:
# 创建一个 3×3 的单位矩阵
e_array = np.eye(3)
print("e_array:\n", e_array)

e_array:
 [[1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]]


## Array indexing and slicing
Elements of an array can be accessed in various ways. 

In [27]:
a1 = np.array([1,2,3])
print("a1:\n",a1)

a1:
 [1 2 3]


Elements of an array can be accessed in various ways. For instance, we can access an individual element of this array as we would access an element in the original list: using the integer index of the element within square brackets.

In [7]:
print("a1[0]:",a1[0])
print("a1[1]:",a1[1])
print("a1[2]:",a1[2])

a1[0]: 1
a1[1]: 2
a1[2]: 3


In [8]:
print("a1[3]:",a1[3])

IndexError: index 3 is out of bounds for axis 0 with size 3

In [9]:
print("a1[-1]:",a1[-1])

a1[-1]: 3


In [11]:
print("a1[-2]:",a1[-2])

a1[-2]: 2


In [12]:
print("a1[0:2]:",a1[0:2])
print("a1[:2]:",a1[:2])

a1[0:2]: [1 2]
a1[:2]: [1 2]


In [13]:
a1[0]=10
print("a1 after a1[0]=10:", a1)

a1 after a1[0]=10: [10  2  3]


Slicing an array returns a view: an object that refers to the data in the original array. The original array can be mutated using the view.

In [19]:
b1 = a1[1:]
print("b1 when b1=a1[1:]:", b1)

b1 when b1=a1[1:]: [9 3]


In [20]:
b1[0] = 9
print("b1:", b1)
print("a1", a1)

b1: [9 3]
a1 [10  9  3]


In [21]:
b1[0] = 8
b1[1] = 7
print("b1:", b1)
print("a1", a1)

b1: [8 7]
a1 [10  8  7]


In [22]:
b1 = [6, 5]
print("b1:", b1)
print("a1", a1)

b1: [6, 5]
a1 [10  8  7]


In [23]:
c1 = a1
print("c1:", c1)
print("a1:", a1)

c1: [10  8  7]
a1: [10  8  7]


In [24]:
c1[0] = 1
print("c1:", c1)
print("a1:", a1)

c1: [1 8 7]
a1: [1 8 7]


In [25]:
c1[2] = 2
print("c1:", c1)
print("a1:", a1)

c1: [1 8 2]
a1: [1 8 2]


In [26]:
c1 = [3,4,5]
print("c1:", c1)
print("a1:", a1)

c1: [3, 4, 5]
a1: [1 8 2]


In [28]:
a2 = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])
print("a2:\n",a2)

a2:
 [[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]


In [62]:
print("a2[1,3]:\n",a2[1,3])

a2[1,3]:
 8


In [63]:
print("a2_row0:", a2[0])

a2_row0: [1 2 3 4]


In [44]:
print("a2_row2:", a2[2])
print("a2_row2:", a2[2,:])
print("a2_row2:", a2[[2]])

a2_row2: [ 9 10 11 12]
a2_row2: [ 9 10 11 12]
a2_row2: [[ 9 10 11 12]]


In [33]:
print("a2_row3:", a2[3])

IndexError: index 3 is out of bounds for axis 0 with size 3

In [35]:
print("a2_last_row:", a2[-1])

a2[-1]: [ 9 10 11 12]


In [41]:
a2_col0 = a2[:, [0]]
print("a2_col0:\n", a2_col0)

a2_col0:
 [[1]
 [5]
 [9]]


In [42]:
a2_col0 = a2[:, 0]
print("a2_col0:\n", a2_col0)

a2_col0:
 [1 5 9]


## Array attributes
This section covers the ndim, shape, size, and dtype attributes of an array.

In [45]:
a1 = np.array([1,2,3])
print("a1:\n",a1)
a2 = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])
print("a2:\n",a2)

a1:
 [1 2 3]
a2:
 [[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]


In [47]:
# The number of dimensions of an array is contained in the ndim attribute.
print("a1_ndim", a1.ndim)
print("a2_ndim", a2.ndim)

a1_ndim 1
a2_ndim 2


In [51]:
# The shape of an array is a tuple of non-negative integers that specify the number of elements along each dimension.
print("a1_shape:", a1.shape)
print("a2_shape:", a2.shape)
print("a1 shape & dim:", len(a1.shape) == a1.ndim)
print("a2 shape & dim:", len(a2.shape) == a2.ndim)


a1_shape: (3,)
a2_shape: (3, 4)
a1 shape & dim: True
a2 shape & dim: True


In [52]:
# The fixed, total number of elements in array is contained in the size attribute.
print("number of elements in a1:", a1.size)
print("number of elements in a2:", a2.size)

number of elements in a1: 3
number of elements in a2: 12


In [53]:
"""
Arrays are typically “homogeneous”, meaning that they contain elements of only one “data type”. 
The data type is recorded in the dtype attribute.
"""
print("a1 data type:", a1.dtype)

a1 data type: int64


## Difference between Python Lists and NumPy Arrays

### 1. Data Type Flexibility

- Python lists can hold elements of different data types.
- NumPy arrays require all elements to be of the same data type.

In [55]:
# Python list with mixed types
py_list = [1, 2.5, 'three']
print("Python list:", py_list)

# NumPy array with uniform type
np_array = np.array([1, 2, 3])
print("NumPy array:", np_array)

# Trying to create a NumPy array with mixed types
np_mixed = np.array([1, 2.5, 'three']) #turn into string data type
print("NumPy array with mixed types:", np_mixed)
print("Data type of np_mixed array:", np_mixed.dtype)

Python list: [1, 2.5, 'three']
NumPy array: [1 2 3]
NumPy array with mixed types: ['1' '2.5' 'three']
Data type of np_mixed array: <U32


### 2. Performance Comparison

NumPy arrays are more efficient in terms of computation speed compared to Python lists.
Let's compare the time taken to compute the element-wise sum of two large sequences.

In [5]:
import time

size = 10

# Using Python lists
list1 = list(range(size))
list2 = list(range(size))

start = time.time()
result_list = [x + y for x, y in zip(list1, list2)]
end = time.time()
print(f"Time using Python lists: {end - start:.4f} seconds")

# Using NumPy arrays
arr1 = np.arange(size)
arr2 = np.arange(size)
print("arr1:",arr1)
start = time.time()
result_array = arr1 + arr2
end = time.time()
print(f"Time using NumPy arrays: {end - start:.4f} seconds")

Time using Python lists: 0.0001 seconds
arr1: [0 1 2 3 4 5 6 7 8 9]
Time using NumPy arrays: 0.0000 seconds


### 3. Vectorized Operations

NumPy arrays support vectorized operations directly, without explicit loops.

In [57]:
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])

print("a + b =", a + b)
print("a * b =", a * b)

# For Python lists, you must use loops or comprehensions:
list_a = [1, 2, 3]
list_b = [4, 5, 6]
list_sum = [x + y for x, y in zip(list_a, list_b)]
print("list_a + list_b =", list_sum)

a + b = [5 7 9]
a * b = [ 4 10 18]
list_a + list_b = [5, 7, 9]


### 4. Memory Usage

NumPy arrays are more memory efficient than Python lists.

In [None]:
import sys

print("Memory size of Python list with 1000 ints:", sys.getsizeof([0]*1000))
print("Memory size of NumPy array with 1000 ints:", np.array([0]*1000).nbytes)

### 5. Multidimensional Support

NumPy arrays natively support multidimensional data, while Python lists can only achieve this by nesting.

In [59]:
# Python nested lists (2D)
py_list_2d = [[1, 2], [3, 4]]
print("Python nested list:", py_list_2d)

# NumPy 2D array
np_array_2d = np.array([[1, 2], [3, 4]])
print("NumPy 2D array:\n", np_array_2d)

Python nested list: [[1, 2], [3, 4]]
NumPy 2D array:
 [[1 2]
 [3 4]]


In [61]:
print(len(py_list_2d))         
print(py_list_2d[1][1])        
col1 = [row[1] for row in py_list_2d]
print(col1)

2
4
[2, 4]
