# CA6000(Nov18) Topic 5：NumPy 入门与实践

本 Notebook 根据课堂讲解与 PPT《CA6000 - Topic 5 (NumPy)》整理，用于复习和动手练习。


## 1. NumPy 简介与导入

NumPy = **Numerical Python**，是 Python 中用于数值计算和数组处理的核心库：

- 主要操作对象是多维数组 `ndarray`
- 在处理**大规模数值数据**时，内存利用率高，运算速度快
- 许多数据科学/AI 库（如 *scikit-learn, pandas, TensorFlow* 等）底层都依赖 NumPy

导入惯例：

```python
import numpy as np
```

课堂上老师也特别说明：我们给 `numpy` 起别名 `np`，是为了**少打字**，也是整个社区的统一写法。


In [1]:
import numpy as np

# 查看 NumPy 版本
print("NumPy version:", np.__version__)

# 一个最简单的一维数组（向量）示例
arr = np.array([1, 2, 3, 4])
print("arr =", arr)
print("type(arr) =", type(arr))
print("dtype =", arr.dtype)


NumPy version: 1.26.4
arr = [1 2 3 4]
type(arr) = <class 'numpy.ndarray'>
dtype = int64


### 练习 1：尝试导入与初步观察

1. 在上面的代码单元中再创建一个数组，例如包含你学号最后 4 位的数字。
2. 使用 `type()` 和 `.dtype` 查看它的类型和数据类型。
3. 把数组里的值改成 `float` 类型，再打印一次看看区别（可以使用 `arr.astype(float)`）。


In [3]:
arr1 = np.array([7,2,1,2])
print(arr1)
print(type(arr1))
print(arr.dtype)

[7 2 1 2]
<class 'numpy.ndarray'>
int64


## 2. 从列表创建数组 & 常用创建函数

在 PPT 中提到：如果你的数据已经在 Python `list` 里，可以用 `np.array()` 转成 NumPy 数组。

常见的数组创建方式：

- `np.array(list_like)`：从列表/嵌套列表创建
- `np.linspace(start, stop, num)`：在区间 `[start, stop]` 上生成 **等间距** 的 `num` 个点
- `np.arange(start, stop, step)`：类似 `range`，生成 `[start, stop)` 区间、步长为 `step` 的数组


---
- linspace = linear space,线性等距数组  
- arange = array + range,numpy版本的range，只是返回array（数据）

In [None]:
# 从 Python list 创建 ndarray
py_list = [10, 20, 30, 40]
np_arr = np.array(py_list)
print("原始列表:", py_list)
print("NumPy 数组:", np_arr)
print("shape:", np_arr.shape)

# 使用 linspace 创建等间距数组
lin = np.linspace(0, 1, 5)  # 从 0 到 1，5 个点（包含两端）
print("linspace(0, 1, 5):", lin)

# 使用 arange 创建数组（不包含终点）
ar = np.arange(0, 12, 2)  # 0, 2, 4, ..., 10
print("arange(0, 12, 2):", ar)


### 练习 2：创建你自己的数组

1. 使用 `np.linspace` 创建一个从 0 到 100 的数组，一共 11 个点（间隔为 10）。
2. 使用 `np.arange` 创建一个从 5 到 50 的数组，步长为 5。
3. 思考：`linspace` 和 `arange` 在“是否包含终点”上的区别是什么？在代码单元中写下你的结论作为注释。

In [16]:
arr2 = np.linspace(0,100,11)
print("arr2:",arr2)#并不是左闭右开

arr2: [  0.  10.  20.  30.  40.  50.  60.  70.  80.  90. 100.]


In [17]:
arr3 = np.arange(5,50,5)
print("arr3:",arr3)#左闭右开

arr3: [ 5 10 15 20 25 30 35 40 45]


In [25]:
r = range(5)#左闭右开
print("直接打印r:",r)#r本身并不是列表
print(list(r))

直接打印r: range(0, 5)
[0, 1, 2, 3, 4]


### Python 内置函数 `list()`：用法与示例

在本节中，我们系统整理 Python 内置函数 **`list()`** 的用法，内容包括：

- `list()` 的基本作用与语法
- 从 `range`、字符串、元组、集合、字典等构造列表
- 使用 `list()` 展开生成器 / 迭代器（如 `map`、`filter`）
- 使用 `list()` 做浅拷贝（shallow copy）
- 若干练习题

## Creating array from List

If your dataset is already loaded as a standard Python `list`, you can convert it into a NumPy `ndarray` using the `np.array()` constructor.

This method applies to both 1-dimensional lists and multi-dimensional nested lists (matrices).

In [56]:
data = [2,2,10,230]
arr = np.array(data)
print(arr)
print(type(data),type(arr))

[  2   2  10 230]
<class 'list'> <class 'numpy.ndarray'>


In [69]:
# Example 2: Converting a nested list (2D array)
import numpy as np
data = [[1,2],[3,4],[5,6]]
print(data,"\n",type(data))


arr = np.array(data)
print(arr,"\n",type(arr))

# only numpy arrar has shape attribute 
#print(data.shape),AttributeError: 'list' object has no attribute 'shape'
print(arr.shape)

[[1, 2], [3, 4], [5, 6]] 
 <class 'list'>
[[1 2]
 [3 4]
 [5 6]] 
 <class 'numpy.ndarray'>
(3, 2)


### Practice Exercises

**Task 1:** Create a list containing floating-point numbers (e.g., `3.14`, `2.71`, `1.61`). Convert it to a NumPy array and print the result.

**Task 2:** Define a nested list representing a **2x2 matrix** (2 rows, 2 columns). Convert it into a NumPy array and verify its dimensions by printing `.shape`.

In [79]:
data = [3.14,2.71,1.62]
arr = np.array(data)
print(arr,type(arr),arr.dtype)

data = [[1,2],[3,4]]
arr = np.array(data)
print(arr,type(arr),arr.dtype)
print(arr.shape)

[3.14 2.71 1.62] <class 'numpy.ndarray'> float64
[[1 2]
 [3 4]] <class 'numpy.ndarray'> int64
(2, 2)


## Array Dimension（会不会出肉眼判断arr.shape的题目？）

Arrays in NumPy can have different levels of depth, referred to as **dimensions** or axes.

You can inspect the structure of an array using the following attributes:
*   **`.ndim`**: Returns the number of dimensions (axes).
*   **`.shape`**: Returns a tuple representing the dimensions (rows, columns, depth, etc.).
*   **`.size`**: Returns the total number of elements in the array.
*   **`.dtype`**: Returns the data type of the elements (e.g., `int64`, `float64`).

In [102]:
# 1-D Array: An array containing scalars
arr1 = np.array([1,2,3,4,5])
print(arr1)
print("the shape of arr1:",arr1.shape,"\n")

# A 2-D array is an array with 1-D arrays as its element:
arr2 = np.array([[1,2,3],[4,5,6]])
print(arr2)
print("the shape of arr2:",arr2.shape,"\n")

arr3 = np.array([
    [1,2],[3,4],[5,6]
])
print(arr3)
print("the shape of arr3:",arr3.shape,"\n")

arr2 = np.array([[1,2,3,4,5],[6,7,8,9,10]])
print("arr2 -> ndim:", arr2.ndim)
print("arr2 -> shape:", arr2.shape)
print("arr2 -> size:", arr2.size) #the total number of elements in the array
print("arr2 -> dtype:", arr2.dtype,"\n")

# A 3-D array is an array with 2-D arrays as its elements:
arr4 = np.array([
    [[1,2,3],[1,2,3]],[[1,2,3],[1,2,3]]
])
print(arr4)
print("the shape of arr3:",arr4.shape)

[1 2 3 4 5]
the shape of arr1: (5,) 

[[1 2 3]
 [4 5 6]]
the shape of arr2: (2, 3) 

[[1 2]
 [3 4]
 [5 6]]
the shape of arr3: (3, 2) 

arr2 -> ndim: 2
arr2 -> shape: (2, 5)
arr2 -> size: 10
arr2 -> dtype: int64 

[[[1 2 3]
  [1 2 3]]

 [[1 2 3]
  [1 2 3]]]
the shape of arr3: (2, 2, 3)


### Practice Exercises

**Task 1:** Create a **2-D array** with 3 rows and 3 columns (e.g., numbers 1 to 9). Print its `.shape` to verify it is `(3, 3)`.

**Task 2:** Check the attributes of `arr3` created above. Print its **total number of elements** (`.size`) and the **number of dimensions** (`.ndim`).

## Accessing array (by Index)

NumPy allows you to access and manipulate array elements using **indexing** and **slicing**.

*   **Indexing**: Access specific elements using integer indices. For 2D arrays, use a comma-separated tuple `[row, col]`.
*   **Slicing**: Extract a subset of the array using the `[start:end]` syntax (the `end` index is exclusive).
*   **Iteration**: You can iterate through arrays using standard loops.

In [110]:
# 1. Accessing a specific element in a 2D array
arr2 = np.array([[1, 2, 3], [7, 8, 9], [10, 11, 12]])
print(arr2)
print(arr2.shape)
print(arr2[0, 1])  # Access row 0, column 1 (Value: 2)

# 2. Slicing a 1D array
arr = np.array([1, 2, 3, 4, 5, 6, 7])
print(arr[1:5])

# output from for loop print 1-D array
print("output from for loop print 1-D array:")
for x in arr:
    print(x)

arr = np.array([[1,2,3,4,5],[6,7,8,9,10]])
print(arr)
print(arr[1,1:4]) # Made a mistake

# output from for loop print 2-D array
print("output from for loop print 2-D array:")
for x in arr:
    print(x)

[[ 1  2  3]
 [ 7  8  9]
 [10 11 12]]
(3, 3)
2
[2 3 4 5]
output from for loop print 1-D array:
1
2
3
4
5
6
7
[[ 1  2  3  4  5]
 [ 6  7  8  9 10]]
[7 8 9]
output from for loop print 2-D array:
[1 2 3 4 5]
[ 6  7  8  9 10]


### Practice Exercises

**Task 1:** Create an array `[10, 20, 30, 40, 50]`. Use slicing to print the numbers `30` and `40`.

**Task 2:** Given the 2D array `matrix = np.array([[1, 2], [3, 4]])`, access and print the number `3` using `[row, col]` indexing.

In [111]:
# Task 1: Slice [30, 40]
# Write your code here


# Task 2: Access the element '3'
# Write your code here

## Data type within the Array

Unlike standard Python lists, NumPy arrays are **homogeneous**, meaning all elements must be of the same data type.

*   **Checking Type**: Use the `.dtype` attribute to inspect the data type of elements.
*   **Type Codes**: NumPy often returns type codes (e.g., `<U6`, `int64`). Common codes include:
    *   `i`: Integer
    *   `f`: Float
    *   `b`: Boolean
    *   `S`: String
    *   `U`: Unicode String
    *   `O`: Object
*   Unlike Python lists, which can store strings of varying lengths, NumPy requires all elements in an array to occupy the same amount of memory for computational efficiency.

In [120]:
# Example 1: Integer Array
# NumPy automatically detects the type as integer
arr = np.array([[1,2,3,4,5],[6,7,8,9,10]])
print(arr.dtype)

# Example 2: String/Unicode Array
# NumPy detects strings. Note that 'U' stands for Unicode.
arr = np.array(['apples','banana','cherry'])
print(arr)
print(arr.dtype)
print(arr.size)

arr = np.array(['big apples','banana','cherry'])
print(arr.dtype)

int64
['apples' 'banana' 'cherry']
<U6
3
<U10


### Practice Exercises (Advanced)

**Task 1: Type Promotion (Upcasting)**
NumPy attempts to maintain the most precise data type possible. Create an array from the mixed list `[10, 20, 30.5, 40]`. Print the `dtype`.
*Question: Did the integers remain integers, or were they cast to floats?*

**Task 2: Forcing Data Types**
Create an array using the list `[1, 0, 1]` but force the data type to be **boolean** (True/False) instead of integer. You can achieve this by passing `dtype='b'` (or `bool`) as an argument to `np.array()`. Print the resulting array and its `dtype`.

In [140]:
data = [10,20,30.5,40]
arr = np.array(data)
#print(data.dtype,arr.dtype) #'list' object has no attribute 'dtype'
print("before convert:")
for x in data:
    print(type(x))
print(data)

print("\n after convert:")
print(arr.dtype)
print(arr)


before convert:
<class 'int'>
<class 'int'>
<class 'float'>
<class 'int'>
[10, 20, 30.5, 40]

 after convert:
float64
[10.  20.  30.5 40. ]


In [142]:
arr = np.array([1,0,1],dtype = bool)
print(arr)
print(arr.dtype)

[ True False  True]
bool


## Data Type and Copy

### 1. Specifying Data Types
While NumPy attempts to infer data types, you can explicitly define them using the `dtype` argument.
*   **`dtype='S'`**: Creates an array of **ByteStrings** (fixed-length 8-bit encoded characters).
*   Note: In Python 3, `S` types appear with a `b` prefix (e.g., `b'10'`), indicating they are bytes, not Unicode strings.

### 2. Copying Arrays
To create a strictly independent duplicate of an array, use the `.copy()` method.
*   **The Difference**: If you simply assign `arr2 = arr1`, both variables point to the same memory. Changing one changes the other.
*   **The Solution**: `arr2 = arr1.copy()` allocates new memory. Changes to the original do **not** affect the copy.

In [144]:
arr = np.array([10,2,3,4],dtype='S')
print(arr)
print(arr.dtype)
print(arr[2])

[b'10' b'2' b'3' b'4']
|S2
b'3'


In [152]:
arr = np.array([1,2,3])
newarr = arr.copy() # make a copy(why include (?)
arr[0] = 10
print(arr)
print(newarr)

print("----------")

arr = np.array([1,2,3])
arr[0] = 10
newarr = arr.copy() # make a copy(why include (?)
print(arr)
print(newarr)

[10  2  3]
[1 2 3]
----------
[10  2  3]
[10  2  3]


## Data Type Conversion (astype)

The `.astype()` method provides a way to perform **explicit type casting**.

*   **Functionality**: It converts the array to a specified data type (e.g., `float` to `int`, `int` to `bool`).
*   **Copying**: This method **always returns a new copy** of the array. The original array remains unchanged.
*   **Truncation**: When converting from `float` to `int`, NumPy simply truncates the decimal part (rounds towards zero), it does **not** round to the nearest integer.

In [169]:
arr = np.array([1.1,1.9,10.0,0])
newarr = arr.astype(int)
print("Original:", arr)
print("Converted:", newarr)
print("New dtype:", newarr.dtype)

newarr = arr.astype('int32')
print("Original:", arr)
print("Converted:", newarr)
print("New dtype:", newarr.dtype)

newarr = arr.astype(np.int32)
print("Original:", arr)
print("Converted:", newarr)
print("New dtype:", newarr.dtype)

newarr = arr.astype(bool)
print("Original:", arr)
print("Converted:", newarr)
print("New dtype:", newarr.dtype)

Original: [ 1.1  1.9 10.   0. ]
Converted: [ 1  1 10  0]
New dtype: int64
Original: [ 1.1  1.9 10.   0. ]
Converted: [ 1  1 10  0]
New dtype: int32
Original: [ 1.1  1.9 10.   0. ]
Converted: [ 1  1 10  0]
New dtype: int32
Original: [ 1.1  1.9 10.   0. ]
Converted: [ True  True  True False]
New dtype: bool


### Practice Exercises (Advanced)

**Task 1: Two-Step Conversion (String parsing)**
Create an array of strings representing floating-point numbers: `data = np.array(['1.5', '2.9', '-3.7'])`.
*   Try to convert this directly to integers using `.astype(int)`. *Note the error.*
*   Correctly convert it first to `float`, and *then* chain another `.astype(int)` to convert those floats to integers. Print the final result.

**Task 2: The "Round vs. Floor" Trap**
Create an array `arr = np.array([0.99, 1.01, 1.99, -0.99])`.
*   Convert it to `int`.
*   Then, convert that integer array to `bool`.
*   Print the final boolean array.
*   *Analysis Question: Why did `0.99` and `-0.99` both become `False`?*

In [171]:
# Task 1: String -> Float -> Int
# Write your code here



# Task 2: Float -> Int -> Bool chain
# Write your code here

## Changing the Array Shape

The `reshape()` method allows you to change the dimensions (structure) of an array without changing its data.

*   **Consistency Check**: The total number of elements in the new shape must equal the total number of elements in the original array.
*   **Flattening**: converting a multi-dimensional array into a 1D array is often called "flattening".
*   **Unknown Dimension (`-1`)**: You can specify **one** dimension as `-1`. NumPy will automatically calculate this dimension based on the total number of elements.

In [176]:
# 1. Reshape 1D to 2D
arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])
newarr = arr.reshape(4,3) # 4 rows,3columns
print("1D to 2D:\n", newarr)
newarr = arr.reshape(3,4) # 3 rows,4columns
print("1D to 2D:\n", newarr)

1D to 2D:
 [[ 1  2  3]
 [ 4  5  6]
 [ 7  8  9]
 [10 11 12]]
1D to 2D:
 [[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]


In [178]:
# 1. Reshape 1D to 3D
arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])
newarr = arr.reshape(2,3,2)
print("1D to 3D:\n", newarr)
newarr = arr.reshape(3,2,2)
print("1D to 3D:\n", newarr)

1D to 3D:
 [[[ 1  2]
  [ 3  4]
  [ 5  6]]

 [[ 7  8]
  [ 9 10]
  [11 12]]]
1D to 3D:
 [[[ 1  2]
  [ 3  4]]

 [[ 5  6]
  [ 7  8]]

 [[ 9 10]
  [11 12]]]


In [190]:
# 3. Flattening (Multi-D to 1D)
arr = np.array([[1, 2, 3], [4, 5, 6]])
print(arr.shape)
newarr = arr.reshape(-1)# -1 implies "flatten to 1 dimension"
print(newarr)
print(newarr.shape)

arr = np.array([
    [
        [1,2],[3,4]
    ],[
        [1,2],[3,4]
    ]
])
print(arr.shape)
newarr = arr.reshape(-1)# -1 implies "flatten to 1 dimension"
print(newarr)
print(newarr.shape)
print("so,'-1' not means minus 1 dimension, is become 1D")

(2, 3)
[1 2 3 4 5 6]
(6,)
(2, 2, 2)
[1 2 3 4 1 2 3 4]
(8,)
so,'-1' not means minus 1 dimension, is become 1D


In [197]:
# 4. Automatic Dimension Inference
arr = np.array([1, 2, 3, 4, 5, 6, 7, 8,9,10,11,12])
# We specify 2 blocks, 2 rows, and let NumPy figure out the columns (-1)
newarr = arr.reshape(2, 3, -1) 
print("\nInferred Shape:", newarr.shape)
print(newarr)

newarr = arr.reshape(2, -1, 1) 
print("\nInferred Shape:", newarr.shape)
print(newarr)


Inferred Shape: (2, 3, 2)
[[[ 1  2]
  [ 3  4]
  [ 5  6]]

 [[ 7  8]
  [ 9 10]
  [11 12]]]

Inferred Shape: (2, 6, 1)
[[[ 1]
  [ 2]
  [ 3]
  [ 4]
  [ 5]
  [ 6]]

 [[ 7]
  [ 8]
  [ 9]
  [10]
  [11]
  [12]]]


## NumPy Array Manipulation

### 1. Joining Arrays（没太搞懂， 不过好像不重要？）
There are multiple ways to combine arrays in NumPy, depending on how you want to handle dimensions (axes).

*   **`np.concatenate`**: Joins a sequence of arrays along an **existing** axis.
*   **`np.stack`**: Joins a sequence of arrays along a **new** axis.
*   **`np.vstack` (Vertical Stack)**: Stacks arrays row-wise (on top of each other).
*   **`np.hstack` (Horizontal Stack)**: Stacks arrays column-wise (side-by-side).

In [210]:
import numpy as np

# --- Concatenation ---

# 1. Simple Concatenation (1D)
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])
newarr = np.concatenate((arr1, arr2)) # join 2 arrays
print("Concatenated 1D:", newarr)

# 2. Concatenation along Axis 1 (2D)
arr1 = np.array([[1, 2], [3, 4]])
arr2 = np.array([[5, 6], [7, 8]])
# axis=1 means joining columns (side by side)
newarr = np.concatenate((arr1, arr2), axis=1) 
print("\nConcatenated 2D (Axis 1):\n", newarr)

newarr = np.concatenate((arr1, arr2), axis=0) 
print("\nConcatenated 2D (Axis 0):\n", newarr)

newarr = np.stack((arr1,arr2),axis=0)
print("stack(Axis 0):\n", newarr)

newarr = np.stack((arr1,arr2),axis=1)
print("stack(Axis 1):\n", newarr)

arr1 = np.array([1,2,3])
arr2 = np.array([4,5,6])
newarr = np.stack((arr1,arr2),axis=1)
print(newarr)

Concatenated 1D: [1 2 3 4 5 6]

Concatenated 2D (Axis 1):
 [[1 2 5 6]
 [3 4 7 8]]

Concatenated 2D (Axis 0):
 [[1 2]
 [3 4]
 [5 6]
 [7 8]]
stack(Axis 0):
 [[[1 2]
  [3 4]]

 [[5 6]
  [7 8]]]
stack(Axis 1):
 [[[1 2]
  [5 6]]

 [[3 4]
  [7 8]]]
[[1 4]
 [2 5]
 [3 6]]


In [208]:
# --- Stacking ---

arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])

# 1. Stack along Axis 1
# Note: This effectively turns 1D arrays into columns of a 2D array
newarr = np.stack((arr1, arr2), axis=1)
print("Stack (Axis 1):\n", newarr)

newarr = np.stack((arr1, arr2), axis=0)
print("Stack (Axis 0):\n", newarr)

# 2. vstack (Vertical) and hstack (Horizontal)
print("\nvstack:\n", np.vstack((arr1, arr2)))
print("\nhstack:\n", np.hstack((arr1, arr2)))

Stack (Axis 1):
 [[1 4]
 [2 5]
 [3 6]]
Stack (Axis 0):
 [[1 2 3]
 [4 5 6]]

vstack:
 [[1 2 3]
 [4 5 6]]

hstack:
 [1 2 3 4 5 6]


### 2. Splitting Arrays
You can break arrays into smaller sub-arrays using **splitting** methods.

*   **`np.array_split`**: Splits an array into multiple sub-arrays. It is robust and can handle **uneven splits** (where the number of elements doesn't divide equally by the number of sections).
*   **Slicing (Manual Split)**: For dataset management (e.g., Machine Learning), it is common to manually slice data into **Training**, **Validation**, and **Test** sets using index ranges.

In [217]:
# --- Array Split ---
arr = np.array([1, 2, 3, 4, 5, 6])

# 1. Equal Split
newarr = np.array_split(arr, 3) # split into 3 equal arrays
print("Equal Split (3 parts):", newarr)

# 2. Uneven Split
# NumPy will adjust sizes if elements don't divide equally
newarr = np.array_split(arr, 4) 
print("Uneven Split (4 parts):", newarr)

# --- Manual Slicing (Train/Val/Test) ---
data = np.array([[1, 2, 3],
                 [4, 5, 6],
                 [7, 8, 9],
                 [10, 11, 12]])
train,val,test = data[:2,:],data[2:3,:],data[3:,:]
print("Train Set:\n", train)
print("Validation Set:\n", val)
print("Test Set:\n", test)

Equal Split (3 parts): [array([1, 2]), array([3, 4]), array([5, 6])]
Uneven Split (4 parts): [array([1, 2]), array([3, 4]), array([5]), array([6])]
Train Set:
 [[1 2 3]
 [4 5 6]]
Validation Set:
 [[7 8 9]]
Test Set:
 [[10 11 12]]


### Practice Exercises (Advanced)

**Task 1: Reconstructing a Grid (Block Matrix)**
You are given four 2x2 matrices: `A`, `B`, `C`, and `D`.
*   `A = zeros`, `B = ones`, `C = ones * 2`, `D = ones * 3` (all shape 2x2).
*   Use a combination of `hstack` and `vstack` (or `concatenate`) to assemble them into a single **4x4 matrix** where:
    *   Top-Left is A, Top-Right is B
    *   Bottom-Left is C, Bottom-Right is D.

**Task 2: Time-Series Train/Test Split**
Create an array representing 100 days of data: `time_series = np.arange(100)`.
*   You need to split this into a **Train set (80%)** and a **Test set (20%)**.
*   Calculate the split index dynamically (do not hardcode `80`).
*   Perform the split using slicing.
*   Print the shapes of the resulting arrays to verify they are `(80,)` and `(20,)`.

## NumPy Arithmetic

NumPy allows you to perform **element-wise** arithmetic operations on arrays. This is often referred to as **vectorization**, which is significantly faster than iterating through lists with loops.

### 1. Basic Arithmetic
You can use specific NumPy functions (like `np.add`) or standard Python operators (`+`, `-`, `*`, `/`).
*   **`np.add()`**: Adds elements from two arrays (or lists).
*   **`np.subtract()`**: Subtracts elements of the second array from the first.
*   **`np.multiply()`**: Multiplies elements.
*   **`np.divide()`**: Divides elements.

### 2. Rounding & Truncation
*   **`np.around()`**: Rounds data to a specified number of decimals.
*   **`np.trunc()`**: Removes the decimal part strictly (truncates towards zero).

In [220]:
# --- Basic Addition (Lists to Array) ---
# NumPy ufuncs (universal functions) like add can accept lists directly
x = [1, 2, 3, 4]
y = [4, 5, 6, 7]
z = np.add(x,y)  # add the two array
print("Addition result:", z)

Addition result: [ 5  7  9 11]


## 3. 数组维度与索引

NumPy 使用 `ndarray` 来表示 **N 维数组**：

- 一维：向量，如 `[1, 2, 3]`
- 二维：矩阵，如 `[[1, 2, 3], [4, 5, 6]]`
- 三维及以上：常用于图像、视频或更高维特征

相关属性与操作：

- `.shape`：查看数组的形状（每一维的长度）
- `.ndim`：查看数组的维度数
- 索引和切片：`arr[i]`、`arr[i:j]`、`arr[row, col]` 等


In [40]:
# 一维数组（向量）
v = np.array([7, 9, 12, 22, 26])
print("v:", v)
print("v.shape:", v.shape, "v.ndim:", v.ndim)

# 二维数组（矩阵）
M = np.array([
    [1, 2, 3],
    [4, 5, 6]
])
print("\nM =\n", M)
print("M.shape:", M.shape, "M.ndim:", M.ndim)

# 基本索引
print("M[0, 0] =", M[0, 0])        # 第一行第一列
print("M[1, 2] =", M[1, 2])        # 第二行第三列
print("第一行:", M[0, :])
print("第二列:", M[:, 1])


v: [ 7  9 12 22 26]
v.shape: (5,) v.ndim: 1

M =
 [[1 2 3]
 [4 5 6]]
M.shape: (2, 3) M.ndim: 2
M[0, 0] = 1
M[1, 2] = 6
第一行: [1 2 3]
第二列: [2 5]


In [39]:
arr_random = np.empty([3,3])
print("此时元素随机:\n",arr_random)

arr_0 = np.zeros([3,3])
print("此时元素均为0:\n",arr_0)

arr_1 = np.ones([3,2])
print("此时元素均为1:\n",arr_1)

此时元素随机:
 [[0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]]
此时元素均为0:
 [[0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]]
此时元素均为1:
 [[1. 1.]
 [1. 1.]
 [1. 1.]]


### 练习 3：索引与切片

给定如下数组（请自己在代码单元中创建）：

```python
A = np.array([
    [10, 20, 30, 40],
    [50, 60, 70, 80],
    [90, 100, 110, 120]
])
```

请完成：

1. 取出第二行（索引 1）的全部元素。
2. 取出第一列和第三列（索引 0 和 2）的所有行。
3. 取出中间的 2x2 子矩阵 `[[60, 70], [100, 110]]`。


In [54]:
A = np.array([
    [10, 20, 30, 40],
    [50, 60, 70, 80],
    [90, 100, 110, 120]
])
print("第二行为: ",A[1,:])
print("第一列: ",A[:,0])
print("第三列: ",A[:,2])
print("中间的2*2子矩阵为: ",A[1:3,1:3])#切片仍然是左闭右开

第二行为:  [50 60 70 80]
第一列:  [10 50 90]
第三列:  [ 30  70 110]
中间的2*2子矩阵为:  [[ 60  70]
 [100 110]]


## 4. 多维数组与 Taxi Fare 示例（线性方程组）

PPT 中的 Taxi 例子：

- $25：10km 距离，30 分钟
- $20：8km 距离，20 分钟
- $37：15km 距离，50 分钟

假设计费公式为：
\begin{equation}
\text{fare} = w_0 \cdot \text{distance} + w_1 \cdot \text{time} + w_2
\end{equation}

我们可以把这写成矩阵形式 $A w = b$，利用 NumPy 的线性代数模块 `np.linalg.solve` 求解 $w$。


In [None]:
# Taxi fare 示例：构造线性方程组 Aw = b

# A 的每一行是 [distance, time, 1]
A = np.array([
    [10, 30, 1],
    [8, 20, 1],
    [15, 50, 1]
], dtype=float)

# b 是对应的车费
b = np.array([25, 20, 37], dtype=float)

# 解方程 Aw = b
w = np.linalg.solve(A, b)
w0, w1, w2 = w
print("w =", w)
print(f"fare = {w0:.3f} * distance + {w1:.3f} * time + {w2:.3f}")

# 验证第一条样本
fare_pred = w0 * 10 + w1 * 30 + w2
print("\n预测 10km, 30min 的车费:", fare_pred)


### 练习 4：修改与扩展 Taxi 模型

1. 自己再假设 1 条新的数据，例如：某次打车为 12km，40 分钟，车费 ？ 元。
   - 用上面的公式计算预测车费。
2. 修改/增加输入数据点（例如多加几行数据），重新构造 A 和 b，并用 `np.linalg.solve` 重新求解 w。
3. 思考：如果我们收集了很多数据点（多于 3 条），而且有噪声，这个时候还适合用 `np.linalg.solve` 吗？应该改用什么方法？（提示：回归/最小二乘）


## 5. 数组形状变换与数据拆分（train / validate / test）

常见操作：

- `reshape`：改变数组形状（元素总数不变）
- `concatenate`：拼接数组（沿行/列方向）
- `np.array_split`：将数组拆分为多份（例如训练集、验证集、测试集）


In [None]:
# reshape 示例
x = np.arange(12)  # [0, 1, ..., 11]
print("原始 x:", x)
X2 = x.reshape(3, 4)
print("\nreshape 为 3x4:\n", X2)

# 拼接示例
a = np.array([[1, 2], [3, 4]])
b = np.array([[5, 6], [7, 8]])
h_cat = np.concatenate([a, b], axis=1)  # 横向拼接（列增加）
v_cat = np.concatenate([a, b], axis=0)  # 纵向拼接（行增加）
print("\n横向拼接:\n", h_cat)
print("纵向拼接:\n", v_cat)

# 拆分示例：将 0~11 拆成 3 份
big_arr = np.arange(12)
parts = np.array_split(big_arr, 3)
for i, p in enumerate(parts):
    print(f"part {i}:", p)


### 练习 5：模拟 train / validate / test 划分

1. 使用 `np.arange(30)` 生成一个包含 30 个样本 ID 的数组，例如代表 30 张图片的编号。
2. 使用 `np.array_split` 将它拆成 3 份，分别命名为 `train_ids`、`val_ids`、`test_ids`。
3. 打印每一份的内容和长度，并思考：真实项目中，训练集/验证集/测试集的比例一般如何选择？（例如 70% / 15% / 15% 等）


## 6. 数组运算、搜索、排序与过滤

### 基本算术运算

NumPy 支持数组之间的逐元素加减乘除：

```python
c = a + b
d = a * b
```

### 搜索与排序

- `np.where(condition)`：返回满足条件的元素索引
- `np.sort(arr)`：返回排好序的新数组
- `np.searchsorted(arr, x)`：在有序数组中找到 `x` 应该插入的位置

### 布尔过滤（Filtering）

通过布尔数组选择满足条件的元素：

```python
mask = arr > 10
new_arr = arr[mask]
```

In [None]:
arr = np.array([7, 9, 12, 22, 26])
print("原数组:", arr)

# 过滤出大于 10 的元素
mask = arr > 10
print("mask:", mask)
filtered = arr[mask]
print("大于 10 的元素:", filtered)

# where 示例：找到等于 3 的元素位置
x = np.array([7, 2, 3, 4, 5, 3, 7, 8, 3])
idx = np.where(x == 3)
print("\nx:", x)
print("等于 3 的索引:", idx)

# 排序与插入位置
sorted_x = np.sort(x)
pos = np.searchsorted(sorted_x, 4)
print("\n排序后:", sorted_x)
print("数字 4 应插入的位置:", pos)


### 练习 6：过滤与搜索

1. 创建一个包含 20 个随机整数（范围 0~100）的数组。
2. 使用布尔过滤选出其中的偶数，存到 `even_nums` 中。
3. 使用 `np.where` 找出所有大于数组平均值的元素索引。
4. 使用 `np.sort` 对数组排序，然后用 `np.searchsorted` 找出数字 50 应该插入的位置。


## 7. 随机数、统计分析与归一化

NumPy 提供了丰富的随机与统计函数：

- 随机：`np.random.randint`、`np.random.random`、`np.random.normal` 等
- 统计：`np.mean`、`np.median`、`np.var`、`np.std` 等
- 归一化（Normalization）：将特征缩放到统一范围（如 0~1），有利于模型训练


In [None]:
# 随机整数
rand_ints = np.random.randint(0, 101, size=10)
print("随机整数:", rand_ints)

# 正态分布示例：均值 170，标准差 10，共 250 个样本
heights = np.random.normal(loc=170, scale=10, size=250)
print("\n身高数据: mean =", np.mean(heights), ", std =", np.std(heights))

# 归一化到 0~1
h_min, h_max = heights.min(), heights.max()
heights_norm = (heights - h_min) / (h_max - h_min)
print("归一化后: min =", heights_norm.min(), ", max =", heights_norm.max())


### 练习 7：实现你自己的归一化

1. 生成一个长度为 100 的随机数组，数值范围在 `[50, 200]` 之间。
2. 手写一个函数 `min_max_normalize(x)`，对输入数组做 0~1 归一化：
   ```python
   def min_max_normalize(x):
       # TODO: 你的代码
       return x_norm
   ```
3. 对比 `min_max_normalize(x)` 的输出和你在代码中手动写的 `(x - x.min()) / (x.max() - x.min())` 是否一致。


## 8. 网格与数学函数（进阶预告）

PPT 中还展示了：

- `np.meshgrid`：生成二维坐标网格（X、Y），常用于绘制 3D 曲面图、等高线图
- 三角函数：`np.sin`、`np.cos` 等
- 对数函数：`np.log` 等

这些内容在可视化和更复杂的数学建模中非常有用，建议在需要时查阅文档并动手实验。

### 练习 8（选做）：生成网格并计算函数值

1. 使用 `np.linspace(-5, 5, 11)` 生成一维的 x 和 y 坐标。
2. 使用 `np.meshgrid(x, y)` 得到 `X, Y` 两个二维数组。
3. 计算 `Z = np.sin(X) + np.cos(Y)`，观察 `Z.shape`。
4. 如果你会用 `matplotlib`，可以尝试画出 `Z` 的热力图（heatmap）。
