# Python 資料分析

> 以 NumPy 運算陣列

[數據交點](https://www.datainpoint.com/) | 郭耀仁 <yaojenkuo@datainpoint.com>

## 練習題指引

- 第一個程式碼儲存格會將可能用得到的模組（套件）以及單元測試 `unittest` 載入。
- 如果練習題需要載入檔案，檔案與練習題存放在同個資料夾中，意即我們可以指定工作目錄來載入。
- 練習題已經定義好函數或者類別的名稱以及參數名稱，我們只需要寫作主體。
- 函數或者類別的 `"""docstring"""` 部分會描述測試如何進行。
- 觀察 `"""docstring"""` 的部分能夠暸解輸入以及預期輸出之間的關係，能幫助我們更暸解題目。
- 請在 `### BEGIN SOLUTION` 與 `### END SOLUTION` 這兩個單行註解之間寫作函數或者類別的主體。
- 執行測試的方式為點選上方選單的 Kernel -> Restart Kernel And Run All Cells -> Restart。
- 可以每寫一題就執行測試，也可以全部寫完再執行測試。
- 練習題閒置超過 10 分鐘會自動斷線，這時只要重新點選練習題連結即可重新啟動。

In [1]:
import os
import json
import unittest
import numpy as np

## 01. Define a function named `create_first_five_primes_array` that is able to generate a `(5,)` array as specified.

- Expected inputs: None.
- Expected outputs: a (5,) array.

```
[ 2  3  5  7 11]
```

In [2]:
def create_first_five_primes_array():
    """
    >>> first_five_primes_array = create_first_five_primes_array()
    >>> print(first_five_primes_array)
    [ 2  3  5  7 11]
    >>> print(type(first_five_primes_array))
    <class 'numpy.ndarray'>
    >>> print(first_five_primes_array.shape)
    (5,)
    """
    ### BEGIN SOLUTION
    out_arr = np.array([2, 3, 5, 7, 11])
    return out_arr
    ### END SOLUTION

## 02. Define a function named `create_first_ten_odds_array` that is able to generate a `(10,)` array as specified.

- Expected inputs: None.
- Expected outputs: a (10,) array.

```
[ 1  3  5  7  9 11 13 15 17 19]
```

In [3]:
def create_first_ten_odds_array():
    """
    >>> first_ten_odds_array = create_first_ten_odds_array()
    >>> print(first_ten_odds_array)
    [ 1  3  5  7  9 11 13 15 17 19]
    >>> print(type(first_ten_odds_array))
    <class 'numpy.ndarray'>
    >>> print(first_ten_odds_array.shape)
    (10,)
    """
    ### BEGIN SOLUTION
    out_arr = np.arange(1, 20, 2)
    return out_arr
    ### END SOLUTION

## 03. Define a function named `create_a_square_matrix` that is able to generate a square matrix given `n` as the order, `fill_int` as the elements.

- Expected inputs: 2 integers.
- Expected outputs: a (n, n) array.

In [4]:
def create_a_square_matrix(n, fill_int):
    """
    >>> a_square_matrix = create_a_square_matrix(2, 5566)
    >>> print(a_square_matrix)
    [[5566 5566]
     [5566 5566]]
    >>> a_square_matrix = create_a_square_matrix(3, 55)
    >>> print(a_square_matrix)
    [[55 55 55]
     [55 55 55]
     [55 55 55]]
    >>> a_square_matrix = create_a_square_matrix(4, 66)
    >>> print(a_square_matrix)
    [[66 66 66 66]
     [66 66 66 66]
     [66 66 66 66]
     [66 66 66 66]]
    """
    ### BEGIN SOLUTION
    arr_shape = (n, n)
    out_arr = np.full(shape=arr_shape, fill_value=fill_int)
    return out_arr
    ### END SOLUTION

## 04. Define a function named `create_a_diagonal_matrix` that is able to generate a diagonal matrix given `n` as the order, `fill_int` as the elements of main diagonal.

PS You may refer to the NumPy function `eye`: <https://numpy.org/doc/stable/reference/generated/numpy.eye.html>

- Expected inputs: 2 integers.
- Expected outputs: a (n, n) array.

In [5]:
def create_a_diagonal_matrix(n, fill_int):
    """
    >>> a_diagonal_matrix = create_a_diagonal_matrix(2, 5566)
    >>> print(a_diagonal_matrix)
    [[5566    0]
     [   0 5566]]
    >>> a_diagonal_matrix = create_a_diagonal_matrix(3, 55)
    >>> print(a_diagonal_matrix)
    [[55  0  0]
     [ 0 55  0]
     [ 0  0 55]]
    >>> a_diagonal_matrix = create_a_diagonal_matrix(4, 66)
    >>> print(a_diagonal_matrix)
    [[66  0  0  0]
     [ 0 66  0  0]
     [ 0  0 66  0]
     [ 0  0  0 66]]
    """
    ### BEGIN SOLUTION
    identity_matrix = np.eye(n, dtype=int)
    out_arr = identity_matrix * fill_int
    return out_arr
    ### END SOLUTION

## 05. Define a function named `create_a_diagonal_split_matrix` that is able to generate a diagonal matrix given `n` as the order, `fill_int` as the elements outside the main diagonal.

PS You may refer to the NumPy function `diag`: <https://numpy.org/doc/stable/reference/generated/numpy.diag.html>

- Expected inputs: 2 integers.
- Expected outputs: a (n, n) array.

In [6]:
def create_a_diagonal_split_matrix(n, fill_int):
    """
    >>> a_diagonal_split_matrix = create_a_diagonal_split_matrix(2, 5566)
    >>> print(a_diagonal_split_matrix)
    [[   0 5566]
     [5566    0]]
    >>> a_diagonal_split_matrix = create_a_diagonal_split_matrix(3, 55)
    >>> print(a_diagonal_split_matrix)
    [[ 0 55 55]
     [55  0 55]
     [55 55  0]]
    >>> a_diagonal_split_matrix = create_a_diagonal_split_matrix(4, 66)
    >>> print(a_diagonal_split_matrix)
    [[ 0 66 66 66]
     [66  0 66 66]
     [66 66  0 66]
     [66 66 66  0]]
    """
    ### BEGIN SOLUTION
    arr_shape = (n, n)
    out_arr = np.full(shape=arr_shape, fill_value=fill_int)
    diags = np.diagonal(out_arr)
    minus_arr = -np.diag(diags)
    out_arr += minus_arr
    return out_arr
    ### END SOLUTION

## 06. Define a function named `create_nine_one_array` that is able to generate a `(9, 1)` array as specified.

- Expected inputs: None.
- Expected outputs: a (9, 1) array.

```
[[1]
 [2]
 [3]
 [4]
 [5]
 [6]
 [7]
 [8]
 [9]]
```

In [7]:
def create_nine_one_array():
    """
    >>> nine_one_array = create_nine_one_array()
    >>> nine_one_array.shape
    (9, 1)
    >>> nine_one_array[0, 0]
    1
    >>> nine_one_array[-1, 0]
    9
    """
    ### BEGIN SOLUTION
    no_arr = np.arange(1, 10).reshape(9, 1)
    return no_arr
    ### END SOLUTION

## 07. Define a function named `create_one_nine_array` that is able to generate a `(1, 9)` array as specified.

- Expected inputs: None.
- Expected outputs: a (1, 9) array.

```
[[1 2 3 4 5 6 7 8 9]]
```

In [8]:
def create_one_nine_array():
    """
    >>> one_nine_array = create_one_nine_array()
    >>> one_nine_array.shape
    (1, 9)
    >>> one_nine_array[0, 0]
    1
    >>> one_nine_array[0, -1]
    9
    """
    ### BEGIN SOLUTION
    on_arr = np.arange(1, 10).reshape(1, 9)
    return on_arr
    ### END SOLUTION

## 08. Define a function named `create_nine_nine_array` that is able to generate a `(9, 9)` array as specified.

- Expected inputs: None.
- Expected outputs: a (9, 9) array.

```
[[ 1  2  3  4  5  6  7  8  9]
 [ 2  4  6  8 10 12 14 16 18]
 [ 3  6  9 12 15 18 21 24 27]
 [ 4  8 12 16 20 24 28 32 36]
 [ 5 10 15 20 25 30 35 40 45]
 [ 6 12 18 24 30 36 42 48 54]
 [ 7 14 21 28 35 42 49 56 63]
 [ 8 16 24 32 40 48 56 64 72]
 [ 9 18 27 36 45 54 63 72 81]]
```

In [9]:
def create_nine_nine_array():
    """
    >>> nine_nine_array = create_nine_nine_array()
    >>> nine_nine_array.shape
    (9, 9)
    >>> nine_nine_array[0, 0]
    1
    >>> nine_nine_array[1, 1]
    4
    >>> nine_nine_array[7, 7]
    64
    >>> nine_nine_array[8, 8]
    81
    """
    ### BEGIN SOLUTION
    nine_one_array = create_nine_one_array()
    one_nine_array = create_one_nine_array()
    nine_nine_array = nine_one_array.dot(one_nine_array)
    return nine_nine_array
    ### END SOLUTION

## 09. Define a function named `filter_evens` that is able to extract the even numbers from a given array.

- Expected inputs: an array.
- Expected outputs: an array.

In [10]:
def filter_evens(x):
    """
    >>> filter_evens(np.array([5, 5, 6, 6]))
    array([6, 6])
    >>> filter_evens(np.array([1, 2, 3, 4]))
    array([2, 4])
    >>> filter_evens(np.array([0, 1, 2, 3]))
    array([0, 2])
    """
    ### BEGIN SOLUTION
    evens = x[x % 2 == 0]
    return evens
    ### END SOLUTION

## 10. Define a function named `filter_evens_then_product` that is able to product the even numbers from a given array.

- Expected inputs: an array.
- Expected outputs: a numeric.

In [11]:
def filter_evens_then_product(x):
    """
    >>> filter_evens_then_product(np.array([5, 5, 6, 6]))
    36
    >>> filter_evens_then_product(np.array([1, 2, 3, 4]))
    8
    >>> filter_evens_then_product(np.array([0, 1, 2, 3]))
    0
    """
    ### BEGIN SOLUTION
    evens = x[x % 2 == 0]
    return evens.prod()
    ### END SOLUTION

## 11. Define a function named `find_divisors` that is able to find the divisors of a given integer.

- Expected inputs: an integer.
- Expected outputs: an array.

In [12]:
def find_divisors(x):
    """
    >>> find_divisors(1)
    array([1])
    >>> find_divisors(2)
    array([1, 2])
    >>> find_divisors(3)
    array([1, 3])
    >>> find_divisors(4)
    array([1, 2, 4])
    >>> find_divisors(5)
    array([1, 5])
    """
    ### BEGIN SOLUTION
    possible_divisors = np.arange(1, x+1)
    modulos = x % possible_divisors
    divisors = possible_divisors[modulos == 0]
    return divisors
    ### END SOLUTION

## 12. Define a function named `var` that is able to calculate the variance of a given array.

PS You may refer to the definition of variance: <https://en.wikipedia.org/wiki/Variance>

\begin{equation}
Var(X) = \frac{1}{n}\sum_{i=1}^{n}(x_i - \mu)^2
\end{equation}

- Expected inputs: an array.
- Expected outputs: a numeric.

In [13]:
def var(X):
    """
    >>> var(np.array([1, 1, 1, 1]))
    0.0
    >>> var(np.array([5, 5, 6, 6]))
    0.25
    >>> var(np.array([5, -5, 6, -6]))
    30.5
    >>> var(np.array([2, 4, 8, 16]))
    28.75
    """
    ### BEGIN SOLUTION
    n = X.size
    mu = X.mean()
    se = (X - mu)**2
    sse = se.sum()
    return sse / n
    ### END SOLUTION

## 13. Define a function named `cov` that is able to calculate the covariance given 2 same-length arrays.

PS You may refer to the definition of covariance: <https://en.wikipedia.org/wiki/Covariance>

\begin{equation}
cov(X, Y) = \frac{1}{n}\sum_{i=1}^{n}(x_i-\mu_{X})(y_i-\mu_{Y})
\end{equation}

- Expected inputs: 2 arrays.
- Expected outputs: a numeric.

In [14]:
def cov(X, Y):
    """
    >>> X = np.array([1, 1, 1, 1])
    >>> Y = np.array([1, 1, 1, 1])
    >>> cov(X, Y)
    0.0
    >>> X = np.array([5, 5, 6, 6])
    >>> Y = np.array([5, 5, 6, 6])
    >>> cov(X, Y)
    0.25
    >>> X = np.array([5, 5, 6, 6])
    >>> Y = np.array([5, -5, 6, -6])
    >>> cov(X, Y)
    0.0
    """
    ### BEGIN SOLUTION
    n = X.size
    mu_X = X.mean()
    mu_Y = Y.mean()
    sum_err_prod = ((X - mu_X)*(Y - mu_Y)).sum()
    return sum_err_prod / n
    ### END SOLUTION

## 14. Define a function named `corr` that is able to calculate the correlation coefficient of 2 same-length arrays.

PS You may refer to the definition of correlation coefficient: <https://en.wikipedia.org/wiki/Pearson_correlation_coefficient>

\begin{equation}
r_{XY} = \frac{cov(X, Y)}{\sqrt{cov(X, X)cov(Y, Y)}}
\end{equation}

- Expected inputs: 2 arrays.
- Expected outputs: a numeric.

In [15]:
def corr(X, Y):
    """
    >>> X = np.array([1, 2])
    >>> Y = np.array([2, 4])
    >>> corr(X, Y)
    1.0
    >>> X = np.array([1, 2])
    >>> Y = np.array([-2, -4])
    >>> corr(X, Y)
    -1.0
    >>> X = np.array([1, 2, 3])
    >>> Y = np.array([2, 4, 6])
    >>> corr(X, Y)
    1.0
    >>> X = np.array([1, 2, 3])
    >>> Y = np.array([-2, -4, -6])
    >>> corr(X, Y)
    -1.0
    """
    ### BEGIN SOLUTION
    cov_X_Y = cov(X, Y)
    cov_X_X = cov(X, X)
    cov_Y_Y = cov(Y, Y)
    return cov_X_Y / np.sqrt(cov_X_X * cov_Y_Y)
    ### END SOLUTION

## 15. Define a function named `mse` that is able to calculate the mean squared error given 2 same-length arrays.

PS You may refer to the definition of mean squared error: <https://en.wikipedia.org/wiki/Mean_squared_error>

\begin{equation}
MSE = \frac{1}{n}\sum_{i=1}^{n}(Y_i - \hat{Y_{i}})^2
\end{equation}

- Expected inputs: 2 arrays.
- Expected outputs: a numeric.

In [16]:
def mse(Y, Y_hat):
    """
    >>> Y = np.array([5, 5, 6, 6])
    >>> Y_hat = np.array([5, 5, 6, 6])
    >>> mse(Y, Y_hat)
    0.0
    >>> Y = np.array([5, 5, 6, 6])
    >>> Y_hat = np.array([5, -5, 6, -6])
    >>> mse(Y, Y_hat)
    61.0
    >>> Y = np.array([5, 5, 6, 6])
    >>> Y_hat = np.array([-5, -5, -6, -6])
    >>> mse(Y, Y_hat)
    122.0
    """
    ### BEGIN SOLUTION
    n = Y.size
    errors = Y - Y_hat
    se = errors**2
    sse = se.sum()
    return sse / n
    ### END SOLUTION

## 執行測試！

Kernel -> Restart Kernel And Run All Cells -> Restart

In [17]:
class TestArrayComputingWithNumpy(unittest.TestCase):
    def test_01_create_first_five_primes_array(self):
        first_five_primes_array = create_first_five_primes_array()
        np.testing.assert_array_equal(first_five_primes_array,
                                     np.array([2, 3, 5, 7, 11]))
        self.assertIsInstance(first_five_primes_array, np.ndarray)
        self.assertEqual(first_five_primes_array.shape, (5,))
    def test_02_create_first_ten_odds_array(self):
        first_ten_odds_array = create_first_ten_odds_array()
        np.testing.assert_array_equal(first_ten_odds_array,
                                     np.array([1, 3, 5, 7, 9, 11, 13, 15, 17, 19]))
        self.assertIsInstance(first_ten_odds_array, np.ndarray)
        self.assertEqual(first_ten_odds_array.shape, (10,))
    def test_03_create_a_square_matrix(self):
        a_square_matrix = create_a_square_matrix(2, 5566)
        self.assertEqual(a_square_matrix.shape, (2, 2))
        self.assertEqual(a_square_matrix.sum(), 5566 * 2**2)
        a_square_matrix = create_a_square_matrix(3, 55)
        self.assertEqual(a_square_matrix.shape, (3, 3))
        self.assertEqual(a_square_matrix.sum(), 55 * 3**2)
        a_square_matrix = create_a_square_matrix(4, 66)
        self.assertEqual(a_square_matrix.shape, (4, 4))
        self.assertEqual(a_square_matrix.sum(), 66 * 4**2)
    def test_04_create_a_diagonal_matrix(self):
        a_diagonal_matrix = create_a_diagonal_matrix(2, 5566)
        self.assertEqual(a_diagonal_matrix.shape, (2, 2))
        self.assertEqual(a_diagonal_matrix.sum(), 5566 * 2)
        a_diagonal_matrix = create_a_diagonal_matrix(3, 55)
        self.assertEqual(a_diagonal_matrix.shape, (3, 3))
        self.assertEqual(a_diagonal_matrix.sum(), 55 * 3)
        a_diagonal_matrix = create_a_diagonal_matrix(4, 66)
        self.assertEqual(a_diagonal_matrix.shape, (4, 4))
        self.assertEqual(a_diagonal_matrix.sum(), 66 * 4)
    def test_05_create_a_diagonal_split_matrix(self):
        a_diagonal_split_matrix = create_a_diagonal_split_matrix(2, 5566)
        self.assertEqual(a_diagonal_split_matrix.shape, (2, 2))
        self.assertEqual(a_diagonal_split_matrix.sum(), 5566 * (2**2 - 2))
        a_diagonal_split_matrix = a_diagonal_split_matrix = create_a_diagonal_split_matrix(3, 55)
        self.assertEqual(a_diagonal_split_matrix.shape, (3, 3))
        self.assertEqual(a_diagonal_split_matrix.sum(), 55 * (3**2 - 3))
        a_diagonal_split_matrix = a_diagonal_split_matrix = create_a_diagonal_split_matrix(4, 66)
        self.assertEqual(a_diagonal_split_matrix.shape, (4, 4))
        self.assertEqual(a_diagonal_split_matrix.sum(), 66 * (4**2 - 4))
    def test_06_create_nine_one_array(self):
        nine_one_array = create_nine_one_array()
        self.assertEqual(nine_one_array.shape, (9, 1))
        self.assertEqual(nine_one_array[0, 0], 1)
        self.assertEqual(nine_one_array[-1, 0], 9)
    def test_07_create_one_nine_array(self):
        one_nine_array = create_one_nine_array()
        self.assertEqual(one_nine_array.shape, (1, 9))
        self.assertEqual(one_nine_array[0, 0], 1)
        self.assertEqual(one_nine_array[0, -1], 9)
    def test_08_create_nine_nine_array(self):
        nine_nine_array = create_nine_nine_array()
        self.assertEqual(nine_nine_array.shape, (9, 9))
        self.assertEqual(nine_nine_array[7, 7], 64)
        self.assertEqual(nine_nine_array[7, 8], 72)
        self.assertEqual(nine_nine_array[8, 7], 72)
        self.assertEqual(nine_nine_array[8, 8], 81)
    def test_09_filter_evens(self):
        np.testing.assert_equal(filter_evens(np.array([5, 5, 6, 6])), np.array([6, 6]))
        np.testing.assert_equal(filter_evens(np.array([1, 2, 3, 4])), np.array([2, 4]))
        np.testing.assert_equal(filter_evens(np.array([0, 1, 2, 3])), np.array([0, 2]))
    def test_10_filter_evens_then_product(self):
        self.assertEqual(filter_evens_then_product(np.array([5, 5, 6, 6])), 36)
        self.assertEqual(filter_evens_then_product(np.array([1, 2, 3, 4])), 8)
        self.assertEqual(filter_evens_then_product(np.array([0, 1, 2, 3])), 0)
    def test_11_find_divisors(self):
        np.testing.assert_equal(find_divisors(1), np.array([1]))
        np.testing.assert_equal(find_divisors(2), np.array([1, 2]))
        np.testing.assert_equal(find_divisors(3), np.array([1, 3]))
        np.testing.assert_equal(find_divisors(4), np.array([1, 2, 4]))
        np.testing.assert_equal(find_divisors(5), np.array([1, 5]))
    def test_12_var(self):
        self.assertAlmostEqual(var(np.array([1, 1, 1, 1])), 0.0)
        self.assertAlmostEqual(var(np.array([5, 5, 6, 6])), 0.25)
        self.assertAlmostEqual(var(np.array([5, -5, 6, -6])), 30.5)
        self.assertAlmostEqual(var(np.array([2, 4, 8, 16])), 28.75)
    def test_13_cov(self):
        X = np.array([1, 1, 1, 1])
        Y = np.array([1, 1, 1, 1])
        self.assertAlmostEqual(cov(X, Y), 0.0)
        X = np.array([5, 5, 6, 6])
        Y = np.array([5, 5, 6, 6])
        self.assertAlmostEqual(cov(X, Y), 0.25)
        X = np.array([5, 5, 6, 6])
        Y = np.array([5, -5, 6, -6])
        self.assertAlmostEqual(cov(X, Y), 0.0)
    def test_14_corr(self):
        X = np.array([1, 2])
        Y = np.array([2, 4])
        self.assertAlmostEqual(corr(X, Y), 1.0)
        X = np.array([1, 2])
        Y = np.array([-2, -4])
        self.assertAlmostEqual(corr(X, Y), -1.0)
        X = np.array([1, 2, 3])
        Y = np.array([2, 4, 6])
        self.assertAlmostEqual(corr(X, Y), 1.0)
        X = np.array([1, 2, 3])
        Y = np.array([-2, -4, -6])
        self.assertAlmostEqual(corr(X, Y), -1.0)
    def test_15_mse(self):
        Y = np.array([5, 5, 6, 6])
        Y_hat = np.array([5, 5, 6, 6])
        self.assertAlmostEqual(mse(Y, Y_hat), 0.0)
        Y = np.array([5, 5, 6, 6])
        Y_hat = np.array([5, -5, 6, -6])
        self.assertAlmostEqual(mse(Y, Y_hat), 61.0)
        Y = np.array([5, 5, 6, 6])
        Y_hat = np.array([-5, -5, -6, -6])
        self.assertAlmostEqual(mse(Y, Y_hat), 122.0)

suite = unittest.TestLoader().loadTestsFromTestCase(TestArrayComputingWithNumpy)
runner = unittest.TextTestRunner(verbosity=2)
test_results = runner.run(suite)
number_of_failures = len(test_results.failures)
number_of_errors = len(test_results.errors)
number_of_test_runs = test_results.testsRun
number_of_successes = number_of_test_runs - (number_of_failures + number_of_errors)
cwd = os.getcwd()
folder_name = cwd.split("/")[-1]
with open("../exercise_index.json", "r") as content:
    exercise_index = json.load(content)
chapter_name = exercise_index[folder_name]

test_01_create_first_five_primes_array (__main__.TestArrayComputingWithNumpy) ... ok
test_02_create_first_ten_odds_array (__main__.TestArrayComputingWithNumpy) ... ok
test_03_create_a_square_matrix (__main__.TestArrayComputingWithNumpy) ... ok
test_04_create_a_diagonal_matrix (__main__.TestArrayComputingWithNumpy) ... ok
test_05_create_a_diagonal_split_matrix (__main__.TestArrayComputingWithNumpy) ... ok
test_06_create_nine_one_array (__main__.TestArrayComputingWithNumpy) ... ok
test_07_create_one_nine_array (__main__.TestArrayComputingWithNumpy) ... ok
test_08_create_nine_nine_array (__main__.TestArrayComputingWithNumpy) ... ok
test_09_filter_evens (__main__.TestArrayComputingWithNumpy) ... ok
test_10_filter_evens_then_product (__main__.TestArrayComputingWithNumpy) ... ok
test_11_find_divisors (__main__.TestArrayComputingWithNumpy) ... ok
test_12_var (__main__.TestArrayComputingWithNumpy) ... ok
test_13_cov (__main__.TestArrayComputingWithNumpy) ... ok
test_14_corr (__main__.TestArray

In [18]:
print("你在「{}」章節中的 {} 道 Python 練習答對了 {} 題。".format(chapter_name, number_of_test_runs, number_of_successes))

你在「以 NumPy 運算陣列」章節中的 15 道 Python 練習答對了 15 題。
