# Python 的 50+ 練習：資料科學學習手冊

> 文字資料操作

[數據交點](https://www.datainpoint.com) | 郭耀仁 <yaojenkuo@datainpoint.com>

## 練習題指引

- 由於近期 mybinder.org 的服務不穩定，新增 Google Colab 作為另一個寫作練習題的平台。
- 開始寫作之前，可以先按上方「Copy to Drive」按鈕將筆記本複製到自己的 Google 雲端硬碟。
- 練習題閒置超過 10 分鐘會自動斷線，只要重新點選練習題連結即可重新啟動。
- 第一個程式碼儲存格會將可能用得到的模組載入。
- 如果練習題需要載入檔案，檔案存放絕對路徑為 `/content`
- 練習題已經給定函數、類別、預期輸入或參數名稱，我們只需要寫作程式區塊。同時也給定函數的類別提示，說明預期輸入以及預期輸出的類別。
- 說明（Docstring）會描述測試如何進行，閱讀說明能夠暸解預期輸入以及預期輸出之間的關係，幫助我們更快解題。
- 請在 `### BEGIN SOLUTION` 與 `### END SOLUTION` 這兩個註解之間寫作函數或者類別的程式區塊。
- 將預期輸出放置在 `return` 保留字之後，若只是用 `print()` 函數將預期輸出印出無法通過測試。
- 語法錯誤（`SyntaxError`）或縮排錯誤（`IndentationError`）等將會導致測試失效，測試之前應該先在筆記本使用函數觀察是否與說明（Docstring）描述的功能相符。
- 如果卡關，可以先看練習題詳解或者複習課程單元影片之後再繼續寫作。
- 執行測試的步驟：
    1. 點選右上角 Connect
    2. 點選上方選單的 Runtime -> Restart and run all -> Yes -> Run anyway
    3. 移動到 Google Colab 的最後一個儲存格看批改測試結果。
- 先執行下列兩個程式碼儲存格載入需要的模組與下載檔案至 `/content`

In [None]:
import re
import numpy as np
import pandas as pd

In [None]:
!wget -N https://raw.githubusercontent.com/datainpoint/classroom-hahow-pythonfiftyplus/main/exercise_index.json
!wget -N https://raw.githubusercontent.com/datainpoint/classroom-hahow-pythonfiftyplus/main/data/covid19/UID_ISO_FIPS_LookUp_Table.csv
!wget -N https://raw.githubusercontent.com/datainpoint/classroom-hahow-pythonfiftyplus/main/data/thispy.txt

## 131. 轉換英文字母為 ASCII 整數

定義函數 `convert_character_to_int()` 將輸入的英文子母轉換為 ASCII 整數。

來源：<https://en.wikipedia.org/wiki/ASCII>

- 使用內建函數 `ord()` <https://docs.python.org/3/library/functions.html#ord>
- 將預期輸出寫在 `return` 之後。

In [None]:
def convert_character_to_int(x: str) -> int:
    """
    >>> convert_character_to_int("A")
    65
    >>> convert_character_to_int("M")
    77
    >>> convert_character_to_int("N")
    78
    >>> convert_character_to_int("Z")
    90
    >>> convert_character_to_int("a")
    97
    >>> convert_character_to_int("m")
    109
    >>> convert_character_to_int("n")
    110
    >>> convert_character_to_int("z")
    122
    """
    ### BEGIN SOLUTION
    
    ### END SOLUTION

## 132. 轉換整數為英文字母

定義函數 `convert_int_to_character()` 將輸入介於 65 到 90、97 到 122 之間的整數轉換為英文字母。

來源：<https://en.wikipedia.org/wiki/ASCII>

- 使用內建函數 `chr()` <https://docs.python.org/3/library/functions.html#chr>
- 將預期輸出寫在 `return` 之後。

In [None]:
def convert_int_to_character(x: int) -> str:
    """
    >>> convert_int_to_character(65)
    'A'
    >>> convert_int_to_character(77)
    'M'
    >>> convert_int_to_character(78)
    'N'
    >>> convert_int_to_character(90)
    'Z'
    >>> convert_int_to_character(97)
    'a'
    >>> convert_int_to_character(109)
    'm'
    >>> convert_int_to_character(110)
    'n'
    >>> convert_int_to_character(122)
    'z'
    """
    ### BEGIN SOLUTION
    
    ### END SOLUTION

## 133. 使用 ROT13 轉換英文字母

定義函數 `rot13_character()` 將輸入的英文字母用 ROT13（迴轉 13 位）的規則轉換。

![](https://raw.githubusercontent.com/datainpoint/classroom-hahow-pythonfiftyplus/main/16-working-with-text/ROT13.png)

來源：<https://en.wikipedia.org/wiki/ROT13>

- 使用內建函數 `ord()` <https://docs.python.org/3/library/functions.html#ord>
- 使用內建函數 `chr()` <https://docs.python.org/3/library/functions.html#chr>
- 運用條件敘述。
- 運用數值運算符。
- 將預期輸出寫在 `return` 之後。

In [None]:
def rot13_character(x: str) -> str:
    """
    >>> rot13_character("A")
    'N'
    >>> rot13_character("M")
    'Z'
    >>> rot13_character("N")
    'A'
    >>> rot13_character("Z")
    'M'
    >>> rot13_character("a")
    'n'
    >>> rot13_character("m")
    'z'
    >>> rot13_character("n")
    'a'
    >>> rot13_character("z")
    'm'
    >>> rot13_character("!")
    '!'
    >>> rot13_character("*")
    '*'
    """
    ### BEGIN SOLUTION
    
    ### END SOLUTION

## 134. 使用 ROT13 轉換英文句子

定義函數 `rot13_sentence()` 將輸入的英文句子用 ROT13（迴轉 13 位）的規則轉換。

![](https://raw.githubusercontent.com/datainpoint/classroom-hahow-pythonfiftyplus/main/16-working-with-text/ROT13.png)

來源：<https://en.wikipedia.org/wiki/ROT13>

- 使用 `rot13_character()`
- 運用迴圈。
- 將預期輸出寫在 `return` 之後。

In [None]:
def rot13_sentence(x: str) -> str:
    """
    >>> rot13_sentence("Gur Mra bs Clguba, ol Gvz Crgref")
    'The Zen of Python, by Tim Peters'
    >>> rot13_sentence("The Zen of Python, by Tim Peters")
    'Gur Mra bs Clguba, ol Gvz Crgref'
    >>> rot13_sentence("Abj vf orggre guna arire.")
    'Now is better than never.'
    >>> rot13_sentence("Now is better than never.")
    'Abj vf orggre guna arire.'
    >>> rot13_sentence("Nygubhtu arire vf bsgra orggre guna *evtug* abj.")
    'Although never is often better than *right* now.'
    >>> rot13_sentence("Although never is often better than *right* now.")
    'Nygubhtu arire vf bsgra orggre guna *evtug* abj.'
    """
    ### BEGIN SOLUTION
    
    ### END SOLUTION

## 135. 使用 ROT13 轉換 Python 禪學（Zen of Python）

定義函數 `rot13_zen_of_python()` 將位於 `/content 路徑的 `thispy.txt` 用 ROT13（迴轉 13 位）的規則轉換。

![](https://raw.githubusercontent.com/datainpoint/classroom-hahow-pythonfiftyplus/main/16-working-with-text/ROT13.png)

來源：<https://en.wikipedia.org/wiki/ROT13>

- 使用 `rot13_sentence()`
- 運用 `with` 敘述。
- 使用 `open()` 函數。
- 運用絕對路徑。
- 使用 `TextIOWrapper.readlines()`
- 運用迴圈。
- 使用 `str.strip()`
- 將預期輸出寫在 `return` 之後。

In [None]:
def rot13_zen_of_python() -> list:
    """
    >>> zen_of_python = rot13_zen_of_python()
    >>> type(zen_of_python)
    list
    >>> "Now is better than never." in zen_of_python
    True
    >>> "Although never is often better than *right* now." in zen_of_python
    True
    """
    ### BEGIN SOLUTION
    
    ### END SOLUTION

## 136. 取代文字中的星號

定義函數 `replace_strs_asterisks()` 將文字中的 `*` 取代。

- 使用 `str.replace()`
- 將預期輸出寫在 `return` 之後。

In [None]:
def replace_strs_asterisks(x: str) -> str:
    """
    >>> replace_strs_asterisks("Taiwan*")
    'Taiwan'
    >>> replace_strs_asterisks("Crimea Republic*")
    'Crimea Republic'
    >>> replace_strs_asterisks("Crimea Republic*, Ukraine")
    'Crimea Republic, Ukraine'
    >>> replace_strs_asterisks("Sevastopol*")
    'Sevastopol'
    >>> replace_strs_asterisks("Sevastopol*, Ukraine")
    'Sevastopol, Ukraine'
    """
    ### BEGIN SOLUTION
    
    ### END SOLUTION

## 137. 載入 `UID_ISO_FIPS_LookUp_Table.csv`

定義函數 `import_lookup_table()` 將位於 `/content` 路徑的 `UID_ISO_FIPS_LookUp_Table.csv` 載入為一個 `DataFrame`

來源：<https://github.com/CSSEGISandData/COVID-19/tree/master/csse_covid_19_data>

- 使用絕對路徑。
- 使用 `pd.read_csv()` 函數。
- 將預期輸出寫在 `return` 之後。

In [None]:
def import_lookup_table() -> pd.core.frame.DataFrame:
    """
    >>> lookup_table = import_lookup_table()
    >>> type(lookup_table)
    pandas.core.frame.DataFrame
    >>> lookup_table.shape
    (4215, 12)
    """
    ### BEGIN SOLUTION
    
    ### END SOLUTION

## 138. 篩選台灣、克里米亞共和國與塞瓦斯托波爾

定義函數 `import_lookup_table()` 將位於 `/content` 路徑的 `UID_ISO_FIPS_LookUp_Table.csv` 中的台灣、克里米亞共和國與塞瓦斯托波爾篩選出來。

來源：<https://github.com/CSSEGISandData/COVID-19/tree/master/csse_covid_19_data>

- 使用 `import_lookup_table()` 函數。
- 運用篩選觀測值的技巧。
- 將預期輸出寫在 `return` 之後。

In [None]:
def filter_taiwan_crimea_sevastopol() -> pd.core.frame.DataFrame:
    """
    >>> taiwan_crimea_sevastopol = filter_taiwan_crimea_sevastopol()
    >>> type(taiwan_crimea_sevastopol)
    pandas.core.frame.DataFrame
    >>> taiwan_crimea_sevastopol.shape
    (3, 12)
    >>> taiwan_crimea_sevastopol
           UID iso2 iso3  code3  FIPS Admin2    Province_State Country_Region  \
    680    158   TW  TWN  158.0   NaN    NaN               NaN        Taiwan*   
    695  80404   UA  UKR  804.0   NaN    NaN  Crimea Republic*        Ukraine   
    711  80420   UA  UKR  804.0   NaN    NaN       Sevastopol*        Ukraine   

             Lat     Long_               Combined_Key  Population  
    680  23.7000  121.0000                    Taiwan*  23816775.0  
    695  45.2835   34.2008  Crimea Republic*, Ukraine   1913731.0  
    711  44.6054   33.5220       Sevastopol*, Ukraine    443211.0 
    """
    ### BEGIN SOLUTION
    
    ### END SOLUTION

## 139. 取代 `Series` 中的星號

定義函數 `replace_series_asterisks()` 將 `Series` 中的 `*` 取代。

- 使用 `Series.str.replace()`。
- 如果指定參數 `regex=True` 要注意 `*` 是正規表達式的特殊字元。
- 將預期輸出寫在 `return` 之後。

In [None]:
def replace_series_asterisks(x: pd.core.series.Series) -> pd.core.series.Series:
    """
    >>> province_states = pd.Series([np.nan, "Crimea Republic*", "Sevastopol*"])
    >>> replace_series_asterisks(province_states)
    0                NaN
    1    Crimea Republic
    2         Sevastopol
    dtype: object
    >>> country_regions = pd.Series(["Taiwan*", "Ukraine", "Ukraine"])
    >>> replace_series_asterisks(country_regions)
    0     Taiwan
    1    Ukraine
    2    Ukraine
    dtype: object
    >>> combined_keys = pd.Series(["Taiwan*", "Crimea Republic*, Ukraine", "Sevastopol*, Ukraine"])
    >>> replace_series_asterisks(combined_keys)
    0                      Taiwan
    1    Crimea Republic, Ukraine
    2         Sevastopol, Ukraine
    dtype: object
    """
    ### BEGIN SOLUTION
    
    ### END SOLUTION

## 140. 輸出沒有星號的 `lookup_table.csv`

定義函數 `export_lookup_table()` 將位於 `/content` 路徑的 `UID_ISO_FIPS_LookUp_Table.csv` 載入後，取代 `Province_State`、`Country_Region` 與 `Combined_Key` 欄位中的星號後，在工作目錄輸出 `lookup_table.csv`

來源：<https://github.com/CSSEGISandData/COVID-19/tree/master/csse_covid_19_data>

- 使用 `import_lookup_table()` 函數。
- 使用 `replace_series_asterisks()` 函數。
- 運用 `DataFrame.to_csv("lookup_table.csv", index=False)` 在工作目錄輸出 `DataFrame` 為 `lookup_table.csv`

In [None]:
def export_lookup_table() -> None:
    """
    >>> export_lookup_table()
    >>> lookup_table = pd.read_csv("lookup_table.csv")
    >>> condition = lookup_table["Combined_Key"].isin(["Taiwan", "Crimea Republic, Ukraine", "Sevastopol, Ukraine"])
    >>> lookup_table[condition]
           UID iso2 iso3  code3  FIPS Admin2   Province_State Country_Region  \
    680    158   TW  TWN  158.0   NaN    NaN              NaN         Taiwan   
    695  80404   UA  UKR  804.0   NaN    NaN  Crimea Republic        Ukraine   
    711  80420   UA  UKR  804.0   NaN    NaN       Sevastopol        Ukraine   

             Lat     Long_              Combined_Key  Population  
    680  23.7000  121.0000                    Taiwan  23816775.0  
    695  45.2835   34.2008  Crimea Republic, Ukraine   1913731.0  
    711  44.6054   33.5220       Sevastopol, Ukraine    443211.0
    """
    ### BEGIN SOLUTION
    
    ### END SOLUTION

## 練習題到此結束，以下的儲存格可以忽略

In [None]:
import unittest
import json

def run_suite(test_class, chapter_index):
    suite = unittest.TestLoader().loadTestsFromTestCase(test_class)
    runner = unittest.TextTestRunner(verbosity=2)
    test_results = runner.run(suite)
    number_of_failures = len(test_results.failures)
    number_of_errors = len(test_results.errors)
    number_of_test_runs = test_results.testsRun
    number_of_successes = number_of_test_runs - (number_of_failures + number_of_errors)
    with open("exercise_index.json", "r") as f:
        exercise_index = json.load(f)
    chapter_name = exercise_index[chapter_index]["chapter_name"]
    number_of_total_questions = 0
    number_of_completed_questions = 0
    for i in range(len(exercise_index)):
        number_of_total_questions += exercise_index[i]["number_of_exercises"]
        if i < chapter_index:
            number_of_completed_questions += exercise_index[i]["number_of_exercises"]
    number_of_completed_questions += number_of_successes
    chapter_percentage = number_of_successes * 100 / number_of_test_runs
    overall_percentage = number_of_completed_questions * 100 / number_of_total_questions
    print("你在「{}」章節的練習題完成率為 ... {:.2f}% ({}/{})".format(chapter_name, chapter_percentage, number_of_successes, number_of_test_runs))
    print("整體課程練習題的累計完成率為 ... {:.2f}% ({}/{})".format(overall_percentage, number_of_completed_questions, number_of_total_questions))
    if chapter_percentage == 100 and chapter_index < 19:
        print("表現得很好，你已經完成「{}」所有習題，我們繼續往下個章節：「{}」前進！".format(exercise_index[chapter_index]["chapter_name"], exercise_index[chapter_index + 1]["chapter_name"]))
        if chapter_index == 4:
            print("太棒了，你已經完成「Python 的 50+ 練習」的第一部分：Python 程式設計的基礎觀念，接下來還有三個部分等你來挑戰！")
        elif chapter_index == 8:
            print("表現得非常好，你已經完成「Python 的 50+ 練習」的第二部分：Python 程式設計的進階觀念，接著讓我們邁向資料科學！")
        elif chapter_index == 12:
            print("太令人佩服，你已經完成「Python 的 50+ 練習」的第三部分：Python 資料科學的基礎，距離完課只剩下最後一哩路！")
    elif chapter_percentage == 100 and chapter_index == 19:
        print("恭喜完課，你已經完成「Python 的 50+ 練習」所有習題，能夠堅持到底完成所有的教學影片與練習題真是非常了不起！後面已經沒有練習題了，你現在是一位擅長寫程式處理資料的分析師！")
    elif chapter_percentage >= 50:
        print("你已經完成「{}」章節一半以上的練習，繼續加油！".format(chapter_name))

class TestWorkingWithText(unittest.TestCase):
    def test_131_import_movies_csv(self):
        self.assertEqual(convert_character_to_int("A"), 65)
        self.assertEqual(convert_character_to_int("M"), 77)
        self.assertEqual(convert_character_to_int("N"), 78)
        self.assertEqual(convert_character_to_int("Z"), 90)
        self.assertEqual(convert_character_to_int("a"), 97)
        self.assertEqual(convert_character_to_int("m"), 109)
        self.assertEqual(convert_character_to_int("n"), 110)
        self.assertEqual(convert_character_to_int("z"), 122)
    def test_132_convert_int_to_character(self):
        self.assertEqual(convert_int_to_character(65), 'A')
        self.assertEqual(convert_int_to_character(77), 'M')
        self.assertEqual(convert_int_to_character(78), 'N')
        self.assertEqual(convert_int_to_character(90), 'Z')
        self.assertEqual(convert_int_to_character(97), 'a')
        self.assertEqual(convert_int_to_character(109), 'm')
        self.assertEqual(convert_int_to_character(110), 'n')
        self.assertEqual(convert_int_to_character(122), 'z')
    def test_133_rot13_character(self):
        self.assertEqual(rot13_character("A"), 'N')
        self.assertEqual(rot13_character("M"), 'Z')
        self.assertEqual(rot13_character("N"), 'A')
        self.assertEqual(rot13_character("Z"), 'M')
        self.assertEqual(rot13_character("a"), 'n')
        self.assertEqual(rot13_character("m"), 'z')
        self.assertEqual(rot13_character("n"), 'a')
        self.assertEqual(rot13_character("z"), 'm')
        self.assertEqual(rot13_character("!"), '!')
        self.assertEqual(rot13_character("*"), '*')
    def test_134_rot13_sentence(self):
        self.assertEqual(rot13_sentence("Gur Mra bs Clguba, ol Gvz Crgref"), 'The Zen of Python, by Tim Peters')
        self.assertEqual(rot13_sentence("The Zen of Python, by Tim Peters"), 'Gur Mra bs Clguba, ol Gvz Crgref')
        self.assertEqual(rot13_sentence("Abj vf orggre guna arire."), 'Now is better than never.')
        self.assertEqual(rot13_sentence("Now is better than never."), 'Abj vf orggre guna arire.')
        self.assertEqual(rot13_sentence("Nygubhtu arire vf bsgra orggre guna *evtug* abj."), 'Although never is often better than *right* now.')
        self.assertEqual(rot13_sentence("Although never is often better than *right* now."), 'Nygubhtu arire vf bsgra orggre guna *evtug* abj.')
    def test_135_rot13_zen_of_python(self):
        zen_of_python = rot13_zen_of_python()
        self.assertIsInstance(zen_of_python, list)
        self.assertTrue("Now is better than never." in zen_of_python)
        self.assertTrue("Although never is often better than *right* now." in zen_of_python)
    def test_136_replace_strs_asterisks(self):
        self.assertEqual(replace_strs_asterisks("Taiwan*"), 'Taiwan')
        self.assertEqual(replace_strs_asterisks("Crimea Republic*"), 'Crimea Republic')
        self.assertEqual(replace_strs_asterisks("Crimea Republic*, Ukraine"), 'Crimea Republic, Ukraine')
        self.assertEqual(replace_strs_asterisks("Sevastopol*"), 'Sevastopol')
        self.assertEqual(replace_strs_asterisks("Sevastopol*, Ukraine*"), 'Sevastopol, Ukraine')
    def test_137_import_lookup_table(self):
        lookup_table = import_lookup_table()
        self.assertIsInstance(lookup_table, pd.core.frame.DataFrame)
        self.assertEqual(lookup_table.shape, (4215, 12))
    def test_138_filter_taiwan_crimea_sevastopol(self):
        taiwan_crimea_sevastopol = filter_taiwan_crimea_sevastopol()
        self.assertIsInstance(taiwan_crimea_sevastopol, pd.core.frame.DataFrame)
        self.assertEqual(taiwan_crimea_sevastopol.shape, (3, 12))
    def test_139_replace_series_asterisks(self):
        province_states = pd.Series([np.nan, "Crimea Republic*", "Sevastopol*"])
        pd.testing.assert_series_equal(replace_series_asterisks(province_states), pd.Series([np.nan, "Crimea Republic", "Sevastopol"]))
        country_regions = pd.Series(["Taiwan*", "Ukraine", "Ukraine"])
        pd.testing.assert_series_equal(replace_series_asterisks(country_regions), pd.Series(["Taiwan", "Ukraine", "Ukraine"]))
        combined_keys = pd.Series(["Taiwan*", "Crimea Republic*, Ukraine", "Sevastopol*, Ukraine"])
        pd.testing.assert_series_equal(replace_series_asterisks(combined_keys), pd.Series(["Taiwan", "Crimea Republic, Ukraine", "Sevastopol, Ukraine"]))
    def test_140_export_lookup_table(self):
        export_lookup_table()
        lookup_table = pd.read_csv("lookup_table.csv")
        self.assertIsInstance(lookup_table, pd.core.frame.DataFrame)
        self.assertEqual(lookup_table.shape, (4215, 12))
        condition = lookup_table["Combined_Key"].isin(["Taiwan", "Crimea Republic, Ukraine", "Sevastopol, Ukraine"])
        self.assertEqual(lookup_table[condition].shape, (3, 12))

run_suite(TestWorkingWithText, 15)