# Python 的 50+ 練習：資料科學學習手冊

> 文字資料操作

[數據交點](https://www.datainpoint.com) | 郭耀仁 <yaojenkuo@datainpoint.com>

## 練習題指引

- 練習題閒置超過 10 分鐘會自動斷線，只要重新點選練習題連結即可重新啟動。
- 第一個程式碼儲存格會將可能用得到的模組載入。
- 如果練習題需要載入檔案，檔案存放絕對路徑為 `/home/jovyan/data`
- 練習題已經給定函數、類別、預期輸入或參數名稱，我們只需要寫作程式區塊。同時也給定函數的類別提示，說明預期輸入以及預期輸出的類別。
- 說明（Docstring）會描述測試如何進行，閱讀說明能夠暸解預期輸入以及預期輸出之間的關係，幫助我們更快解題。
- 請在 `### BEGIN SOLUTION` 與 `### END SOLUTION` 這兩個註解之間寫作函數或者類別的程式區塊。
- 將預期輸出放置在 `return` 保留字之後，若只是用 `print()` 函數將預期輸出印出無法通過測試。
- 語法錯誤（`SyntaxError`）或縮排錯誤（`IndentationError`）等將會導致測試失效，測試之前應該先在筆記本使用函數觀察是否與說明（Docstring）描述的功能相符。
- 如果卡關，可以先看練習題詳解或者複習課程單元影片之後再繼續寫作。
- 執行測試的步驟：
    1. 點選上方選單的 File -> Save Notebook 儲存 exercises.ipynb。
    2. 點選上方選單的 File -> New -> Terminal 開啟終端機。
    3. 在終端機輸入 `python 16-working-with-text/test_runner.py` 後按下 Enter 執行測試。

In [1]:
import re
import numpy as np
import pandas as pd

## 131. 轉換英文字母為 ASCII 整數

定義函數 `convert_character_to_int()` 將輸入的英文子母轉換為 ASCII 整數。

來源：<https://en.wikipedia.org/wiki/ASCII>

- 使用內建函數 `ord()` <https://docs.python.org/3/library/functions.html#ord>
- 將預期輸出寫在 `return` 之後。

In [3]:
def convert_character_to_int(x: str) -> int:
    """
    >>> convert_character_to_int("A")
    65
    >>> convert_character_to_int("M")
    77
    >>> convert_character_to_int("N")
    78
    >>> convert_character_to_int("Z")
    90
    >>> convert_character_to_int("a")
    97
    >>> convert_character_to_int("m")
    109
    >>> convert_character_to_int("n")
    110
    >>> convert_character_to_int("z")
    122
    """
    ### BEGIN SOLUTION
    return ord(x)
    ### END SOLUTION

In [11]:
convert_character_to_int("A")
convert_character_to_int("M")
convert_character_to_int("N")
convert_character_to_int("Z")
convert_character_to_int("a")
convert_character_to_int("m")
convert_character_to_int("n")
convert_character_to_int("z")

122

## 132. 轉換整數為英文字母

定義函數 `convert_int_to_character()` 將輸入介於 65 到 90、97 到 122 之間的整數轉換為英文字母。

來源：<https://en.wikipedia.org/wiki/ASCII>

- 使用內建函數 `chr()` <https://docs.python.org/3/library/functions.html#chr>
- 將預期輸出寫在 `return` 之後。

In [13]:
def convert_int_to_character(x: int) -> str:
    """
    >>> convert_int_to_character(65)
    'A'
    >>> convert_int_to_character(77)
    'M'
    >>> convert_int_to_character(78)
    'N'
    >>> convert_int_to_character(90)
    'Z'
    >>> convert_int_to_character(97)
    'a'
    >>> convert_int_to_character(109)
    'm'
    >>> convert_int_to_character(110)
    'n'
    >>> convert_int_to_character(122)
    'z'
    """
    ### BEGIN SOLUTION
    return chr(x)
    ### END SOLUTION

In [22]:
convert_int_to_character(65)
convert_int_to_character(77)
convert_int_to_character(78)
convert_int_to_character(90)
convert_int_to_character(97)
convert_int_to_character(109)
convert_int_to_character(110)
convert_int_to_character(122)

'z'

## 133. 使用 ROT13 轉換英文字母

定義函數 `rot13_character()` 將輸入的英文字母用 ROT13（迴轉 13 位）的規則轉換。

![](ROT13.png)

來源：<https://en.wikipedia.org/wiki/ROT13>

- 使用內建函數 `ord()` <https://docs.python.org/3/library/functions.html#ord>
- 使用內建函數 `chr()` <https://docs.python.org/3/library/functions.html#chr>
- 運用條件敘述。
- 運用數值運算符。
- 將預期輸出寫在 `return` 之後。

In [29]:
def rot13_character(x: str) -> str:
    """
    >>> rot13_character("A")
    'N'
    >>> rot13_character("M")
    'Z'
    >>> rot13_character("N")
    'A'
    >>> rot13_character("Z")
    'M'
    >>> rot13_character("a")
    'n'
    >>> rot13_character("m")
    'z'
    >>> rot13_character("n")
    'a'
    >>> rot13_character("z")
    'm'
    >>> rot13_character("!")
    '!'
    >>> rot13_character("*")
    '*'
    """
    ### BEGIN SOLUTION
    ord_x = ord(x)
    condition_A_M = 65 <= ord_x <= 77
    condition_N_Z = 78 <= ord_x <= 90
    condition_a_m = 97 <= ord_x <= 109
    condition_n_z = 110 <= ord_x <= 122
    if condition_A_M or condition_a_m:
        ord_x += 13
        return chr(ord_x)
    elif condition_N_Z or condition_n_z:
        ord_x -= 13
        return chr(ord_x)
    else:
        return x
    ### END SOLUTION

In [39]:
rot13_character("A")
rot13_character("M")
rot13_character("N")
rot13_character("Z")
rot13_character("a")
rot13_character("m")
rot13_character("n")
rot13_character("z")
rot13_character("!")
rot13_character("*")

'*'

In [28]:
# help(ord)
# help(chr)

## 134. 使用 ROT13 轉換英文句子

定義函數 `rot13_sentence()` 將輸入的英文句子用 ROT13（迴轉 13 位）的規則轉換。

![](ROT13.png)

來源：<https://en.wikipedia.org/wiki/ROT13>

- 使用 `rot13_character()`
- 運用迴圈。
- 將預期輸出寫在 `return` 之後。

In [43]:
def rot13_sentence(x: str) -> str:
    """
    >>> rot13_sentence("Gur Mra bs Clguba, ol Gvz Crgref")
    'The Zen of Python, by Tim Peters'
    >>> rot13_sentence("The Zen of Python, by Tim Peters")
    'Gur Mra bs Clguba, ol Gvz Crgref'
    >>> rot13_sentence("Abj vf orggre guna arire.")
    'Now is better than never.'
    >>> rot13_sentence("Now is better than never.")
    'Abj vf orggre guna arire.'
    >>> rot13_sentence("Nygubhtu arire vf bsgra orggre guna *evtug* abj.")
    'Although never is often better than *right* now.'
    >>> rot13_sentence("Although never is often better than *right* now.")
    'Nygubhtu arire vf bsgra orggre guna *evtug* abj.'
    """
    ### BEGIN SOLUTION
    rotated_list = []
    for character in x:
        rotated_character = rot13_character(character)
        rotated_list.append(rotated_character)
    return "".join(rotated_list)
    ### END SOLUTION

In [49]:
rot13_sentence("Gur Mra bs Clguba, ol Gvz Crgref")
rot13_sentence("The Zen of Python, by Tim Peters")
rot13_sentence("Abj vf orggre guna arire.")
rot13_sentence("Now is better than never.")
rot13_sentence("Nygubhtu arire vf bsgra orggre guna *evtug* abj.")
rot13_sentence("Although never is often better than *right* now.")

'Nygubhtu arire vf bsgra orggre guna *evtug* abj.'

## 135. 使用 ROT13 轉換 Python 禪學（Zen of Python）

定義函數 `rot13_zen_of_python()` 將位於 `/home/jovyan/data` 路徑的 `thispy.txt` 用 ROT13（迴轉 13 位）的規則轉換。

![](ROT13.png)

來源：<https://en.wikipedia.org/wiki/ROT13>

- 使用 `rot13_sentence()`
- 運用 `with` 敘述。
- 使用 `open()` 函數。
- 運用絕對路徑。
- 使用 `TextIOWrapper.readlines()`
- 運用迴圈。
- 使用 `str.strip()`
- 將預期輸出寫在 `return` 之後。

In [65]:
def rot13_zen_of_python() -> list:
    """
    >>> zen_of_python = rot13_zen_of_python()
    >>> type(zen_of_python)
    list
    >>> "Now is better than never." in zen_of_python
    True
    >>> "Although never is often better than *right* now." in zen_of_python
    True
    """
    ### BEGIN SOLUTION
    file_path = "C:\\Users\\Yan-Ju-Wang\\Python_Hahow\\thispy.txt"
    
    with open(file_path) as file:
        thispy_list = file.readlines()
        
    output_list = []
    for element in thispy_list:
        output_list.append(rot13_sentence(element).strip())
    
    return output_list
    ### END SOLUTION

In [68]:
zen_of_python = rot13_zen_of_python()
type(zen_of_python)
"Now is better than never." in zen_of_python
"Although never is often better than *right* now." in zen_of_python

True

## 136. 取代文字中的星號

定義函數 `replace_strs_asterisks()` 將文字中的 `*` 取代。

- 使用 `str.replace()`
- 將預期輸出寫在 `return` 之後。

In [75]:
def replace_strs_asterisks(x: str) -> str:
    """
    >>> replace_strs_asterisks("Taiwan*")
    'Taiwan'
    >>> replace_strs_asterisks("Crimea Republic*")
    'Crimea Republic'
    >>> replace_strs_asterisks("Crimea Republic*, Ukraine")
    'Crimea Republic, Ukraine'
    >>> replace_strs_asterisks("Sevastopol*")
    'Sevastopol'
    >>> replace_strs_asterisks("Sevastopol*, Ukraine")
    'Sevastopol, Ukraine'
    """
    ### BEGIN SOLUTION
    return x.replace("*", "")
    ### END SOLUTION

In [80]:
replace_strs_asterisks("Taiwan*")
replace_strs_asterisks("Crimea Republic*")
replace_strs_asterisks("Crimea Republic*, Ukraine")
replace_strs_asterisks("Sevastopol*")
replace_strs_asterisks("Sevastopol*, Ukraine")

'Sevastopol, Ukraine'

In [74]:
# help(str.replace)

## 137. 載入 `UID_ISO_FIPS_LookUp_Table.csv`

定義函數 `import_lookup_table()` 將位於 `/home/jovyan/data/covid19` 路徑的 `UID_ISO_FIPS_LookUp_Table.csv` 載入為一個 `DataFrame`

來源：<https://github.com/CSSEGISandData/COVID-19/tree/master/csse_covid_19_data>

- 使用絕對路徑。
- 使用 `pd.read_csv()` 函數。
- 將預期輸出寫在 `return` 之後。

In [82]:
def import_lookup_table() -> pd.core.frame.DataFrame:
    """
    >>> lookup_table = import_lookup_table()
    >>> type(lookup_table)
    pandas.core.frame.DataFrame
    >>> lookup_table.shape
    (4215, 12)
    """
    ### BEGIN SOLUTION
    file_path = "C:\\Users\\Yan-Ju-Wang\\Python_Hahow\\covid19\\UID_ISO_FIPS_LookUp_Table.csv"
    with open(file_path) as file:
        return pd.read_csv(file)
    ### END SOLUTION

In [84]:
lookup_table = import_lookup_table()
type(lookup_table)
lookup_table.shape

(4215, 12)

## 138. 篩選台灣、克里米亞共和國與塞瓦斯托波爾

定義函數 `import_lookup_table()` 將位於 `/home/jovyan/data/covid19` 路徑的 `UID_ISO_FIPS_LookUp_Table.csv` 中的台灣、克里米亞共和國與塞瓦斯托波爾篩選出來。

來源：<https://github.com/CSSEGISandData/COVID-19/tree/master/csse_covid_19_data>

- 使用 `import_lookup_table()` 函數。
- 運用篩選觀測值的技巧。
- 將預期輸出寫在 `return` 之後。

In [88]:
def filter_taiwan_crimea_sevastopol() -> pd.core.frame.DataFrame:
    """
    >>> taiwan_crimea_sevastopol = filter_taiwan_crimea_sevastopol()
    >>> type(taiwan_crimea_sevastopol)
    pandas.core.frame.DataFrame
    >>> taiwan_crimea_sevastopol.shape
    (3, 12)
    >>> taiwan_crimea_sevastopol
           UID iso2 iso3  code3  FIPS Admin2    Province_State Country_Region  \
    680    158   TW  TWN  158.0   NaN    NaN               NaN        Taiwan*   
    695  80404   UA  UKR  804.0   NaN    NaN  Crimea Republic*        Ukraine   
    711  80420   UA  UKR  804.0   NaN    NaN       Sevastopol*        Ukraine   

             Lat     Long_               Combined_Key  Population  
    680  23.7000  121.0000                    Taiwan*  23816775.0  
    695  45.2835   34.2008  Crimea Republic*, Ukraine   1913731.0  
    711  44.6054   33.5220       Sevastopol*, Ukraine    443211.0 
    """
    ### BEGIN SOLUTION
    lookup_table = import_lookup_table()
    condition_taiwan = lookup_table["Combined_Key"] == "Taiwan*"
    condition_Crimea = lookup_table["Combined_Key"] == "Crimea Republic*, Ukraine"
    condition_Sevastopol = lookup_table["Combined_Key"] == "Sevastopol*, Ukraine"
    output_dataframe = lookup_table[condition_taiwan | condition_Crimea | condition_Sevastopol]
    return output_dataframe
    ### END SOLUTION

In [91]:
taiwan_crimea_sevastopol = filter_taiwan_crimea_sevastopol()
type(taiwan_crimea_sevastopol)
taiwan_crimea_sevastopol.shape
taiwan_crimea_sevastopol

Unnamed: 0,UID,iso2,iso3,code3,FIPS,Admin2,Province_State,Country_Region,Lat,Long_,Combined_Key,Population
680,158,TW,TWN,158.0,,,,Taiwan*,23.7,121.0,Taiwan*,23816775.0
695,80404,UA,UKR,804.0,,,Crimea Republic*,Ukraine,45.2835,34.2008,"Crimea Republic*, Ukraine",1913731.0
711,80420,UA,UKR,804.0,,,Sevastopol*,Ukraine,44.6054,33.522,"Sevastopol*, Ukraine",443211.0


## 139. 取代 `Series` 中的星號

定義函數 `replace_series_asterisks()` 將 `Series` 中的 `*` 取代。

- 使用 `Series.str.replace()`。
- 如果指定參數 `regex=True` 要注意 `*` 是正規表達式的特殊字元。
- 將預期輸出寫在 `return` 之後。

In [102]:
def replace_series_asterisks(x: pd.core.series.Series) -> pd.core.series.Series:
    """
    >>> province_states = pd.Series([np.nan, "Crimea Republic*", "Sevastopol*"])
    >>> replace_series_asterisks(province_states)
    0                NaN
    1    Crimea Republic
    2         Sevastopol
    dtype: object
    >>> country_regions = pd.Series(["Taiwan*", "Ukraine", "Ukraine"])
    >>> replace_series_asterisks(country_regions)
    0     Taiwan
    1    Ukraine
    2    Ukraine
    dtype: object
    >>> combined_keys = pd.Series(["Taiwan*", "Crimea Republic*, Ukraine", "Sevastopol*, Ukraine"])
    >>> replace_series_asterisks(combined_keys)
    0                      Taiwan
    1    Crimea Republic, Ukraine
    2         Sevastopol, Ukraine
    dtype: object
    """
    ### BEGIN SOLUTION
    return x.str.replace("*", "", regex=True)
    ### END SOLUTION

In [105]:
province_states = pd.Series([np.nan, "Crimea Republic*", "Sevastopol*"])
replace_series_asterisks(province_states)
country_regions = pd.Series(["Taiwan*", "Ukraine", "Ukraine"])
replace_series_asterisks(country_regions)
combined_keys = pd.Series(["Taiwan*", "Crimea Republic*, Ukraine", "Sevastopol*, Ukraine"])
replace_series_asterisks(combined_keys)

0                      Taiwan
1    Crimea Republic, Ukraine
2         Sevastopol, Ukraine
dtype: object

## 140. 輸出沒有星號的 `lookup_table.csv`

定義函數 `export_lookup_table()` 將位於 `/home/jovyan/data/covid19` 路徑的 `UID_ISO_FIPS_LookUp_Table.csv` 載入後，取代 `Province_State`、`Country_Region` 與 `Combined_Key` 欄位中的星號後，在工作目錄輸出 `lookup_table.csv`

來源：<https://github.com/CSSEGISandData/COVID-19/tree/master/csse_covid_19_data>

- 使用 `import_lookup_table()` 函數。
- 使用 `replace_series_asterisks()` 函數。
- 運用 `DataFrame.to_csv("lookup_table.csv", index=False)` 在工作目錄輸出 `DataFrame` 為 `lookup_table.csv`

In [113]:
def export_lookup_table() -> None:
    """
    >>> export_lookup_table()
    >>> lookup_table = pd.read_csv("lookup_table.csv")
    >>> condition = lookup_table["Combined_Key"].isin(["Taiwan", "Crimea Republic, Ukraine", "Sevastopol, Ukraine"])
    >>> lookup_table[condition]
           UID iso2 iso3  code3  FIPS Admin2   Province_State Country_Region  \
    680    158   TW  TWN  158.0   NaN    NaN              NaN         Taiwan   
    695  80404   UA  UKR  804.0   NaN    NaN  Crimea Republic        Ukraine   
    711  80420   UA  UKR  804.0   NaN    NaN       Sevastopol        Ukraine   

             Lat     Long_              Combined_Key  Population  
    680  23.7000  121.0000                    Taiwan  23816775.0  
    695  45.2835   34.2008  Crimea Republic, Ukraine   1913731.0  
    711  44.6054   33.5220       Sevastopol, Ukraine    443211.0
    """
    ### BEGIN SOLUTION
    lookup_table = import_lookup_table()
    replace_Province_State = replace_series_asterisks(lookup_table["Province_State"])
    replace_Country_Region = replace_series_asterisks(lookup_table["Country_Region"])
    replace_Combined_Key = replace_series_asterisks(lookup_table["Combined_Key"])
    export_dataframe = pd.DataFrame()
    export_dataframe["UID"] = lookup_table["UID"]
    export_dataframe["iso2"] = lookup_table["iso2"]
    export_dataframe["iso3"] = lookup_table["iso3"]
    export_dataframe["code3"] = lookup_table["code3"]
    export_dataframe["FIPS"] = lookup_table["FIPS"]
    export_dataframe["Admin2"] = lookup_table["Admin2"]
    export_dataframe["Province_State"] = replace_Province_State
    export_dataframe["Country_Region"] = replace_Country_Region
    export_dataframe["Lat"] = lookup_table["Lat"]
    export_dataframe["Long_"] = lookup_table["Long_"]
    export_dataframe["Combined_Key"] = replace_Combined_Key
    export_dataframe["Population"] = lookup_table["Population"]
    export_dataframe.to_csv("lookup_table.csv", index=False)
    ### END SOLUTION

In [114]:
export_lookup_table()
lookup_table = pd.read_csv("lookup_table.csv")
condition = lookup_table["Combined_Key"].isin(["Taiwan", "Crimea Republic, Ukraine", "Sevastopol, Ukraine"])
lookup_table[condition]

Unnamed: 0,UID,iso2,iso3,code3,FIPS,Admin2,Province_State,Country_Region,Lat,Long_,Combined_Key,Population
680,158,TW,TWN,158.0,,,,Taiwan,23.7,121.0,Taiwan,23816775.0
695,80404,UA,UKR,804.0,,,Crimea Republic,Ukraine,45.2835,34.2008,"Crimea Republic, Ukraine",1913731.0
711,80420,UA,UKR,804.0,,,Sevastopol,Ukraine,44.6054,33.522,"Sevastopol, Ukraine",443211.0
