<a href="https://colab.research.google.com/github/benrio0923/colab/blob/main/python_basic_Mandarin.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Python basics I. Outline**

1. Python basics
2. Lists, tuples, and sets
3. String
4. Dictionaries

# 1. Python basics. Outline / Python 基礎教學大綱

1. **Indenting and block structuring / 縮排與區塊結構**
2. **Differentiating comments / 區分註解**
3. **Variables and assignments / 變數與賦值**
4. **Delete statement / 刪除語句**
5. **Expressions / 表達式**
6. **Strings / 字串**
7. **Numbers and None / 數字與 None**
8. **Getting user inputs / 獲取用戶輸入**
9. **Pythonic style / Python 的編程風格**

## Indentation and block structuring / 縮排與區塊結構

* **縮排：每層使用四個空格，不使用 Tab 鍵**

In [None]:
# print n in while loop
n = 3
i = 1
while n > 0:
    i = i * n
    n = n - 1
    print(n)

In [None]:
# print n, after done with while loop
n = 3
i = 1
while n > 0:
    i = i * n
    n = n - 1
print(n)

* 以下是如何通過正確的縮排計算基因序列中的 GC 含量（G 和 C 核苷酸的百分比）：

In [None]:
sequence = "ATGCTAGCTAGGCTA"

gc_count = 0
for nucleotide in sequence:
    if nucleotide == "G" or nucleotide == "C":
        gc_count += 1

gc_content = (gc_count / len(sequence)) * 100
print(f"GC Content: {gc_content}%")


## Differentiating comments /區分註解

In [None]:
x = 3 # Assign 3 to x
x = 1 # Now x is 1
x = "# This is not a comment"
print(x)

## Variables and assignments / 變數與賦值

* **在 Python 中一切皆為物件。**
* **Variables in python: buckets or labels?**    
  概念是將 a、b 和 c 指派為列表 ```[1, 2, 3]```。因此，當 b 賦予不同值時，a 和 c 也會發生變化。

In [None]:
a = [1, 2, 3]
b = a
c = b
print(a, b, c)
b[1] = 5
print(a, b, c)

In [None]:
a = 1
b = a
c = b
print(a, b, c)
b = 5
print(a, b, c)

In [None]:
# This is a gene sequence
sequence = "ATGCTAGCTAGGCTA"

# Calculate the total length of the gene sequence
length = len(sequence)  # Length of the sequence

# Output the length of the gene sequence
print(f"Length of the sequence: {length}")

# Count of 'A' in the sequence
a_count = sequence.count('A')
print(f"Number of 'A' nucleotides: {a_count}")

## Delete statement / 刪除語句

* 當程式中出現錯誤時，會觸發traceback（或異常）。

In [None]:
sequence = "ATGCTAGCTAGGCTA"
print(sequence)

del sequence  # Deleting the variable after use
print(sequence)  # This will cause an error as 'sequence' is deleted

## Arithmetic Expressions / 算術表達式

* **在 Python 中，執行除法運算會產生浮點數結果。**
* **若要獲得去除小數部分的整數結果，可以使用 "//" 運算符來進行傳統的整數除法。**


In [None]:
x = 5
y = 7
z = (x + y) / 2
print(z)
a = y / 2
print(a)
b = y // 2
print(b)
c = -y // 2
print(c)

In [None]:
x = 2 + 4 * 5 - 6 / 3
print(x)
x = 2 + (4 * 5) - (6 / 3)
print(x)

## Strings / 字串

* **可以用單引號（' '）、雙引號（" "）、三個單引號（''' '''）或三個雙引號（""" """）來定界字串。**
* **反斜線 (\\)** 可以用來**轉義具有特殊意義的字符**。


In [None]:
x = "\tThis line starts with a \"tab\"."
print(x)
x = "Just print a single backslash(\\)."
print(x)

* **單引號與雙引號字串的區別**

In [None]:
x = "Hello, World"
print(x)
x = 'Hello, World'
print(x)
x = 'Can\'t print single quotation \' by without a backslash'
print(x)
x = "Can\'t print double quotation \" by without a backslash"
print(x)
x = 'Or just leave a double quotation " alone'
print(x)

* **如果字串跨越兩行，則會發生程式錯誤。**

In [None]:
x = "Program error occurs if
put a newline without using triple double quotation"

* **三引號字串允許在不使用反斜線的情況下包含單引號和雙引號。**
* **Python 編程風格**：每行長度不應超過 80 個字符。

In [None]:
x = """Triple-quoted strings allow to include single '
and double quotes "
without backslashes."""
print(x)

## Numbers and None / 數字與 None

* **在 Python 中，有四種類型的數字：整數、浮點數、複數和布林值。**
* **None 用來表示空值。**

In [None]:
None == False

In [None]:
None == 0

## Input() function to receive user input. / 使用 Input() 函數接收用戶輸入

In [None]:
sequence = input("Please enter a gene sequence: ")
# for example: ATGCTAGCTAGGCTA
gc_content = (sequence.count('G') + sequence.count('C')) / len(sequence) * 100
print(f"GC content: {gc_content}%")

## Pythonic style / Python 的編程風格

* **Python 增強提案 (PEP) 8：Python 的首選編程風格約定**
* **www.python.org/dev/peps/pep-0008/**

## 以下哪個變數和函數名稱符合 **Pythonic** 風格？
    * LongVar
    * long_var
    * VERYVERYLONGVARNAME
    * veryverylongvarname
    * very_very_long_var_name

## 簡要總結

* **縮排**：每層四個空格，不使用 Tab 鍵
* **在 Python 中一切皆為物件。**
* **Python 中有四種類型的數字，包括整數、浮點數、複數和布林值。**
* **Pythonic** 風格來自 Python 增強提案 (PEP) 8

# 2. Lists, tuples, and sets / 列表、元組和集合

* **在 Python 中，有兩大主要的序列類型**
    * **lists列表**：更靈活且功能強大
    * **Tuples元組**：與列表類似，但**不能**修改。
* **Sets集合**：用來表示物件在某個集合中的成員關係

## Lists are like arrays / 列表類似於陣列

* **使用方括號將逗號分隔的元素列表括起來**
* **不需要在使用前指定列表的大小（或類別相同）。**

In [None]:
x = [1, 2, 3]
x = [2, "two", [1, 2, 3]]

### List length / 列表長度

In [None]:
len([0])

In [None]:
len([])

In [None]:
len([[2,1,[1,8],8],0])

### List index / 列表索引

In [None]:
x = ["Zero", "1st", "2nd", "3rd", "4th"]
print(x[0]) # the index started from 0
print(x[2])
print(x[-1])

### List slicing / 列表切片

In [None]:
x = ["Zero", "1st", "2nd", "3rd", "4th"]
print(x[1:-1])
print(x[0:4])
print(x[-3:-1])
print(x[-1:-3])
print(x[:4])
print(x[1:])

In [None]:
x = ["Zero", "1st", "2nd", "3rd", "4th"]
start = 1
length = 4
print(x[start:start+length])

### Methods in Lists / 列表方法

* insert()、append()、extend()、remove()

In [None]:
x = [2, 1, 1]
x.insert(3, 8)
print(x)
x.insert(3, 8)
print(x)
y = 0
x.append(y)
print(x)
z = [0]
x.extend(z)
print(x)
x.remove(0) # remove the first one
print(x)

* sort()

In [None]:
x = [2, 1, 1, 8, 8, 0, 0]
x.sort()
print(x)
x.sort(reverse=True)
print(x)

In [None]:
x = ["hello", "world", "!"]
x.sort()
print(x)

In [None]:
x = [1, 2, 'hello', 3]
x.sort()
print(x)

### Customized function for sorting / 自訂排序函數

* 函數命名使用**全小寫和下劃線**以提高可讀性（PEP8）

In [None]:
def compare_num_of_chars(string1):
    return len(string1)

In [None]:
word_list = ["National", "Tsing", "Hua", "University"]
word_list.sort()
print(word_list)

In [None]:
word_list = ["National", "Tsing", "Hua", "University"]
word_list.sort(key=compare_num_of_chars)
print(word_list)

* 內建函數 `sorted()` **返回一個排序過的列表，而不改變原始列表。**

In [None]:
x = ["National", "Tsing", "Hua", "University"]
y = sorted(x)
print(x)
print(y)

### More list operations / 更多列表操作

* 使用 `in` 運算符檢查列表成員資格
* [https://docs.python.org/3/tutorial/datastructures.html#more-on-lists](https://docs.python.org/3/tutorial/datastructures.html#more-on-lists)

In [None]:
2 in [2, 1, 1, 8, 8, 0, 0]

In [None]:
2 not in [2, 1, 1, 8, 8, 0, 0]

* 使用 `min()` 和 `max()` 分別返回最小值和最大值

In [None]:
min([2, 1, 1, 8, 8, 0, 0])

In [None]:
max([2, "Hello", [1, 1]])

* 使用 `+` 運算符進行列表連接

In [None]:
x = [2, 1, 1] + [8, 8, 0, 0]
print(x)

* 使用 `*` 運算符初始化列表

In [None]:
x = [None] * 2
print(x)

In [None]:
x = [2, 1, 1] * 2
print(x)

* 使用 `index` 進行列表搜尋

In [None]:
x = [2, 1, 1, 8, 8, 0, 0]
x.index(8)

In [None]:
x.index(5)

* 使用 `count` 進行列表匹配

In [None]:
x = [2, 1, 1, 8, 8, 0, 0]
x.count(2)

In [None]:
x.count(5)

* 使用列表處理基因序列的範例：

In [None]:
# List of gene sequences from different variants
sequences = ["ATGCTAGC", "CGTACGTC", "GCTAGCTA"]

# Accessing elements in the list (first sequence)
print(f"First sequence: {sequences[0]}")

# Adding a new sequence to the list
sequences.append("TTGACCTA")
print(f"Updated sequences: {sequences}")

# Lists can contain sequences of different lengths and types
sequences = ["ATGCTAGC", 12345, ["CGTACGTC", "GCTAGCTA"]]
print(f"Complex list: {sequences}")

## Tuple / 元組

* **與列表類似**，但**不能被修改**。
    * list列表是用 `[ ]` 括起來的序列。
    * tuple元組是用 `( )` 括起來的序列。
* **為什麼要有兩者呢?**。
    * **元組不能像列表那樣高效填充**，例如: **keys for dictionaries**。

* 使用元組tuple的方式與使用列表list非常相似。

In [None]:
x = ('zero', '1st', '2nd', '3rd', '4th')
print(x[2])
print(x[1:])
print(len(x))
print(max(x))
print(min(x))

In [None]:
'1st' in x

In [None]:
'1st' not in x

* 元組tuple和列表list的主要區別在於**元組tuple是不可變的**。

In [None]:
x[2] = 'two'

In [None]:
# Create tuples from existing ones by using the + and * operators
print(x + x)
print(2 * x)

* 打包和解包元組，從賦值運算符的右側接收對應的值

In [None]:
(one, two, three, four, five) = (1, 2, 3, 4, 5)
print(one)
print(four)

* 在 Python 中，**一行代碼**可以取代多行代碼，例如，**var1, var2, var3, var4, var5 = value1, value2, value3, value4, value5**
* 概念是**打包 (1, 2, 3, 4, 5)，然後解包並賦值。**

In [None]:
# not only tuple, but also sequence type (e.g., list ans string)
var1, var2, var3, var4, var5 = [1, 2, 3, 4, 5] # list
print(var5)
var1, var2, var3, var4, var5 = 'fifth' # string
print(var5)

In [None]:
var1, var2, var3 =  1, 2, 3, 4 # too many values to unpack

* **擴展解包功能**，允許用 `*` 標記的元素吸收與其他元素不匹配的任意數量的元素

In [None]:
a, b, *c  = (1, 2, 3, 4)
print(a, b, c)
a, *b, c  = (1, 2, 3, 4)
print(a, b, c)
a, b, *_  = (1, 2, 3, 4, 5, 6) # star with underscore will collect the rest
print(a, b, *_)

### Converting between lists and tuples / 轉換列表和元組之間

In [None]:
x = (1, 2, 3)
print(list(x))
print(tuple([1, 2, 3]))

In [None]:
print(list("NTHU"))

### 練習

* 嘗試解釋以下操作對於元組tuple `x = (1, 2, 3)` 是不允許的。

In [None]:
x = (1, 2, 3)
x.append(4)

In [None]:
x[0] = "hello"

In [None]:
del x[1]

* 我們可以對元組tuple `x = (2, 1, 1, 8, 8, 0, 0)` 使用 `sorted()` 嗎？

In [None]:
x = (2, 1, 1, 8, 8, 0, 0)
x = sorted(x) # This function returns "a sorted list" of "an iterable object".
print(x)

### Pythonic v.s. Unpythonic

* Example 1. (Not good)

In [None]:
tmp = a
a =  b
b = tmp

* Example 2.

In [None]:
def get_user_info(id):
    name="jack"
    age="5"
    email="jack@nthu"
    return name, age, email
info = get_user_info(id)
print("name", info[0])
print("age", info[1])
print("email", info[2])

In [None]:
def get_user_info(id):
    name="jack"
    age="5"
    email="jack@nthu"
    return name, age, email
name, age, email = get_user_info(id)
print("name", name)
print("age", age)
print("email", email)

* Example 3. (Not good)

In [None]:
if b > 10 and b <= a and a <= 20:
    pass

In [None]:
if 10 < b <= a <= 20:
    pass

* However, not always simple is better, for example:

In [None]:
math.sqrt(sum(pow(x-sum(data) / len(data)), 2) for x in data) / len(data)

In [None]:
# This one is more better
mean = sum(data) / len(data)
variance = sum(pow(x-mean, 2) for x in data) / len(data)
std = math.sqrt(variance)

* 使用元組來存儲不可變的基因序列數據的範例：

In [None]:
# Tuple of conserved regions in the genome
conserved_regions = ("ATGCTAGC", "CGTACGTC", "GCTAGCTA")

# Accessing elements in the tuple (first conserved region)
print(f"First conserved region: {conserved_regions[0]}")

# Tuples cannot be modified, so attempting to change an element will raise an error
conserved_regions[0] = "TTGACCTA"  # This would raise an error

## Sets

* 在 Python 中，集合是**一種無序的物件集合，具有成員資格和唯一性**。
* **與字典中的key類似，集合中的項目必須是不可變和可哈希的**。這意味著**整數、浮點數、字串和元組**可以是集合的成員，但**列表、字典和集合**本身則不行。

### Add and remove element(s) in a set

In [None]:
x = set([2, 1, 1, 8, 8, 0])
print(x)
x.add(0)
print(x)
x.remove(2)
print(x)

In [None]:
print(1 in x)
print(4 in x)

* More operations
    * `|` : 兩個集合的聯集（或組合）
    * `&` : 交集
    * `^` : 對稱差異（**元素存在於其中一個集合或另一個集合中，但不在兩者中**）

In [None]:
x = set([2, 1, 1, 8])
y = set([8, 0, 0])
print(x | y)
print(x & y)
print(x ^ y)

### Frozensets / 凍結集合

* 因為**集合不是不可變且不可哈希的**，所以**不能成為其他集合的成員**。
* 為了解決這個問題，Python 有另一種類型的集合，**frozenset**，它與集合類似，但創建後不能被修改。

In [None]:
x = set([2, 1, 1, 8, 8, 0, 0])
z = frozenset(x)
print(z)

In [None]:
z.add(6)

In [None]:
x.add(z)
print(x)

* 集合：用於存儲唯一的元素，例如病毒基因組中的獨特突變。
* Using sets to remove duplicate mutations: / 使用集合去除重複突變的範例：

In [None]:
# Set of mutations in a SARS-CoV-2 sequence (duplicates are automatically removed)
mutations = {"A123T", "G456C", "T789G", "A123T"}  # "A123T" appears twice but will be stored only once

print(f"Unique mutations: {mutations}")

# Adding a new mutation
mutations.add("C999A")
print(f"Updated mutations: {mutations}")

# Checking if a mutation is present
print("A123T" in mutations)  # True

## Summary

* **列表和元組** 是體現 **元素序列** 概念的結構，**字串** 也是如此。
* **列表類似於其他語言中的陣列**，**但具有自動調整大小、切片表示法和許多便利函數**。
* **元組** 是 **類似於列表但不能被修改** 的結構，因此 **它們使用更少的記憶體並且可以作為字典的鍵**。
* **集合是可迭代的集合**，但它們是 **無序的並且不能有重複元素**。

# 3. String. Outline / 字串

1. Strings as sequences of characters / 字串作為字符序列
2. Basic string operations / 基本字串操作
3. Special characters and escape sequences / 特殊字符和轉義序列
4. String methods / 字串方法
5. Converting from objects to strings / 將物件轉換為字串

## Strings as sequences of characters

* Use index or slice notation: / 使用索引或切片表示法：

In [None]:
x = "Hello"
print(x[0])
print(x[-1])
print(x[1:])

In [None]:
x = "Hello\tWorld"
x = x[:-1]
print(x)

In [None]:
# A gene sequence as a string
sequence = "ATGCTAGCTAGGCTA"

# Access individual nucleotides by index
first_nucleotide = sequence[0]
last_nucleotide = sequence[-1]

print(f"First nucleotide: {first_nucleotide}")
print(f"Last nucleotide: {last_nucleotide}")

## Basic string operations / 基本字串操作

* Combine strings
    * 字串連接運算符 `+`
    * 字串重複運算符 `*`

In [None]:
x = "Hello\t" + "World"
print(x)
print(8 * x)

* 你可以連接、重複或切片字串。例如，你可以連接基因序列或提取子序列。

In [None]:
# Concatenate two gene sequences
sequence1 = "ATGCTAGC"
sequence2 = "GCTAGCTA"
combined_sequence = sequence1 + sequence2
print(f"Combined sequence: {combined_sequence}")

# Repeat a sequence multiple times
repeated_sequence = sequence1 * 3
print(f"Repeated sequence: {repeated_sequence}")

# Slice a portion of the gene sequence
sub_sequence = sequence1[2:6]
print(f"Sliced sequence: {sub_sequence}")

## Special characters and escape sequences / 特殊字符和轉義序列

* 字串中的雙字符轉義序列（[Python 文檔](https://docs.python.org/3/reference/lexical_analysis.html)）。


### 打印和評估具有特殊字符的字串

* What's the difference?
    * 互動式地評估 Python 表達式
    * 打印 Python 表達式的結果

In [None]:
x='Hello\tWorld'
x
print("using print():"+x)

In [None]:
print("using print():Hello\tWorld")
print("using print():Hello\tWorld", end="*")

## String methods / 字串方法

* 內建函數
    * 用法：`string.method()`

### Split and join string methods / 切割和連接字串方法

* `split` 默認使用空白字符作為分隔符

In [None]:
print(" ".join(["join", "with", "spaces"]))

In [None]:
print("::".join(["join", "with", "colons"]))

In [None]:
print("".join(["join", "with", "nothing"]))

* `split` function
    * **`split` 默認會根據任何空白字符進行切割**，而不僅僅是單個空格字符。

In [None]:
x = "You\t\t can have tabs\t\n \t and newlines \n\n mixed in"
print(x.split())
print(x.split("newlines"))

* 指定 `split` 應該執行多少次切割

In [None]:
x = 'a b c d'
print(x.split(' ', 1))
print(x.split(' ', 2)) # split string twice
print(x.split(' ', 3))
print(x.split(' ', 4))

* 字串方法對於分析和處理基因序列非常有用。常用的方法包括 `count()`、`replace()`、`find()` 和 `split()`。

In [None]:
sequence = "ATGCTAGCTAGGCTA"

# Count the occurrences of a specific nucleotide (e.g., A)
a_count = sequence.count("A")
print(f"Number of 'A' nucleotides: {a_count}")

# Replace all occurrences of one nucleotide with another (mutation simulation)
mutated_sequence = sequence.replace("A", "T")
print(f"Mutated sequence: {mutated_sequence}")

# Find the position of a specific subsequence
position = sequence.find("GCT")
print(f"Position of 'GCT' in the sequence: {position}")

### 將字串轉換為數字

* 使用 `int` 和 `float` 函數將字串分別轉換為整數和浮點數
* 可選的第二個參數：通過指定數字基數來解釋輸入的字串

In [None]:
print(float('123.456'))

In [None]:
print(float('xxyy'))

In [None]:
int('3333')

In [None]:
int('123.456')

In [None]:
print(int('10000', 8)) # Interprets 10000 as octal number
print(int('101', 2))
print(int('ff', 16))

In [None]:
# you will never get the digit "6" in a base 6 number.
print(int('123456', 6))

### 去除多餘的空白字符

* `strip`、`lstrip` 和 `rstrip` 函數。
    * 返回一個新的字串，與原始字串相同，但去除了字串開頭或結尾的空白字符

In [None]:
x = "\t\t  Hello, World\t\t  "
print(x)
print(x.strip())
print(x.lstrip())
print(x.rstrip())

* 查找 Python 認為的空白字符​

In [None]:
import string
string.whitespace

* `strip` 函數用於移除字串中的所有指定字符

In [None]:
x = "www.python.org"
print(x.strip("w")) # Strips off all ws
print(x.strip("gor")) # Strips off all gs, os, and rs
print(x.strip(".gorw")) # Strips off all dots, gs, os, rs and ws
print(x.strip(".gorwh")) # Strips off all dots, gs, os, rs and ws

In [None]:
x="(name, date),\n" ## be carefull "\n"
print(x.strip("),\n"))
print(x.strip("\n)(,"))

In [None]:
state = "Mississippi"
print(state.strip("i"))
print(state.strip("M"))
print(state.strip("s"))

### 字串搜尋

* 四種基本的字串搜尋方法類似：**`find`、`rfind`、`index` 和 `rindex`**。

In [None]:
x = "Hello world"
"Hello" in x

In [None]:
x.find("Hello")

In [None]:
x.find("world")

* Optional arguments / 可選參數
    * 忽略字串中 `start` 位置之前的所有字符
    * 忽略字串中 `end` 位置及其之後的所有字符

In [None]:
x = "Mississippi"
print(x.find("ss", 4)) # started from an index of 4
print(x.find("ss", 0, 4)) # no hit found

* **`rfind`**
    * 從字串的末尾開始搜尋

In [None]:
x = "Mississippi"
print(x.rfind("ss"))
print(x.rfind("p"))
print(x.rfind("pp"))

* **`index` 和 `rindex`** 與 **`find` 和 `rfind`** 分別相同
* **唯一的區別**
    * **如果 `index` 或 `rindex` 未能找到子字串的出現位置**，**它不會返回 -1**，而是會引發 **`ValueError` 異常**。

In [None]:
x = "Mississippi"
x.index("Hello")

In [None]:
x = "Mississippi"
x.find("Hello")

### startswith() and endswith()

In [None]:
x = "Mississippi"
print(x.startswith("Miss"))
print(x.startswith("Mist"))
print(x.endswith("pi"))
print(x.endswith("p"))
print(x.endswith(("i", "u")))

### 修改“不可變”字串

* 對該字串進行操作並返回一個新字串

In [None]:
x = "Mississippi"
print(x.replace("ss", "++"))

### Other useful methods

In [None]:
x = "123"
print(x.isdigit())
print(x.isalpha())
x = "M"
print(x.islower())
print(x.isupper())

## 將物件轉換為字串

* `repr` function
    * 幾乎所有物件都可以轉換為字串表示形式
    * 返回一個描述該物件的字串

In [None]:
x = [1]
x.append(2)
x.append([3, 4])
print('the list x is ' + repr(x))

In [None]:
repr(len)

### 對於除錯程式非常有用：`str` 和 `repr`

* **`str` 函數** 返回一個字串
* **`repr` 函數** 返回 Python 物件的正式字串表示形式。（更像是電腦語言）

In [None]:
from datetime import datetime
now = datetime.now()
print(str(now))
print(repr(now))

## Quick practice
* Q1. remove double quotes in x

In [None]:
x = ['"abc"', 'def', '"ghi"', '"klm"', 'nop']

In [None]:
# Answer:

* Q2. find the last of p in Mississippi, print the position, and remove this p

In [None]:
state = "Mississippi"

In [None]:
# Answer:

## Summary

* 字串是**不可變**的。
* 強大的文字處理功能：**searching and replacing, trimming characters, and changing case**。

# 4. Dictionaries. Outline

1. What's a dictionary / 什麼是字典
2. Using dictionary operations / 使用字典操作
3. Determining what can be used as a key / 確定什麼可以用作鍵
4. Sparse matrices / 稀疏矩陣

## What's a dictionary? / 什麼是字典？

* 關聯數組(arrays)、哈希表(hash tables)，或**將一組物件映射到相關值的方式**
    * 使用方式：`{key1: value1, key2: value2}`

In [None]:
ages = {"Tim":36, "Tony":32, "Manu":36} # key should be unique.
print(ages["Tim"]) # get value

In [None]:
# add new key
ages["Pop"] = 70
print(ages)

In [None]:
# Dictionary mapping gene names to their sequences
genes = {
    "Spike": "ATGCTAGCTAGGCTA",
    "Envelope": "CGTACGTCGATCGTA",
    "Membrane": "GCTAGCTAGGCTAAT"
}

# Access the sequence of the Spike gene
print(f"Spike gene sequence: {genes['Spike']}")

### Dictionary v.s. List

* **兩者都可以存儲任何類型的物件。**
* **lists中的值** 通過索引訪問，按位置排序。
* **Dictionaries** 通過keys訪問值。

In [None]:
# An empty dictionary is created much like an empty list,
# but with **curly braces** instead of **square brackets**.
x = [] # I'm an empty list
y = {} # I'm an empty dictionary

* **看起來非常像列表。**

In [None]:
y["two"] = 2
y["pi"] = 3.14
y["two"] * y["pi"]

## Useful dictionary operations

In [None]:
english_to_french = {}
english_to_french['red'] = 'rouge'
english_to_french['blue'] = 'bleu'
english_to_french['green'] = 'vert'
print("red is", english_to_french['red'])

In [None]:
# the length of dictionary
len(english_to_french)

In [None]:
# list of the keys
list(english_to_french.keys())

In [None]:
# list of the values
list(english_to_french.values())

In [None]:
# list of the items
list(english_to_french.items())

* **del 語句：從dictionary中刪除一個條目（as a key-value pair）。**

In [None]:
list(english_to_french.items())
print(english_to_french)
del english_to_french['blue']
list(english_to_french.items())
print(english_to_french)

* Iterable like sequences

In [None]:
'red' in english_to_french

In [None]:
'orange' in english_to_french

* **get function**
    * **返回與鍵相關聯的值**（如果找到）
    * **如果未找到，返回第二個參數（可選）或 None**

In [None]:
print(english_to_french.get('blue', 'No translation'))

In [None]:
print(english_to_french.get('chartreuse', 'No translation'))

In [None]:
print(english_to_french.get('blue'))

* Copy method

In [None]:
x = {0: 'zero', 1: 'one'}
y = x.copy()
print(y)

* **Update method**
    * **使用第二個dictionary的所有key值對來更新第一個dictionary。**

In [None]:
z = {1: 'One', 2: 'Two'}
x = {0: 'zero', 1: '1'}
x.update(z)
print(x)

### Word counting

In [None]:
## a list of words and one word per line
sample_string = "To be or not to be"
occurrences = {}
for word in sample_string.split():
    occurrences[word] = occurrences.get(word, 0) + 1
for word in occurrences:
    print("The word", word, "occurs", occurrences[word], "times in the string")

## What can be used as a key?

* **任何不可變且可哈希的物件**
    * **數字是不可變的。**
    * **列表是可變的，可以添加、修改或刪除。**
    * **元組是不可變的列表，但包含可變值的元組是不可哈希的。**

### Quick practice: WHAT CAN BE A KEY?

* 1
* 'hello'
* ('hello', \[1, 2, 3\])
* \["hello-world"\]
* "hello"
* ("hello", "world")    

* Answer:
    * -
    * -
    * -
    * -
    * -
    * -

## Sparse matrices / 稀疏矩陣
* **Matrix**: 用方括號表示的二維數字網格

In [None]:
matrix = [[2, 0, -1, 1], [0, 8, 0, 0], [0, 8, 0, 0], [0, 0, 0, 0]]
print(matrix)

* matrix中的元素由行和列號訪問：

In [None]:
matrix[0][0]

* 使用帶有元組索引的dictionary表示Sparse matrices

In [None]:
matrix = {(0, 0): 2, (0, 2): -1, (0, 3): 1, (1, 1): 8, (2, 1): 8}

In [None]:
rownum = 0
colnum = 2
print(matrix[(rownum, colnum)])

In [None]:
# Representing a sparse matrix of gene expression levels
# Keys are (gene, condition), values are expression levels
expression_matrix = {
    ("Gene1", "ConditionA"): 2.5,
    ("Gene1", "ConditionB"): 0.0,
    ("Gene2", "ConditionA"): 0.8,
    ("Gene2", "ConditionB"): 1.5
}

# Accessing a specific expression value
print(f"Gene1 expression in ConditionA: {expression_matrix[('Gene1', 'ConditionA')]}")

## Brief summary
* **Dictionary的keys必須是不可變的。**
* **使用keys來訪問數據集合（例如matrix）將比許多其他解決方案更簡單（代碼更少）。**