# 五、字典和集合
## 1. 字典
### 1.1 字典基础
字典: 由键值对组成的数据结构

- 字典的每个键值 `key:value` 对用冒号 `:` 分割，每个键值对之间用逗号 `,` 分割，整个字典包括在花括号 `{}` 
- 初始化空字典用`d = {}` 或 `d = dict()`
- 键的唯一性：一个字典不能有相同的键
- 键的不可变性：键必须是**不可变的类型**，比如字符串、数字、元组

In [2]:
d1 = {}
d2 = dict()
d3 = {"wav_name": "B000_S000_W000", "duration": 3.62, "text": "Hello, world!"}
print(d1)
print(d2)
print(d3)

{}
{}
{'wav_name': 'B000_S000_W000', 'duration': 3.62, 'text': 'Hello, world!'}


### 1.2 字典常用操作
增加/修改 键值对：`d[key] = value`

> 如果key存在则修改这个key对应的值 ；如果key不存在则新增此键值对

In [3]:
d4 = {"wav_name": "B000_S000_W000", "duration": 3.62, "text": "Hello, world!"}
print(d4)
d4["lang"] = "eng"      # key不存在，新增此键值对
print(d4)
d4["duration"] = 7.89   # key存在，修改这个key对应的值
print(d4)

{'wav_name': 'B000_S000_W000', 'duration': 3.62, 'text': 'Hello, world!'}
{'wav_name': 'B000_S000_W000', 'duration': 3.62, 'text': 'Hello, world!', 'lang': 'eng'}
{'wav_name': 'B000_S000_W000', 'duration': 7.89, 'text': 'Hello, world!', 'lang': 'eng'}


删除操作：

- `del`：删除字典中指定键值对

In [4]:
d5 = {"wav_name": "B000_S000_W000", "duration": 3.62, "text": "Hello, world!", "lang": "eng"}
print(d5)
del d5["lang"]
print(d5)

{'wav_name': 'B000_S000_W000', 'duration': 3.62, 'text': 'Hello, world!', 'lang': 'eng'}
{'wav_name': 'B000_S000_W000', 'duration': 3.62, 'text': 'Hello, world!'}


- `clear()`: 清空字典

In [5]:
d5 = {"wav_name": "B000_S000_W000", "duration": 3.62, "text": "Hello, world!", "lang": "eng"}
print(d5)
d5.clear()
print(d5)

{'wav_name': 'B000_S000_W000', 'duration': 3.62, 'text': 'Hello, world!', 'lang': 'eng'}
{}


查找操作：
- 直接用key值查找对应value
- 更安全的操作`get()`：键不存在时返回默认值

In [8]:
d6 = {"wav_name": "B000_S000_W000", "duration": 3.62, "text": "Hello, world!", "lang": "eng"}
print(d6["wav_name"])
print(d6.get("wav_name"))
print(d6.get("dnsmos"))

B000_S000_W000
B000_S000_W000
None


查找字典的不同部分：
- `keys()`: 获取所有键
- `values()`: 获取所有值
- `items()`: 获取所有键值对

In [9]:
d7 = {"wav_name": "B000_S000_W000", "duration": 3.62, "text": "Hello, world!", "lang": "eng"}
print(d7.keys())
print(d7.values())
print(d7.items())

dict_keys(['wav_name', 'duration', 'text', 'lang'])
dict_values(['B000_S000_W000', 3.62, 'Hello, world!', 'eng'])
dict_items([('wav_name', 'B000_S000_W000'), ('duration', 3.62), ('text', 'Hello, world!'), ('lang', 'eng')])


### 1.3 字典的遍历

In [13]:
d8 = {"wav_name": "B000_S000_W000", "duration": 3.62, "text": "Hello, world!", "lang": "eng"}
print("------------------------------")
for key in d8.keys(): # 遍历key
    print(key)
print("------------------------------")
for value in d8.values(): # 遍历value
    print(value)
print("------------------------------")
for key, value in d8.items(): # 同时遍历key和value
    print(f'{key}: {value}')
print("------------------------------")

------------------------------
wav_name
duration
text
lang
------------------------------
B000_S000_W000
3.62
Hello, world!
eng
------------------------------
wav_name: B000_S000_W000
duration: 3.62
text: Hello, world!
lang: eng
------------------------------


### 1.4 字典推导式
字典推导式：一种简洁创建字典的方法

- 基本语法：`{键表达式: 值表达式 for 变量 in 可迭代对象 if 条件}`

例1：将两个列表合并成字典

In [16]:
myKeys = ['wav_name', 'duration', 'text', 'lang']
myValues = ['B000_S000_W000', 3.62, 'Hello, world!', 'eng']
d9 = {myKeys[i]:myValues[i] for i in range(len(myKeys))}
d9

'''
{'wav_name': 'B000_S000_W000',
 'duration': 3.62,
 'text': 'Hello, world!',
 'lang': 'eng'}
'''

{'wav_name': 'B000_S000_W000',
 'duration': 3.62,
 'text': 'Hello, world!',
 'lang': 'eng'}

例2：提取字典中符合条件的数据

In [23]:
def filter(wav_name: str):
    '''
    只挑选出key的第一部分"bxxx"中数字大于等于3的数据
    '''
    parts = wav_name.split('_')
    b_num = parts[0][1:]    
    return (int(b_num) >= 3)
    

wav_dict = {"B000_S000_W000": 6.54, 
            "B002_S023_W045": 8.19,
            "B003_S010_W079": 10.98,
            "B004_S012_W000": 6.43,
            "B005_S049_W085": 3.84}

wav_dict = {key: value for key, value in wav_dict.items() if filter(key)}
print(wav_dict)

{'B003_S010_W079': 10.98, 'B004_S012_W000': 6.43, 'B005_S049_W085': 3.84}


## 2. 集合
### 2.1 集合基础
集合: 一组**不重复**元素的无序集合

- 集合中的元素用逗号 , 分割，整个集合包括在花括号 {} 中
- 初始化空集合用 `s = set()`（注意：`s = {} `创建的是空字典!!!）

In [25]:
s1 = {1, 2, 3, 4, 5}
s2 = set()
d9 = {}
print(s1)
print(f"s2\'s type: {type(s2).__name__}")
print(f"d9\'s type: {type(d9).__name__}")

{1, 2, 3, 4, 5}
s2's type: set
d9's type: dict


### 2.2 集合常用操作
增加数据：

- `add()`: 增加单个元素
- `update()`: 往集合中一次性追加另一个序列中的多个值

In [26]:
s3 = {1, 2, 3, 4, 5}
print(s3)
s3.add(6)
print(s3)
s3.update([7, 8, 9])
print(s3)

{1, 2, 3, 4, 5}
{1, 2, 3, 4, 5, 6}
{1, 2, 3, 4, 5, 6, 7, 8, 9}


删除操作：

- `remove()`: 删除指定元素，如果元素不存在会报错

In [28]:
s4 = {1, 2, 3, 4, 5}
s4.remove(3)
print(s4)

{1, 2, 4, 5}


- `discard()`: 删除指定元素，如果元素不存在不会报错

In [30]:
s5 = {1, 2, 3, 4, 5}
s5.discard(3)
print(s5)
s5.discard(6)
print(s5)

{1, 2, 4, 5}
{1, 2, 4, 5}


- `pop()`: 随机删除一个元素并返回该元素

In [37]:
s6 = {1, 2, 3, 4, 5}
value = s6.pop()
print(value)
print(s6)

1
{2, 3, 4, 5}


- `clear()`: 清空集合

In [38]:
s7 = {1, 2, 3, 4, 5}
print(s7)
s7.clear()
print(s7)

{1, 2, 3, 4, 5}
set()


查找操作：

- `in`：判断数据在集合序列
- `not in`：判断数据不在集合序列

In [39]:
s8 = {1, 2, 3, 4, 5}
print(1 in s8)
print(1 not in s8)

True
False


### 2.3 集合推导式
集合推导式：一种简洁创建集合的方法

- 基本语法（同字典推导式）：`{表达式 for 变量 in 可迭代对象 if 条件}`

In [45]:
s9 = {x**2 for x in range(10) if x % 2} # 0~9中，奇数的平方
print(s9)

{1, 9, 81, 49, 25}


### 2.4 集合的常用例子
词汇表构建的简单例子: NLP和TTS预处理的常用操作

In [53]:
import string
myList = ["Hello, world!",
          "My name is Super momo.",
          "Hello my friend."]

vocab_set = set()
for text in myList:
    text = ''.join(char for char in text if char not in string.punctuation)
    words = text.split()
    words = [word.lower() for word in words]
    vocab_set.update(words)

print(vocab_set)

{'friend', 'name', 'hello', 'is', 'my', 'momo', 'super', 'world'}
