# <div style="text-align: center"><font color='#dc2624' face='微软雅黑'>Python 基础系列</font></div>
## <div style="text-align: center"><font color='#dc2624' face='微软雅黑'>解析表达式</font></div>

## <font color='#dc2624' face='微软雅黑'>目录</font><a name='toc'></a>
### 1. [**<font color='#dc2624' face='微软雅黑'>列表解析式</font>**](#1)
1. [<font color='#2b4750' face='微软雅黑'>`for` 循环</font>](#1.1)
2. [<font color='#2b4750' face='微软雅黑'>`if` 条件</font>](#1.2)
3. [<font color='#2b4750' face='微软雅黑'>多层 `for` 循环和 `if` 条件</font>](#1.3)

### 2. [**<font color='#dc2624' face='微软雅黑'>集合解析式</font>**](#2)


### 3. [**<font color='#dc2624' face='微软雅黑'>字典解析式</font>**](#3)

### 4. [**<font color='#dc2624' face='微软雅黑'>案例和习题</font>**](#4)
---

解析式 (comprehension) 是将一个可迭代对象转换成另一个可迭代对象的工具。

上面出现了两个可迭代对象 (iterable)，不严谨地说，容器类型数据 (`str`, `tuple`, `list`, `dict`, `set`) 都是可迭代对象。

- 第一个可迭代对象：可以是任何容器类型数据。
- 第二个可迭代对象：看是什么类型解析式：


    - 列表解析式：可迭代对象是 `list`
    - 字典解析式：可迭代对象是 `dict`
    - 集合解析式：可迭代对象是 `set`

# <font color='#dc2624' face='微软雅黑'>1. 列表解析式</font><a name='1'></a>
[<font color='black' face='微软雅黑'>回到目录</font>](#toc)
### <font color='#2b4750' face='微软雅黑'>1.1 `for` 循环</font><a name='1.1'></a>
[<font color='black' face='微软雅黑'>回到章首</font>](#1)

有三种方式生成列表。

**第一种方法**：用 `for` 循环

In [2]:
l = []
for i in range(10):
    l.append(i*i)
l

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

**第二种方法**：用 `map` 函数

In [9]:
m_iter = map( lambda x: x*x, range(10) )
list(m_iter)

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

**第三种方法**：用列表解析式

In [10]:
[ i*i for i in range(10) ]

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

列表解析式可以看成 **`for` 循环**的等价转换。

```
l = []
for item in collection:
    l.append(item_expression)
```
<=>
```
l = [ item_expression for item in collection ]
```

### <font color='#2b4750' face='微软雅黑'>1.2 `if` 条件</font><a name='1.2'></a>
[<font color='black' face='微软雅黑'>回到章首</font>](#1)

加上条件也有三种方式生成列表。

**第一种方法**：用 `for` 循环 + `if` 条件

In [11]:
l = []
for i in range(10):
    if i % 2 == 0:
        l.append(i*i)
l

[0, 4, 16, 36, 64]

**第二种方法**：用 `filter` 函数

In [15]:
f_iter = map( lambda x:x*x, filter(lambda x:x%2==0, range(10)) )
list(f_iter)

[0, 4, 16, 36, 64]

**第三种方法**：用列表解析式

In [16]:
[ i * i for i in range(10) if i % 2 == 0 ]

[0, 4, 16, 36, 64]

对于新手，如果觉得上面解析式不直观，可用先**分行**写，如下：

In [19]:
[
    i * i
    for i in range(10) 
    if i % 2 == 0
]

[0, 4, 16, 36, 64]

列表解析式可以看成 **`for` 循环 + `if` 条件**的等价转换。
```
l = []
for item in collection:
    if condition(item):
        l.append(item_expression)
```
<=>
```
l = [ item_expression for item in collection if condition(item) ]
```
<=>
```
l = [
        item_expression 
        for item in collection
        if condition(item)
    ]
```

回顾之前两节，我们发现用列表解析式都可以实现 `filter` 和 `map` 函数的功能。用专业计算机的语言说，解析式可以看成是 `filter` 和 `map` 函数的语法糖 (syntactic sugar)。

<font color='blue' face='微软雅黑'>*语法糖用更简练的方式来表达含义，相当于汉语里的成语。*</font>

### <font color='#2b4750' face='微软雅黑'>1.3 多层 `for` 循环和 `if` 条件</font><a name='1.3'></a>
[<font color='black' face='微软雅黑'>回到章首</font>](#1)

将矩阵中大于 5 的数平方，并将矩阵展平。

**第一种方法**：用两个 `for` 循环 + `if` 条件

In [25]:
matrix = [[1,2,3], [4,5,6], [7,8,9]]
l = []

for row in matrix:
    for n in row:
        if n > 5:
            l.append(n*n)

l

[36, 49, 64, 81]

**第二种方法**：用列表解析式

In [22]:
[n*n for row in matrix for n in row if n > 5]

[36, 49, 64, 81]

分行来更清晰的表达：

In [24]:
[
    n*n
    for row in matrix
    for n in row
    if n > 5
]

[36, 49, 64, 81]

In [37]:
data = [['John', 'Emily', 'Michael', 'Mary', 'Steven'],
        ['Maria', 'Javier', 'Pilar']]

result = [
            name
            for names in data
            for name in names
            if 'e' in name
         ]
result

['Michael', 'Steven', 'Javier']

# <font color='#dc2624' face='微软雅黑'>2. 集合解析式</font><a name='2'></a>
[<font color='black' face='微软雅黑'>回到目录</font>](#toc)

按照列表解析式的思路，将**中括号 `[]`** 换成**大括号 `{}`** 就是集合解析式了。

<font color='blue' face='微软雅黑'>*集合是用 `{}` 创建的。*</font>

用集合解析式主要是为了得到**不重复**的元素。

In [31]:
{ x * x for x in range(-9, 10) }

{0, 1, 4, 9, 16, 25, 36, 49, 64, 81}

In [33]:
s = 'I love this Python class from Steven Wang'
unique_vowels = {i for i in s.lower() if i in 'aeiou'}
unique_vowels

{'a', 'e', 'i', 'o'}

# <font color='#dc2624' face='微软雅黑'>3. 字典解析式</font><a name='3'></a>
[<font color='black' face='微软雅黑'>回到目录</font>](#toc)

按照集合解析式的思路，加一个**冒号 `:`** 就是字典解析式了。

<font color='blue' face='微软雅黑'>*字典是用 `:` 分隔键值的。*</font>

字典解析式可以看成 **`for` 循环 + `if` 条件**的等价转换。
```
d = {}
for key, val in collection:
    if condition(key, val):
        d[key] = val_expression
```
<=>
```
d = { key:val_expression for key, val in collection if condition(key, val) }
```
<=>
```
d = { 
        key:val_expression
        for key, val in collection
        if condition(key, val)
    }
```

In [34]:
{n:n**2 for n in range(6)}

{0: 0, 1: 1, 2: 4, 3: 9, 4: 16, 5: 25}

# <font color='#dc2624' face='微软雅黑'>4. 案例和习题</font><a name='4'></a>
[<font color='black' face='微软雅黑'>回到目录</font>](#toc)

看一个实际数据预处理的例子，从 `yahoofinancials` 获取数据，但是返回的数据是字典套着字典，可读性为零。我们来用列表解析式来将原始数据转换成可读性强的 DataFrame。

In [38]:
from yahoofinancials import YahooFinancials
import pandas as pd

In [64]:
cryptocurrencies = ['BTC-USD', 'ETH-USD']
CRX_obj = YahooFinancials( cryptocurrencies )
start_date='2019-05-01'
end_date='2020-05-01'
CFX_daily = CRX_obj.get_historical_price_data( start_date, end_date, 'daily' )
CFX_daily

{'BTC-USD': {'eventsData': {},
  'firstTradeDate': {'formatted_date': '2014-09-16', 'date': 1410908400},
  'currency': 'USD',
  'instrumentType': 'CRYPTOCURRENCY',
  'timeZone': {'gmtOffset': 3600},
  'prices': [{'date': 1556665200,
    'high': 5418.00390625,
    'low': 5347.64599609375,
    'open': 5350.91455078125,
    'close': 5402.697265625,
    'volume': 13679528236,
    'adjclose': 5402.697265625,
    'formatted_date': '2019-04-30'},
   {'date': 1556751600,
    'high': 5522.2626953125,
    'low': 5394.21728515625,
    'open': 5402.4228515625,
    'close': 5505.28369140625,
    'volume': 14644460907,
    'adjclose': 5505.28369140625,
    'formatted_date': '2019-05-01'},
   {'date': 1556838000,
    'high': 5865.8818359375,
    'low': 5490.20166015625,
    'open': 5505.55224609375,
    'close': 5768.28955078125,
    'volume': 18720780005,
    'adjclose': 5768.28955078125,
    'formatted_date': '2019-05-02'},
   {'date': 1556924400,
    'high': 5886.8935546875,
    'low': 5645.469238

In [62]:
print( f'The type is:               {type(CFX_daily)}' )
print( f'The number of elements is: {len(CFX_daily)}' )
print( f'The keys are:              {CFX_daily.keys()}' )

The type is:               <class 'dict'>
The number of elements is: 2
The keys are:              dict_keys(['BTC-USD', 'ETH-USD'])


In [63]:
print( f"The type is:               {type(CFX_daily['BTC-USD'])}" )
print( f"The number of elements is: {len(CFX_daily['BTC-USD'])}" )
print( f"The keys are:              {CFX_daily['BTC-USD'].keys()}" )

The type is:               <class 'dict'>
The number of elements is: 6
The keys are:              dict_keys(['eventsData', 'firstTradeDate', 'currency', 'instrumentType', 'timeZone', 'prices'])


In [65]:
def data_converter( raw_data, code ):
    # convert raw data to dataframe
    columns = ['open', 'close', 'low', 'high', 'adjclose', 'volume' ]
    price_dict = raw_data[code]['prices']
    
    # 列表解析式
    index = [ p['formatted_date'] for p in price_dict ]
    # 两层列表解析式
    price = [ [ p[c] for c in columns ] for p in price_dict ]

    data = pd.DataFrame( price,
                         index=pd.Index(index, name='date'),
                         columns=pd.Index(columns, name='OHLC') )
    return data

In [66]:
BTC = data_converter( CFX_daily, 'BTC-USD' )
BTC.head().append(BTC.tail())

OHLC,open,close,low,high,adjclose,volume
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2019-04-30,5350.914551,5402.697266,5347.645996,5418.003906,5402.697266,13679530000.0
2019-05-01,5402.422852,5505.283691,5394.217285,5522.262695,5505.283691,14644460000.0
2019-05-02,5505.552246,5768.289551,5490.20166,5865.881836,5768.289551,18720780000.0
2019-05-03,5769.202637,5831.16748,5645.469238,5886.893555,5831.16748,17567780000.0
2019-05-04,5831.068359,5795.708496,5708.035156,5833.862793,5795.708496,14808830000.0
2020-04-26,7679.418945,7795.601074,7679.418945,7795.601074,7795.601074,36162140000.0
2020-04-27,7796.970215,7807.058594,7730.806641,7814.527344,7807.058594,33187960000.0
2020-04-28,7806.712402,8801.038086,7786.049316,8871.753906,8801.038086,60201050000.0
2020-04-29,,,,,,
2020-05-01,8674.056641,8789.90332,8674.056641,8823.572266,8789.90332,60928020000.0


In [67]:
ETH = data_converter( CFX_daily, 'ETH-USD' )
ETH.head().append(ETH.tail())

OHLC,open,close,low,high,adjclose,volume
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2019-04-30,162.186554,160.818344,159.660217,164.060684,160.818344,5789172000.0
2019-05-01,160.853577,162.122787,160.060699,162.937012,162.122787,6044171000.0
2019-05-02,162.075165,167.952408,161.080627,170.068741,167.952408,7299411000.0
2019-05-03,167.887207,164.026581,161.791428,170.645935,164.026581,6658100000.0
2019-05-04,164.015259,163.450699,159.700653,165.399979,163.450699,5938416000.0
2020-04-26,197.475723,197.224716,193.454163,199.552795,197.224716,18670190000.0
2020-04-27,197.273514,198.41539,194.849426,198.786545,198.41539,18217510000.0
2020-04-28,198.465195,216.968231,198.124512,218.454636,216.968231,26397550000.0
2020-04-29,,,,,,
2020-05-01,208.211502,214.030685,208.211502,214.812317,214.030685,26815010000.0


## 习题：从 0 到 20 找出不能被 3 整除的数，如果是奇数返回它本身，如果是偶数返回它的相反数。

提示：
- 从 0 到 20 的数是个容器型变量 - `range(20)`
- 不能被 3 整除的条件 - `if i % 3 != 1`
- 如果是奇数返回它本身，如果是偶数返回它的相反数，这是个表达式，更严谨的是个**条件表达式** - `i if i%2==1 else -i`

再套在解析表达式里

```
l = [
        条件表达式
        for 元素 in 容器
        if 条件
    ]
```

## 答案

In [29]:
[
    i if i%2==1 else -i
    for i in range(20)
    if i%3!=0
]

[1, -2, -4, 5, 7, -8, -10, 11, 13, -14, -16, 17, 19]