# 6 Tables

- 使用表，需要导入 module `datascience`
    ```python
    form datascience import *
    ```
- 创建一个空表
    ```python
    Table()
    ```
- `with_columns()` 用于添加 labeled columns (`with_column()` 只能添加一行（)
- 读文件获取表格，使用 `read_table()`, 例如
    ```python
    minard = Table.read_table(path_data + 'minard.csv')
    ```
- 表的 size
    - `num_rows` 给出表的行数
    - `num_cols` 给出表的列数
- `labels()` 可以列出所有列的标签
- `relabeled()` 用于标签重命名, 返回的一个 table


In [39]:
from datascience import *

flowers = Table().with_columns(
    'Number of petals', make_array(8, 34, 5),
    'Name', make_array('lotus', 'sunflower', 'rose')
)
flowers.with_columns(
    'Color', make_array('pink', 'yellow', 'red')
).show()
flowers.show()  # with_columns() 不会影响原表

Number of petals,Name,Color
8,lotus,pink
34,sunflower,yellow
5,rose,red


Number of petals,Name
8,lotus
34,sunflower
5,rose


In [40]:
minard = Table.read_table('minard.csv')
minard

Longitude,Latitude,City,Direction,Survivors
32.0,54.8,Smolensk,Advance,145000
33.2,54.9,Dorogobouge,Advance,140000
34.4,55.5,Chjat,Advance,127100
37.6,55.8,Moscou,Advance,100000
34.3,55.2,Wixma,Retreat,55000
32.0,54.6,Smolensk,Retreat,24000
30.4,54.4,Orscha,Retreat,20000
26.8,54.3,Moiodexno,Retreat,12000


In [41]:
print(minard.num_columns)
print(minard.num_rows)
minard.labels

5
8


('Longitude', 'Latitude', 'City', 'Direction', 'Survivors')

In [42]:
minard = minard.relabeled('City', 'City Name')
minard

Longitude,Latitude,City Name,Direction,Survivors
32.0,54.8,Smolensk,Advance,145000
33.2,54.9,Dorogobouge,Advance,140000
34.4,55.5,Chjat,Advance,127100
37.6,55.8,Moscou,Advance,100000
34.3,55.2,Wixma,Retreat,55000
32.0,54.6,Smolensk,Retreat,24000
30.4,54.4,Orscha,Retreat,20000
26.8,54.3,Moiodexno,Retreat,12000


### Accessing the Data in a Column

- 可以使用列 label to access the array of data in the column (使用 method `column()`)
- 也可以使用列 index (仍是从 0 开始计数的 index)

In [43]:
minard.column('Survivors')

array([145000, 140000, 127100, 100000,  55000,  24000,  20000,  12000])

In [44]:
minard.column(4)

array([145000, 140000, 127100, 100000,  55000,  24000,  20000,  12000])

### Working with the Data in a Column

- 考虑添加列 "Percent Surviving"


In [45]:
survivors = minard.column('Survivors')
minard = minard.with_column(
    'Percent Surviving', survivors / survivors.item(0)
)
minard

Longitude,Latitude,City Name,Direction,Survivors,Percent Surviving
32.0,54.8,Smolensk,Advance,145000,1.0
33.2,54.9,Dorogobouge,Advance,140000,0.965517
34.4,55.5,Chjat,Advance,127100,0.876552
37.6,55.8,Moscou,Advance,100000,0.689655
34.3,55.2,Wixma,Retreat,55000,0.37931
32.0,54.6,Smolensk,Retreat,24000,0.165517
30.4,54.4,Orscha,Retreat,20000,0.137931
26.8,54.3,Moiodexno,Retreat,12000,0.0827586


- 可以设置列的数据格式，通过 Table method `set_format()`
    > 常见的数据格式有
    > - dates (`DateFormatter`)
    > - currencies (`CurrencyFormatter`)
    > - numbers (`NumberFormatter`)
    > - percentages (`PercentFormatter`)

In [47]:
minard.set_format('Percent Surviving', PercentFormatter)

Longitude,Latitude,City Name,Direction,Survivors,Percent Surviving
32.0,54.8,Smolensk,Advance,145000,100.00%
33.2,54.9,Dorogobouge,Advance,140000,96.55%
34.4,55.5,Chjat,Advance,127100,87.66%
37.6,55.8,Moscou,Advance,100000,68.97%
34.3,55.2,Wixma,Retreat,55000,37.93%
32.0,54.6,Smolensk,Retreat,24000,16.55%
30.4,54.4,Orscha,Retreat,20000,13.79%
26.8,54.3,Moiodexno,Retreat,12000,8.28%


### Choosing Sets of Columns

- `select()` 用于创建一个仅包含若干特定列的新表
- 注意到 `column()` 创建的是数组，`select()` 创建的是表
- `drop()` 丢掉不要的列，创建新表

In [48]:
minard.select('Longitude', 'Latitude')

Longitude,Latitude
32.0,54.8
33.2,54.9
34.4,55.5
37.6,55.8
34.3,55.2
32.0,54.6
30.4,54.4
26.8,54.3


---

## 6.1 Sorting Rows

...

### 6.1.1 Named Arguments

- 考虑一个降序排序
    ```python
    nba.sort('SALARY', descending=True)
    ```
    - 称其中 `descending=True` 为 **named argument**；`'SALARY'` 为 **positional argument** (
    - 一个函数或方法中，所有 argument 都有一个 position 和一个 name (都可以通过帮助文本中看出)

In [59]:
nba = Table.read_table('nba_salaries.csv')
help(nba.sort)

Help on method sort in module datascience.tables:

sort(column_or_label, descending=False, distinct=False) method of datascience.tables.Table instance
    Return a Table of rows sorted according to the values in a column.
    
    Args:
        ``column_or_label``: the column whose values are used for sorting.
    
        ``descending``: if True, sorting will be in descending, rather than
            ascending order.
    
        ``distinct``: if True, repeated values in ``column_or_label`` will
            be omitted.
    
    Returns:
        An instance of ``Table`` containing rows sorted based on the values
        in ``column_or_label``.
    
    >>> marbles = Table().with_columns(
    ...    "Color", make_array("Red", "Green", "Blue", "Red", "Green", "Green"),
    ...    "Shape", make_array("Round", "Rectangular", "Rectangular", "Round", "Rectangular", "Round"),
    ...    "Amount", make_array(4, 6, 12, 7, 9, 2),
    ...    "Price", make_array(1.30, 1.30, 2.00, 1.75, 1.40, 1.00)

- 从 help 中可以看出 `sort()` 的 **signature** 为
    ```python
    sort(column_or_label, descending=False, distinct=False)
    ```
- When an argument is simply True or False, it’s a useful convention to include the argument name so that it’s more obvious what the argument value means.

---

## 6.2 Selecting Rows


In [67]:
import numpy as np

nba.sort('SALARY', descending=True).take(np.arange(5))  # 找出 salary 前 5 的

PLAYER,POSITION,TEAM,SALARY
Kobe Bryant,SF,Los Angeles Lakers,25.0
Joe Johnson,SF,Brooklyn Nets,24.8949
LeBron James,SF,Cleveland Cavaliers,22.9705
Carmelo Anthony,SF,New York Knicks,22.875
Dwight Howard,C,Houston Rockets,22.3594


In [70]:
nba.where('TEAM', are.containing('Warriors')).show()    # 使用 are.containing() 找 substring

PLAYER,POSITION,TEAM,SALARY
Klay Thompson,SG,Golden State Warriors,15.501
Draymond Green,PF,Golden State Warriors,14.2609
Andrew Bogut,C,Golden State Warriors,13.8
Andre Iguodala,SF,Golden State Warriors,11.7105
Stephen Curry,PG,Golden State Warriors,11.3708
Jason Thompson,PF,Golden State Warriors,7.00847
Shaun Livingston,PG,Golden State Warriors,5.54373
Harrison Barnes,SF,Golden State Warriors,3.8734
Marreese Speights,C,Golden State Warriors,3.815
Leandro Barbosa,SG,Golden State Warriors,2.5
