In [35]:
# HIDDEN
import numpy as np
np.set_printoptions(threshold=50)

Tables are a fundamental object type for representing data sets. A table can be viewed in two ways. Tables are a sequence of named columns that each describe a single aspect of all entries in a data set. Tables are also a sequence of rows that each contain all information about a single entry in a data set. 

In order to use tables, import all of the module called `datascience`, a module created for this text.

In [36]:
from datascience import *

Empty tables can be created using the `Table` function, which optionally takes a list of column labels. Though it might seem surprising at first, a common use of the `Table` function is to create an empty table:

In [37]:
Table()

The reason for this is that the empty table can be extended with additional columns. The `with_column` and `with_columns` methods return new tables with additional labeled columns. 

Each column of a table has to be an array. The first argument of `with_column` is a string that is the column label, and the second argument is an array consisting of the data in the column.

Below, we begin each example with an empty table that has no columns. 

In [38]:
Table().with_column('Numbers', make_array(1, 2, 3))

Numbers
1
2
3


We can also create a table that has multiple columns.

In [39]:
Table().with_columns(
'Numbers', make_array(1, 2, 3),
'Letters', make_array('a', 'b', 'c')
)

Numbers,Letters
1,a
2,b
3,c


We can give this table a name, and then extend the table with another column.

In [40]:
an_example = Table().with_columns(
'Numbers', make_array(1, 2, 3),
'Letters', make_array('a', 'b', 'c')
)

an_example.with_column(
'Colors', make_array('red', 'blue', 'green')
)

Numbers,Letters,Colors
1,a,red
2,b,blue
3,c,green


The methods `with_column` and `with_columns` create new tables each time they are called. They don't change the table on which they act. So the table `an_example` still has only the two columns that it had when it was created.

In [41]:
an_example

Numbers,Letters
1,a
2,b
3,c


Creating tables in this way involves a lot of typing. If the data have already been entered somewhere, it is usually possible to use Python to read it into a table, instead of typing it all in cell by cell.

Often, tables are created from files that contain comma-separated values. Such files are called CSV files.

Below, we use the Table method `read_table` to read a CSV file that contains some of the data used by Minard in his graphic about Napoleon's Russian campaign. The data are placed in a table named `minard`

In [42]:
minard = Table.read_table('minard.csv')
minard

Longitude,Latitude,City,Direction,Survivors
32.0,54.8,Smolensk,Advance,145000
33.2,54.9,Dorogobouge,Advance,140000
34.4,55.5,Chjat,Advance,127100
37.6,55.8,Moscou,Advance,100000
34.3,55.2,Wixma,Retreat,55000
32.0,54.6,Smolensk,Retreat,24000
30.4,54.4,Orscha,Retreat,20000
26.8,54.3,Moiodexno,Retreat,12000


We will use this small table to demonstrate some useful Table methods. We will then use those same methods, and develop other methods, on much larger tables of data.

### The Size of the Table ###

The method `num_columns` gives the number of columns in the table, and `num_rows` the number of rows.

In [43]:
minard.num_columns

5

In [44]:
minard.num_rows

8

### Column Labels 
The method `labels` can be used to list the labels of all the columns. With `minard` we don't gain much by this, but it can be very useful for tables that are so large that not all columns are visible on the screen.

In [46]:
minard.labels

('Longitude', 'Latitude', 'City', 'Direction', 'Survivors')

We can change column labels using the `relabeled` method. This creates a new table and leaves `minard` unchanged.

In [47]:
minard.relabeled('City', 'City Name')

Longitude,Latitude,City Name,Direction,Survivors
32.0,54.8,Smolensk,Advance,145000
33.2,54.9,Dorogobouge,Advance,140000
34.4,55.5,Chjat,Advance,127100
37.6,55.8,Moscou,Advance,100000
34.3,55.2,Wixma,Retreat,55000
32.0,54.6,Smolensk,Retreat,24000
30.4,54.4,Orscha,Retreat,20000
26.8,54.3,Moiodexno,Retreat,12000


### Accessing the Data in a Column ###
We can use a column's label to access the array of data in the column.

In [16]:
minard.column('Survivors')

array([145000, 140000, 127100, 100000,  55000,  24000,  20000,  12000])

The 5 columns are indexed 0, 1, 2, 3, 4 and 5. The column `Survivors` can also be accessed by using its column index.

In [19]:
minard.column(4)

array([145000, 140000, 127100, 100000,  55000,  24000,  20000,  12000])

The 8 items in the array are indexed 0, 1, 2, and so on, up to 7. The items in the column can be accessed using `item`.

In [18]:
minard.column(4).item(0)

145000

In [21]:
minard.column(4).item(5)

24000

### Working with the Data in a Column ###
Because columns are arrays, we can use array operations on them to discover new information. For example, we can create a new column that contains the percent of all survivors at each city after Smolensk.

In [48]:
minard.with_column(
    'Percent Surviving', minard.column('Survivors')/145000
    )

Longitude,Latitude,City,Direction,Survivors,Percent Surviving
32.0,54.8,Smolensk,Advance,145000,1.0
33.2,54.9,Dorogobouge,Advance,140000,0.965517
34.4,55.5,Chjat,Advance,127100,0.876552
37.6,55.8,Moscou,Advance,100000,0.689655
34.3,55.2,Wixma,Retreat,55000,0.37931
32.0,54.6,Smolensk,Retreat,24000,0.165517
30.4,54.4,Orscha,Retreat,20000,0.137931
26.8,54.3,Moiodexno,Retreat,12000,0.0827586


To make the proportions in the new columns appear as percents, we can use the method `set_format` with the option `PercentFormatter`. The `set_format` method takes `Formatter` objects, which exist for dates (`DateFormatter`), currencies (`CurrencyFormatter`), numbers, and percentages.

In [49]:
minard_and_percents = minard.with_column(
    'Percent Surviving', minard.column('Survivors')/145000
    )

minard_and_percents.set_format('Percent Surviving', PercentFormatter)

Longitude,Latitude,City,Direction,Survivors,Percent Surviving
32.0,54.8,Smolensk,Advance,145000,100.00%
33.2,54.9,Dorogobouge,Advance,140000,96.55%
34.4,55.5,Chjat,Advance,127100,87.66%
37.6,55.8,Moscou,Advance,100000,68.97%
34.3,55.2,Wixma,Retreat,55000,37.93%
32.0,54.6,Smolensk,Retreat,24000,16.55%
30.4,54.4,Orscha,Retreat,20000,13.79%
26.8,54.3,Moiodexno,Retreat,12000,8.28%


### Choosing Sets of Columns ###
The method `select` creates a new table that contains only the specified columns.

In [28]:
minard.select('Longitude', 'Latitude')

Longitude,Latitude
32.0,54.8
33.2,54.9
34.4,55.5
37.6,55.8
34.3,55.2
32.0,54.6
30.4,54.4
26.8,54.3


The same selection can be made using column indices instead of labels.

In [51]:
minard.select(0, 1)

Longitude,Latitude
32.0,54.8
33.2,54.9
34.4,55.5
37.6,55.8
34.3,55.2
32.0,54.6
30.4,54.4
26.8,54.3


The result of using `select` is a new table, even when you select just one column.

In [29]:
minard.select('Survivors')

Survivors
145000
140000
127100
100000
55000
24000
20000
12000


Notice that the result is a table, unlike the result of `column` which is an array.

In [31]:
minard.column('Survivors')

array([145000, 140000, 127100, 100000,  55000,  24000,  20000,  12000])

Another way to create a new table consisting of a set of columns is to `drop` the columns you don't want.

In [50]:
minard.drop('Longitude', 'Latitude', 'Direction')

City,Survivors
Smolensk,145000
Dorogobouge,140000
Chjat,127100
Moscou,100000
Wixma,55000
Smolensk,24000
Orscha,20000
Moiodexno,12000


All of the methods that we have used above can be applied to any table.