## Lecture Notes - Census: Review Table Methods and Visualization ##

**Helpful Resource:**
- [Python Reference](http://data8.org/sp22/python-reference.html): Cheat sheet of helpful array & table methods used in Data 8!

**Recommended Readings:**
- [Sex Ratios](https://inferentialthinking.com/chapters/06/4/Example_Sex_Ratios.html)
- [Population Trends](https://inferentialthinking.com/chapters/06/3/Example_Population_Trends.html)
- [Tables](https://inferentialthinking.com/chapters/06/Tables.html)
- [Arrays](https://inferentialthinking.com/chapters/05/1/Arrays.html)
- [Programming in Python](http://www.inferentialthinking.com/chapters/03/programming-in-python.html)
- [Data Types](https://inferentialthinking.com/chapters/04/Data_Types.html)

## Table Methods ##

- Creating tables: `Table.read_table` 
- Extending tables: `Table().with_columns`
- Finding numbers of rows in a table: `num_rows`
- Finding numbers of columns in a table: `num_columns`
- Referring to columns: by labels or indices: column indices start at 0
- Accessing data in a column: `column` takes a label or index and returns an array
- Using array methods to work with data in columns: `item`, `sum`, `min`, `max`, and so on
- Creating new tables containing some of the original columns: `select`, `drop`

## Manipulating Rows ##

- `tbl.sort(column)` sorts the rows in increasing order
- `tbl.sort(column, descending=True)` sorts the rows in decreasing order
- `tbl.take(row_numbers)` keeps the numbered rows and each row has an index, starting at 0
- `tbl.where(column, condition)` where *condition* can be a value or a predictor, keeps all rows for which a column's value satisfies a condition.  

    For example:
    
    ```tbl.where(column, are.equal_to(value))``` keeps all rows for which a column's value equals some particular value, shorter form: `tbl.where(column, value)`

### Reading a Table from a Data File (Recap) ###

Python reference for CS88
1. import `datascience` module
2. Read a data file to a table format

In [None]:
# Just run the cell to import the required module

from datascience import *
import numpy as np
import warnings
warnings.simplefilter('ignore', FutureWarning)

# These lines do some fancy plotting magic.\n",
import matplotlib
%matplotlib inline
import matplotlib.pyplot as plots
plots.style.use('fivethirtyeight')

In [None]:
cars = Table.read_table("Cars2015_v1.csv")
cars.show(5)

In [None]:
cars.num_rows

In [None]:
cars.num_columns

#### Sort the table `cars` by the column `LowPrice` in ascending order, then assign the sorted table to `sort_by_low_price`

In [None]:
sort_by_low_price = cars.sort("LowPrice", descending=False)
sort_by_low_price

#### Make a table containing the cheapest 7 cars, then assign the table to `lowest_7`.

*Hint: Use table `take` method. Check out the Array Functions and Methods section on [Python Reference](http://data8.org/sp22/python-reference.html): Cheat sheet of helpful array & table methods used in this course.*

`tbl.take(row_indices)` method where `row_indices` can be:

1. a series of row indices that may or may not be in a sequence, for exmamples: 10, 11, 12, 13 or 6, 24, 90, 56
2. a range of row indices that is in a sequence of numbers, for example: 0, 1, 2, 3.



In [None]:
# make a new table with a series of row indices
lowest_7 = sort_by_low_price.take(0, 1, 2, 3, 4, 5, 6)

# or
# use np.arange() method to get a range of rows 
#lowest_7 = sort_by_low_price.take(np.arange(7))

# or
#lowest_7 = sort_by_low_price.take(np.arange(0, 7))
lowest_7

In [None]:
random_7 = sort_by_low_price.take(6, 24, 90, 56, 88, 17, 7)
random_7

#### Compute an array containing the minimum size of a garage for each car in the table `cars`, then assign the array to `storage_spaces`.

*Hint: if you want to put a car in a huge gift box, how you would measure the dimensions and volumn of the box to wrap the car in it. Same way as computing the volumn of a cubic rectangle - width x length x height*.

Steps:
1. Use `tbl.column(column_name_or_index)` convert a table column to an array
2. Compute the storage space for each car

In [None]:
storage_spaces = cars.column("Width") * cars.column("Length") * cars.column("Height")

# or break the long assignment statement into multiple lines
#width = cars.column("Width")
#length = cars.column("Length")
#height = cars.column("Height")
#storage_spaces = width * length * height

storage_spaces

#### Create a table with a new column

1. Add the array `storage_spaces` as a column to the table `cars`
2. Name the column as `Cargo Vol`
3. Sort the resulting table by that column `Cargo Vol` in ascending order
4. Assign the resulting table to `sort_by_cargo_vol`.

*Hint: you have already created the array `storage_spaces` in a question above.*

Steps:
1. Use `tbl.with_columns(name, values)` to append a new column to the table
    `tbl.with_columns(name, values)` can take multiple pairs of name and values
2. Use `tbl.sort(column_name_or_index)` to sort the table
    By default, `tbl.sort(column_name_or_index)` sort by ascending order

In [None]:
sort_by_cargo_vol = cars.with_columns("Cargo Vol", storage_spaces).sort("Cargo Vol")

# or break the long assignment statement into multiple lines
cars_with_cargo_vol = cars.with_columns("Cargo Vol", storage_spaces)
sort_by_cargo_vol = cars_with_cargo_vol.sort("Cargo Vol")

sort_by_cargo_vol

In [None]:
# To sort the table by descending order, we need to explicitly set it

sort_by_descending = cars_with_cargo_vol.sort("Cargo Vol", descending=True)
sort_by_descending

### Visualization ###

1. `tbl.plot(x_column, y_column)`
2. `tbl.scatter(x_column, y_column)`

#### Example 1: Create a line plot of `Cargo Vol` over `Weight` ####

Instead of looking at the table, we want to visualize it on a plot.

Use `tbl.plot(x_column, y_column)`


In [None]:
# Cargo volumn over weight

sort_by_cargo_vol.plot("Weight", "Cargo Vol")


#### What pattern do you see from the plot you created above? ####

*Your obseration here...*

#### Example 2: We can also draw two overlaid line plots, showing the fuel capacity for city and highway miles per gallon  ####

Steps:
1. Create a new table that only contains the data we need to draw a plot, in this example, use `tbl.select(col1, col2, ...)` to make a table with the columns of `CityMPG`, `HwyMPG` and `FuelCap`
2. Use Use `tbl.plot(x_column)` to draw a plot

In [None]:
# use the column indices
sort_by_cargo_vol.select(6, 7, 8).plot("FuelCap")

# or use the column names/labels

#sort_by_cargo_vol.select("CityMPG", "HwyMPG", "FuelCap").plot("FuelCap")

#### What pattern do you see from the plot you created above? ####

*Your obseration here...*

In [None]:
sort_by_cargo_vol.select(6, 7, 8).sort("FuelCap")

#### Example 3: Create a scatter plot to visualize the association of the Weight and its corresponding CityMPG ####

In [None]:
cars.scatter("Weight", "CityMPG")

#### Visualization: create a scatter plot to visualize the association of the Weight and its corresponding Fuel Capacity ####


In [None]:
cars.scatter("Weight", "FuelCap")