# Lecture 6

In [None]:
from datascience import *
import numpy as np

%matplotlib inline
import matplotlib.pyplot as plots
plots.style.use('fivethirtyeight')

## Review and continuation Table operations ##

Last class we discussed the following table methods which return new Tables as output:

1. `tb.select(label)`: constructs a new table with just the specified columns
2. `tb.drop(label)`: constructs a new table in which the specified columns are omitted
3. `tb.sort(label)`: constructs a new table with rows sorted by the specified column
4. `tb.where(label, condition)`: constructs a new table with just the rows that match the condition

There are a number of properties we can extract from a Table including:
- `num_rows`: returns the number of rows in a Table
- `num_columns`: returns the number columns in a Table

There are also a number of additional methods for Tables including:
- `relabel('column_name', 'new_name')`: returns a table where the column name `'column_name'` is now called `'new_name'`
- `take(row_numbers)`:  returns a Table with the selected row numbers


In [None]:
# Load the ice cream data. Each row represents one ice cream cone.
cones = Table.read_table('cones.csv')
cones


In [None]:
# select only the chocolate cones using the `where` method as we did last class
cones.where('Flavor', 'chocolate')


In [None]:
# print the number of rows and columns
print(cones.num_rows)
print(cones.num_columns)


In [None]:
# relabel a column
cones.relabel("Flavor", "Taste")

In [None]:
# extract a row
cones.take(0)

## Columns of Tables are Arrays ##

We can extract columns from a `Table` as either:

- A new `Table` with fewer columns using `tb.select()`
- An `ndarray` using `tb.column()` 


In [None]:
cones.select('Price')  # still a table

In [None]:
type(cones.select('Price'))

In [None]:
cones.column('Price') # an array

In [None]:
type(cones.column('Price'))

## Lists

Lists are one of the most widely used data structions in Python. They like like ndarrays but they can hold heterogeneous types of data. 

- We construction lists using square brackets [], where the elements in the list are separated by commas.
- We can access the third items in a list called `my_list` using `my_list[2]`.


In [None]:
my_list = [3.0, 21, "unicorn", "pocket_lint"]

my_list

## Constructing Tables

We have created tables by loading data from comma separated value files (.csv files). We can also create Tables from scratch by using:
 - `Table()`: constucts an empty Table
 - `tb.with_columns("Name", array)` adds columns to a Table
 - `tb.with_row("Name", list)` adds a row to a Table


Let's try creating a table that says how many blocks away different streets are from our classroom (now that we are back in person!).


In [None]:
streets = make_array('College', 'Temple', 'Church', 'Orange')
streets

In [None]:
eastside = Table().with_columns(
    'Street', streets,
    'Blocks from Campus', np.arange(4)
)
eastside

In [None]:
type(eastside.row(0))

In [None]:
eastside = eastside.with_row(['State', 4])
eastside

In [None]:
eastside = eastside.with_column('One-Way', ['No', 'Yes', 'Yes', 'No', 'No'])
eastside

In [None]:
eastside.column('One-Way')

## Example: Census data ##

The US government conducts a census every 10 years. We can examine the census data to see interesting patterns in the population of people in the United States.


In [None]:
# As of August 2021, this census file is online here: 
data = 'http://www2.census.gov/programs-surveys/popest/technical-documentation/file-layouts/2010-2019/nc-est2019-agesex-res.csv'

# A local copy can be accessed here in case census.gov moves the file:
# data = path_data + 'nc-est2019-agesex-res.csv'

full = Table.read_table(data)
full

In [None]:
# get a reduced set of columns that we want to analyze further
partial = full.select('SEX', 'AGE', 'POPESTIMATE2014', 'POPESTIMATE2019')
partial

In [None]:
# rename the columns to make them easier to work with
simple = partial.relabeled('POPESTIMATE2014', '2014').relabeled('POPESTIMATE2019', '2019')
simple

In [None]:
# let's examine the data a little more
simple.sort('AGE', descending=True)

In [None]:
# let's remove the totals (value of 999 in the AGE column)
no_999 = simple.where('AGE', are.below(999))
no_999.sort("AGE", descending=True)

In [None]:
# let's split the data into male, female and everyone
everyone = no_999.where('SEX', 0).drop('SEX')
males = no_999.where('SEX', 1).drop('SEX')
females = no_999.where('SEX', 2).drop('SEX')

In [None]:
females

In [None]:
# let's see which ages have the most people
females.sort('2019', descending=True)

In [None]:
males.sort('2019', descending=True)

In [None]:
# let's create a Table with age males and females 
pop_2019 = Table().with_columns(
    'Age', males.column('AGE'),
    'Males', males.column('2019'),
    'Females', females.column('2019')
)

In [None]:
pop_2019

In [None]:
# let's add a precent female column to our Table
percent_females = 100 * pop_2019.column('Females')/(pop_2019.column('Males') + pop_2019.column('Females'))
counts_and_percents = pop_2019.with_column('Percent Female', percent_females)

In [None]:
counts_and_percents

## Line Graphs ##

A useful way to visualize data as a function of time is a line plot. We can do this using the `tb.plot('x_col_name', 'y_col_name')` method.


In [None]:
counts_and_percents.plot('Age', 'Percent Female')

In [None]:
pop_2019

In [None]:
pop_2019.plot('Age')

In [None]:
everyone = everyone.with_column(
    'Change', everyone.column('2019') - everyone.column('2014')
)

In [None]:
everyone.sort('Change', descending=True)

In [None]:
pop_growth = everyone.with_column(
    'Percent change', everyone.column('Change')/everyone.column('2014'))

pop_growth = pop_growth.set_format('Percent change', PercentFormatter)

pop_growth.sort('Percent change', descending=True)


In [None]:
# plot percent change - any ideas why larger increases around age 72 and late 90's? 
pop_growth.plot("AGE", "Percent change")

# actually plot as percentage rather than proportion
# pop_growth.with_column("Percent change", pop_growth.column("Percent change") * 100).plot("AGE", "Percent change")

In [None]:
age_to_examine = 72
print(2014 - age_to_examine)  # people who were 72 in 2014 were born in which year? 
print(2019 - age_to_examine)  # people who were 72 in 2019 were born in which year?  