## Table of Contents

1.  <a href='#section 1'>Lists</a>

    a. <a href='#subsection 1a'> What is a list?</a> 

    b. <a href='#subsection 1b'>Index Practice</a><br><br>
    
2. <a href='#section 2'>Acquiring and cleaning data</a>

    a. <a href='#subsection 1d'>Relabeled</a><br><br> 
    
    
## Learning Objectives:
- Students will be able to make and read lists.
- Students will be able to clean up data by relabelling and using column arithmetic. 


In [None]:
from datascience import *
import numpy as np

# 1. Lists <a id='section 1'></a>

### What is a list?  <a id='subsection 1a'></a>

A list is a sequence of values (just like an array), but the values can have all different types. The way we know it is a list is by square brackets. Note: You can include tables in lists as well. However, you can not include tables in arrays.

In [None]:
# Here is an example of a list
our_list = [2 + 3, 'four', 4, "3", 2, Table().with_column('Q', [3, 4])]
our_list

In [None]:
# This is an array
our_array = make_array(1, 4, 2, 6)
our_array

<div class="alert alert-warning">
<b>Question:</b> Why might we want to use a list and not an array? And, why might we want to use an array and not a list?
   </div>

[Insert Answer here]

<div class="alert alert-warning">
<b>Question:</b> How many elements are in our_list? [Hint: Try to use one of these: sum(your_list), max(your_list), len(your_list)]
   </div>

In [None]:
num_elements = ...

### Index Practice <a id='subsection 1b'></a>

#### LISTS

The way we access the elements in the list is with brackets. It is important to not that the index starts with 0.

In [None]:
my_list = [9 + 1.001, "100", "thousand", 0.001, 1]
my_list

In [None]:
# indexing example into lists
first_element = my_list[0]
first_element

In [None]:
order = make_array("first", "second", "third", "fourth", "fifth")

Table().with_columns(
    'Index', np.arange(5),
    "Order", order,
    "Element", my_list
)

#### ARRAYS

This is different from arrays. To index into an array, you will use `.item(...)`

In [None]:
# indexing example into arrays
our_array.item(0)

Here is a table with the index, order, and elements.

In [None]:
Table().with_columns(
    'Index', np.arange(4),
    "Order", make_array("first", "second", "third", "fourth"), 
    "Elements", our_array
)

<div class="alert alert-warning">
<b>Question:</b> Access the 3rd element in my_list. What is the data type of this element? Use brackets to index into the list.
   </div>

In [None]:
third_element = ...
third_element

In [None]:
type(...)

# 2. Acquiring and cleaning data

We looked at the 2010 Census yesterday, now we are going to load it again and go through the process of cleaning. Cleaning data makes it more usable and easier to read. It can include identifying missing values, changing column names, or changing the data types of the elements of a column in order to work with it better 

In [None]:
path_data = '../../../../data/'
np.set_printoptions(threshold=50)

# As of Jan 2017, this census file is online here: 
data = 'http://www2.census.gov/programs-surveys/popest/datasets/2010-2015/national/asrh/nc-est2015-agesex-res.csv'

# A local copy can be accessed here in case census.gov moves the file:
# data = path_data + 'nc-est2015-agesex-res.csv'

full_census_table = Table.read_table(data)
full_census_table

We selected columns yesterday but also simplify the labels of the selected columns.

In [None]:
# This first line we used yesterday to select specific columns
partial_census_table = full_census_table.select('SEX', 'AGE', 'POPESTIMATE2010', 'POPESTIMATE2014')
partial_census_table

In [None]:
us_pop_relabel = partial_census_table.relabeled('POPESTIMATE2010', '2010')
us_pop_relabel

We changed the `POPESTIMATE2010` to `2010` in order to make it easier to read and use in future analysis. 

<div class="alert alert-warning">
<b>Question:</b> Set us_pop to a table where both columns are relabeled.
   </div>

In [None]:
us_pop = us_pop_relabel....
us_pop

We now have a table that is easy to work with. Each column of the table is an array of the same length, and so columns can be combined using arithmetic. Here is the change in population between 2010 and 2014.

In [None]:
change = us_pop.column('2014') - us_pop.column('2010')
change

Let us augment `us_pop` with a column that contains these changes, both in absolute terms and as percents relative to the value in 2010. Notice the "Percent change" column, which we can reformat using PercentFormatter.

In [None]:
#run this cell
census = us_pop.with_columns(
    'Change', change,
    'Percent Change', change / us_pop.column('2010')
)
census

In [None]:
census.set_format('Percent Change', PercentFormatter)