<div style="width: 38.5%;">
    <p><strong>City College of San Francisco</strong><p>
    <hr>
    <p>MATH 108 - Foundations of Data Science</p>
</div>

# Lecture 05: Building Tables

Associated Textbook Sections: [5.0, 5.1, 5.2, 5.3](https://inferentialthinking.com/chapters/05/Sequences.html)

<h2>Set Up the Notebook<h2>

In [None]:
from datascience import *
import numpy as np
%matplotlib inline
import matplotlib.pyplot as plots
plots.style.use('fivethirtyeight')

## Arrays

### Demo: Columns of Tables are Arrays

Import the out-of-date data on skyscrapers from `skyscrapers_v2.csv` as a table and explore the San Francisco content of that table.

In [None]:
skyscrapers = Table.read_table('data/skyscrapers_v2.csv')
skyscrapers

In [None]:
...

Add the Salesforce Tower to the table using [the information from Wikipedia](https://en.wikipedia.org/wiki/List_of_tallest_buildings_in_San_Francisco) and observe how adding just one very tall building can drastically change the average height. 

The method used below uses a Python object called a list, as indicated by the square brackets. We will discuss these later.

In [None]:
...

## Ranges ##

### Ranges

* A range is an array of consecutive numbers
    * `np.arange(stop)`: An array of increasing integers from `0` up to `stop`
    * `np.arange(start, stop)`: An array of increasing integers from `start` up to `stop`
    * `np.arange(start, stop, step)`: A range with step between consecutive values
* The range always includes `start` but excludes `stop`

### Demo: Ranges

Make an array explicitly using `make_array` and then construct it and several others using `np.arange`. Access items in an array using the `item` method. Explore the documentation for `np.arange` and force some common errors.

For those with experience using Python arrays, keep in mind that this course does not use the bracket notation `[]` to access items in an array and that may trigger an error in the auto-grader on assignments.

In [None]:
...

### Ways to Create a Table

* `Table.read_table(filename)` - reads a table from a csv file
* `Table()` - an empty table
* and… `select`, `drop`, `where`, `sort` and a few other table methods all create new tables
* The table methods `with_column` and `with_columns` adds column(s) to the end of the table and creates a new table.

### Demo: Creating a Table from Scratch

Create a table containing information on the major east-west streets north of the Ocean campus and how far they are from campus. Make sure how to see that most table methods will end up creating a new table and not modify the original one.

[Google Maps near CCSF - Ocean Campus](https://goo.gl/maps/QVR57VvqKWqLeSA9A)

In [None]:
...

There are various properties of a table that you can access as well using the dot notation such as `labels`, `num_columns`, and `num_rows`.

In [None]:
...

## Example: W. E. B. Du Bois

Image Source: [Wikipedia - W. E. B. Du Bois](https://en.wikipedia.org/wiki/W._E._B._Du_Bois)

<a href="https://en.wikipedia.org/wiki/W._E._B._Du_Bois"><img src="./img/WEB_DuBois_1918.jpeg" width = 12%></a>

**The content of the following podcast, video, and images contains references to slavery, lynching, and the historical use of the word negro.**

* Scholar, historian, activist, and data scientist
    > "The Philadelphia Negro was the first scientific study of race in the world. [...] the first non-racist investigation of a non-white poulation in the world. [...] one of the first social scientific written in the U.S. using the advanced statistical methods of the time." - Dr. Tukufu Zuberi, Professor of Race Relations at the University of Pennsylvania (Source: [A Legacy of Courage: W.E.B. Du Bois and the Philadelphia Negro](https://youtu.be/PQX_0uyDgGw))
* First Black American to receive a PhD from Harvard
* NAACP founder
* Made a series of visualizations for the 1900 Paris Exposition
    * Goal: change the way people see Black Americans
    * Hundreds of photographs and patents
    * 60+ handmade graphs in 3 months
    > "All art is propaganda, and ever must be, despite the wailing of the purists. I stand in utter shamelessness and say that whatever art I have for writing has been used always for propaganda for gaining the right of black folk to love and enjoy. I do not care a damn for any art that is not used for propaganda." - W.E.B. Du Bois
* Typically compared with Booker T. Washington.

The following podcast provides an 11 minutes overview of these two leaders.

In [None]:
# The content contains references to slavery, lynching, and the historical use of the word negro.
from IPython.display import IFrame
IFrame('https://open.spotify.com/embed/episode/6MdipyUuPK2bbXF0n2CYA1?utm_source=generator', width=500, height=350)

### Images from Paris Exposition

Image Sources:
* [Smithsonian Magazine - W.E.B. Du Bois’ Visionary Infographics Come Together for the First Time in Full Color](https://www.smithsonianmag.com/history/first-time-together-and-color-book-displays-web-du-bois-visionary-infographics-180970826/#)
* [WBUR - W.E.B. Du Bois Created These Infographics In 1900 To Humanize The African-American Experience](https://www.wbur.org/news/2019/02/21/web-du-bois-infographics-humanity-african-american)

<img src="//cdn.thinglink.me/api/image/1119379323288027138/1024/10/scaletowidth#tl-1119379323288027138;1750075619" width=32%>

<img src="//cdn.thinglink.me/api/image/1119380866397634562/1024/10/scaletowidth#tl-1119380866397634562;1750075619" width = 32%>

<img src="./img/WEB_DuBois_income_and_expenditure.jpeg" width = 50%>

### Demo: Reading a Table from a File

Read the `du_bois.csv` data as a table.

In [None]:
du_bois = Table.read_table('data/du_bois.csv')
du_bois

Find the income bracket (`CLASS`) that spent the highest percentage of their income on rent.

In [None]:
...

Explore the table using `select`, `column`

In [None]:
...

Add a column to the table showing the dollar amount for food based on the presented average.

In [None]:
...

Format the `Food` column as a percent using the `set_format` method.

In [None]:
...

### Table Methods

* Creating and extending tables: `Table().with_column` and `Table.read_table`
* Finding the size: `num_rows` and `num_columns`
* Referring to columns: labels, relabeling, and indices 
    * `labels` and `relabeled`; column indices start at 0
* Accessing data in a column: `column` takes a label or index and returns an array
* Using array methods to work with data in columns: `item`, `sum`, `min`, `max`, and so on
* Creating new tables containing some of the original columns: `select`, `drop`

### Demo: Selecting Data in a Column

Run the following commands to explore the `movies_by_year_with_ticket_price.csv` data set using several of the methods used so far. Also, preview the `plot` method.

In [None]:
movies = Table.read_table('data/movies_by_year_with_ticket_price.csv')
movies.show()

In [None]:
gross_in_dollars = movies.column('Total Gross') * 1e6
tix_sold = gross_in_dollars / movies.column('Average Ticket Price')

In [None]:
movies = movies.with_column('Tickets sold', tix_sold)

In [None]:
movies.show(4)

In [None]:
movies.set_format('Tickets sold', NumberFormatter)

In [None]:
movies.plot('Year', 'Tickets sold')

In [None]:
movies.where('Year', are.between(2000, 2005))

In [None]:
movies.where('Year', 2002)

In [None]:
movies.where('Year', are.equal_to(2002))

In [None]:
movies.where('#1 Movie', are.containing('Harry Potter'))

In [None]:
movies.take(np.arange(2, 5))

<footer>
    <hr>
    <p>Adopted from UC Berkeley DATA 8 course materials.</p>
    <p>This content is offered under a <a href="https://creativecommons.org/licenses/by-nc-sa/4.0/">CC Attribution Non-Commercial Share Alike</a> license.</p>
</footer>