<div style="width: 38.5%;">
    <p><strong>City College of San Francisco</strong><p>
    <hr>
    <p>MATH 108 - Foundations of Data Science</p>
</div>

# Lecture 07: Building Tables

Associated Textbook Sections: [5.0, 5.1, 5.2, 5.3](https://inferentialthinking.com/chapters/05/Sequences.html)

---

## Overview

* [Creating Tables](#Creating-Tables)
* [Columns and Rows of Tables](#Columns-and-Rows-of-Tables)
* [Attribute Types](#Attribute-Types)
* [Exploring the Tallest_Buildings](#Exploring-the-Tallest-Buildings)
* [Exploring Movies](#Exploring-Movies)

---

## Set Up the Notebook

In [None]:
from datascience import *
import numpy as np

---

## Creating Tables

Here are *some* of the ways that you will create a table in this class:
* `Table.read_table(filename)` - a table from a CSV file
* `Table()` - an empty table
* `select`, `drop`, `where`, `sort`, etc. - a table from existing tables
* `with_column` and `with_columns` - a table from an existing table with additional columns
* `with_row` and `with_rows` - a table from an existing table with additional rows

---

### Demo: Using `read_table`

As of February 2024, the tallest buildings in the United States (according to Wikipedia) **should** be stored in the file `tallest_buildings.csv`. This file is located in your the same folder as this Jupyter Notebook.

In [None]:
tallest_buildings = ...
tallest_buildings

---

### Demo: Tables from Tables

Most (not all) of the table methods create a new table and do not modify the original table.

In [None]:
tallest_buildings.select('Name', 'Height (ft)')

In [None]:
tallest_buildings

In [None]:
name_height = tallest_buildings.select('Name', 'Height (ft)')
name_height

In [None]:
buildings_above_1500 = tallest_buildings.where('Height (ft)', are.above_or_equal_to(1500))
buildings_above_1500

---

## Columns and Rows of Tables

* Columns:
    * Labeled NumPy arrays
    * Column labels are strings
    * All column values have the same data type
    * `t.select(column_labels_or_indexes)` - creates a table with the specified columns of table `t`.
    * `t.column(column_label_or_index)` - creates an array with the specified column information
* Rows:
    * `Row` data type (... kind of like a `list` where the items have labels)
    * `t.take(row_indexes)` - creates a table with the specified rows
    * `t.row(row_indexes)` - creates a row object with the specified rows

---

### Demo: Rows and Columns

Compare the `row` and `take` table methods.

In [None]:
...

In [None]:
...

In [None]:
...

In [None]:
...

---

Demonstrate how to use an array to use `take` with more than 1 row.

In [None]:
tallest_buildings.take(np.arange(5))

---

Compare the `select` and `column` table methods.

In [None]:
...

In [None]:
...

---

### Demo: Creating a Table from Scratch

Create a table containing information on the major east-west streets north of the Ocean campus and how far they are from campus.

[Google Maps near CCSF - Ocean Campus](https://goo.gl/maps/QVR57VvqKWqLeSA9A)

In [None]:
from IPython.display import IFrame
IFrame('https://www.google.com/maps/embed?pb=!1m10!1m8!1m3!1d6311.10985715617\
7!2d-122.4451173!3d37.7301236!3m2!1i1024!2i768!4f13.1!5e0!3m2!1sen!2sus!4v1675\
197446609!5m2!1sen!2sus', 600, 450)

In [None]:
streets = make_array('Judson', 'Staples', 'Flood', 'Hearst')
streets

In [None]:
Table()

In [None]:
northside = ...
northside

In [None]:
...

In [None]:
northside = ...
northside

In [None]:
...

In [None]:
northside

In [None]:
northside = ...
northside

---

Update `northside` to include Monterey by adding it using `with_row`.

In [None]:
monterey_data = ...
monterey_data

In [None]:
northside = ...
northside

---

Add multiple columns to a table using .with_columns

In [None]:
streets = make_array('Judson', 'Staples', 'Flood', 'Hearst', 'Monterey')
blocks = np.arange(5)
northside_again = ...
northside_again

---

## Attribute Types


---

### Types of Attributes

All values in a column of a table should be both the same type and be comparable to each other in some way
* **Numerical** --- Each value is from a numerical scale
    * Numerical measurements are ordered
    * Differences are meaningful
* **Categorical** --- Each value is from a fixed inventory
    * May or may not have an ordering
    * Categories are the same or different


---

### “Numerical” Attributes

Sometimes numbers represent categorical data:
* 94112 and 94110 are San Francisco ZIP codes
* Subtracting 94112 and 94110 doesn't yield a meaningful value
* ZIP codes are categorical, even though numbers were used for the categories

---

## Exploring the Tallest Buildings

---

### Tallest Buildings Attributes

In [None]:
tallest_buildings

* `'Name'` and `'City'` are categorical attributes
* The other attributes are numerical

---

### Summarizing Height

Remember that `select` and `column` produce different data types.

In [None]:
tallest_buildings.select('Height (ft)')

In [None]:
tallest_buildings.column('Height (ft)')

---

### Average

* The average is one way to summarize numerical data like the heights of buildings.
* Use `np.average` to calculate the average of an array of values.
* Another name that we will use for average is `mean`.
* `np.mean` calculates the mean of an array of values.
* For our class, we will use average and mean interchangeably.
* The `np.average` function also calculates weighted averages.

---

#### Demo: Average

Calculate the average height of the buildings in the data set.

In [None]:
heights = ...
heights

In [None]:
average_height = ...
average_height

---

### Median

* The median can also be used to summarize numerical data like the heights of buildings
* The median is:
    * The middle value of an odd number of sorted data points
    * The average of the two middle values of an even number of sorted data points
* Use `np.median` to calculate the median of an array of numerical values.

---

#### Demo: Median

Calculate the median height of the buildings in the table.

In [None]:
median_height = ...
median_height

---

### The Salesforce Tower

The Salesforce Tower in San Francisco is one of the tallest buildings in the country. See if it is in the `tallest_buildings` table.

In [None]:
...

---

There must have been a mistake on creating the `tallest_buildings.csv` data file, because the Salesforce Tower is on [Wikipedia's page](https://en.wikipedia.org/wiki/List_of_tallest_buildings_in_the_United_States), but not in the table. Create a new table that includes all the information in `tallest_buildings` and the Salesforce Tower information.

In [None]:
tallest_buildings_with_SFT = ...
tallest_buildings_with_SFT.where('City', 'San Francisco')

---

## Exploring Movies

---

### Loading the Data

Explore the `top_movies.csv` data set that we generated from [www.the-numbers.com/market](https://www.the-numbers.com/market/). Try to implement several of the methods and attributes seen so far.

In [None]:
movies = Table.read_table('top_movies.csv')
movies

---

### Data Attributes

* `'Year'`, `'Total for Year ($)'`, `'Total in 2022 dollars'`, and `'Tickets Sold'` are numerical attributes
* The other attributes are categorical

---

### Demo: Explore Movies

Calculate the average ticket price and add that information to the table.

In [None]:
total_gross = ...
number_of_tickets_sold = ...
ave_ticket_price = ...
ave_ticket_price

In [None]:
movies = ...
movies

---

Practice filtering the data using the `where` and `take` table methods.

In [None]:
...

In [None]:
...

In [None]:
...

In [None]:
...

In [None]:
sorted_movies = ...
top_5_movies = ...
top_5_movies

<footer>
    <hr>
    <p>Adopted from UC Berkeley DATA 8 course materials.</p>
    <p>This content is offered under a <a href="https://creativecommons.org/licenses/by-nc-sa/4.0/">CC Attribution Non-Commercial Share Alike</a> license.</p>
</footer>