# Lecture 6 – Table Fundamentals

### Data 6, Summer 2022

In [None]:
# Just run me
from datascience import *
import numpy as np

## Introduction to Tables

Tables allow us to organize data in a systematic and easy-to-work-with way. Each table consists of **columns**, which represent variables, and **rows**, which represent one individual or observation.

Most of our datasets will be stored in `.csv` files (CSV stands for "Comma Separated Values"), which we will _import_ into our notebook using the `Table.read_table(...)` function. Here, `Table` is part of the `datascience` library, which is the main library we will be using to work with tables.

We can load in the same dataset of California public universities from the first lecture by passing in the _filepath_ string corresponding to where our `.csv` file is in our computer's folder structure. (Don't worry, you don't need to know how this works)

In [None]:
schools = Table.read_table('data/cal_unis.csv')
schools

One of the first things we often want to know about our data or table is how big it is. We can use the `tbl.num_rows` and `tbl.num_columns` properties to find that out.

In [None]:
... # Find the number of rows in the `schools` table

In [None]:
... # Find the number columns in the `schools` table

We will take a subset of the first five schools in the table for illustration purposes.

In [None]:
# Just run me
some_schools = schools.take(np.arange(5))
some_schools

Each column in a table is an **array**, which is useful when we want to perform arithmetic on entire columns. We can extract a particular column with the `tbl.column(...)` method. Note that when we talk about table methods in the `datascience` library, we will use `tbl` to refer to the name of a general table. When using these table methods, remember to replace `tbl` is the name of the table you're working with.

In [None]:
... # Return an array containing the city of each school in `some_schools`

In [None]:
... # Return an array of the city of each school using a column index

### Quick Check 1

In [None]:
states = Table.read_table('data/us-state-capitals.csv')

In [None]:
states

What should we pass into `.column()` in order to get the latitudes of each state capital as an array?

In [None]:
states.column(...) # Replace the three dots with your answer

Since `.column()` returns an array, can we use table properties on the result of a `.column()` call? 

_(Hint: Do table methods/properties work on any other data types besides tables?)_

In [None]:
states.column(...).num_rows # See what happens when you replace the three dots with your answer from above

## `select` and `drop`

A common workflow when working with tables is to **import** the table, **identify** relevant columns, and then make a **new table** with only the columns we want to work with. The `.select()` and `.drop()` table methods allow us to do just that. Notice how both methods achieve the same result, just by slightly different means.

In [None]:
some_schools

In [None]:
... # Select only the columns 'Name' and 'Enrollment'

In [None]:
... # Drop columns so that you are left with only 'Name' and 'Enrollment'

**Remember** that _all_ table methods return a **new table**, so the original `some_schools` table is not modified!

In [None]:
some_schools

## Adding Columns

Another thing we might want to do with a table is add additional columns that provide additional tables. We can use the `tbl.with_columns()` method to add columns to an existing table.

In [None]:
some_schools

In [None]:
# Add a column with the nicknames for each of the five schools (Cal, UCD, UCI, UCLA, UCM)
some_schools.with_columns(
    ...
)

In [None]:
# Add two columns to `some_schools`: one with the nickname for the school and the other for how old the school is
some_schools.with_columns(
    ...
)

### Creating tables from scratch

We can also use `tbl.with_columns()` to make an entirely new table from scratch. Notice that `Table()` creates a blank table. This is what we will write in front of `.with_columns()` instead of `tbl`.

In [None]:
Table()

In [None]:
type(Table())

In [None]:
states = Table().with_columns(
    'State', np.array(['California', 'New York', 'Florida', 'Texas', 'Pennsylvania']),
    'Code', np.array(['CA', 'NY', 'FL', 'TX', 'PA']),
    'Population', np.array([39.3, 19.3, 21.7, 29.3, 12.8])
)
states

### Quick Check 2

Given the table `states`, fill in the blanks in the second cell to create a new table that corresponds to the following table:

| State | Code | FedVote |
| --- | --- | --- |
| California | CA | D|
| New York | NY | D |
| Florida | FL | R |
| Texas | TX | R |
| Pennsylvania | PA | D |

In [None]:
states

In [None]:
# Fill in the three blanks to replicate the table above with each state's federal vote
states._____('Population').with_columns(
    ____, ____
)

## Filtering with `.where`

The `tbl.where()` method allows us to filter the table to only the rows that match a certain condition. For right now, the synatx we will use is `tbl.where(label, value)`, where `label` is the column you are filtering by and `value` is the value you want to match to. 

In [None]:
... # Filter the `schools` table to only include UC schools

In [None]:
... # Filter the `schools` table to only the schools in Los Angeles

We will learn more complicated uses of `.where()` later, but for now just remember this specific syntax.

## Additional methods

Here are some additional table methods that are also useful.

### `show`

`tbl.show(n)` displays the first `n` rows of `tbl`. If `n` is not specified, it will display the entire table.

In [None]:
... # Show the first 3 rows of the `schools` table

In [None]:
schools.show()

### `labels`

The `tbl.labels` property returns a tuple (basically a list) of the labels for each of the columns

In [None]:
schools.show(5)

In [None]:
# The result is a "tuple" – think of it as a basic list
schools.labels

You can also relabel the labels in your table using the `tbl.relabed()` method.

In [None]:
schools.relabeled('Name', 'University').show(5)

## Example: WNBA data

We can use this dataset of statistics from the 2020 WNBA season to get more practice working with tables

In [None]:
wnba = Table.read_table('data/wnba-2020.csv')
wnba

In [None]:
..., ... # Find the number of rows and number of columns in the table

In [None]:
wnba_pts = ... # Create a table of only the following columns:
               # 'Player', 'Tm', 'Pos', 'G', 'PTS'
               # Name this table `wnba_pts`

In [None]:
wnba_pts

In [None]:
... # Compute the number of points scored per game (Points Per Game)

In [None]:
# Add a Points Per Game column to the existing `wnba_pts` table
wnba_pts = wnba_pts.with_columns(
    ...
)

In [None]:
wnba_pts