# Tables

The next cell has some "boiler plate" code.  You'll find it at the top of all notebooks in CS 104.  Always run this cell.  It sets up our notebook environment to have access to the libraries and resources used in the rest of the code.

In [None]:
from datascience import *
from cs104 import *
import numpy as np

%matplotlib inline

## 1. Python Review

### Expressions

**Expressions** are typically pieces of code that are self-contained and usually evaluate to a value.

In [None]:
24

In [None]:
24 * 7

In [None]:
24 * 60 * (60 + 5 - 3 * 2)

In [None]:
# two to the power of four: 2 * 2 * 2 * 2
2 ** 4

In [None]:
# a hashtag starts a comment 
# comments allow us to make notes to ourselves and other humans
# but they are ignored by the computer

In [None]:
'hello'

### Variables

**Variables** hold values for us.  We assign a value to a variable with an **assignment statement.**
An assignment statement computes the value on the right hand side of = and changes the meaning of the name to the left of the = symbol to be that value

Naming rules:  letters, numbers, underscores; case sensitive; start with letter usually

Calculate the number of seconds in a year.

In [None]:
60 * 60 * 24 * 365

In [None]:
seconds_per_year = 60 * 60 * 24 * 365

In [None]:
seconds_per_year

Often, we want to assign intermediate steps to variables so that they're easier to inspect and debug. 

In [None]:
seconds_per_hour = 60 * 60
hours_per_year = 24 * 365
seconds_per_year = seconds_per_hour * hours_per_year
seconds_per_year

### Functions

A  **function** returns some value, based on its arguments. 

In [None]:
abs(-5)

In [None]:
day_temp = 52
night_temp = 47
abs(night_temp - day_temp)

In [None]:
max(3, 4, 6, -2 , 1, 0)

In [None]:
y = max(3, 4)

In [None]:
y

In [None]:
round(123.456, 1)

## 2. Tables & Obama Gifts

Tables are stored as CSV (Comma separated values) files.  Have a look!

Now we'll look at data about gifts given to President Obama and his family during 2010 when he was in the White House.

The full data set can be found [here](https://raw.githubusercontent.com/tacookson/data/master/us-government-gifts/gifts.csv).

In [None]:
gifts = Table().read_table('data/obama-gifts-2010.csv')
gifts

In [None]:
gifts.show(2)

In [None]:
gifts.show(5) 

*What questions do have about this dataset?*
- How many gifts were given? 
- Total USD? 
- Most expensive gift? 

In [None]:
gifts.num_rows

In [None]:
gifts.num_columns

### Table Operations: Selecting and Dropping Columns

Terminology: *method*

- Select a single column 
- Errors 
- Select multiple columns 
- Methods don't modify the underlying variable


We'll stick to real data as much as we can, but sometimes it is handy to start with a small subset of our whole data set to illustrate new concepts.  Our `tiny_gifts` table contains six rows from the full list of gifts, and we'll use that to introduce some key concepts.  We'll return to the original table below.

In [None]:
tiny_gifts = Table().read_table('data/tiny-obama-gifts-2010.csv')
tiny_gifts

In [None]:
tiny_gifts.select('donor_country')

Let's try again, but ask for a column not in the table... This will cause a Python error because what we're asking for is not present.

In [None]:
tiny_gifts.select('Donor Country')   # caps and no _ between the words...

The next line will also cause an error because `donor` is treated as a variable unless it is surrounded by quotes, eg: `'donor'`.

In [None]:
tiny_gifts.select(donor)

In [None]:
tiny_gifts.select("donor_country", "value_usd")

In [None]:
tiny_gifts.drop("gift_description")

### Table operations return new tables

In [None]:
tiny_gifts

Calling a method on a table (e.g., `drop`) does not change the table -- it creates a new one!  We need to reassign the vairable if we want to save the results of a method applied to a Table. 

In [None]:
gifts_no_description = tiny_gifts.drop("gift_description")
gifts_no_description

In [None]:
tiny_gifts  # hasn't changed!

### Method Chaining

Terminology: *method chaining* is when the outputs from a previous method are used as inputs to the next method (in a single line). 

In [None]:
tiny_gifts.drop("gift_description").show(3)

### Table Operations: Sorting

In [None]:
tiny_gifts.sort('value_usd')

We can call **named arugments** for additional functionality of methods. 

In [None]:
tiny_gifts.sort('value_usd', descending=True)

Let's sort by another column now. 

In [None]:
tiny_gifts.sort('donor_country')

In [None]:
tiny_gifts.sort('value_usd')

In [None]:
tiny_gifts.sort('donor_country')

How can we get the one most expensive gift for each country? 

In [None]:
tiny_gifts.sort('value_usd', descending=True).sort('donor_country', distinct=True)

**Now back to the full data set!!!**

In [None]:
top_gifts = gifts.sort('value_usd', descending=True).sort('donor_country', distinct=True)
top_gifts

Sorted by most expensive?

In [None]:
top_gifts.sort('value_usd', descending=True)

## 3. Williams Majors

Here's a new dataset. Let's explore how the number of different majors have changed over time! 

In [None]:
majors = Table().read_table("data/majors.csv")
majors

In [None]:
majors.sort("2018-2021", descending=True)

### Table Operations: Selecting Rows

In [None]:
div3 = majors.where("Division", are.equal_to(3))
div3

In [None]:
div3 = div3.drop("Division")
div3

### Bar Charts to Visualize Categorical Variables

We can create a bar chart directly from a table by the method `barh`. 

The first argument (categorical variable) in this method will appear on the vertical axis. 

The second (numerical variable) will appear on the horizontal axis. 

In [None]:
div3.barh("Major", "2008-2012")

#### Digression: Bar Charts for Multiple Variables

In [None]:
div3.barh("Major")

### Additional where conditions

See complete list of `are` conditions in our [Python Reference](https://www.cs.williams.edu/~cs104/auto/python-library-ref.html#sec-where).

In [None]:
majors.where("Division", are.not_equal_to(3)).sort("2018-2021", descending=True)

In [None]:
majors.where("2018-2021", are.above(30))

In [None]:
majors.where("2018-2021", are.between(10,20))

In [None]:
majors.where("Major", are.containing('ics'))

## What else can we learn from these data sets?