# Lecture 4 - Table Fundamentals

### Data 6

In [1]:
from datascience import *
import numpy as np

<hr style="border: 5px solid #003262;" />
<hr style="border: 1px solid #fdb515;" />

## Challenge: WWPD (What Will Python Do)?

Exercise 1:

In [None]:
str_arr = make_array("bun",
                     "apricot",
                     "Durian",
                     "canteloupe")
np.sort(str_arr)

Exercise 2:

In [None]:
pop_by_year = make_array(23, 45, 93, 101, 118) # in millions
np.diff(pop_by_year)

Exercise 3a, 3b:

In [None]:
int_arr = make_array(2, 0, 4, -3)
np.sort(int_arr)

In [None]:
-1 * np.sort(-1 * int_arr)

<hr style="border: 5px solid #003262;" />
<hr style="border: 1px solid #fdb515;" />

## Introduction to Tables

Tables allow us to organize data in a systematic and easy-to-work-with way. Each table consists of **columns**, which represent variables, and **rows**, which represent one individual or observation.

Most of our datasets will be stored in `.csv` files (**CSV** stands for "Comma Separated Values").

We will _import_ CSVs into our notebook using the `Table.read_table(...)` function. Here, `Table` is part of the `datascience` library, which is the main library we will be using to work with tables.

We can load in the same dataset of [public universities in California](https://en.wikipedia.org/wiki/List_of_colleges_and_universities_in_California) by passing in the _filepath_ string corresponding to where our `.csv` file is in our computer's folder structure. (Don't worry, you don't need to know how this works)

In [2]:
schools = Table.read_table('data/cal_unis.csv')
schools

Name,Institution,City,County,Enrollment,Founded
"California State Polytechnic University, Humboldt",CSU,Arcata,Humboldt,6025,1913
"California State University, Bakersfield",CSU,Bakersfield,Kern,9613,1965
"University of California, Berkeley",UC,Berkeley,Alameda,45307,1869
California State University Channel Islands,CSU,Camarillo,Ventura,6128,2002
"California State University, Dominguez Hills",CSU,Carson,Los Angeles,16426,1960
"California State University, Chico",CSU,Chico,Butte,14183,1887
"University of California, Davis",UC,Davis,Yolo,39679,1905
"California State University, Fresno",CSU,Fresno,Fresno,23999,1911
"California State University, Fullerton",CSU,Fullerton,Orange,40386,1957
"California State University, East Bay",CSU,Hayward,Alameda,13673,1957


### Find Table Dimensions ("first things first")

One of the first things we often want to know about our data or table is how big it is.

In [None]:
schools.num_rows # Find the number of rows in the `schools` table

In [None]:
schools.num_columns # Find the number columns in the `schools` table

`tbl.labels` returns the labels for each of the columns:

In [None]:
# The result is a "tuple" â€“ think of it as a basic list
schools.labels

### Column-first paradigm: Columns are arrays

Each column in a table is an **array**, which is useful when we want to perform arithmetic on entire columns. We can extract a particular column with the `tbl.column(...)` method, where `tbl` is the name of some table.

In [None]:
schools.column('City') # Return an array containing the city of each school in `some_schools`

In [None]:
schools.column(3) # Return an array of the city of each school using a column index

#### `select` and `drop`

A common workflow when working with tables is to **import** the table, **identify** relevant columns, and then make a **new table** with only the columns we want to work with. The `.select()` and `.drop()` table methods allow us to do just that. Notice how both methods achieve the same result, just by slightly different means.

In [None]:
schools

In [None]:
type(schools.column('City'))

In [None]:
type(schools.select('City')) # Select only the columns 'Name' and 'Enrollment'

In [None]:
schools.drop('Founded', 'County', 'Institution', 'City') # Drop columns so that you are left with only 'Name' and 'Enrollment'

**Remember** that _all_ table methods return a **new table**, so the original `schools` table is not modified!

In [None]:
schools

#### Adding Columns

Another thing we might want to do with a table is add additional columns that provide additional tables. We can use the `tbl.with_columns()` method to add columns to an existing table.

In [3]:
schools.show(2)

Name,Institution,City,County,Enrollment,Founded
"California State Polytechnic University, Humboldt",CSU,Arcata,Humboldt,6025,1913
"California State University, Bakersfield",CSU,Bakersfield,Kern,9613,1965


Make a table with two new columns:
1. "Years since Founding" (as of 2025)
1. "County (Full)" with "County" concatenated onto county name

Humboldt County

Kern County

In [11]:
# TODO: add second column
schools.with_columns(
    'Years since Founding', 2025 - schools.column('Founded'),
    'County (Full)', schools.column("County") + " County"
)

Name,Institution,City,County,Enrollment,Founded,Years since Founding,County (Full)
"California State Polytechnic University, Humboldt",CSU,Arcata,Humboldt,6025,1913,112,Humboldt County
"California State University, Bakersfield",CSU,Bakersfield,Kern,9613,1965,60,Kern County
"University of California, Berkeley",UC,Berkeley,Alameda,45307,1869,156,Alameda County
California State University Channel Islands,CSU,Camarillo,Ventura,6128,2002,23,Ventura County
"California State University, Dominguez Hills",CSU,Carson,Los Angeles,16426,1960,65,Los Angeles County
"California State University, Chico",CSU,Chico,Butte,14183,1887,138,Butte County
"University of California, Davis",UC,Davis,Yolo,39679,1905,120,Yolo County
"California State University, Fresno",CSU,Fresno,Fresno,23999,1911,114,Fresno County
"California State University, Fullerton",CSU,Fullerton,Orange,40386,1957,68,Orange County
"California State University, East Bay",CSU,Hayward,Alameda,13673,1957,68,Alameda County


In [16]:
schools.with_columns(
    'Years since Founding', 2025 - schools.column('Founded')
).with_columns(   
    'County (Full)', schools.column("County") + " County"
)

Name,Institution,City,County,Enrollment,Founded,Years since Founding,County (Full)
"California State Polytechnic University, Humboldt",CSU,Arcata,Humboldt,6025,1913,112,Humboldt County
"California State University, Bakersfield",CSU,Bakersfield,Kern,9613,1965,60,Kern County
"University of California, Berkeley",UC,Berkeley,Alameda,45307,1869,156,Alameda County
California State University Channel Islands,CSU,Camarillo,Ventura,6128,2002,23,Ventura County
"California State University, Dominguez Hills",CSU,Carson,Los Angeles,16426,1960,65,Los Angeles County
"California State University, Chico",CSU,Chico,Butte,14183,1887,138,Butte County
"University of California, Davis",UC,Davis,Yolo,39679,1905,120,Yolo County
"California State University, Fresno",CSU,Fresno,Fresno,23999,1911,114,Fresno County
"California State University, Fullerton",CSU,Fullerton,Orange,40386,1957,68,Orange County
"California State University, East Bay",CSU,Hayward,Alameda,13673,1957,68,Alameda County


### Filtering with `.where`

The `tbl.where()` method allows us to filter the table to only the rows that match a certain condition. For right now, the syntax we will use is `tbl.where(label, value)`, where `label` is the column you are filtering by and `value` is the value you want to match to. 

In [None]:
schools.where("Institution", "UC") # Filter the `schools` table to only include UC schools

In [None]:
schools.where(....).select(...).drop(...)

In [None]:
schools.where("City", "Los Angeles") # Filter the `schools` table to only the schools in Los Angeles

We will learn more complicated uses of `.where()` later, but for now just remember this specific syntax.

### Method Chaining

**Method chaining** in Python is when the object returned from one method becomes the object to use in the next method.

Below, `Table()` creates a new empty table.

In [None]:
# TODO: organize code with more whitespace
states = Table().with_columns('State', np.array(['California', 'New York', 'Florida', 'Texas', 'Pennsylvania']), 'Code', np.array(['CA', 'NY', 'FL', 'TX', 'PA']),'Population (millions)', np.array([39.3, 19.3, 21.7, 29.3, 12.8])
)
states

### (bonus) Additional methods

Here are some additional table methods that are also useful.

#### `show`

`tbl.show(n)` displays the first `n` rows of `tbl`. If `n` is not specified, it will display the entire table.

In [None]:
schools.show(3) # Show the first 3 rows of the `schools` table

In [None]:
schools.show()

#### `relabeled`

You can also relabel the labels in your table using the `tbl.relabed()` method.

In [None]:
schools.relabeled('Name', 'University').show(5)

<hr style="border: 5px solid #003262;" />
<hr style="border: 1px solid #fdb515;" />

## Table Practice

In [None]:
schools = Table.read_table('data/cal_unis.csv')
schools

### Exercise 1: Variable names

How do we get all the column labels of `schools`?

In [None]:
schools.show()

### Exercise 2: Reorder columns

How do we reorder the columns, as below?

| Name	| Founded |	Institution |City |	County | Enrollment |
| --- | --- | --- | --- | --- | --- |
| ... | ... | ... | ... | ... | ... |

Hint: use one of `select`, `drop`, or `with_columns`.

In [None]:
schools.select("Name", "Founded", "Institution", "City", "County", "Enrollment")

### Exercise 3: Filtering

1. How do we get a table with only **UC** schools?

In [None]:
...

2. How do we get a table with all the schools in **Los Angeles**?

### Exercise 4: Rename Columns

How do we **update** `schools` such that the column Name is renamed University? _Hint_: Check out the method `relabeled`.

In [None]:
... # using relabeled

There are many ways to approach a problem. Suppose you didn't know the method `relabeled` existed:

In [None]:
schools = Table.read_table('data/cal_unis.csv')
schools = schools.with_column("University", schools.column("Name")).drop("Name")
schools

<hr style="border: 5px solid #003262;" />
<hr style="border: 1px solid #fdb515;" />

## `print()`


The `print` function in Python is used to display information or output on the screen. It allows you to show text, numbers, variables, or any other value to the user or developer during program execution. You can pass multiple arguments to the `print` function, which will be displayed sequentially on a single line by default.

What happens when we run the cell below?

In [None]:
print(15)
x = 3 + 4
x
print(14)
-3

### Hello World
For example, `print("Hello, world!")` would display the text "Hello, world!" on the screen. Run the cell below to try it out!

In [None]:
print("Hello, world!")

### Advanced `print()` Example

Notice how we **typecast** s to be a string so we can concatenate the strings together!

In [None]:
polygon = "square"
s = 4
print("The area of a " + polygon + \
      " with side length " + str(s) + \
      " is " + str(s ** 2) + ".")

### Hm....

Exercise 1: What happens below?

In [None]:
my_var = print("hi")

In [None]:
my_var

In [None]:
print(my_var)

---

## NoneType

In Python, `NoneType` is a special data type that represents the absence of a value or the lack of any specific data. It is used to indicate the absence of a meaningful object or variable. 

When a function or method does not have a return value or when a variable has not been assigned a value, Python assigns the `None` value to it, and its type becomes `NoneType`. 

It is commonly used as a default value, a placeholder, or to signify the failure of an operation.

What is output and/or displayed when we run the cell below?

In [None]:
print("This value is", print(1))