## Lecture Notes - Tables ##

**Helpful Resource:**
- [Python Reference](http://data8.org/sp22/python-reference.html): Cheat sheet of helpful array & table methods used in Data 8!

**Recommended Readings:**
 * [Introduction to Tables](https://www.inferentialthinking.com/chapters/03/4/Introduction_to_Tables)

## Python Arithmetic ##

Already done last lecture

## Variables ##

In Python, a variable dosen't need to be declared a data type before using it.

The data type of a variable is determined by the value assigned to it.


### `int` and `float`

In [None]:
# hourly_rate is a double or float

hourly_rate = 20.5
type(hourly_rate)


In [None]:
# price is an integer

hour = 16
type(hour)


In [None]:
# total's data type is determined by the data types of the expressions
#   hourly_rate and hour

total = hourly_rate * hour

total, type(total)


In [None]:
# hourly_rate is an integer

hourly_rate = 20
type(hourly_rate)

In [None]:
# hour is an integer

hour = 16
type(hour)

In [None]:
# total's data type is determined by the data types of the expressions
#   hourly_rate and hour

total = hourly_rate * hour

total, type(total)

### `str` ###

how to assign a string value to a variable?
use `""` or `''` to represent a string, otherwise produce errors

In [None]:
# use ''

name = 'c'
name, type(name)

In [None]:
# use ""

name = "c"
name, type(name)

In [None]:
# this assignment statement produce errors
#   because c doesn't have a value and being assigned to a variable

name = c

## Assignment Statements ##

We have seen Python codes where contain a variable name, an equal sign and a value expression in one line.  They are call **assignment statements**.

<img src='assignment-statement.png' alt='assignment statement' width='280px'>

An assignment statement changes the meaning of the variable name to the left of the `=` sign.

The variable name is bound to the value of the expression to the right of the `=` sign (its current value; not the equation)


## Why do we use variables (names)? ##

Ans: to store the values and use them later in the program

### ...more on data types and variables in next lecture... ###

## Functions and Call Expressions ##

There are two types of functions - pre-defined functions and user-defined functions. For now, we look at pre-defined functions provided by Python or other libraries/modules.  Later we learn to write our own functions.

### Function Anatomy ###

<img src='function-anatomy.png' alt='function anatomy' width='300px'>

For example, if we want to use math functions, we need to import the math modules to our notebook.

In [None]:
import math

math.log(8, 2)

In [None]:
# if we specify what to import, 
#    we can omit the module name - `math` when we call it

from math import pi
pi

## Some functions already come with Python 

Just call the functions directly, no need to import modules/libraries.

For examples, finding an absolute value, rounding up a decimal number or looking for the min or max value in a list of numbers, etc.


In [None]:
abs(-5)

In [None]:
round(3.141592653589793)

In [None]:
list = [100, 43, 2, 90]
min(list), type(list)

In [None]:
max(list)

In [None]:
min(100, 43, 2, 90)

There are more on pre-defined functions in labs and homeworks for you to practice.

### View a Function Documentation ###

Type a `?` after a function name to see its documentation.


In [1]:
# Example

round?

## Tables ##

Similar to Excel in MS Office, a table in Python is composed of columns and rows.

Each column has a label (name of the column).  Each row represents an individual entity with the columns in a dataset.  Each column is a category to describe a charactristic of the individual entities (row) in a dataset.  In other words, data within a column represents one attribute of the individual entities (row).

<img src='table.png' alt='data in a table' width='500px'>

To use tables in Python, we need to import a package/library which is called "datascience".  It was created by UC Berkeley professors and fellow students.

In [None]:
# import everyting from datascience library/module
# once the module is imported, it can be used for all the cells below it.
from datascience import *


In [None]:
# here we read a data file to a table 
#   and then assign the table to a variable
cones = Table.read_table('cones.csv')
cones


### Default Table Display ##

In [None]:
# let's read another data file to a table
#   we don't assign the table to a variable
#   when this cell is executed, by default, the table will be displayed 
#     10 rows and hide others  

Table.read_table('student_data.csv')

### Specify a Number of Rows to Display - `tbl.show()` ###

In [None]:
# show a specific number of rows

student_tbl = Table.read_table('student_data.csv')
student_tbl.show(5)


In [None]:
# or we can also chain the function calls in one line

Table.read_table('student_data.csv').show(5)


### Display All Rows - `tbl.show()` ###

To display all rows in a table, omit the number inside the parentheses.

In [None]:
# to display all rows in a table

student_tbl = Table.read_table('student_data.csv')
student_tbl.show()


In [None]:
# or chain the function calls in one line

Table.read_table('student_data.csv').show()

### `tbl.select(label)` ###

Constructs a new table with just the specified columns

In [None]:
student_tbl.select("SHOE", "HEIGHT")

### `tbl.where(label, condition)` ###

Constructs a new table with just the rows that match the condition

Each `tbl.where(label, condition)` takes one pair of label and condition at a time.

In [None]:
student_tbl.where("ZIP", 95403)

If we want to construct a new table with just the rows that match more than one condition, we need to chain up another `where(label, condition)`

In [None]:
student_tbl.where("ZIP", 95403).where("COLOR", "Green")

### `tbl.sort(label)` ###

Constructs a new table with rows sorted by the specified column

In [None]:
# by default, sort function will sort the table in ascending order

student_tbl.sort("AGE")


In [None]:
# sometimes we may want to view the entire table

student_tbl.sort("AGE").show()

In [None]:
# we can also sort a table in descending order

student_tbl.sort("AGE", descending=True)

### `tbl.drop(label)` ###

Constructs a new table in which the specified columns are omitted


In [None]:
# When we work on tabular data, the table may be too big to view and 
#   some data may not be relevant, we can drop table columns.

student_tbl.drop("PAPER")

In [None]:
# when we do tbl.drop(), we only remove the column for a table view,
#   not actually remove the column from the table permanently,
#   when we display the table again, we see all column intact

student_tbl

In [None]:
# To keep a table with the column removed, 
#   we need to assign the updated table to a variable

student_tbl_without_paper_col = student_tbl.drop("PAPER")
student_tbl_without_paper_col

In [None]:
# the original student table - student_tbl still contains the PAPER column

student_tbl