# Lecture 3 - Demo

In [2]:
from datascience import *
import numpy as np

%matplotlib inline
import matplotlib.pyplot as plots
plots.style.use('fivethirtyeight')

## Words of Caution ##
- Remember to run the cell above. It's for setting up the environment so you can have access to what's needed for this lecture. For now, don't worry about what it means: we'll learn more about what's inside of it in the next few lectures.
- Data science is not just about code, so please don't go over this notebook by itself. Have the relevant textbook sections or lecture video at hand so that you can go over the discussion along with the code. Thank you! 

## Python Arithmetic ##

In [3]:
# Basic arithmetic

In [4]:
2 + 6

8

In [5]:
2 * 4

8

In [6]:
# Bound to rules

In [7]:
# This will error because it is not a full expression
# * 4

In [8]:
# Same for this because the * * operator (with space) does not exists
2 * * 4

SyntaxError: invalid syntax (<ipython-input-8-ce54d8f44df7>, line 2)

In [9]:
# The ** operator does exist; exponentiation
2 ** 4

16

In [10]:
# Note how division results in a decimal number (also - float)
8/9

0.8888888888888888

In [11]:
# This however results in an integer
8 * 9

72

In [12]:
# PEMDAS
2 + 3 * -5

-13

In [13]:
# Let's use parentheses
(2 + 3) * -5

-25

In [14]:
# Please close your parentheses well. Error messages can also be very informative.
# ((2+3) * -5

In [15]:
# You can add different expressions together
-(1+1) + 2 * ( 3 * 4 * 5 / 6) ** 3 + 7 + 8 + 9

2022.0

In [16]:
# Text data are represented as strings

In [17]:
# You can use single quotation marks
'Data 8'

'Data 8'

In [18]:
# Or double
"Hello World"

'Hello World'

In [19]:
# We can even do math. More on this later!
"Hi" * 3

'HiHiHi'

## Names ##

In [20]:
# Assigning variables

In [21]:
a = 2

In [22]:
a

2

In [23]:
b = 3
b
a

2

In [24]:
# We can also do math with variables

In [25]:
a * b

6

In [26]:
2 * a

4

In [27]:
a + b = total

SyntaxError: can't assign to operator (<ipython-input-27-ab948ce2525a>, line 1)

In [28]:
total_a_and_b = a + b

In [29]:
# Reassigning a variable does not update your original value

In [30]:
a = 5
a

5

In [31]:
total_a_and_b

5

In [32]:
a + b

8

In [33]:
# Running a cell over again might cause trouble!

In [34]:
a

5

In [35]:
a = a + 1

In [36]:
a

6

In [37]:
# You can reset the kernel. Python then just unremembers everything you have done.

## Why names? ##

In [38]:
# Option 1 : No names can be messy. Imagine you go back to this example in 1 week. 
# It is easy to get confused.
40 * 15

600

In [39]:
40 * 52 * 15

31200

On January 1, 2022, the CA Minimum Wage for Employers with 26 Employees or More increased to $15/hour. For instance, if we want to calculate `hours_per_year`, `weekly_wages`, `yearly_wages`

In [40]:
ca_hour_minimum_wage = 15
hours_p_week = 40
weeks_per_year = 52

In [41]:
hours_per_year = hours_p_week * weeks_per_year
hours_per_year

2080

In [42]:
weekly_wages = hours_per_year * ca_hour_minimum_wage

In [43]:
weekly_wages

31200

## Functions and Call Expressions ##

In [44]:
# We can call the abs function on a number
abs(-5)

5

In [45]:
# We can call also call it on an expression
abs(1 - 5)

4

In [46]:
# But not on multiple numbers
abs(1, -5)

TypeError: abs() takes exactly one argument (2 given)

In [None]:
# We can call it on names that are number
# Also - celsius rules
day_temp = 33
min_temp = 18
abs(day_temp - min_temp)

In [47]:
# Sometimes a function takes multiple arguments
max(209, 34)

209

In [48]:
# We can assign it to a name too 
max_number = max(209, 34)
max_number

209

In [49]:
# Rounding is cool
round(123.456)

123

In [50]:
# We can use optional arguments
round(123.456, 1)

123.5

In [51]:
# To learn about a function you can:
# round?
# -- or --
# round(<SHIFT + TAB>)
# to see the use case

In [52]:
# arguments can have names
round(123.456, ndigits=1)

123.5

In [53]:
# these names only exist within the function call. Not outside
ndigits

NameError: name 'ndigits' is not defined

In [54]:
# Sometimes you don't need to name the first argument, but you can
round(number=123.456)

123

In [55]:
round(numbers=123)

TypeError: round() missing required argument 'number' (pos 1)

In [56]:
# Not all functions are native to python
sqrt(16)

NameError: name 'sqrt' is not defined

In [57]:
# Luckily we have libraries that other people have created for us
# like math
import math

In [58]:
# Now we can do square roots
math.sqrt(16)

4.0

## Tables ##

In [59]:
from datascience import *

In [60]:
# Reading in tables! You don't need to remember how to do this. But it is cool to play around.
cones = Table.read_table('cones.csv')
cones

Flavor,Color,Price
strawberry,pink,3.55
chocolate,light brown,4.75
chocolate,dark brown,5.25
strawberry,pink,5.25
chocolate,dark brown,5.25
bubblegum,pink,4.75


In [61]:
# Show first three rows
cones.show(3)

Flavor,Color,Price
strawberry,pink,3.55
chocolate,light brown,4.75
chocolate,dark brown,5.25


In [62]:
# The call above doesn't save it though
cones

Flavor,Color,Price
strawberry,pink,3.55
chocolate,light brown,4.75
chocolate,dark brown,5.25
strawberry,pink,5.25
chocolate,dark brown,5.25
bubblegum,pink,4.75


In [63]:
# Selecting one column - Price
cones.select("Price")

Price
3.55
4.75
5.25
5.25
5.25
4.75


In [64]:
# Still doesn't save it
cones

Flavor,Color,Price
strawberry,pink,3.55
chocolate,light brown,4.75
chocolate,dark brown,5.25
strawberry,pink,5.25
chocolate,dark brown,5.25
bubblegum,pink,4.75


In [65]:
# Now we created a new table that only has the prices :D
only_prices = cones.select("Price")
only_prices

Price
3.55
4.75
5.25
5.25
5.25
4.75


In [66]:
# Selecting multiple columns is also a thing
cones.select("Price", "Flavor")

Price,Flavor
3.55,strawberry
4.75,chocolate
5.25,chocolate
5.25,strawberry
5.25,chocolate
4.75,bubblegum


In [67]:
# Flavor vs 'Flavor'
cones.select(Flavor)

# The name Flavor does not exist

NameError: name 'Flavor' is not defined

In [68]:
# But this does work
Flavor = 'Flavor'
cones.select(Flavor)

Flavor
strawberry
chocolate
chocolate
strawberry
chocolate
bubblegum


In [69]:
cones.select("Flavor")

Flavor
strawberry
chocolate
chocolate
strawberry
chocolate
bubblegum


In [70]:
# Instead of selecting, we can also drop
cones.drop("Color")

Flavor,Price
strawberry,3.55
chocolate,4.75
chocolate,5.25
strawberry,5.25
chocolate,5.25
bubblegum,4.75


In [71]:
# Filtering values - we only want rows with strawberry as Flavor
cones.where("Flavor", "strawberry")

Flavor,Color,Price
strawberry,pink,3.55
strawberry,pink,5.25


In [72]:
# Sorting is by default in ascending order
cones.sort("Price")

Flavor,Color,Price
strawberry,pink,3.55
chocolate,light brown,4.75
bubblegum,pink,4.75
chocolate,dark brown,5.25
strawberry,pink,5.25
chocolate,dark brown,5.25


In [73]:
# Sorting the other way !
cones.sort("Price", descending=True)

Flavor,Color,Price
chocolate,dark brown,5.25
strawberry,pink,5.25
chocolate,dark brown,5.25
bubblegum,pink,4.75
chocolate,light brown,4.75
strawberry,pink,3.55


In [74]:
# Sorting also works on strings!
cones.sort("Color")

Flavor,Color,Price
chocolate,dark brown,5.25
chocolate,dark brown,5.25
chocolate,light brown,4.75
strawberry,pink,3.55
strawberry,pink,5.25
bubblegum,pink,4.75


## More interesting tables ##

In [75]:
# From https://github.com/erikgregorywebb/datasets/blob/master/nba-salaries.csv
nba = Table.read_table('nba_salaries.csv')
nba

PLAYER,POSITION,TEAM,'15-'16 SALARY
Paul Millsap,PF,Atlanta Hawks,18.6717
Al Horford,C,Atlanta Hawks,12.0
Tiago Splitter,C,Atlanta Hawks,9.75625
Jeff Teague,PG,Atlanta Hawks,8.0
Kyle Korver,SG,Atlanta Hawks,5.74648
Thabo Sefolosha,SF,Atlanta Hawks,4.0
Mike Scott,PF,Atlanta Hawks,3.33333
Kent Bazemore,SF,Atlanta Hawks,2.0
Dennis Schroder,PG,Atlanta Hawks,1.7634
Tim Hardaway Jr.,SG,Atlanta Hawks,1.30452


**Discussion question**: Create a table of just the point guards of the Atlanta Hawks?

In [76]:
# We know how to get the point guards - but this also contains non-Hawks PGs #
nba.where("POSITION", "PG")

PLAYER,POSITION,TEAM,'15-'16 SALARY
Jeff Teague,PG,Atlanta Hawks,8.0
Dennis Schroder,PG,Atlanta Hawks,1.7634
Avery Bradley,PG,Boston Celtics,7.73034
Isaiah Thomas,PG,Boston Celtics,6.91287
Marcus Smart,PG,Boston Celtics,3.43104
Terry Rozier,PG,Boston Celtics,1.82436
Jarrett Jack,PG,Brooklyn Nets,6.3
Shane Larkin,PG,Brooklyn Nets,1.5
Kemba Walker,PG,Charlotte Hornets,12.0
Brian Roberts,PG,Charlotte Hornets,2.85494


In [77]:
# We know how to get the atlanta hawks - but this also contains non-PGs #
nba.where("TEAM", "Atlanta Hawks")

PLAYER,POSITION,TEAM,'15-'16 SALARY
Paul Millsap,PF,Atlanta Hawks,18.6717
Al Horford,C,Atlanta Hawks,12.0
Tiago Splitter,C,Atlanta Hawks,9.75625
Jeff Teague,PG,Atlanta Hawks,8.0
Kyle Korver,SG,Atlanta Hawks,5.74648
Thabo Sefolosha,SF,Atlanta Hawks,4.0
Mike Scott,PF,Atlanta Hawks,3.33333
Kent Bazemore,SF,Atlanta Hawks,2.0
Dennis Schroder,PG,Atlanta Hawks,1.7634
Tim Hardaway Jr.,SG,Atlanta Hawks,1.30452


In [78]:
# we can combine calls!
nba.where("TEAM", "Atlanta Hawks").where("POSITION", "PG")

PLAYER,POSITION,TEAM,'15-'16 SALARY
Jeff Teague,PG,Atlanta Hawks,8.0
Dennis Schroder,PG,Atlanta Hawks,1.7634
