# Python Refresher
## and some new things

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/defreez/cs356-notebooks/blob/main/notebooks/python-refresher.ipynb)

## Variables and Types

Python is a dynamically typed language. It *has* types, but we don't explicitly declare the type of a variable when it is being created.

In [13]:
# Variables Types
x = 1
y = 1.

In [14]:
x

1

In [15]:
x, type(x), y, type(y)

(1, int, 1.0, float)

In [16]:
foo = 'foo'
bar = "bar"
foo, bar

('foo', 'bar')

## Lists

Lists are ordered, homogeneous, and mutable

In [28]:
a_list = [1,2,3]
a_list[0] = 5

In [29]:
a_list

[5, 2, 3]

## Tuples

Tuples are ordered, heterogeneous, and immutable. 

In [25]:
a_tuple = ("age", "25")
a_tuple[0] = 3

TypeError: 'tuple' object does not support item assignment

### Dictionaries

Key/value pairs

In [56]:
person = {"name": "lilith", "age": 28}

## For loops

Python using a range style for loop always. This works well because there are so many iterators in Python/

In [39]:
for i in range(10):
    print(i)

0
1
2
3
4
5
6
7
8
9


In [40]:
for i in [1,2,3]:
    print(i)

1
2
3


In [42]:
for i in ("abc", "xyz"):
    print(i)

abc
xyz


In [54]:
person = {"name": "lilith", "age": 28}
for k in person:
    print(k)

name
age


In [57]:
for k, v in person.items():
    print(k, v)

name lilith
age 28


In [58]:
# list comprehensions
nums = [1, 2, 3]
[x*3 for x in nums]

[3, 6, 9]

## Functions

In [34]:
# Functions are defined with the def keyword
# Function arguments are passed by reference
# This function doesn't return a value, therefore the function call evaluates to None
def update(x):
    x = 5

y = 10
update(y), y

(None, 10)

In [36]:
# Default arguments
def add(x, v=1):
    return x + v

add(5), add(5, 2)

(6, 7)

### Higher-order Functions

Python allows you to program in a "functional" style because it supports "first-class" functions.
This means that functions can be treated as values. We will use this feature a lot.

In [18]:
# For example, let's write a function that applys another function to every item in a list.
# This is not efficient.
def fnlst(fn, lst):
    result = []
    for l in lst:
        result.append(fn(l))
    return result

def double(x):
    return x * 2

In [19]:
fnlst(double, [1, 2, 3])

[2, 4, 6]

### Lambda Functions

"Lambda" functions allow you to declare a function inline anonymously (without a name).

In [21]:
fnlst(lambda x: x * 2, [1,2,3])

[2, 4, 6]

## Numpy shapes and Stacking

In [59]:
import numpy as np

In [65]:
x = np.arange(10)
x

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [69]:
## Rank-1 Array
x.shape

(10,)

In [72]:
## Rank-2 Array
nums = [[1,2,3], [2,4,6], [3,6,9]]
x = np.array(nums)
x, x.shape

(array([[1, 2, 3],
        [2, 4, 6],
        [3, 6, 9]]),
 (3, 3))

In [98]:
## Stacking adds an extra dimension
## The axis parameter specifies the index of the new axis in the dimensions of the result. 
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])

stacked = np.stack((a, b))
stacked, stacked.shape

(array([[1, 2, 3],
        [4, 5, 6]]),
 (2, 3))

In [95]:
stacked = np.stack((a, b), axis=1)
stacked, stacked.shape

(array([[1, 4],
        [2, 5],
        [3, 6]]),
 (3, 2))

In [96]:
stacked = np.stack((a, b), axis=-1)
stacked, stacked.shape

(array([[1, 4],
        [2, 5],
        [3, 6]]),
 (3, 2))

In [107]:
a = np.array([[1, 1], [2, 2]])
b = np.array([[10, 10], [20, 20]])
c = np.array([[100, 100], [200, 200]])
np.stack([a, b, c])

array([[[  1,   1],
        [  2,   2]],

       [[ 10,  10],
        [ 20,  20]],

       [[100, 100],
        [200, 200]]])

In [114]:
np.stack([a, b, c], axis=1)

array([[[  1,   1],
        [ 10,  10],
        [100, 100]],

       [[  2,   2],
        [ 20,  20],
        [200, 200]]])

## Pandas and Tabular Data

Indexing DataFrames
Mapping a function over a dataframe
Filtering a DataFrame
GroupBy
Plotting

In [115]:
import pandas as pd

In [116]:
# Loading CSV data
# Can be loaded from a URL or local file
df = pd.read_csv('https://dlptest.com/sample-data.csv')

In [118]:
df.head()

Unnamed: 0,SSN,gender,birthdate,maiden name,last name,first name,address,city,state,zip,phone,email,cc_type,CCN,cc_cvc,cc_expiredate
0,172-32-1176,m,4/21/1958,Smith,White,Johnson,10932 Bigge Rd,Menlo Park,CA,94025,408 496-7223,jwhite@domain.com,m,5270-4267-6450-5516,123,2010/06/25
1,514-14-8905,f,12/22/1944,Amaker,Borden,Ashley,4469 Sherman Street,Goff,KS,66428,785-939-6046,aborden@domain.com,m,5370-4638-8881-3020,713,2011/02/01
2,213-46-8915,f,4/21/1958,Pinson,Green,Marjorie,309 63rd St. #411,Oakland,CA,94618,415 986-7020,mgreen@domain.com,v,4916-9766-5240-6147,258,2009/02/25
3,524-02-7657,m,3/25/1962,Hall,Munsch,Jerome,2183 Roy Alley,Centennial,CO,80112,303-901-6123,jmunsch@domain.com,m,5180-3807-3679-8221,612,2010/03/01
4,489-36-8350,m,1964/09/06,Porter,Aragon,Robert,3181 White Oak Drive,Kansas City,MO,66215,816-645-6936,raragon@domain.com,v,4929-3813-3266-4295,911,2011/12/01


In [119]:
# list states
df['state']

0     CA
1     KS
2     CA
3     CO
4     MO
5     MO
6     NE
7     NC
8     NY
9     AL
10    TX
11    LA
12    HI
13    CA
14    CA
15    TX
16    MN
17    TX
18    CT
19    MS
20    AL
21    TX
22    OH
23    IL
24    TX
25    CA
26    NC
27    PA
28    KY
29    NJ
Name: state, dtype: object

In [135]:
# Only people in california
cali = df[df['state'] == "CA"]
cali

Unnamed: 0,SSN,gender,birthdate,maiden name,last name,first name,address,city,state,zip,phone,email,cc_type,CCN,cc_cvc,cc_expiredate
0,172-32-1176,m,4/21/1958,Smith,White,Johnson,10932 Bigge Rd,Menlo Park,CA,94025,408 496-7223,jwhite@domain.com,m,5270-4267-6450-5516,123,2010/06/25
2,213-46-8915,f,4/21/1958,Pinson,Green,Marjorie,309 63rd St. #411,Oakland,CA,94618,415 986-7020,mgreen@domain.com,v,4916-9766-5240-6147,258,2009/02/25
13,559-81-1301,m,1952/01/20,Mcafee,Heard,James,2865 Driftwood Road,San Jose,CA,95129,408-370-0031,jheard@domain.com,v,4532 4220 6922 9909,311,2010/09/01
14,624-84-9181,m,1980/01/16,Frazier,Reyes,Danny,3500 Diane Street,San Luis Obispo,CA,93401,805-369-0464,dreyes@domain.com,v,4532 0065 1968 5602,713,2009/11/01
25,612-20-6832,m,1979/08/18,Banas,Edwards,Rick,4254 Walkers Ridge Way,Gardena,CA,90248,626-991-3620,redwards@domain.com,m,5293 8502 0071 3058,701,2010/08/01


In [132]:
cali_m = df[(df['state'] == "CA") & (df['gender'] == 'm')]
cali_m

Unnamed: 0,SSN,gender,birthdate,maiden name,last name,first name,address,city,state,zip,phone,email,cc_type,CCN,cc_cvc,cc_expiredate
0,172-32-1176,m,4/21/1958,Smith,White,Johnson,10932 Bigge Rd,Menlo Park,CA,94025,408 496-7223,jwhite@domain.com,m,5270-4267-6450-5516,123,2010/06/25
13,559-81-1301,m,1952/01/20,Mcafee,Heard,James,2865 Driftwood Road,San Jose,CA,95129,408-370-0031,jheard@domain.com,v,4532 4220 6922 9909,311,2010/09/01
14,624-84-9181,m,1980/01/16,Frazier,Reyes,Danny,3500 Diane Street,San Luis Obispo,CA,93401,805-369-0464,dreyes@domain.com,v,4532 0065 1968 5602,713,2009/11/01
25,612-20-6832,m,1979/08/18,Banas,Edwards,Rick,4254 Walkers Ridge Way,Gardena,CA,90248,626-991-3620,redwards@domain.com,m,5293 8502 0071 3058,701,2010/08/01


In [155]:
import datetime

# Create a new column by mapping one column to another
def age(birthday):
    today = datetime.date.today()
    try:
        format_str = "%m/%d/%Y"
        bday_obj = datetime.datetime.strptime(birthday, format_str)
        return today.year - bday_obj.year - ((today.month, today.day) < (bday_obj.month, bday_obj.day))
    except:
        format_str = "%Y/%m/%d"
        bday_obj = datetime.datetime.strptime(birthday, format_str)
        return today.year - bday_obj.year - ((today.month, today.day) < (bday_obj.month, bday_obj.day))
        
age("9/10/1989")

32

In [156]:
# This is a common warning. It means your changes haven't propagated back to the original dataframe. That's OK here.
cali_m['age'] =  cali_m['birthdate'].map(age)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.


In [158]:
cali_m[['birthdate', 'age']]

Unnamed: 0,birthdate,age
0,4/21/1958,63
13,1952/01/20,69
14,1980/01/16,41
25,1979/08/18,42


In [163]:
# Count people above age 60
gray_df = cali_m[cali_m['age'] > 60]
len(gray_df)

2