# Python Language Workbook

Python is a rich language with several different data types and operators designed around fundamental principles documented in [PEP 20 -- The Zen of Python](https://www.python.org/dev/peps/pep-0020/)

This notebook is designed to provide a succinct reference of some of the language concepts you will likely encounter during your ML/AI learning journey 

**Tip**: Need more information about a language concept? Try using help(concept) for a comprehensive explanation and dir(concept) if you want a complete listing of a classes methods, properties, ...


In [0]:
#help(set)

In [0]:
#dir(set)

## Common data types & operations used on them


*   **bool** (boolean) - True/False
*   **int** (integer) - whole number
*   **float** - number with decimal
*   **str** (string) - sequence of Unicode characters (*Challenge question: How many languages can be represented in Unicode ?*)
*   **list** - ordered sequence of values
*   **tuple** - list of fixed length
*   **dict** (dictionary) - unordered grouping of key-value pairs
*   **set** - unordered grouping of values

Use **type(x)** to check the type of object x

*Understanding data types is important because they define the set of operations that can (and cannot) be performed on those data types but sometimes overlap different data types, e.g. addition and multiplication operators are used both with numbers and strings*

**Challenge exercise**: *look up duck typing and type checking in Python and Java and take note of the differences*

### Numeric operations (& some overlapping string operations)

In [0]:
# addition
print(3+1)
print('3' + '1')

4
31


In [0]:
# multiplication
print(1 * 74)
print('-'*74)

74
--------------------------------------------------------------------------


In [0]:
# division
1 / 2

0.5

In [0]:
# exponentiation (i.e. raise to a power)
print(2 ** 3)  # i.e. 2*2*2
print(2**-3)  # # i.e. 1/(2*2*2)

8
0.125


In [0]:
# modulus (i.e. remainder)
print(4 % 2)
print(2 % 4)
print(4 % 3)

0
2
1


In [0]:
# order of operations - Please Excuse My Dear Aunt Sally - (),**,*,/,+,-
print(2 + 3 * 5 + 5)
print((2 + 3) * (5 + 5))

22
50


### Strings

In [0]:
'single quotes'

'single quotes'

In [0]:
"double quotes"

'double quotes'

In [0]:
" wrap lot's of other quotes"

" wrap lot's of other quotes"

### Printing strings

In [0]:
x = 'hello'

In [0]:
x

'hello'

In [0]:
print(x)

hello


In [0]:
num = 12
name = 'Sam'

In [0]:
print('My number is: {one}, and my name is: {two}'.format(one=num,two=name))

My number is: 12, and my name is: Sam


In [0]:
print('My number is: {}, and my name is: {}'.format(num,name))

My number is: 12, and my name is: Sam


In [0]:
# new in Python 3.6: formatted string literals(f-strings) - evaluate formats as inline expressions
print(f'My number is: {num*1000:,} and my name is: {name!r}')  #.format(num,name))

My number is: 12,000 and my name is: 'Sam'


### Variables 
In python a 'variable' is just a label that a value is assigned to that is stored in Python's namespace. Many labels can be associated with the same value (i.e. object). When that value changes, all labels associated with that value change too. Other characteristics of variables include:

* *untyped (see duck typing)*
* *do **not** need to be declared*

### Variable names
- case sensitive
- must begin with **letter** or **underscore**
- may **only** contain **letters**, **digits** or **underscores**
- Pythonic convention - *words separated by underscores*

### Variable assignment
- value assigned to name with **assignment operator =**, e.g. x = 10
- **multiple assignment allowed** (encouraged), e.g. X, y = df[['x1','x2','x3']], df['target']

In [0]:
# Can not start with number or special characters
name_of_var = 2
print(name_of_var)

2


In [0]:
# can assign multiple variables of different types with one statement separated by commas
x, y, z = 2, 200, 'duh'  # Challenge question: what type of object is x, y, z? (hint: starts with 't', sounds like pupil)
print(x)
print(y)
print(z)
# print(type((x, y, z)))  # look it up first - important for understanding data structures in numpy and TF

2
200
duh


## Sequences: Lists, tuples, dictionaries, sets

The sequence data types in Python are referred to as **iterables** (i.e. objects that can return their members one at a time)

**Iterables** are an important concept in Python and used frequently in ML/AI libraries scikit-learn and TensorFlow to manage model optimization and tuning


### Lists
A list is an ordered collection of objects separated by commas and typed with **square** brackets, e.g. [a, b, c, 10, 9, 8, elephant, lion, tiger ]

In [3]:
# access list elements using index and slice
a = [1, 2, 3, 4, 10, 20, 30, 40, 99]
# index
print(a[1]) # 2nd element (indexes start with 0)
print(a[-1]) # last element
print(a[-2]) # second from last element
# slice notation
print(a[1:4])
print(a[:])
print(a[:4])
print(a[4:])
print(a[1:8:2])

# list assignment operations
a = [1, 2, 3]
print(a)
print(type(a))
b = a
print('b', b)
c = b
print('c', c)

# modify list by assigning a new value to element in a list
b[0] = 11
print('b', b)
print('a', a)
a[0] = 10*a[0]
print('a', a)

2
99
40
[2, 3, 4]
[1, 2, 3, 4, 10, 20, 30, 40, 99]
[1, 2, 3, 4]
[10, 20, 30, 40, 99]
[2, 4, 20, 40]
[1, 2, 3]
<class 'list'>
b [1, 2, 3]
c [1, 2, 3]
b [11, 2, 3]
a [11, 2, 3]
a [110, 2, 3]


## List Operations

In [9]:
# list operations
a = [1, 2, 3]
print(a + a)
print(2*a)
print('min: ', min(a), 'max: ', max(a), 'N: ', len(a), 'sum: ', sum(a))

print('-'*74)
# lists are ordered and can be sorted using Python's sort method, .sort()
a.sort()  # useful for changing the order of results in scikit-learn model object 
print(a)
a.reverse()
print(a)

print('-'*74)
# add elements to list
a.append(a[0])
print(a)
a.extend(a)
print(a)

print('-'*74)
# remove/delete list elements
a.remove(3)
print('a.remove(3)', a)  # removes first occurence of 3

print('-'*74)
# use index to delete ith element of list
del(a[0])
print(a)

print('-'*74)
# remove and assign list elements by index
print('a before pop', a)
a_pop2 = a.pop(2)
print('a.pop(2) =', a_pop2)
print('a after pop', a)

print('-'*74)
# number of elements in list using len()
print(len(a)) 

# count element occurence in list
print(f'number 3 appeared {a.count(3)} times in list a')  # f' {x} ' is a formatted string

[1, 2, 3, 1, 2, 3]
[1, 2, 3, 1, 2, 3]
min:  1 max:  3 N:  3 sum:  6
--------------------------------------------------------------------------
[1, 2, 3]
[3, 2, 1]
--------------------------------------------------------------------------
[3, 2, 1, 3]
[3, 2, 1, 3, 3, 2, 1, 3]
--------------------------------------------------------------------------
a.remove(3) [2, 1, 3, 3, 2, 1, 3]
--------------------------------------------------------------------------
[1, 3, 3, 2, 1, 3]
--------------------------------------------------------------------------
a before pop [1, 3, 3, 2, 1, 3]
a.pop(2) = 3
a after pop [1, 3, 2, 1, 3]
--------------------------------------------------------------------------
5
number 3 appeared 2 times in list a


### Lists - [item1, item2, ...] - can nest lists inside lists - nested list referenced sequentially
e.g. list = [a, [b, [c, d]]] list[2] [1] = 'd'

In [0]:
nest = [1,2,3,[4,5,['target']]]

In [0]:
nest[3]

[4, 5, ['target']]

In [0]:
nest[3][2]

['target']

In [0]:
nest[3][2][0]

'target'

**Application**: scikit-learn pipeline: stores results in a list of tuples (see documentation excerpt below) 

   *List of (name, transform) tuples (implementing fit/transform) that are
     |      chained, in the order in which they are chained, with the last object
     |      an estimator.*

In [5]:
# full documentation for sklearn pipeline
'''from sklearn import pipeline
help(pipeline)'''

Help on module sklearn.pipeline in sklearn:

NAME
    sklearn.pipeline

DESCRIPTION
    The :mod:`sklearn.pipeline` module implements utilities to build a composite
    estimator, as a chain of transforms and estimators.

CLASSES
    sklearn.base.TransformerMixin(builtins.object)
        FeatureUnion(sklearn.base.TransformerMixin, sklearn.utils.metaestimators._BaseComposition)
    sklearn.utils.metaestimators._BaseComposition(sklearn.base.BaseEstimator)
        FeatureUnion(sklearn.base.TransformerMixin, sklearn.utils.metaestimators._BaseComposition)
        Pipeline
    
    class FeatureUnion(sklearn.base.TransformerMixin, sklearn.utils.metaestimators._BaseComposition)
     |  Concatenates results of multiple transformer objects.
     |  
     |  This estimator applies a list of transformer objects in parallel to the
     |  input data, then concatenates the results. This is useful to combine
     |  several feature extraction mechanisms into a single transformer.
     |  
     |  Pa

### Dictionaries - {key:item}

In [0]:
d = {'key1':'item1','key2':'item2'}

In [0]:
d

In [0]:
d['key1']

'item1'

In [0]:
# create an empty dict and assign new itmes
y = {}
y[0] = 'foo'
y[3] = 'bar'

print(y)

{0: 'foo', 3: 'bar'}


In [0]:
# create dict of ids as keys and squares as items
id_dict = {}
for i in range(10):
    id_dict[i] = i**2
    print(id_dict[i])

print("id_dict keys:")
print(list(id_dict.keys()))
print("id_dict items:")
print(id_dict)

0
1
4
9
16
25
36
49
64
81
id_dict keys:
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
id_dict items:
{0: 0, 1: 1, 2: 4, 3: 9, 4: 16, 5: 25, 6: 36, 7: 49, 8: 64, 9: 81}


In [0]:
# assign segment codes to id_dict
id_dict_seg ={}
for i in list(id_dict.keys()):
    if id_dict[i] % 2 > 0:
        id_dict_seg[i] = 'Odd'
    elif id_dict[i] % 2 == 0:
        id_dict_seg[i] = 'Even'
    else:
        print("Error")
        
print("Odd/Even segment codes:")
print(id_dict_seg)

Odd/Even segment codes:
{0: 'Even', 1: 'Odd', 2: 'Even', 3: 'Odd', 4: 'Even', 5: 'Odd', 6: 'Even', 7: 'Odd', 8: 'Even', 9: 'Odd'}


In [0]:
''' STACK OVERFLOW EX.
mydict = {'george':16,'amber':19}
print(list(mydict.keys())[list(mydict.values()).index(16)]) # Prints george

for name, age in dictionary.items():    # for name, age in dictionary.iteritems():  (for Python 2.x)
    if age == search_age:
        print(name)
'''

# examine the 'Even' segment in dict
for id, seg in id_dict_seg.items():
    if seg == 'Even':
        print(id, seg)

0 Even
2 Even
4 Even
6 Even
8 Even


### Booleans (type: bool) - evaluate to True/False

In [0]:
# bool operator (==) used to test conditions
0 == 1  # False

False

In [0]:
# in if/elif/else in loops
for x in range(5):
  if x < 3:
    print('x is less than 3: x =', x)
  elif x == 3:
    print('x is equal to 3: x =', x)
  else:
    print('x is greater than to 3: x =', x)



x is less than 3: x = 0
x is less than 3: x = 1
x is less than 3: x = 2
x is equal to 3: x = 3
x is greater than to 3: x = 4


### Tuples - (item1, item2, ...) - like lists but enclosed in parentheses ()

Items in tuple are **immutable**. This makes then good for: 
- keys in dicts
- multiple assignment of variables, e.g. X,y = train_test_split(xdata,ydata)

**Challenge question**: Why would it be bad if X, y were **NOT** immutable in the example above?

In [0]:
t = (1,2,3)

In [0]:
t[0]

1

In [0]:
# tuples are imutable - should produce an error 
t[0] = 'NEW'

TypeError: ignored

In [0]:
# simple example
x = ('a','b','c')

# list like operations
print(x[2])
print(x[1:])
print(len(x))
print(max(x))
print(min(x))
print(5 in x)
print(5 not in x)

# opps - x is tuple not list
x[2] = 'd'  # should produce an error

c
('b', 'c')
3
c
a
False
True


TypeError: ignored

In [0]:
# create new tuples from existing tuples using +, * operators
print(x+x)
print(2*x)

('a', 'b', 'c', 'a', 'b', 'c')
('a', 'b', 'c', 'a', 'b', 'c')


In [0]:
# need , for one element tuple so python knows that parentheses indicate a tuple
y = 3
z = 4
# add y + z then multiply by 2
print((y+z)*2)
# create tuple from y + z
print((y+z),)

14
7


In [0]:
# packing & unpacking tuples - tuple on L side assigned to values on R side (SAVES MANY LINES OF CODE!)
(one, two, three, four) = (1, 2, 3, 4)
print(one)
print(four)

# don't need () for assignment
one, two, three, four = 1, 2, 3, 4
print(three)

1
4
3


In [0]:
# tuple element with * can hold multiple values not matched in packing/unpacking
one, two, *three = 1, 2, 3, 4
print(three)

one, *two, three = 1, 2, 3, 4
print(one)
print(two)
print(three)

one, two, three, four, *five = 1, 2, 3, 4
print(five)

[3, 4]
1
[2, 3]
4
[]


In [0]:
# easy to convert between tuples and lists using list() and tuple() methods
print(list((1,2,3,4)))
print(tuple([1,2,3,4]))

[1, 2, 3, 4]
(1, 2, 3, 4)


In [0]:
# BONUS: can use list() method to break string into individual characters inside a list
list("awesome!")

['a', 'w', 'e', 's', 'o', 'm', 'e', '!']

### Sets - {item1, item2, ...} - **items** must be [immutable and hashable](https://stackoverflow.com/questions/2671376/hashable-immutable)
- **ints**, **floats**, **strings** and **tuples** can be members of a set
- **lists**, **dictionaries** and **sets** can **NOT** be members of a set

**Challenge questions**: 
1. Why can't **lists**, **dictionaries** and **sets** be members in a set?
2. Is a **set** itself **immutable** and **hashable**? (*hint: see examples below*)



In [0]:
s = {1,2,3}
s

{1, 2, 3}

In [0]:
s = {1,2,3,1,2,1,2,3,3,3,3,2,2,2,1,1,2}
print(s)

# set operations
s.add(4)
print(s)

s.remove(2)
print(s)

# check membership
print(1 in s)
print(2 in s)

# operations with multiple sets
ss = {4, 5, 6, 7}
print(ss)

# | set OR
print(s | ss)

# & set AND
print(s & ss)

# ^ set XOR - Exclusive OR == in A OR B AND NOT (A AND B)
print(s ^ ss)

{1, 2, 3}
{1, 2, 3, 4}
{1, 3, 4}
True
False
{4, 5, 6, 7}
{1, 3, 4, 5, 6, 7}
{4}
{1, 3, 5, 6, 7}


### Frozensets - How to include a set as a member of another set
- Can **NOT** be changed after it is created, i. e. **immutable** and **hashable**
- Therefore, **can** be a member of another set

In [0]:
# frozenset - can't be members of other sets
sss = s ^ ss
print(sss)

fsss = frozenset(sss)
print(fsss)

# YUP
sss.add(8)
print(sss)

sss.add(fsss)
print(sss)

# NOPE
fsss.add(8)  # should generate error since immutable

{1, 3, 5, 6, 7}
frozenset({1, 3, 5, 6, 7})
{1, 3, 5, 6, 7, 8}
{1, 3, 5, 6, 7, 8, frozenset({1, 3, 5, 6, 7})}


AttributeError: ignored

## Comparison Operators - <, <=, ==, >, >=, !=

**Challenge question**: What is the difference between **==** and **=** ?

In [0]:
1 > 2

False

In [0]:
1 < 2

True

In [0]:
1 >= 1

True

In [0]:
1 <= 4

True

In [0]:
1 == 1

True

In [0]:
'hi' == 'bye'

False

## Logic Operators - T & F = F, T & T = T, F & F = T

In [0]:
# False & True is False
(1 > 2) and (2 < 3)

False

In [0]:
# False | True is True
(1 > 2) or (2 < 3)

True

In [0]:
# False | False is False
(1 > 2) or (2 > 3)

False

In [0]:
# False | False | True is True
(1 == 2) or (2 == 3) or (4 == 4)

True

## if,elif, else Statements - elseif is elif in python

In [0]:
if 1 < 2:
    print('Yep!')

Yep!


In [0]:
if 1 < 2:
    print('yep!')

yep!


In [0]:
if 1 < 2:
    print('first')
else:
    print('last')

first


In [0]:
if 1 > 2:
    print('first')
else:
    print('last')

last


In [0]:
if 1 == 2:
    print('first')
elif 3 == 3:
    print('middle')
else:
    print('Last')

middle


## for Loops

In [0]:
# over items in list
seq = [1,2,3,4,5,4,3,2,1]
for item in seq:
    print(item)
    

1
2
3
4
5
4
3
2
1


In [0]:
# over items in set
seq_set = set(seq)
print(seq_set)

for item in seq_set:
    print(item)

{1, 2, 3, 4, 5}
1
2
3
4
5


In [0]:
for jelly in seq:  # can use almost anything to represent iterator element
    print(jelly+jelly)

2
4
6
8
10
8
6
4
2


## while Loops - python index starts with 0 (not 1)

In [0]:
i = 1
while i < 5:
    print('i is: {}'.format(i))
    i = i+1

i is: 1
i is: 2
i is: 3
i is: 4


## range(start, length, increment)

In [0]:
range(5)

range(0, 5)

In [0]:
range(2,5)

range(2, 5)

In [0]:
for i in range(5):
    print(i)

0
1
2
3
4


In [0]:
for i in range(2,5):
    print(i)

2
3
4


In [0]:
for i in range(2,5,2):
    print(i)

2
4


In [0]:
for i in range(5,2,-1):
    print(i)

5
4
3


In [0]:
list(range(5))

[0, 1, 2, 3, 4]

In [0]:
list(range(2,5))

[2, 3, 4]

In [0]:
list(range(2,5,2))

[2, 4]

In [0]:
list(range(5,2,-1))

[5, 4, 3]

## list comprehension - more efficient version of *for loop*

In [100]:
x = [*range(100)]
print(x)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99]


In [0]:
%%timeit?

In [103]:
%%timeit -n1000 -r 10  # for loop append method
out = []
for item in x:
  if item % 2 == 0:
    out.append((item, item**2))
    

#print(out)

1000 loops, best of 10: 19 µs per loop


In [104]:
%%timeit -n1000 -r 10  # list comprehension method
[(item, item**2) for item in x if item % 2 == 0];

1000 loops, best of 10: 17.4 µs per loop


Result: list comprehension slightly better than for loop in simple example

[see explanation of time output](https://serverfault.com/questions/48455/what-are-the-differences-between-wall-clock-time-user-time-and-cpu-time)

## functions - important concept in Python used in ML and AI

In [0]:
# step 1 - define function
def my_func(param0, param1='default'):
    """
    Docstring goes here.
    """
    print(param0, param1)

In [120]:
# step 2 - call function
my_func(6)

6 default


In [121]:
# step 3 - pass parameter to function either directly or to reference
my_func('new param')

new param default


In [123]:
my_func(6, 16) 

6 16


In [0]:
# simple function to return the square of a number
def square(x):
    return x**2

In [0]:
# call func on 2 and assign to out
out = square(2)

In [140]:
print(out)

4


## Processing files directly with Python will be covered in Pandas lab

