# CS 5489 Machine Learning
# Lecture 1: Python Tutorial
## Dr. Antoni B. Chan
### Dept. of Computer Science, City University of Hong Kong

# Why Python?
- General-purpose high-level programming language
- Design philosophy emphasizes programmer productivity and code readability
  - "executable pseudo-code"
- Supports multiple programming paradigms
  - object-oriented, imperative, functional
- Dynamic typing and automatic memory management

# What is special about Python?
- Object-oriented: everything is an object
- Clean: usually one way to do something, not a dozen
- Easy-to-learn: learn in 1-2 days
- Easy-to-read
- Powerful: full-fledged programming language

# Applications for Python
- Scientific Computing
  - numpy, scipy, ipython
- Data Science, Deep Learning
  - scikit-learn, matplotlib, pandas, keras, tensorflow
- Web & Internet Development
  - Django – complete web application framework
  - model-view-controller design pattern
  - templates, web server, object-relational mapper

# Disadvantages of Python
- Not as fast as Java or C
- However, you can call C-compiled libraries from Python (e.g. Boost C++)
- Alternatively, Python code can be compiled to improve speed
  - Cython and PyPy
  - requires type of variables to be declared

# Installing Python
- We will use Python 3
  - Python 3 is not backwards compatible with Python 2.7
- Anaconda (https://www.anaconda.com/download)
  - single bundle includes most scientific computing packages.
    - package manager for installing other libraries
  - make sure to pick version for **Python 3**.
  - easy install packages for Windows, Mac, Linux.
    - (single directory install)

Running Python
===============
- Interactive shell (ipython)
  - good for learning the language, experimenting with code, testing modules

***
```
Nori:CS5489 abc$ ipython
Python 3.5.4 |Anaconda, Inc.| (default, Oct  5 2017, 02:58:14) 
Type "copyright", "credits" or "license" for more information.

IPython 4.2.0 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object', use 'object??' for extra details.

In [1]: print("Hello, World")
Hello, World

In [2]: 
Do you really want to exit ([y]/n)? y
Nori:CS5489 abc$ 

```

- Script file (hello.py)
```python
#!/usr/bin/python
print("Hello, World")
```

- Standalone script
  - explicitly using python interpreter
```
Nori:~ abc$ python hello.py
Hello, World
```

  - using magic shebang (Linux, Mac OS X)
```
Nori:~ abc$ ./hello.py 
Hello, World
```

# Jupyter (ipython notebooks)
- Launch from _Anaconda Navigator_
- browser-based interactive computing environment
  - development, documenting, executing code, viewing results (inline images)
  - whole session stored in notebook document (.ipynb)
  - (also made and presented these slides!)

![ipynb](ipynb-demo.png)

# Jupyter tips
- Keyboard shortcuts
  - there are a lot of keyboard shortcuts for moving between cells, running cells, deleting and inserting cells.
- Starting directory
  - use the `--notebook-dir=mydir` option to start the notebook in a particular directory.
  - Windows: create a shortcut to run `jupyter-notebook.exe --notebook-dir=%userprofile%`.

- Problems viewing SVG images in ipynb
  - SVG images may not display due to the serurity model of Jupyter. 
  - select "Trust Notebook" from the "File" menu to show the SVG images.
- View ipynb in slideshow mode in a web browser (like this presentation!)
```
jupyter-nbconvert --to slides file.ipynb --post serve
```
- Convert to HTML to view statically in web browser
```
jupyter-nbconvert file.ipynb
```

- ValueError when using matplotlib in Jupyter
  - This mainly affects Mac where the OS locale is set to a non-English language.  Open "Terminal" app and go to Preferences -> Profiles -> Terminal -> Enviornment. Deselect the option "Set locale variables automatically".
  - more info: http://stackoverflow.com/questions/15526996/ipython-notebook-locale-error
- MacOS and Anaconda
  - MacOS has a builtin python distribution. If you are using anaconda, make sure that you use the correct command-line commands. You can add "/anaconda3/bin/" in front of the command to make sure you are using the anaconda version.  Otherwise, it may default to the builtin python.



# Outline
1. Python Intro
2. **Python Basics (identifiers, types, operators)**
3. Control structures (conditional and loops)
4. Functions, Classes
5. File IO, Pickle, pandas
6. NumPy
7. matplotlib

# Python Basics
- Formatting
  - case-sensitive
  - statements end in **newline** (not semicolon)
    - use semicolon for multiple statements in one line.
  - **indentation** for code blocks (after a colon).

In [1]:
print("Hello")
print("Hello"); print("World")
name = "Bob"
if name == "George":
    print("Hi George")
else:
    print("Who are you?")

Hello
Hello
World
Who are you?


- single-line comments with `#`
- multi-line statements continued with backslash (`\`)
  - not required inside `{}`, `()`, or `[]` for data types

In [2]:
# this is a comment
a=1       # comments also can go after statements
b=2; c=3  # here too

# multiple line statement
x = a + \
    b + c

# backslash not needed when listing multi-line data
y = [1, 2, 
     3, 4]

# Identifiers and Variables
- Identifiers
  - same as in C
- Naming convention:
  - `ClassName` -- a class name
  - `varName` -- other identifier
  - `_privateVar` -- private identifier
  - `__veryPrivate` -- strongly private identifier
  - `__special__` -- language-defined special name

- Variables
  - no declaration needed
  - no need for declaring data type (automatic type)
  - need to assign to initialize
    - use of uninitialized variable raises exception
  - automatic garbage collection (reference counts)

# Basic Types
- Integer number

In [3]:
4
int(4)

4

- Real number (float)

In [4]:
4.0
float(4)

4.0

- Boolean

In [5]:
True
False

False

- String literal

In [6]:
"a string"
'a string'
"concatenate " "two string literals"
"""this is a multi-line string.
it keeps the newline."""
r'raw string\no escape chars'

'raw string\\no escape chars'

# Lists
- Lists can hold anything (even other lists)

In [7]:
myList = ['abcd', 786, 2.23]
print(myList)     # print the list

['abcd', 786, 2.23]


In [8]:
print(myList[0])  # print the first element (0-indexed)

abcd


- Creating lists of numbers

In [9]:
a = range(5)   # list of numbers from 0 to 4
print(a)
print(list(a))

range(0, 5)
[0, 1, 2, 3, 4]


In [10]:
b = range(2,12,3)  # numbers from 2 to 11, count by 3 
print(b)
print(list(b))

range(2, 12, 3)
[2, 5, 8, 11]


- append and pop 

In [11]:
a = list(range(0,5))
a.append('blah')  # add item to end
print(a)

[0, 1, 2, 3, 4, 'blah']


In [12]:
a.pop()  # remove last item and return it

'blah'

- insert and delete

In [13]:
a.insert(0,42)  # insert 42 at index 0
print(a)

[42, 0, 1, 2, 3, 4]


In [14]:
del a[2]    # delete item 2
print(a)

[42, 0, 2, 3, 4]


- more list operations

In [15]:
a.reverse()   # reverse the entries
print(a)

[4, 3, 2, 0, 42]


In [16]:
a.sort()     # sort the entries
print(a)

[0, 2, 3, 4, 42]


# Tuples
- Similar to a list
  - but immutable (read-only)
  - cannot change the contents (like a string constant)

In [17]:
# make some tuples
x = (1,2,'three')
print(x)

(1, 2, 'three')


In [18]:
y = 4,5,6           # parentheses not needed!
print(y)

(4, 5, 6)


In [19]:
z = (1,)   # tuple with 1 element (the trailing comma is required)
print(z)

(1,)


# Operators on sequences
- _Same operators_ for strings, lists, and tuples
- Slice a sublist with colon (`:`)
  - **Note**: the 2nd argument is not inclusive!

In [20]:
"hello"[0]    # the first element

'h'

In [21]:
"hello"[-1]   # the last element (index from end)

'o'

In [22]:
"hello"[1:4]  # the 2nd through 4th elements

'ell'

In [23]:
"hello"[2:]   # the 3rd through last elements

'llo'

In [24]:
"hello"[0:5:2] # indices 0,2,4 (by 2)

'hlo'

- Other operators on string, list, tuple

In [25]:
len("hello")   # length

5

In [26]:
"he" + "llo"   # concatenation

'hello'

In [27]:
"hello"*3      # repetition

'hellohellohello'

# String methods
- Useful methods

In [28]:
"112211".count("11")         # 2
"this.com".endswith(".com")  # True
"wxyz".startswith("wx")      # True
"abc".find("c")              # finds first: 2
",".join(['a', 'b', 'c'])    # join list: 'a,b,c'
"aba".replace("a", "d")      # replace all: "dbd"
"a,b,c".split(',')           # make list: ['a', 'b', 'c']
"  abc    ".strip()          # "abc",  also rstrip(), lstrip()

'abc'

- String formatting: automatically fill in type

In [29]:
"{} and {} and {}".format('string', 123, 1.6789)

'string and 123 and 1.6789'

- String formatting: specify type (similar to C)

In [30]:
"{:d} and {:f} and {:0.2f}".format(False, 3, 1.234)

'0 and 3.000000 and 1.23'

# Dictionaries
- Stores key-value pairs (associative array or hash table)
  - key can be a string, number, or tuple

In [31]:
mydict = {'name': 'john', 42: 'sales', ('hello', 'world'): 6734}
print(mydict)

{'name': 'john', 42: 'sales', ('hello', 'world'): 6734}


- Access

In [32]:
print(mydict['name'])         # get value for key 'name'

john


In [33]:
mydict['name'] = 'jon' # change value for key 'name'
mydict[2] = 5          # insert a new key-value pair
print(mydict)

{'name': 'jon', 42: 'sales', ('hello', 'world'): 6734, 2: 5}


In [34]:
del mydict[2]          # delete entry for key 2
print(mydict)

{'name': 'jon', 42: 'sales', ('hello', 'world'): 6734}


- Other operations:

In [35]:
mydict.keys()           # iterator of all keys (no random access)

dict_keys(['name', 42, ('hello', 'world')])

In [36]:
list(mydict.keys())     # convert to a list for random access

['name', 42, ('hello', 'world')]

In [37]:
mydict.values()         # iterator of all values

dict_values(['jon', 'sales', 6734])

In [38]:
mydict.items()          # iterator of tuples (key, value)

dict_items([('name', 'jon'), (42, 'sales'), (('hello', 'world'), 6734)])

In [39]:
'name' in mydict  # check the presence of a key  

True

# Operators
- Arithmetic: `+`, `-`, `*`, `/`, `%`, `**` (exponent), `//` (floor division)

In [40]:
print(6/4)     # float division

1.5


In [41]:
print(6//4)   # integer division

1


In [42]:
print(6//4.0)  # floor division

1.0


- Assignment: `=`, `+=`, `-=`, `/=`, `%=`, `**=`, `//=`
- Equality: `==`, `!=`
- Compare: `>`, `>=`, `<`, `<=`
- Logical: `and`, `or`, `not`

- Membership: `in`, `not in`

In [43]:
2 in [2, 3, 4]

True

- Identity: `is`, `is not`
  - checks reference to the same object

In [44]:
x = [1,2,3]
y = x
x is y    # same variable?

True

In [45]:
z = x[:]  # create a copy

In [46]:
z is x    # same variable?

False

- Tuple packing and unpacking

In [47]:
point = (1,2,3)
(x,y,z) = point
print(x)
print(y)
print(z)

1
2
3


# Sets
- a set is a collection of unique items

In [48]:
a=[1, 2, 2, 2, 4, 5, 5]
sA = set(a)
sA

{1, 2, 4, 5}

- set operations

In [49]:
sB = {4, 5, 6, 7}
print(sA - sB)    # set difference

{1, 2}


In [50]:
print (sA | sB)    # set union

{1, 2, 4, 5, 6, 7}


In [51]:
print (sA & sB)    # set intersect

{4, 5}


# Outline
1. Python Intro
2. Python Basics (identifiers, types, operators)
3. **Control structures (conditional and loops)**
4. Functions, Classes
5. File IO, Pickle, pandas
6. NumPy
7. matplotlib

# Conditional Statements
- indentation used for code blocks after colon (:)
- if-elif-else statement

In [52]:
if x==2:
    print("foo")
elif x==3:
    print("bar")
else:
    print("baz")

baz


- nested if

In [53]:
if x>1:
    if x==2:
        print("foo")
    else:
        print("bar")
else:
    print("baz")

baz


- single-line

In [54]:
if x==1: print("blah")

blah


- check existence using "if in"

In [55]:
mydict = {'name': 'john', 42: 'sales'}
if 'name' in mydict:
    print("mydict has name field")

mydict has name field


In [56]:
if 'str' in 'this is a long string':
    print('str is inside')

str is inside


# Loops
- "for-in" loop over values in a list

In [57]:
ns = range(1,6,2)    # list of numbers from 1 to 6, by 2
for n in ns:
    print(n)

1
3
5


- loop over index-value pairs

In [58]:
x = ['a', 'b', 'c']
for i,n in enumerate(x):
    print(i, n)

0 a
1 b
2 c


- looping over two lists at the same time

In [59]:
x = ['a', 'b', 'c']
y = ['A', 'B', 'C']
for i,j in zip(x,y):
    print(i,j)

a A
b B
c C


- `zip` creates pairs of items between the two lists
  - (actually creates an iterator over them)

In [60]:
list(zip(x,y))    # convert to a list (for random access)

[('a', 'A'), ('b', 'B'), ('c', 'C')]

- looping over dictionary

In [61]:
x = {'a':1, 'b':2, 'c':3}
for (key,val) in x.items():
    print(key, val)

a 1
b 2
c 3


- while loop

In [62]:
x=0
while x<5:
    x += 1
print(x)

5


In [63]:
# single line
while x<10: x += 1
print(x)

10


- loop control (same as C)
  - `break`, `continue`  
- else clause
  - runs after list is exhausted
  - does _not_ run if loop break

In [64]:
for i in [0, 1, 6]:
    print(i)
else:
    print("end of list reached!")

0
1
6
end of list reached!


# List Comprehension
- build a new list with a "for" loop

In [65]:
myList = [1, 2, 2, 2, 4, 5, 5]
myList4 = [4*item for item in myList]   # multiply each item by 4
myList4

[4, 8, 8, 8, 16, 20, 20]

In [66]:
# equivalent code
myList4=[]
for item in myList:
    myList4.append(4*item)
myList4

[4, 8, 8, 8, 16, 20, 20]

In [67]:
# can also use conditional to select items
[4*item*4 for item in myList if item>2]

[64, 80, 80]

# Outline
1. Python Intro
2. Python Basics (identifiers, types, operators)
3. Control structures (conditional and loops)
4. **Functions, Classes**
5. File IO, Pickle, pandas
6. NumPy
7. matplotlib

# Functions
- Defining a function
  - _required_ and _optional_ inputs (similar to C++)
  - "docstring" for optional documentation

In [68]:
def sum3(a, b=1, c=2):
    "sum a few values"
    mysum = a+b+c
    return mysum

- Calling a function

In [69]:
sum3(2,3,4)   # call function: 2+3+4

9

In [70]:
sum3(0)    # use default inputs: 0+1+2

3

In [71]:
sum3(b=1, a=5, c=2)  # use keyword arguments: 5+1+2

8

In [72]:
help(sum3)   # show documentation

Help on function sum3 in module __main__:

sum3(a, b=1, c=2)
    sum a few values



In [73]:
# ipython magic -- shows a help window about the function
? sum3        

# Classes
- Defining a class
  - `self` is a reference to the object instance (passed _implicitly_)

In [74]:
class MyList:
    "class documentation string"
    num = 0                  # a class variable
    def  __init__(self, b):  # constructor
        self.x = [b]         # an instance variable
        MyList.num += 1      # modify class variable
    def appendx(self, b):    # a class method
        self.x.append(b)     # modify an instance variable
        self.app = 1         # create new instance variable

- Using the class

In [75]:
c = MyList(0)         # create an instance of MyList
print(c.x)

[0]


In [76]:
c.appendx(1)          # c.x = [0, 1]
print(c.x)

[0, 1]


In [77]:
c.appendx(2)          # c.x = [0, 1, 2]
print(c.x)

[0, 1, 2]


In [78]:
print(MyList.num)      # access class variable (same as c.num)

1


# More on Classes
- There are _no_ "private" members
  - everything is accessible
  - convention to indicate _private_:
    - `_variable` means private method or variable (but still accessible)
  - convention for _very private_:
    - `__variable` is not directly visible
    - actually it is renamed to `_classname__variable`

- Instance variable rules
  - On _use_ via instance (`self.x`), scope search order is:
    - (1) instance, (2) class, (3) base classes
    - also the same for method lookup
  - On _assignment_ via instance (`self.x=...`):
    - always makes an instance variable
  - Class variables "default" for instance variables
    - _class_ variable: one copy _shared_ by all
    - _instance_ variable: each instance has its own

# Inheritence
- Child class inherits attributes from parents

In [79]:
class MyListAll(MyList): 
    def __init__(self, a):   # overrides MyList
        self.allx = [a]
        MyList.__init__(self, a)   # call base class constructor
    def popx(self):
        return self.x.pop()
    def appendx(self, a):          # overrides MyList
        self.allx.append(a)
        MyList.appendx(self, a)    # "super" method call

- Multiple inheritence
  - `class ChildClass(Parent1, Parent2, ...)`
  - calling method in parent
    - `super(ChildClass, self).method(args)`

# Class methods & Built-in Attributes
- Useful methods to override in class

In [80]:
class MyList2:
    ...
    def __str__(self):     # string representation
        ...
    def __cmp__(self, x):  # object comparison
        ...
    def __del__(self):     # destructor
        ...

- Built-in attributes

In [81]:
print(c.__dict__)    # Dictionary with the namespace.
print(c.__doc__)     # Class documentation string
print(c.__module__)  # Module which defines the class

{'x': [0, 1, 2], 'app': 1}
class documentation string
__main__


In [82]:
print(MyList.__name__)    # Class name
print(MyList.__bases__)   # tuple of base classes

MyList
(<class 'object'>,)


# Outline
1. Python Intro
2. Python Basics (identifiers, types, operators)
3. Control structures (conditional and loops)
4. Functions, Classes
5. **File IO, Pickle, pandas**
6. NumPy
7. matplotlib

# File I/O
- Write a file

In [83]:
with open("myfile.txt", "w") as f:
    f.write("blah\n")
    f.writelines(['line1\n', 'line2\n', 'line3\n'])

# NOTE: using "with" will automatically close the file

- Read a whole file

In [84]:
with open("myfile.txt", "r") as f:
    contents = f.read()   # read the whole file as a string
    print(contents)

blah
line1
line2
line3



- Read line or remaining lines

In [85]:
f = open("myfile.txt", 'r')
print(f.readline())    # read a single line.

blah



In [86]:
print(f.readlines())   # read remaining lines in a list.
f.close()

['line1\n', 'line2\n', 'line3\n']


- Read line by line with a loop

In [87]:
with open("myfile.txt", 'r') as f:
    for line in f:
        print(line)    # still contains newline char

blah

line1

line2

line3



# Saving Objects with Pickle
- Turns almost **any** Python **object** into a string representation for saving into a file.

In [88]:
import pickle                     # load the pickle library
mylist = MyList(0)                # an object
# open file to save object (write bytes)
with open('alist.pickle', 'wb') as file: 
    pickle.dump(mylist, file)         # save the object using pickle

- Load object from file

In [89]:
with open('alist.pickle', 'rb') as file:  # (read bytes)
    mylist2 = pickle.load(file)       # load pickled object from file
print(mylist2)
print(mylist2.x)

<__main__.MyList object at 0x7fa84032a510>
[0]


- cPickle is a faster version (1,000 times faster!)

# Exception Handling
- Catching an exception
  - `except` block catches exceptions
  - `else` block executes if no exception occurs
  - `finally` block always executes at end

In [90]:
try:
    file = open('blah.pickle', 'r')
    blah = pickle.load(file)
    file.close()
except:               # catch everything
    print("No file!")
else:                 # executes if no exception occurred
    print("No exception!")
finally:
    print("Bye!")      # always executes

No file!
Bye!


# pandas
- pandas is a Python library for data wrangling and analysis.
- `Dataframe` is a table of entries (like an Excel spreadsheet).
  - each column does not need to be the same type
  - operations to modify and operate on the table

In [91]:
# setup pandas and display
import pandas as pd

In [92]:
# read CSV file
df = pd.read_csv('mycsv.csv')

# print the dataframe
df

Unnamed: 0,Name,Location,Age
0,John,New York,24
1,Anna,Paris,13
2,Peter,Berlin,53
3,Linda,London,33


- select a column

In [93]:
df['Name']

0     John
1     Anna
2    Peter
3    Linda
Name: Name, dtype: object

- query the table

In [94]:
# select Age greater than 30
df[df.Age > 30]

Unnamed: 0,Name,Location,Age
2,Peter,Berlin,53
3,Linda,London,33


- compute statistics 

In [95]:
df.mean()

Age    30.75
dtype: float64

# Outline
1. Python Intro
2. Python Basics (identifiers, types, operators)
3. Control structures (conditional and loops)
4. Functions, Classes
5. File IO, Pickle, pandas
6. **NumPy**
7. matplotlib

# NumPy
- Library for multidimensional arrays and 2D matrices
- `ndarray` class for multidimensional arrays
  - elements are all the same type
  - aliased to `array`

In [96]:
from numpy import *     # import all classes from numpy
a = arange(15)
a

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

In [97]:
b = a.reshape(3,5)  # rows x columns
b

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])

In [98]:
b.shape  # get the shape (num rows x num columns)

(3, 5)

In [99]:
b.ndim   # get number of dimensions

2

In [100]:
b.size   # get number of elements

15

In [101]:
b.dtype  # get the element type

dtype('int64')

# Array Creation

In [102]:
a = array([1, 2, 3, 4])       # use a list to initialize
a

array([1, 2, 3, 4])

In [103]:
b = array([[1.1,2,3], [4,5,6]]) # or list of lists
b

array([[1.1, 2. , 3. ],
       [4. , 5. , 6. ]])

In [104]:
zeros( (3,4) )   # 3x4 array of zeros

array([[0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.]])

In [105]:
ones( (2,4) )  # 2x4 array of ones

array([[1., 1., 1., 1.],
       [1., 1., 1., 1.]])

In [106]:
full( (3,4), 8.8)  # 3x4 array with all 8.8

array([[8.8, 8.8, 8.8, 8.8],
       [8.8, 8.8, 8.8, 8.8],
       [8.8, 8.8, 8.8, 8.8]])

In [107]:
empty( (2,3) )  # create an array, but do not prepopulate it.
                # contents are random

array([[1.1, 2. , 3. ],
       [4. , 5. , 6. ]])

In [108]:
arange(0,5,0.5)   # from 0 to 5 (exclusive), increment by 0.5

array([0. , 0.5, 1. , 1.5, 2. , 2.5, 3. , 3.5, 4. , 4.5])

In [109]:
linspace(0,1,10)  # 10 evenly-spaced numbers between 0 to 1 (inclusive)

array([0.        , 0.11111111, 0.22222222, 0.33333333, 0.44444444,
       0.55555556, 0.66666667, 0.77777778, 0.88888889, 1.        ])

In [110]:
logspace(-3,3,13)  # 13 numbers evenly spaced in log-space between 1e-3 and 1e3

array([1.00000000e-03, 3.16227766e-03, 1.00000000e-02, 3.16227766e-02,
       1.00000000e-01, 3.16227766e-01, 1.00000000e+00, 3.16227766e+00,
       1.00000000e+01, 3.16227766e+01, 1.00000000e+02, 3.16227766e+02,
       1.00000000e+03])

# Array Indexing

- One-dimensional arrays are indexed, sliced, and iterated similar to Python lists.

In [111]:
a = array([1,2,3,4,5])
a[2]

3

In [112]:
a[2:5]            # index 2 through 4

array([3, 4, 5])

In [113]:
a[0:5:2]          # index 0 through 4, by 2

array([1, 3, 5])

In [114]:
# iterating with loop
for i in a:
    print(i)

1
2
3
4
5


- For multi-dimensional arrays, each axis had an index.
  - indices are given using tuples (separated by commas)

In [115]:
a = array([[1, 2, 3], [4, 5, 6], [7,8,9]])
print(a)

[[1 2 3]
 [4 5 6]
 [7 8 9]]


In [116]:
a[0,1]    # row 0, column 1

2

In [117]:
a[:,1]    # all elements in column 1

array([2, 5, 8])

In [118]:
a[0:2, 1:3]  # sub array: rows 0-1, and columns 1-2

array([[2, 3],
       [5, 6]])

In [119]:
# "for" iterates over the first index (rows)
for r in a:
    print("--")
    print(r)

--
[1 2 3]
--
[4 5 6]
--
[7 8 9]


- indexing with a boolean mask

In [120]:
a = array([3, 1, 2, 4])
m = array([True, False, False, True])
print("m =", m)
a[m]             # select with a mask

m = [ True False False  True]


array([3, 4])

# multi-dimensional arrays (tensors)

- 3 x 2 x 4 tensor
  - prints as three 2x4 arrays
  - last index is iterated first

In [181]:
a = arange(24)
b = a.reshape((3,2,4))
print(b)

[[[ 0  1  2  3]
  [ 4  5  6  7]]

 [[ 8  9 10 11]
  [12 13 14 15]]

 [[16 17 18 19]
  [20 21 22 23]]]


- indexing is similar to 2-dim arrays (i,j,k)

In [123]:
b[2,0,1]

17

- extract a "slice"

In [124]:
b[1,:]  # i=1

array([[ 8,  9, 10, 11],
       [12, 13, 14, 15]])

In [125]:
b[:,1,:]  # j=1

array([[ 4,  5,  6,  7],
       [12, 13, 14, 15],
       [20, 21, 22, 23]])

In [126]:
b[:,:,1]  # k=1

array([[ 1,  5],
       [ 9, 13],
       [17, 21]])

In [127]:
# iterate over the first index
for s in b:
    print("--")
    print(s)

--
[[0 1 2 3]
 [4 5 6 7]]
--
[[ 8  9 10 11]
 [12 13 14 15]]
--
[[16 17 18 19]
 [20 21 22 23]]


# Array Shape Manipulation
- The shape of an array can be changed

In [128]:
a = array([[1,2,3], [4, 5, 6]])
print(a)
a.shape

[[1 2 3]
 [4 5 6]]


(2, 3)

In [129]:
a.ravel()      # return flattened array (last index iterated first).

array([1, 2, 3, 4, 5, 6])

In [130]:
a.transpose()  # return transposed array (swap rows and columns)

array([[1, 4],
       [2, 5],
       [3, 6]])

In [131]:
a.reshape(3,2)  # return reshaped array

array([[1, 2],
       [3, 4],
       [5, 6]])

In [132]:
a.resize(3,2)   # change the shape directly (modifies a)
print(a)

[[1 2]
 [3 4]
 [5 6]]


# Concatenating arrays

In [133]:
a = array([1, 2, 3])
b = array([4, 5, 6])
concatenate((a,b))

array([1, 2, 3, 4, 5, 6])

In [134]:
c_[a,b]      # concatenate as column vectors

array([[1, 4],
       [2, 5],
       [3, 6]])

In [135]:
r_[a,b]      # concatenate as row vectors

array([1, 2, 3, 4, 5, 6])

# Stacking arrays

In [136]:
a = array([[1, 1],
           [1, 1]])
b = array([[2, 2],
           [2, 2]])
vstack( (a,b) )     # stack vertically

array([[1, 1],
       [1, 1],
       [2, 2],
       [2, 2]])

In [137]:
hstack( (a,b) )     # stack horizontally

array([[1, 1, 2, 2],
       [1, 1, 2, 2]])

# Array Operations
- operators are applied **elementwise**

In [138]:
a = array( [20,30,40,50] )
b = arange( 4 )   # [0 1 2 3]
a - b             # element-wise subtraction

array([20, 29, 38, 47])

In [139]:
b**2              # element-wise exponentiation

array([0, 1, 4, 9])

In [140]:
10*sin(a)         # element-wise product and sin

array([ 9.12945251, -9.88031624,  7.4511316 , -2.62374854])

In [141]:
a < 35            # element-wise comparison

array([ True,  True, False, False])

- product operator (`*`) is **elementwise**
  - i.e., Hadamard product

In [142]:
A = array( [[1,1],
            [0,1]] )
B = array( [[2,0],
            [3,4]] )
A*B                         # elementwise product

array([[2, 0],
       [0, 4]])

- compound assignment: `*=`, `+=`, `-=`
- unary operators

In [143]:
a = array( [[1,2,3], [4, 5, 6]])
a.sum()

21

In [144]:
a.min()

1

In [145]:
a.max()

6

- unary operators on each axis of array

In [146]:
a = array( [[1,2,3], [4, 5, 6]])
a.sum(axis=0)    # sum over rows

array([5, 7, 9])

In [147]:
a.sum(axis=1)    # sum over column

array([ 6, 15])

- Numpy provides functions for other operations (called universal functions)
  - `argmax`, `argmin`, `min`, `max`
  - `average`, `cov`, `std`, `mean`, `median`, 
  - `ceil`, `floor`
  - `cumsum`, `cumprod`, `diff`, `sum`, `prod`
  - `inv`, `dot`, `trace`, `transpose`

# Broadcasting
- any binary operators (+, -, *, etc)
- if the two operands are not the same size
  - broadcasting tries to extend the singleton dimensions of one operand to match the other operand.
  - an Error is thrown if two operands can't be broadcast together.
- operands do not need to have the same number of dimensions
  - match dimensions from the right


In [148]:
a = array( [[1,2,3],
            [4,5,6]] )

In [149]:
b = array( [1,2,3] )

- a and b are not the same dimensions, 
  - b is "stretched" so that it fills in a 2x3 shape
```
a:      2 x 3
b:          3
result: 2 x 3
```


In [150]:
a + b

array([[2, 4, 6],
       [5, 7, 9]])

- c is stretched so that it fills in a 2x3 shape
```
a:      2 x 3
c:      2 x 1
result: 2 x 3
```
  

In [151]:
c = array( [[1],
            [2]] )

In [152]:
a+c

array([[2, 3, 4],
       [6, 7, 8]])

- b and c are both stretched to 2x3 shape
```
b:          3
c:      2 x 1
result: 2 x 3
```

In [153]:
b+c

array([[2, 3, 4],
       [3, 4, 5]])

- "newaxis" can insert an extra dimension
```
b:                3
b[:,newaxis]: 3 x 1
result:       3 x 3
```

In [154]:
b + b[:,newaxis]

array([[2, 3, 4],
       [3, 4, 5],
       [4, 5, 6]])

# Brief Linear Algebra Review
- column vector: $\mathbf{x} = \left[\begin{array}{c} x_1\\\vdots\\x_d\end{array}\right] \in \mathbb{R}^d$
- matrix: $\mathbf{A} = \left[\begin{array}{c}a_{1,1} & \cdots & a_{1,n} \\ \vdots & \ddots & \vdots \\ a_{m,1} & \cdots & a_{m,n}\end{array}\right] \in \mathbb{R}^{m\times n}$


- matrix as collection of column vectors: $\mathbf{A} = \left[\begin{array}{c} | & & | \\ \mathbf{a}_1 & \cdots & \mathbf{a}_n \\ | & & | \end{array}\right]$
  - $\mathbf{a}_i$ is the i-th column of $\mathbf{A}$.

In [155]:
x = array([1,2,3]).reshape((3,1))
print(x)

[[1]
 [2]
 [3]]


In [156]:
A = zeros((3,3))
print(A)

[[0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]]


- Transpose: swap rows and columns
  - $\mathbf{x}^T = [x_1 \cdots x_d]$

In [157]:
z = x.transpose()
print(y)

['A', 'B', 'C']


# Inner product
- Inner product: $\mathbf{x}^T \mathbf{y} = \sum_{i=1}^d x_i y_i$
  - measures the similarity between vectors $\mathbf{x}$ and $\mathbf{y}$.

In [158]:
x = array([1, 2, 3])
y = array([2, 1, 1])
inner(x,y)

- Length (norm): $||\mathbf{x}|| = \sqrt{\mathbf{x}^T\mathbf{x}} = \sqrt{\sum_{i=1}^d x_i^2}$

In [182]:
x = array([1, 2, 3])
linalg.norm(x)

14.750252401517024

- Distance between two vectors: $||\mathbf{x}-\mathbf{y}|| = \sqrt{\sum_{i=1}^d (x_i-y_i)^2}$



In [161]:
y = array([2, 1, 1])
linalg.norm(x-y)

2.449489742783178

- Outerproduct between two vectors: $\mathbf{x}\mathbf{y}^T = \left[\begin{array}{c}y_1 \mathbf{x} & \cdots & y_d \mathbf{x} \end{array}\right]$
  - $\mathbf{x}\mathbf{y}^T = \left[\begin{array}{c}x_1y_1 & \cdots & x_1y_d \\ \vdots & \ddots & \vdots \\ x_dy_1 & \cdots & x_d y_d\end{array}\right]$

In [162]:
x = array([1, 2, 3])
y = array([2, 1, 1])
outer(x,y)

array([[2, 1, 1],
       [4, 2, 2],
       [6, 3, 3]])

# Matrix multiplication
- need compatible dimensions: $\mathbf{C}_{m \times n} = \mathbf{A}_{m\times d} \mathbf{B}_{d\times n}$
- $\mathbf{A} = \left[\begin{array}{c} \fbox{$\begin{array}{c} a_{1,1} & \cdots & a_{1,n}\end{array}$} \\  \begin{array}{c}\vdots & \ddots & \vdots\end{array} \\  \begin{array}{c}a_{m,1} & \cdots & a_{m,n}\end{array}\end{array}\right]$, $\mathbf{B} = \left[\begin{array}{c} 
\fbox{$\begin{array}{c}b_{1,1}\\\vdots\\b_{m,1}\end{array}$}
    &\begin{array}{c}\cdots\\\ddots\\\cdots\end{array}
 &\begin{array}{c}b_{1,n}\\\vdots\\b_{m,n}\end{array}
 \end{array}\right]$
- Entry in $\mathbf{C}$: $c_{i,j} = \mathbf{a}_i \mathbf{b}_j = \sum_{k=1}^d a_{i,d} b_{d,j}$

In [163]:
A = array([[1, 2, 3],
           [2, 1, 0]])
B = array([[-1, 1],
           [0,  1], 
           [1,  0]])
A @ B

array([[ 2,  3],
       [-2,  3]])

# Matrix-Vector multiplication
- Different interpretations if using transpose or not.
- $\mathbf{A}\mathbf{x}$: Linear combination of the columns of $\mathbf{A}$
  - $\mathbf{A}\in\mathbb{R}^{m\times d}, \mathbf{x}\in\mathbb{R}^d$:
  - $\mathbf{y} = \mathbf{A}\mathbf{x} = 
  \left[\begin{array}{c}| & &| \\ \mathbf{a}_1 & \cdots & \mathbf{a}_d\\| && | \end{array}\right]
  \left[\begin{array}{c} x_1 \\ \vdots \\ x_d \end{array}\right]
  = \sum_{i=1}^d x_i \mathbf{a}_i \in \mathbb{R}^m$

In [164]:
A = array([[1, 2],
           [3, 5]])
x = array([-1, 1])
A @ x  # matrix multiplicattion

array([1, 2])

- $\mathbf{A}^T\mathbf{x}$: Vector of inner products with columns of $\mathbf{A}$
  - $\mathbf{A}\in\mathbb{R}^{d\times m}, \mathbf{x}\in\mathbb{R}^d$:
  - $\mathbf{y} = \mathbf{A}^T\mathbf{x} 
  = \left[\begin{array}{c}| & &| \\ \mathbf{a}_1 & \cdots & \mathbf{a}_d\\| && | \end{array}\right]^T\mathbf{x}
  = \left[\begin{array}{c}- & \mathbf{a}_1^T &- \\ & \vdots & \\ - & \mathbf{a}_m^T & - \end{array}\right]\mathbf{x}
  = \left[\begin{array}{c} \mathbf{a}_1^T\mathbf{x} \\ \vdots \\ \mathbf{a}_m^T\mathbf{x} \end{array}\right]
  \in \mathbb{R}^m$

In [165]:
A = array([[1, 2],
           [3, 5]])
x = array([-1, 1])
A.transpose() @ x

array([2, 3])

# Matrix-matrix multiplication
- $\mathbf{A}\mathbf{B}$: $\mathbf{A}$ multiplied by each column of $\mathbf{B}$
  - $\mathbf{A}\mathbf{B} 
  = \mathbf{A} \left[\begin{array}{c}| & &| \\ \mathbf{b}_1 & \cdots & \mathbf{b}_n\\| && | \end{array}\right]
  = \left[\begin{array}{c}| & &| \\ \mathbf{A}\mathbf{b}_1 & \cdots & \mathbf{A}\mathbf{b}_n\\| && | \end{array}\right]$


In [166]:
A = array([[1, 2],
           [2, 1]])
B = array([[-1,1],
           [0, 1]])
A @ B

array([[-1,  3],
       [-2,  3]])

- $\mathbf{A}^T\mathbf{B}$: matrix of inner products between columns of $\mathbf{A}$ and $\mathbf{B}$
  - $\mathbf{A}^T\mathbf{B} 
  = \mathbf{A}^T \left[\begin{array}{c}| & &| \\ \mathbf{b}_1 & \cdots & \mathbf{b}_n\\| && | \end{array}\right]
  = \left[\begin{array}{c}\mathbf{a_1}^T\mathbf{b_1} & \cdots &\mathbf{a_1}^T\mathbf{b_n} \\ \vdots & \ddots & \vdots\\ \mathbf{a_m}^T\mathbf{b_1}& \cdots & \mathbf{a_m}^T\mathbf{b_n}\end{array}\right] = \left[\mathbf{a_i}^T\mathbf{b_j}\right]_{ij}$



In [167]:
A = array([[1, 2],
           [2, 1]])
B = array([[-1,1],
           [0, 1]])
A.transpose() @ B

array([[-1,  3],
       [-2,  3]])

- $\mathbf{A}\mathbf{B}^T$: sum of outer products of between columns of $\mathbf{A}$ and $\mathbf{B}$
  - $\mathbf{A}\mathbf{B}^T
  = \left[\begin{array}{c}| & &| \\ \mathbf{a}_1 & \cdots & \mathbf{a}_n\\| && | \end{array}\right]
    \left[\begin{array}{c}- & \mathbf{b}_1^T &- \\ & \vdots & \\ - & \mathbf{b}_n^T & - \end{array}\right]
  = \sum_{i=1}^n \mathbf{a}_i \mathbf{b}_i^T
$




In [168]:
A = array([[1, 2],
           [2, 1]])
B = array([[-1,1],
           [0, 1]])
A @ B.transpose()

array([[ 1,  2],
       [-1,  1]])

# Copies and Views
- When operating on arrays, data is sometimes copied and sometimes not.
- _No copy is made for simple assignment._
  - **Be careful!**

In [169]:
a = array([1,2,3,4])
b = a               # simple assignment (no copy made!)
b is a              # yes, b references the same object

True

In [170]:
b[1] = -2           # changing b also changes a
a

array([ 1, -2,  3,  4])

- View or shallow copy
  - different array objects can share the same data (called a view)
  - happens when slicing

In [171]:
c = a.view()   # create a view of a
c is a         # not the same object

False

In [172]:
c.base is a    # but the data is owned by a

True

In [173]:
c.shape = 2,2   # change shape of c
c

array([[ 1, -2],
       [ 3,  4]])

In [174]:
a               # but the shape of a is the same

array([ 1, -2,  3,  4])

- Deep copy

In [175]:
d = a.copy()        # create a complete copy of a (new data is created)
d is a              # not the same object

False

In [176]:
d.base is a         # not sharing the same data

False

# Outline
1. Python Intro
2. Python Basics (identifiers, types, operators)
3. Control structures (conditional and loops)
4. Functions, Classes
5. File IO, Pickle, pandas
6. NumPy
7. **matplotlib**

# Visualizing Data
- Use matplotlib package to make plots and graphs
- Works with Jupyter to show plots within the notebook

In [1]:
# setup matplotlib
%matplotlib inline
# setup output image format (Chrome works best)
import IPython.core.display  
IPython.core.display.set_matplotlib_formats("svg") # file format
import matplotlib.pyplot as plt

- Each cell will start a new figure automatically.
- Plots are made piece by piece.

In [2]:
x = linspace(0,2*pi,16)
y = sin(x)
plt.plot(x, y, 'bo-')
plt.grid(True)
plt.ylabel('y label'); plt.xlabel('x label'); plt.title('my title')
plt.show()

NameError: name 'linspace' is not defined

- plot string specifies three things (e.g., `'bo-'`)
  - colors:
    - **b**lue, **r**ed, **g**reen, **m**agenta, **c**yan, **y**ellow, blac**k**, **w**hite
  - markers: 
    - ”.”	point;  “o”	circle
    - “v”	triangle down; “^”	triangle up
    - “<”	triangle left; “>”	triangle right
    - “8”	octagon;  “s”	square
    - “p”	pentagon “*”	star
    - “h”	hexagon1
    - “+”	plus; “x”	x
    - “d”	thin_diamond
  - line styles:
    - '-' solid line
    - '--' dashed line
    - '-.' dash-dotted line
    - ':' dotted lione

# Python Tutorials
- Python - https://docs.python.org/3/tutorial/
- numpy - https://docs.scipy.org/doc/numpy-dev/user/quickstart.html
- “Machine Learning in Action” – Appendix A, Ch. 1
- scikit-learn - http://scikit-learn.org/stable/tutorial/
- matplotlib - http://matplotlib.org/users/pyplot_tutorial.html
- pandas - https://pandas.pydata.org/pandas-docs/stable/tutorials.html

