# Agenda

1. Setup

2. Basics

3. Iterables 

4. Numpy (for math and matrix operations)

5. Matplotlib (for plotting)

In [None]:
# Note: This tutorial is based on Python 3.7
#       but it should apply to all Python 3.X versions
# Please note that this tutorial is NOT exhaustive
# We try to cover everything you need for class assignments
# but you should also navigate external resources
#
# More tutorials:
# NUMPY:
# https://cs231n.github.io/python-numpy-tutorial/#numpy
# https://numpy.org/doc/stable/user/quickstart.html
# MATPLOTLIB:
# https://matplotlib.org/gallery/index.html
# BASICS:
# https://www.w3schools.com/python/
# CONSULT THESE WISELY:
# The official documentation, Google, and Stack-overflow are your friends!

# 1. Setup

You can either install conda in your local machine or use Google Colab. We are going to use Google Colab in this tutorial.

## 1.1. Conda for environment management 

Anaconda: https://www.anaconda.com/ <br>
Miniconda: https://docs.conda.io/en/latest/miniconda.html

[Anaconda or Miniconda?](https://docs.conda.io/projects/conda/en/latest/user-guide/install/download.html#anaconda-or-miniconda)

### Common commands

- list all environments: `conda env list`
- create new environment w/ python 3.7: `conda create -n <envname> python=3.7`
- create environment from config file: `conda env create -f env.yml`
- activate a environment: `conda activate <envname>`
- exit environment: `conda deactivate`
- install package for current environment: `pip install <package>`
- open jupyter in current environment: `jupyter notebook`

### Package installation using conda/pip
- `conda install <package>`: do care about compatibility with installed packages; it would take some time <br>
- `pip install <package>`: do not care about compatibility with installed packages
- Both will install all dependencies, if any

### Recommended IDEs
Pycharm (the most popular choice, compatible with Anaconda) <br>
Visual Studio Code (for other programming languages as well) <br>
Spyder (in-built in Anaconda) <br>
Notepad++ (if you just need a lightweight text editor)

In [None]:
## common anaconda commands
#conda env list
#conda create -n <envname> python=3.8
#conda env create -f env.yml
#conda activate <envname>
#conda deactivate
## install packages
#conda install <package>
#pip install <package>

## 1.2. Google Colab Setup
Before getting started we need to run some boilerplate code to set up our environment. You'll need to rerun this setup code each time you start the notebook.

First, run this cell load the [autoreload](https://ipython.readthedocs.io/en/stable/config/extensions/autoreload.html?highlight=autoreload) extension. This allows us to edit `.py` source files, and re-import them into the notebook for a seamless editing and debugging experience.

In [5]:
%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


Next we need to run a few commands to set up our environment on Google Colab. If you are running this notebook on a local machine you can skip this section.

Run the following cell to mount your Google Drive. Follow the link, sign in to your Google account (the same account you used to store this notebook!) and copy the authorization code into the text box that appears below.

In [7]:
import google

In [8]:
from google.colab import drive
drive.mount('/content/drive')

  from IPython.utils import traitlets as _traitlets


AttributeError: module 'IPython.utils.traitlets' has no attribute 'Unicode'

Now recall the path in your Google Drive where you uploaded this notebook, fill it in below. If everything is working correctly then running the folowing cell should print the filenames:

```
['python101.py', 'python101.ipynb']
```

In [None]:
import os

# TODO: Fill in the Google Drive path where you uploaded the assignment
# Example: If you create a "Colab Notebooks" folder and put all the files under this folder
GOOGLE_DRIVE_PATH_AFTER_MYDRIVE = 'Colab Notebooks'
GOOGLE_DRIVE_PATH = os.path.join('drive', 'My Drive', GOOGLE_DRIVE_PATH_AFTER_MYDRIVE)
print(os.listdir(GOOGLE_DRIVE_PATH))

Once you have successfully mounted your Google Drive and located the path to this assignment, run the following cell to allow us to import from the `.py` files of this assignment. If it works correctly, it should print the message:

```
Hello from python101.py!
```

as well as the last edit time for the file `python101.py`.

In [None]:
import sys
sys.path.append(GOOGLE_DRIVE_PATH)

import time, os
os.environ["TZ"] = "Asia/Seoul"
time.tzset()

from python101 import hello
hello()
hello('print me')

python101_path = os.path.join(GOOGLE_DRIVE_PATH, 'python101.py')
python101_edit_time = time.ctime(os.path.getmtime(python101_path))
print('python101.py last edited on %s' % python101_edit_time)

# 2. Basics

https://www.w3schools.com/python/

In [None]:
# input and output
name = input()
print("hello, " + name)

In [None]:
print("print with new line")
print("print without new line", end="")
print()
print("print multiple variables separated by a space:", name, 1, 3.0, True)

In [None]:
# line comment
"""
block 
comments
"""
print('nothing to print')

In [None]:
# variables don't need explicit declaration
var = "hello" # string
var = 10.0    # float
var = 10      # int
var = True    # boolean
var = [1,2,3] # pointer to list
var = None    # empty pointer

In [None]:
# type conversions
var = 10
print(int(var))
print(str(var))
print(float(var))

In [None]:
# basic math operations
var = 10
print("var + 4 =", var + 4)
print("var - 4 =", var - 4)
print("var * 4 =", var * 4)
print("var ^ 4 =", var ** 4)
print("mod(var, 3) =", var % 3)
print("int(var) / 4 =", var // 4)   # // for int division
print("float(var) / 4 =", var / 4)  # / for float division
# All compound assignment operators available
# including += -= *= **= /= //= 
# pre/post in/decrementers not available (++ --)

In [None]:
# basic boolean operations include "and", "or", "not"
print("not True is", not True)
print("True and False is", True and False)
print("True or False is", True or False)

In [None]:
# String operations
# '' and "" are equivalent
s = "String"
#s = 'Mary said "Hello" to John'
# s = "Mary said \"Hello\" to John"
# print(s)

# basic
print(len(s)) # get length of string and any iterable type
print(s[0]) # get char by index
print(s[1:3]) # [1,3)
print("This is a " + s + "!")

# handy tools
print(s.lower())
print(s*4)
print("ring" in s)
print(s.index("ring"))

# slice by delimiter
print("I am a sentence".split(" "))
# concatenate a list of string using a delimiter
print("...".join(['a','b','c']))

# formatting variables
print("Formatting a string like %.2f"%(0.12345))
print("Or like {} {}!".format(s.upper(), s.lower()))
print("Or like {str1} {str2}!".format(str2=s.upper(), str1=s.lower()))
print(f"Or like {s.upper()} {s.lower()}! for python>=3.6")

In [None]:
# control flows
# NOTE: No parentheses or curly braces
#       Indentation is used to identify code blocks
#       So never ever mix spaces with tabs
for i in range(0, 5):
    for j in range(i, 5):
        print("inner loop ", [i, j])
    print("outer loop ", i)

In [None]:
# if-else
var = 10
if var > 10:
    print(">")
elif var == 10:
    print("=")
else:
    print("<")

# pass when do nothing
if True:
    pass
else:
    print("do something")

In [None]:
# use "if" to check null pointer or 0 or empty arrays
var = None
if var: 
    print(var)
var = 0
if var:
    print(var)
var = []
if var:
    print(var)
var = "object"
if var:
    print(var)

In [None]:
# while-loop
var = 5
while var > 0:
    print(var)
    var -=1

In [None]:
# for-loop
"""
equivalent to
for (int i = 0; i < 3; i++)
"""
for i in range(3):  # prints 0 1 2
    print(i)
    
print("-------")
# range (start-inclusive, stop-exclusive, step)
"""
equivalent to
for (int i = 2; i > -3; i-=2)
"""
for i in range(2, -3, -2): 
    print(i)

In [None]:
# define function
def func(a, b):
    c= a + b
    return c
print(func(1,3))

In [None]:
# use default parameters and pass values by parameter name
def rangeCheck(a, min_val=0, max_val=10):
    return min_val < a < max_val    # syntactic sugar
out = rangeCheck(5, max_val=4)
print(out)
out = rangeCheck(5, min_val=3, max_val=6)
print(out)

In [None]:
# define class
class Foo:

    # optinal constructor
    def __init__(self, x):
        # first parameter "self" for instance reference, like "this" in JAVA
        self.x = x
        # class methods are runnable only in the same class
        self.printClass()

    # static variable
    y = 1

    # instance method
    def printX(self): # instance reference is required for all function parameters
        print(self.x)

    # class method, useful when inherit class; most likely you will never need this
    @classmethod
    def printClass(self):
        print(self.y)

    # static method, you don't need an instance to call this
    @staticmethod
    def printStatic():
        print("hello")

print('create a Foo instance')
obj = Foo(6)

print('call a Foo instance method')
obj.printX()
# Foo.printX() # invalid

print('call a Foo class method')
obj.printClass()
Foo.printClass()

print('call a Foo static method')
obj.printStatic()
Foo.printStatic()

In [None]:
# class inheritance - inherits variables and methods
class Bar(Foo):
    pass
obj = Bar(3)
obj.printX()

# 3. Iterables

In [None]:
alist = list()  # linear, size not fixed, not hashable
atuple = tuple() # linear, fixed size, hashable
adict = dict()  # hash table, not hashable, stores (key,value) pairs
aset = set()    # hash table, like dict but only stores keys
acopy = alist.copy() # shallow copy
import copy
aacopy = copy.deepcopy(alist) # deep copy
print(len(alist)) # gets size of any iterable type

a_list = [0,1,2,3,4,5,6,7,8,9]
print(a_list)

a_range = range(10) # not defined until actually retrieved
print(a_range)
print(list(a_range)) # cast to list

print(sum(a_list))
print(sum(a_range))

In [None]:
# examplar tuple usage
# creating a dictionary to store ngram counts
d = dict()
d[("a","cat")] = 10
# d[["a","cat"]] = 11 # invalid
print(d)

In [None]:
"""
List: not hashable (i.e. can't use as dictionary key)
      dynamic size
      allows duplicates and inconsistent element types
      dynamic array implementation
"""
# list creation
alist = []          # empty list, equivalent to list()
alist = [1,2,3,4,5] # initialized list

print(alist[0])
alist[0] = 5
print(alist)
alist[0] = 1
print(alist)


print("-"*10)
# list indexing
print(alist[0]) # get first element (at index 0)
print(alist[-1]) # get last element (at index len-1)
print(alist[3:]) # get elements starting from index 3 (inclusive: [3,len))
print(alist[:3]) # get elements stopping at index 3 (exclusive [0,3))
print(alist[2:4]) # get elements within index range [2,4)
print(alist[6:]) # prints nothing because index is out of range
print(alist[::-1]) # returns a reversed list

print("-"*10)
# list modification
alist.append("new item") # insert at end
print(alist)
alist.insert(0, "new item") # insert at index 0
print(alist)
alist.extend([2,3,4]) # concatenate lists
print(alist)
# above line is equivalent to alist += [2,3,4]
i = alist.index("new item") # search by content
print(i)
j = alist.index("new item", i+1) # search by content, next
print(j)
alist.remove("new item") # remove by content
print(alist)
alist.pop(0) # remove by index
print(alist)

print("-"*10)
if "new item" in alist:
    print("found")
else:
    print("not found")

print("-"*10)
# list traversal
for ele in alist:
    print(ele)

print("-"*10)
# or traverse with index
for i, ele in enumerate(alist):
    print(i, ele)

In [None]:
"""
Tuple: hashable (i.e. can use as dictionary key)
       fixed size (no insertion or deletion)
"""
# it does not make sense to create empty tuples
atuple = (1,)
print(atuple)
atuple = (1,2,3,4,5)
print(atuple)
 # or you can cast other iterables to tuple
atuple = tuple([1,2,3])
print(atuple)

# indexing and traversal are same as list
print(atuple[1:])

In [None]:
"""
Named tuples for readibility
"""
from collections import namedtuple
Point = namedtuple('Point', 'x y')
pt1 = Point(1.0, 5.0)
pt2 = Point(2.5, 1.5)
print(pt1.x, pt1.y)

In [None]:
"""
Dict: not hashable 
      dynamic size
      no duplicates allowed
      hash table implementation which is fast for searching
"""
# dict creation
adict = {} # empty dict, equivalent to dict()
print(adict)
adict = {'a':1, 'b':2, 'c':3}
print(adict)

# get all keys in dictionary
print(adict.keys())

# get value paired with key
print(adict['a'])

# NOTE: accessing keys not in the dictionary leads to exception
key = 'e'
if key in adict:
    print(adict[key])
    
# add or modify dictionary entries
adict['e'] = 10 # insert new key
adict['e'] = 5  # modify existing keys

print("-"*10)
# traverse keys only
for key in adict:
    print(key, adict[key])

print("-"*10)
# or traverse key-value pairs together
for key, value in adict.items():
    print(key, value)

print("-"*10)
# NOTE: Checking if a key exists
key = 'e'
if key in adict: # NO .keys() here please!
    print(adict[key])
else:
    print("Not found!")

In [None]:
"""
Special dictionaries 
"""
# set is a dictionary without values
aset = set() # caution: {} is an empty dictionary, not an empty set
print(aset)
aset.add('a')
print(aset)

# deduplication short-cut using set
alist = [1,2,3,3,3,4,3]
alist = list(set(alist))
print(alist)

# default_dictionary returns a value computed from a default function
#     for non-existent entries
from collections import defaultdict
adict = defaultdict(lambda: 'unknown')
adict['cat'] = 'feline'
print(adict['cat'])
print(adict['dog'])

In [None]:
# counter is a dictionary with default value of 0
#     and provides handy iterable counting tools
from collections import Counter

# initialize and modify empty counter
counter1 = Counter()
print(counter1)
counter1['t'] = 10
print(counter1)
counter1['t'] += 1
print(counter1)
counter1['e'] += 1
print(counter1)
print("-"*10)

# initialize counter from iterable
counter2 = Counter("letters to be counted")
print(counter2)
print(counter2["t"])
print("-"*10)

# computations using counters
print("1,", counter1 + counter2)
print("2,", counter1 - counter2)
print("3,", counter1 or counter2) # or for intersection, and for union

In [None]:
# sorting
a = [4,6,1,7,0,5,1,8,9]
a = sorted(a)
print(a)
a = sorted(a, reverse=True)
print(a)

In [None]:
# sorting
a = [("cat",1), ("dog", 3), ("bird", 2)]
a = sorted(a)
print(a)
# user-defined function as a key
def pick_second(x):
  return x[1]
c = sorted(a, key=pick_second)
print(c)
# lambda function as a key
b = sorted(a, key=lambda x:x[1])
print(b)

In [None]:
# useful in dictionary sorting
adict = {'cat':3, 'bird':1}
print(sorted(adict.items(), key=lambda x:x[1]))

In [None]:
# Syntax sugar: one-line control flow + list operation
x = [1,2,3,5,3]
"""
for i in range(len(sent)):
    sent[i] = sent[i].lower().split(" ")
""" 
x1 = [xx*3 + 5 for xx in x]
print(x1)

x2 = [xx*3 + 5 for xx in x if xx < 3]
print(x2)

# Use this for deep copy!
# copy = [obj.copy() for obj in original]

In [None]:
# Syntax sugar: * operator for repeating iterable elements
print("-"*10)
print([1]*10)

# Note: This only repeating by value
#       So you cannot apply the trick on reference types

# To create a double list
# DONT
doublelist = [[]]*10 # this copies a pointer to an empty list
print(doublelist)
doublelist[0].append(1)
print(doublelist)
# DO
doublelist = [[] for _ in range(10)] # this defines pointers to independent empty lists
print(doublelist)
doublelist[0].append(1)
print(doublelist)

# 4. Numpy
Very powerful python tool for handling matrices and higher dimensional arrays

In [None]:
import numpy as np

In [None]:
# create arrays
a = np.array([[1,2],[3,4],[5,6]])
print(a)
print(a.shape)
# create all-zero/one arrays
b = np.ones((3,4)) # np.zeros((3,4)), np.full((3,4),-1)
print(b)
print(b.shape)
# create identity matrix
c = np.eye(5)
print(c)
print(c.shape)
# create random matrix with standard normal init
d = np.random.normal(size=(5,5))
print(d)

In [None]:
# reshaping arrays
a = np.arange(8)         # [8,] all vectors are column by default
b = a.reshape(1,-1)      # [1,8] row vector -- -1 for auto-fill
c = a.reshape((4,2))     # shape [4,2]
d = a.reshape((2,2,-1))  # shape [2,2,2]
e = d.flatten()          # shape [8,]
f = np.expand_dims(a, 0) # shape [1,8] ; or a[None,:]
g = np.expand_dims(a, 1) # shape [8,1] ; or a[:,None]
h = e.squeeze()          # shape[8, ]    -- remove all unnecessary dimensions
print('a:', a.shape)
print(a)
print('b:', b.shape)
print(b)
print('c:', c.shape)
print(c)
print('d:', d.shape)
print(d)
print('e:', e.shape)
print(e)
print('f:', f.shape)
print(f)
print('g:', g.shape)
print(g)
print('h:', h.shape)
print(h)

In [None]:
# be careful about vectors!
a = np.array([1,2,3]) # this is a 3-d column vector, which you cannot transpose
print(a)
print(a.shape)
print(a.T.shape)
b = a.reshape(-1, 1) # this is a 3x1 matrix, which you can transpose
print(b)
print(b.shape)
print(b.T)
print(b.T.shape)

In [None]:
# concatenating arrays
a = np.ones((4,3))
b = np.ones((4,3))
c = np.concatenate([a,b], axis=0)
print(c.shape)
d = np.concatenate([a,b], axis=1)
print(d.shape)

In [None]:
# access array slices by index
a = np.zeros([10, 10])
a[:3] = 1
a[:, :3] = 2
a[:3, :3] = 3
rows = [4,6,7]
cols = [9,3,5]
a[rows, cols] = 4 # assign values in (4,9), (6,3), (7,5)
print(a)

In [None]:
# transposition
a = np.arange(24).reshape(2,3,4)
print(a.shape)
print(a)
a = np.transpose(a, (2,1,0)) # swap 0th and 2nd axes
print(a.shape)
print(a)

In [None]:
c = np.array([[1,2],[3,4]])
print(np.linalg.inv(c))
# pinv is pseudo inversion for stability
print(np.linalg.pinv(c))
# To compute c^-1 b
b = np.array([1, 1])
print(np.linalg.inv(c)@b)
print(np.linalg.solve(c,b)) # preferred!

In [None]:
# vector dot product
v1 = np.array([1,2])
v2 = np.array([3,4])
print(v1.dot(v2))
print(np.dot(v1,v2))
print(v1@v2)
print(np.sum(v1*v2)) # v1*v2 is element-wise product

In [None]:
# vector outer product
print(np.outer(v1,v2))
print(v1.reshape(-1,1).dot(v2.reshape(1,-1)))

In [None]:
# Matrix multiply vector (Ax)
m = np.array([1,2,3,4]).reshape(2,2)
v = np.array([1,2])
print(m)
print(v)
print(m @ v)
print(m.dot(v))
print(np.matmul(m, v))

In [None]:
# matrix multiplication
a = np.ones((4,3)) # 4,3
b = np.ones((3,2)) # 3,2 --> 4,2
print(a @ b)
print(a.dot(b))
print(np.matmul(a,b))

In [None]:
# broadcasting
c = np.ones([4,4])
d = np.array([1,2,3,4]).reshape(4,1)
print(c.shape)
print(d.shape)
# automatic repetition along axis
print(c + d)

In [None]:
# computing pairwise distance (using broadcasting)
samples = np.random.random([15, 5]) # suppose this is a collection of 15 5-d vectors
diff=samples[:,np.newaxis,:]-samples[np.newaxis] # np.newaxis to introduce new axis; None has the same effect
print(samples[:,np.newaxis,:].shape)
print(samples[np.newaxis,:,:].shape)
print(diff.shape)
print(diff[2,3])
print(samples[2] - samples[3])

In [None]:
# speed test: numpy vs list
a = np.ones((50,50))
b = np.ones((50,50))

def matrix_multiplication(X, Y):
    result = [[0]*len(Y[0]) for _ in range(len(X))]
    for i in range(len(X)):
        for j in range(len(Y[0])):
            for k in range(len(Y)):
                result[i][j] += X[i][k] * Y[k][j]
    return result

import time

# run numpy matrix multiplication for 10 times
start = time.time()
for _ in range(10):
    a @ b
end = time.time()
print("numpy spends {} seconds".format(end-start))

# run list matrix multiplication for 10 times
start = time.time()
for _ in range(10):
    matrix_multiplication(a,b)
end = time.time()
print("list operation spends {} seconds".format(end-start))

# the difference gets more significant as matrices grow in size!

In [None]:
# other common operations
# a = np.ones((4,4))
a = np.arange(1,17).reshape([4,4])
print(a)
print('default norm along with 0-th axis: ', np.linalg.norm(a, axis=0))
print('  default norm: ', np.linalg.norm(a))
print('      l_1 norm: ', np.linalg.norm(a, ord=1))
print('      l_2 norm: ', np.linalg.norm(a, ord=2))
print('    l_inf norm: ', np.linalg.norm(a, ord=np.inf))
print('Frobenius norm: ', np.linalg.norm(a, ord='fro'))
print(np.sum(a)) # sum all elements in matrix
print(np.sum(a, axis=0)) # sum along axis 0
print(np.sum(a, axis=1)) # sum along axis 1
# element-wise operations, for examples
print(np.log(a))
print(np.exp(a))
print(np.sin(a))
# operation with scalar is interpreted as element-wise
print(a * 3)
print(a.tolist())
print(a.tolist() * 3) # caution: python default list duplicates items

In [None]:
# invalid operations result in an NaN
a = np.array(0)
b = np.array(0)
print(a/b)

# 5. Matplotlib
Powerful tool for visualization <br/>
Many tutorials online. We only go over the basics here <br/>

In [None]:
import numpy as np
import matplotlib.pyplot as plt

In [None]:
# line plot
x = np.arange(0, 2, 0.01)
y = 1+np.sin(2*np.pi*x)
plt.plot(x,y)
plt.show()

In [None]:
# scatter plot
x = [1,3,2]
y = [1,2,3]
plt.scatter(x,y)
plt.show()

In [None]:
# bar plots
plt.bar(x,y)
plt.show()

In [None]:
# plot configurations
x = [1,2,3]
y1 = [1,3,2]
y2 = [4,0,4]

# set figure size
plt.figure(figsize=(5,5))

# set axes
plt.xlim(0,5)
plt.ylim(0,5)
plt.xlabel("x label")
plt.ylabel("y label")

# add title
plt.title("My Plot")

plt.plot(x,y1, label="data1", color="red", marker="*", dashes=[5,1])
plt.plot(x,y2, label="data2", color="green", marker=".")
plt.grid()
plt.legend()
plt.show()

In [None]:
# subplots
f, ax = plt.subplots(2,2,figsize=(5,5))
ax[0][0].plot(x,y)
ax[0][1].scatter(x,y)
ax[1][0].bar(x,y)
ax[1][1].hist(x,y)
plt.show()

In [None]:
# plot area under curve
probs = [1, 1, 0.95, 0.9, 0.9, 0.8, 0.7, 0.6, 0.5, 0.4]
thres = np.arange(0,1,0.1)
plt.fill_between(x=thres, y1=probs, y2=0, step='post')
plt.show()

In [None]:
import seaborn as sn
import matplotlib.pyplot as plt

array = [[13,1,1,0,2,0],
         [3,9,6,0,1,0],
         [0,0,16,2,0,0],
         [0,0,0,13,0,0],
         [0,0,0,0,15,0],
         [0,0,1,0,0,15]]
labels = 'A B C D E F'.split(' ')
sn.heatmap(array, annot=True, annot_kws={"size": 16}, cmap="rocket_r")
plt.xticks(ticks=np.arange(len(labels))+0.5,labels=labels)
plt.yticks(ticks=np.arange(len(labels))+0.5,labels=labels,rotation=0)
plt.show()