# Python lesson plans

**Goal:** writing efficient _Python_ code with a focus on readability and usage of Python's constructs as intended (i.e. pythonic code)

**The Zen of Python:** (by Tim Peters)
- beautiful is better than ugly
- explicit is better than implicit 
- simple is better than complex
- complex is better than complicated
- flat is better than nested
- sparse is better than dense
- readability counts
- special cases aren't special enough to break the rules
- practicality beats purity
- errors should never pass silently unless explicitly silenced
- in the face of ambiguity, refuse the temptation to guess


Variable assignment conventions: 
- Not permitted: variables cannot start with numbers, can't use operators, spaces, slash, ., ?, !, $, #
- Permitted: letters, digits only after letters, underscores

In [1]:
var = 2 # variables sart with lower letter

In [2]:
var

2

In [3]:
Array1 = [1,2,3] #array list

In [4]:
Array1

[1, 2, 3]

In [5]:
2*Array1

[1, 2, 3, 1, 2, 3]

In [6]:
Array2 = ([1,2],[3,4])

In [7]:
Array2

([1, 2], [3, 4])

In [8]:
Array2 + Array2

([1, 2], [3, 4], [1, 2], [3, 4])

In [9]:
A_B = 1

In [10]:
A_B

1

Conditional statement conventions: 
- If else do not go into paranthesis, curly brackets or square brackes are not used
- Note spaces and indentations 

In [11]:
if var == 1: 
    print("var is 1")
else : 
    print("var is not 1")
    

var is not 1


In [12]:
names = ['Jerry', 'Tom', 'Hardy', 'George']

better_list = []
for name in names:
    if len(name) >= 6:
        better_list.append(name)
print(better_list)

['George']


In [13]:
print ("1")
print ("2")
print ("3")
print ("1, 2, 3")

1
2
3
1, 2, 3


In [14]:
print ("1, 2, 3")

1, 2, 3


For loop conventions:  
- for is a keyword among a few others in python
- explore arrays using 'for' loops or 'while'

In [15]:
for i in Array1:
    print(i)
    

1
2
3


In [16]:
i = 15
while i>10:
    print(i)
    i = i-1
    

15
14
13
12
11


_Note 1:_ Loops can be costly and inefficient. The goal is to reduce the number or usage of loops and while, and instead iterate over arrays using **map()** or use **numpy** arrays to perform calculations on all arrays all at once (e.g. np.mean, np.array, etc).

_Note 2:_ Loops can be modified to write better loops by understanding what is being done with each loop iteration. 
- If a calculation is done one time, move it outside or above the loop.  
- Use holistic conversions outside or below the loop.   

# Control flow and functions
- Use def to define the functions 
- Starts with indentation, and can get from zero to infinite number of arguments  
- Arguments can have names, or default values  
- Functions can have a `side effect(s)` and/or have `return value(s)` 
- Can use while, for, and if else conditionals

In [1]:
# this is an example of a function with return values
def double_me(what_to_double):
    print(what_to_double)
    return what_to_double * 2 

In [2]:
double_me(2) #out comment from jupyter shows that this operation has a return value

2


4

In [3]:
# this is an example of a function with side effects
def print_me(what_to_print, what_to_print_next): 
    print(what_to_print)
    print(what_to_print_next)

In [4]:
print_me("me", "you") #me, cat, mouse... are string literal
print_me("cat", "dog")
print_me("mouse", 1)


me
you
cat
dog
mouse
1


In [5]:
# can assign a different name to function
shadow_print_me = print_me

In [8]:
shadow_print_me(1,2)

1
2


# Data Structures
- classes have members or properties (e.g. family_one is an object/instance/member of the class)
- functions and classes can be combined (cross-talk between classes and functions): when functions are in a class, they are called methods (i.e. the first parameters are always the classes)
- the first name for methods are normally self (i.e. family_three.calculate_tax())
- object oriented programming performs encapsulation by using class
- another advantage of object oriented programming is polymorphism

Other data structures:
- non-structured dataset
- structured datasets


In [26]:
class bc_family:
    member_count = 0
    postal_code = ""
    total_income = 100

    def calculate_tax(self):
        return (self.total_income / self.member_count) * 0.2

In [27]:
family_one = bc_family()

In [28]:
family_two = bc_family()

In [29]:
family_three = bc_family()

In [30]:
family_one.member_count = 5 

In [31]:
family_two.member_count = 2

In [32]:
family_three.member_count = 4

In [33]:
family_one.total_income = 100000

In [34]:
family_two.total_income = 50000

In [35]:
family_three.total_income = 150000

In [36]:
family_one

<__main__.bc_family at 0x119ccf630>

In [37]:
family_two

<__main__.bc_family at 0x119dcbc18>

In [16]:
def calculate_tax(family):
    return (family.total_income / family.member_count) * 0.2

In [17]:
calculate_tax(family_one)

4000.0

In [18]:
calculate_tax(family_two)

5000.0

In [19]:
calculate_tax(family_three)

7500.0

In [25]:
bc_family

__main__.bc_family

In [38]:
family_one.calculate_tax()

4000.0

In [39]:
family_two.calculate_tax()

5000.0

In [40]:
family_three.calculate_tax()

7500.0

# Python standard library and built-in functions:
- Built-in types include: list, tuple, set, dict
- Built-in functions include: print(), len(), range(), round(), enumerate(), map(), zip(), and others   
-- shorthand syntax for list is [], for dict is {}, and for tuple is ()
- Built-in modules: os, sys, itertools, collections, math, and others

In [45]:
# Create a list using the formal name
formal_list = list()
print(formal_list)

# Create a list using the literal syntax
literal_list = []
print(literal_list)

# Print out the type of formal_list
print(type(formal_list))

[]
[]
<class 'list'>


In [46]:
nums = range(1,10) #use range to create a list
print(nums)

range(1, 10)


In [47]:
# Create a list of integers (0-50) using list comprehension
nums_list_comp = [num for num in range(51)]
print(nums_list_comp)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50]


In [48]:
# Create a list of integers (0-50) by unpacking range
nums_unpack = [*range(51)]
print(nums_unpack)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50]


In [49]:
list(nums)

[1, 2, 3, 4, 5, 6, 7, 8, 9]

In [50]:
even_nums = range(2,11,2) 
print(list(even_nums))

[2, 4, 6, 8, 10]


In [51]:
letters = ['a', 'z', 'x', 'b']
index_letters = enumerate(letters) #create an indexed list of objects
print(list(index_letters))


[(0, 'a'), (1, 'z'), (2, 'x'), (3, 'b')]


In [52]:
# another way of combining objects are with zip
names = ['a', 'z', 'x', 'b']
numbers = [40, 45, 50, 55]
combine_names_numbers = zip(names, numbers)
print(type(combine_names_numbers))

combine_names_numbers_list = [*combine_names_numbers] #each item is a tuple of the lists
print(combine_names_numbers_list)

<class 'zip'>
[('a', 40), ('z', 45), ('x', 50), ('b', 55)]


In [53]:
# can also make combos using a function without side-effects
names = ['a', 'z', 'x', 'b']
numbers = ['1', '2', '3', '4']

def combo(name, num):
    result = [] 
    for x in name:
        for y in num:
            result.append((x,y))
            result.append((y,x))
    return result

In [40]:
combo(names, numbers)

[('a', '1'),
 ('1', 'a'),
 ('a', '2'),
 ('2', 'a'),
 ('a', '3'),
 ('3', 'a'),
 ('a', '4'),
 ('4', 'a'),
 ('z', '1'),
 ('1', 'z'),
 ('z', '2'),
 ('2', 'z'),
 ('z', '3'),
 ('3', 'z'),
 ('z', '4'),
 ('4', 'z'),
 ('x', '1'),
 ('1', 'x'),
 ('x', '2'),
 ('2', 'x'),
 ('x', '3'),
 ('3', 'x'),
 ('x', '4'),
 ('4', 'x'),
 ('b', '1'),
 ('1', 'b'),
 ('b', '2'),
 ('2', 'b'),
 ('b', '3'),
 ('3', 'b'),
 ('b', '4'),
 ('4', 'b')]

In [60]:
# faster and counts objects ordering them from lowest to highest
from collections import Counter
type_counts = Counter(names)
print(list(names))

['a', 'z', 'x', 'b']


In [61]:
# Import combinations from itertools
from itertools import combinations

# Create a combination object with pairs of lists
combos_obj = combinations(names, 2)
print(type(combos_obj), '\n')

# Convert combos_obj to a list by unpacking
names_2 = [*names]
print(names_2, '\n')

<class 'itertools.combinations'> 

['a', 'z', 'x', 'b'] 



In [62]:
nums = [1.1, 2.0, 5.4, 9.8]
rnd_nums = map(round, nums) #apply a function to an object (i.e. map to round numbers)
print(list(rnd_nums))

[1, 2, 5, 10]


In [63]:
nums = [1, 2, 5, 4, 9, 8]
sqrd_nums = map(lambda x:x **2, nums) #apply a built in map function to anonymous function with lambda
print(list(sqrd_nums))

[1, 4, 25, 16, 81, 64]


In [64]:
# Create a range object that goes from 0 to 5
nums = range(6)
print(type(nums))

<class 'range'>


In [65]:
# Convert nums to a list
nums_list = list(nums)
print(nums_list)


[0, 1, 2, 3, 4, 5]


In [66]:
# Create a new list of odd numbers from 1 to 11 by unpacking a range object
nums_list2 = [*range(1,12,2)]
print(nums_list2)

[1, 3, 5, 7, 9, 11]


In [67]:
# Rewrite for loop to use enumerate
indexed_names = []
for i,name in enumerate(names):
    index_name = (i,name)
    indexed_names.append(index_name) 
print(indexed_names)


[(0, 'a'), (1, 'z'), (2, 'x'), (3, 'b')]


In [68]:
# Rewrite the above for loop using list comprehension
indexed_names_comp = [(i,name) for i,name in enumerate(names)]
print(indexed_names_comp)

[(0, 'a'), (1, 'z'), (2, 'x'), (3, 'b')]


In [69]:
# Unpack an enumerate object with a starting index of one
indexed_names_unpack = [*enumerate(names, 1)]
print(indexed_names_unpack)

[(1, 'a'), (2, 'z'), (3, 'x'), (4, 'b')]


In [70]:
# Use map to apply str.upper to each element in names
names_map  = map(str.upper, names)

# Print the type of the names_map
print(type(names_map))

# Unpack names_map into a list
names_uppercase = [*names_map]

# Print the list created by unpacking the map object
print(names_uppercase)

<class 'map'>
['A', 'Z', 'X', 'B']


# Set Theory
- Branch of mathematics applied to a collection of objects, i.e. sets
- Python has a built-in set datatype with accompanying methods:  
-- intersection(): all elements that are in both sets  
-- difference(): all elements in one set but not the other    
-- symmetric_difference(): all elements in exactly one set  
-- union(): all elements that are in either set   
- allows for membership testing using to see if a value exists in a sequence or not (using in operator) 

In [73]:
list_a = ['a', 'b', 'c', 'd']
list_b = ['a', 'b', 'x', 'y']
set_a = set(list_b)
print(set_a)

set_b = set(list_b)
print(set_b)

set_a.intersection(set_b) # one simple line of code and no need for loop
                          # also the run time is much faster
    
set_b.difference(set_a) # to find the difference

set_a.symmetric_difference(set_b) # to explore differences

set_a.union(set_b) # to combine the two sets

%timeit 'x' in set_a # to explore membership
print('x' in set_a)

{'y', 'b', 'x', 'a'}
{'y', 'b', 'x', 'a'}
33.4 ns ± 0.572 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
True


In [74]:
# can also use set to convert a list to a set
list_a = ['a', 'b', 'c', 'd']
list_a_set = set(list_a)
print(list_a_set)

{'b', 'd', 'a', 'c'}


# The power of numpy arrays

Intro to numpy features:
- numpy arrays are an alternative to python lists but more efficient
- arrays, vectors, and matrices 
- less verbose
- matrix multiplications
- eigen values and eigen vectors
- solving linear systems

In [20]:
import numpy as np

In [76]:
Array = np.array([1,2,3])
print(Array)

[1 2 3]


In [77]:
# find the type of each element using dtype
Array.dtype #returns integer

dtype('int64')

In [78]:
for e in Array:
    print (e)

1
2
3


In [79]:
Array + Array

array([2, 4, 6])

In [80]:
2* Array

array([2, 4, 6])

In [81]:
Array ** 2 #squaring arrays all at once

array([1, 4, 9])

In [82]:
np.sqrt(Array) #square root of array2

array([1.        , 1.41421356, 1.73205081])

In [83]:
np.log(Array) #log of array2

array([0.        , 0.69314718, 1.09861229])

In [84]:
np.exp(Array) #exponential of array2

array([ 2.71828183,  7.3890561 , 20.08553692])

In [85]:
np.sum(Array)

6

In [86]:
np.linalg.norm(Array)

3.7416573867739413

In [87]:
M = np.matrix([[1,5], [6,8]])

In [88]:
M

matrix([[1, 5],
        [6, 8]])

In [89]:
print(M[1,:]) # to print the second row of the matrix

[[6 8]]


In [90]:
print(M[M>6]) # Print all elements of nums that are greater than six

[[8]]


In [91]:
# Double every element of M
M_dbl = M * 2
print(M_dbl)

[[ 2 10]
 [12 16]]


In [92]:
# Replace the second column of M
M[:,1] = M[:,1] + 1
print(M)

[[1 6]
 [6 9]]


In [93]:
A = np.array(M)

In [94]:
A

array([[1, 6],
       [6, 9]])

In [95]:
# Create a list of arrival times
arrival_times = [*range(10,60,10)] # intervals by 10
print(arrival_times)

# Convert arrival_times to an array and update the times
arrival_times_np = np.array(arrival_times)
new_times = arrival_times_np - 3

print(new_times)


[10, 20, 30, 40, 50]
[ 7 17 27 37 47]


In [96]:
A.T #use T for transpose

array([[1, 6],
       [6, 9]])

In [97]:
Z = np.zeros(11) #generate an array

In [98]:
Z

array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

In [99]:
Z = np.zeros((11,11)) #generate a marix of zeros

In [100]:
Z

array([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]])

In [101]:
X = np.ones((10,10)) #generate a marix of ones

In [102]:
X

array([[1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]])

In [103]:
Y = np.random.random((10,10))

In [104]:
Y

array([[0.35579184, 0.7747728 , 0.48992445, 0.10885725, 0.13355319,
        0.24313407, 0.33213702, 0.47434992, 0.27057785, 0.67484345],
       [0.18027774, 0.68205535, 0.21582425, 0.45393513, 0.98849781,
        0.06727891, 0.86296432, 0.85806475, 0.70462158, 0.41747367],
       [0.63585985, 0.79375969, 0.19900827, 0.42952425, 0.99007405,
        0.45055262, 0.65327612, 0.65199999, 0.18807456, 0.67610806],
       [0.45626637, 0.9112365 , 0.65579309, 0.23352172, 0.12605694,
        0.8874374 , 0.09226782, 0.35317557, 0.99983465, 0.28922897],
       [0.6113417 , 0.72609919, 0.75346267, 0.94643661, 0.80181448,
        0.2040487 , 0.6776737 , 0.79612692, 0.4359737 , 0.74743377],
       [0.77624566, 0.34085088, 0.56797176, 0.6472214 , 0.04323505,
        0.77159063, 0.30089441, 0.68352157, 0.65685336, 0.73470674],
       [0.11851079, 0.26598064, 0.5814779 , 0.91118739, 0.10549918,
        0.12440416, 0.17892103, 0.68772458, 0.35065721, 0.52550234],
       [0.13088131, 0.73353483, 0.2962186

In [105]:
G = np.random.randn(10,10)

In [106]:
G

array([[ 0.91143767, -0.55808767,  0.00659945,  0.52263833, -2.16127977,
        -0.46332669,  0.70056805,  0.90228063, -0.0034718 ,  0.23001443],
       [-0.215752  , -0.56601005, -1.5166436 ,  0.05202422, -0.22106233,
        -0.4372219 , -0.08597568,  0.50040426, -0.83263128,  0.95423562],
       [-0.32844142, -0.04040685,  0.50025567,  0.09797489,  0.22086392,
         2.24726242, -0.15895142, -0.13304282,  0.5930657 ,  1.18404819],
       [-0.50300548, -0.10871293,  0.2526728 , -0.25144401, -1.93836694,
        -0.62753417,  2.01746912, -0.57860827, -0.75252584,  0.93085931],
       [ 0.09158355,  1.5593627 , -0.47570951,  0.96725051, -0.50873691,
        -0.49310754, -0.86189119, -1.70871096,  0.66671084, -0.03423764],
       [ 0.1890798 ,  1.6355293 , -0.56832202, -0.10865605,  1.57663891,
         1.07309038,  0.34715536, -0.86404805, -1.58937919,  0.96551855],
       [ 0.66597886,  0.20410198, -0.05542084, -0.45059691, -1.60308066,
         1.61762353,  0.66918669, -1.91099481

In [107]:
G.mean() #calculate mean

-0.04795203813450835

In [108]:
G.var() #calculate values

0.7907787429157171

In [109]:
Ginv = np.linalg.inv(G) #inverse to get the identity matrix

In [110]:
Ginv

array([[ 3.40667537e-02,  9.44379942e-01, -4.15435671e-01,
        -3.89445865e-01, -2.24921366e-01,  1.94162315e-01,
         2.90961408e-01, -1.00660962e-01, -1.02055935e+00,
         6.80575293e-02],
       [ 2.21624389e-01, -6.63183758e-01,  1.00196004e-01,
         2.01282229e-01,  2.47317663e-01,  2.27911896e-01,
        -2.69942987e-01,  3.15555574e-01,  5.43593908e-01,
         1.15181236e-01],
       [-3.10418267e-01,  7.40339754e-01, -2.99086989e-01,
        -5.22664346e-02, -1.23989201e-01,  1.20189413e-01,
         1.70599791e-01, -2.65831390e-01, -9.22834502e-01,
         5.80342569e-01],
       [ 8.76247490e-01, -1.48940254e+00,  3.80743331e-01,
        -1.35036897e-01,  2.93430437e-01,  1.37334473e-01,
        -2.77579325e-01, -3.12579698e-01,  1.04418097e+00,
        -5.75744433e-01],
       [-7.33195154e-03, -3.66635964e-01, -3.84937555e-02,
        -1.30651417e-01, -1.43623084e-01,  1.26683567e-01,
        -1.01514942e-01, -1.09743113e-01,  1.09083327e-01,
        -3.

In [111]:
Ginv.dot(G) #identity matrix 

array([[ 1.00000000e+00,  7.60033988e-17,  2.13517610e-16,
        -4.52536655e-17, -9.93054323e-17, -5.09558810e-16,
         3.05347233e-16,  5.96396316e-17, -2.22044605e-16,
         2.22044605e-16],
       [-2.84692107e-17,  1.00000000e+00, -2.47164525e-16,
         1.08480991e-16, -1.35802811e-16,  7.48797161e-17,
        -1.92210660e-16, -2.66643859e-17,  2.22044605e-16,
         0.00000000e+00],
       [-3.00219265e-16, -2.26401162e-16,  1.00000000e+00,
         1.94145305e-16,  2.58063247e-16, -2.41582513e-16,
        -2.10725669e-16, -1.25739864e-16, -4.44089210e-16,
         3.33066907e-16],
       [-9.50082748e-19,  3.04589605e-16,  1.85457926e-16,
         1.00000000e+00, -5.34997789e-16,  2.79112903e-16,
         1.81781241e-16, -1.80168058e-16,  9.99200722e-16,
        -7.77156117e-16],
       [ 2.65295269e-17,  1.82784619e-17,  2.42367218e-16,
        -2.92833500e-17,  1.00000000e+00, -8.44248075e-17,
         2.88052832e-16,  2.40765860e-17,  3.60822483e-16,
        -2.

In [112]:
G.dot(Ginv) #identity matrix

array([[ 1.00000000e+00, -1.38055898e-16, -5.49393457e-17,
         7.18839082e-17,  6.27706948e-17,  2.62980792e-17,
        -3.78463659e-17,  1.87019114e-16,  1.66533454e-16,
         0.00000000e+00],
       [-1.81131382e-16,  1.00000000e+00, -1.34209205e-16,
        -7.73686750e-18, -6.69638919e-17,  1.38917786e-16,
         1.42127635e-16, -6.84638787e-18, -6.66133815e-16,
         4.44089210e-16],
       [ 1.18573194e-17,  2.87648417e-16,  1.00000000e+00,
        -4.98818752e-17, -9.42438559e-18, -1.44275683e-17,
         1.68132762e-17, -5.87739840e-18, -2.84494650e-16,
         6.93889390e-17],
       [ 3.97593601e-17, -1.05165315e-16,  1.26014407e-16,
         1.00000000e+00,  6.36169736e-17,  1.94815966e-17,
         3.50348372e-17,  7.53306068e-18,  0.00000000e+00,
        -1.11022302e-16],
       [-2.84316910e-16,  1.12999365e-15, -2.38657537e-17,
         1.08428650e-16,  1.00000000e+00, -7.24111646e-18,
         8.38442850e-17,  6.47684507e-17, -1.66533454e-16,
         2.

In [113]:
np.linalg.det(G) #matrix determinant

-698.2796205607748

In [114]:
np.diag(G) #diagonal elements in a vector

array([ 0.91143767, -0.56601005,  0.50025567, -0.25144401, -0.50873691,
        1.07309038,  0.66918669, -0.01757793, -1.70479781, -0.66598481])

In [115]:
np.diag([1,2]) 

array([[1, 0],
       [0, 2]])

In [41]:
# to find the most efficient code, we can time it!
%timeit np.random.rand(1000) # can use built-in 'examining time' function to measure how much time calculations take

# can also set the number of runs (-r)and number of loops (-n)
%timeit -r2 -n10 np.random.rand(1000) 

#can also run timeit in cell code by using two magic %% codes and also save the output in (-o) flag
time = %timeit -o np.random.rand(1000)

18.2 µs ± 3.17 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
The slowest run took 10.17 times longer than the fastest. This could mean that an intermediate result is being cached.
102 µs ± 83.7 µs per loop (mean ± std. dev. of 2 runs, 10 loops each)
16.6 µs ± 1.23 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)


In [117]:
## code profiling allows us to analyze code line-by-line using package: line_profiler
## this has applications for memory usage (or memory foot print)
#pip install line_profiler
#%load_ext line_profiler
#%lprun -f # this will profile a function line-by-line

## to calculate the memory usage
#pip install memory_profiler 
#%load_ext memory_profiler
#%mprun -f 

import sys
sys.getsizeof(np.array(range(20)))


256

In [1]:
## imagine a list of 480 superheroes has been loaded into your session called (heroes)
## as well as a list of each hero's corresponding publisher (called publishers)

def get_publisher_heroes(heroes, publishers, desired_publisher):

    desired_heroes = []

    for i,pub in enumerate(publishers):
        if pub == desired_publisher:
            desired_heroes.append(heroes[i])

    return desired_heroes

def get_publisher_heroes_np(heroes, publishers, desired_publisher):

    heroes_np = np.array(heroes)
    pubs_np = np.array(publishers)

    desired_heroes = heroes_np[pubs_np == desired_publisher]

    return desired_heroes


# Use get_publisher_heroes() to gather Star Wars heroes
star_wars_heroes = get_publisher_heroes(heroes, publishers, 'George Lucas')

print(star_wars_heroes)
print(type(star_wars_heroes))

# Use get_publisher_heroes_np() to gather Star Wars heroes
star_wars_heroes_np = get_publisher_heroes_np(heroes, publishers, 'George Lucas')

print(star_wars_heroes_np)
print(type(star_wars_heroes_np))


NameError: name 'heroes' is not defined

In [118]:
np.outer(M.T,M)

array([[ 1,  6,  6,  9],
       [ 6, 36, 36, 54],
       [ 6, 36, 36, 54],
       [ 9, 54, 54, 81]])

In [119]:
np.inner(M.T,M)

matrix([[ 37,  60],
        [ 60, 117]])

In [120]:
M.dot(M.T) #matrix multiplication

matrix([[ 37,  60],
        [ 60, 117]])

In [121]:
np.diag(M).sum()

10

In [122]:
np.trace(M)

10

In [123]:
S = np.random.randn(100,3) #synthetic random data with 100 samples and 3 features

In [124]:
cov = np.cov(S.T)

In [125]:
cov # to calculate coveriance of a marix, we need to transpose it first

array([[ 1.04166055,  0.08876389, -0.04038847],
       [ 0.08876389,  0.94483293,  0.0034369 ],
       [-0.04038847,  0.0034369 ,  0.97412574]])

In [126]:
cov.shape

(3, 3)

In [127]:
np.linalg.eigh(cov)# eigenvalues for symmetric and hermitian matrix

(array([0.88547882, 0.97216958, 1.10297081]),
 array([[-0.52766844,  0.07587895, -0.84605461],
        [ 0.80485449,  0.36313726, -0.4694045 ],
        [-0.27161603,  0.92864079,  0.25268759]]))

In [128]:
np.linalg.eig(cov) #regular eig and eigh gives the same in this case but possible to get others

(array([1.10297081, 0.88547882, 0.97216958]),
 array([[ 0.84605461,  0.52766844,  0.07587895],
        [ 0.4694045 , -0.80485449,  0.36313726],
        [-0.25268759,  0.27161603,  0.92864079]]))

In [129]:
np.linalg.inv(M.T).dot(M) #there is a better way to do this with solve

matrix([[1., 0.],
        [0., 1.]])

In [130]:
np.linalg.solve(M.T, M) #get the same answer as above, always use solve not inverse

matrix([[1.00000000e+00, 0.00000000e+00],
        [1.23358114e-17, 1.00000000e+00]])

# Loading data into Python


In [38]:
# data input
X = []

In [35]:
open("data.csv")

<_io.TextIOWrapper name='data.csv' mode='r' encoding='UTF-8'>

In [11]:
"a.b.c".split(".") # STRING METHODS

['a', 'b', 'c']

In [14]:
 "  a.b.c ".strip() # STRING METHODS

'a.b.c'

In [15]:
"@a.b.c@".strip("@") # STRING METHODS

'a.b.c'

In [16]:
"@a.b.c@".upper() # STRING METHODS

'@A.B.C@'

In [17]:
"@a.B.c@".lower() # STRING METHODS

'@a.b.c@'

In [125]:
X = []
for line in open("data.csv"):
    row = line.split(",")
    sample = list(map(str.strip, row)) # paython 2 and 3 are different in that python 3 uses list of map
    sample = list(map(float, sample)) # float is a type 
    X.append(sample)

In [126]:
X = np.array(X)

In [127]:
X.shape

(21, 3)

In [128]:
X

array([[2.0000e+03, 9.0990e+03, 1.2000e+01],
       [2.0000e+03, 9.0100e+03, 1.1000e+01],
       [2.0000e+03, 9.0110e+03, 1.1000e+01],
       [2.0000e+03, 9.0120e+03, 1.1000e+01],
       [2.0000e+03, 9.0130e+03, 1.1000e+01],
       [2.0000e+03, 9.0240e+03, 1.1000e+01],
       [2.0000e+03, 9.0350e+03, 1.1000e+01],
       [2.0000e+03, 9.0460e+03, 1.1000e+01],
       [2.0000e+03, 9.0470e+03, 1.1000e+01],
       [2.0000e+03, 9.0480e+03, 1.1000e+01],
       [2.0000e+03, 9.0590e+03, 1.1000e+01],
       [2.0000e+03, 5.7160e+03, 6.0000e+00],
       [2.0000e+03, 5.7160e+03, 6.0000e+00],
       [2.0000e+03, 5.7160e+03, 8.0000e+00],
       [2.0000e+03, 5.7170e+03, 9.0000e+00],
       [2.0000e+03, 5.7200e+03, 9.0000e+00],
       [2.0000e+03, 6.1932e+04, 6.1000e+01],
       [2.0000e+03, 6.1932e+04, 6.1000e+01],
       [2.0000e+03, 6.1932e+04, 6.1000e+01],
       [2.0000e+03, 6.1932e+04, 6.1000e+01],
       [2.0000e+03, 6.1932e+04, 6.1000e+01]])

# The power of Pandas 
- Pandas is a library used for data analysis
- Pandas is built on top of numpy
- The main data structure is the dataframe   
-- tabular data with labeled rows and columns  
-- build on top of the numpy array structure  

In [4]:
import pandas as pd

In [5]:
X = pd.read_csv("data.csv", header = None)

In [6]:
type(X)

pandas.core.frame.DataFrame

In [7]:
X.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 21 entries, 0 to 20
Data columns (total 3 columns):
0    21 non-null int64
1    21 non-null int64
2    21 non-null int64
dtypes: int64(3)
memory usage: 584.0 bytes


In [8]:
X.head()

Unnamed: 0,0,1,2
0,2000,9099,12
1,2000,9010,11
2,2000,9011,11
3,2000,9012,11
4,2000,9013,11


In [9]:
X.head(10)

Unnamed: 0,0,1,2
0,2000,9099,12
1,2000,9010,11
2,2000,9011,11
3,2000,9012,11
4,2000,9013,11
5,2000,9024,11
6,2000,9035,11
7,2000,9046,11
8,2000,9047,11
9,2000,9048,11


In [38]:
X.head()

Unnamed: 0,0,1,2
0,2000,9099,12
1,2000,9010,11
2,2000,9011,11
3,2000,9012,11
4,2000,9013,11


In [10]:
X.shape

(21, 3)

In [21]:
X.iloc[0:10] # if you want rows, need to give it index (10 rows)

Unnamed: 0,0,1,2
0,2000,9099,12
1,2000,9010,11
2,2000,9011,11
3,2000,9012,11
4,2000,9013,11
5,2000,9024,11
6,2000,9035,11
7,2000,9046,11
8,2000,9047,11
9,2000,9048,11


In [22]:
X.iloc[0]

0    2000
1    9099
2      12
Name: 0, dtype: int64

In [23]:
X.iloc[-1] # to get the last row

0     2000
1    61932
2       61
Name: 20, dtype: int64

In [33]:
X.iloc[:,1:] # for all rows

Unnamed: 0,1,2
0,9099,12
1,9010,11
2,9011,11
3,9012,11
4,9013,11
5,9024,11
6,9035,11
7,9046,11
8,9047,11
9,9048,11


In [30]:
X.iloc[:5,:1]

Unnamed: 0,0
0,2000
1,2000
2,2000
3,2000
4,2000


In [12]:
M = X.as_matrix()

  """Entry point for launching an IPython kernel.


In [14]:
type(M)

numpy.ndarray

In [15]:
X[0] # pandas is header based: returns the column with the name of 0

0     2000
1     2000
2     2000
3     2000
4     2000
5     2000
6     2000
7     2000
8     2000
9     2000
10    2000
11    2000
12    2000
13    2000
14    2000
15    2000
16    2000
17    2000
18    2000
19    2000
20    2000
Name: 0, dtype: int64

In [37]:
X[1]

0      9099
1      9010
2      9011
3      9012
4      9013
5      9024
6      9035
7      9046
8      9047
9      9048
10     9059
11     5716
12     5716
13     5716
14     5717
15     5720
16    61932
17    61932
18    61932
19    61932
20    61932
Name: 1, dtype: int64

In [39]:
type(X[0])

pandas.core.series.Series

In [57]:
type(X[0] < 5)

pandas.core.series.Series

In [40]:
X.ix[0]

.ix is deprecated. Please use
.loc for label based indexing or
.iloc for positional indexing

See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#ix-indexer-is-deprecated
  """Entry point for launching an IPython kernel.


0    2000
1    9099
2      12
Name: 0, dtype: int64

In [42]:
X[[0,1,2]]

Unnamed: 0,0,1,2
0,2000,9099,12
1,2000,9010,11
2,2000,9011,11
3,2000,9012,11
4,2000,9013,11
5,2000,9024,11
6,2000,9035,11
7,2000,9046,11
8,2000,9047,11
9,2000,9048,11


In [47]:
X[X[1]>6000]

Unnamed: 0,0,1,2
0,2000,9099,12
1,2000,9010,11
2,2000,9011,11
3,2000,9012,11
4,2000,9013,11
5,2000,9024,11
6,2000,9035,11
7,2000,9046,11
8,2000,9047,11
9,2000,9048,11


In [53]:
X[X.iloc[:,1]>6000] # same as line 47

Unnamed: 0,0,1,2
0,2000,9099,12
1,2000,9010,11
2,2000,9011,11
3,2000,9012,11
4,2000,9013,11
5,2000,9024,11
6,2000,9035,11
7,2000,9046,11
8,2000,9047,11
9,2000,9048,11


In [60]:
type(X[[0,1,2]])

pandas.core.frame.DataFrame

In [61]:
X.columns = ["h1", "h2", "h3"]

In [64]:
X

Unnamed: 0,h1,h2,h3
0,2000,9099,12
1,2000,9010,11
2,2000,9011,11
3,2000,9012,11
4,2000,9013,11
5,2000,9024,11
6,2000,9035,11
7,2000,9046,11
8,2000,9047,11
9,2000,9048,11


In [66]:
X["h1"]

0     2000
1     2000
2     2000
3     2000
4     2000
5     2000
6     2000
7     2000
8     2000
9     2000
10    2000
11    2000
12    2000
13    2000
14    2000
15    2000
16    2000
17    2000
18    2000
19    2000
20    2000
Name: h1, dtype: int64

In [67]:
X.iloc[:,0] # all rows, first column

0     2000
1     2000
2     2000
3     2000
4     2000
5     2000
6     2000
7     2000
8     2000
9     2000
10    2000
11    2000
12    2000
13    2000
14    2000
15    2000
16    2000
17    2000
18    2000
19    2000
20    2000
Name: h1, dtype: int64

In [70]:
X.h1

0     2000
1     2000
2     2000
3     2000
4     2000
5     2000
6     2000
7     2000
8     2000
9     2000
10    2000
11    2000
12    2000
13    2000
14    2000
15    2000
16    2000
17    2000
18    2000
19    2000
20    2000
Name: h1, dtype: int64

In [77]:
X["hellos"] = "hello"

In [78]:
X["goodbyes"] = "goodbye"

In [79]:
X

Unnamed: 0,h1,h2,h3,hellos,goodbye,goodbyes
0,2000,9099,12,hello,goodbye,goodbye
1,2000,9010,11,hello,goodbye,goodbye
2,2000,9011,11,hello,goodbye,goodbye
3,2000,9012,11,hello,goodbye,goodbye
4,2000,9013,11,hello,goodbye,goodbye
5,2000,9024,11,hello,goodbye,goodbye
6,2000,9035,11,hello,goodbye,goodbye
7,2000,9046,11,hello,goodbye,goodbye
8,2000,9047,11,hello,goodbye,goodbye
9,2000,9048,11,hello,goodbye,goodbye


In [81]:
X.head()

Unnamed: 0,h1,h2,h3,hellos,goodbye,goodbyes
0,2000,9099,12,hello,goodbye,goodbye
1,2000,9010,11,hello,goodbye,goodbye
2,2000,9011,11,hello,goodbye,goodbye
3,2000,9012,11,hello,goodbye,goodbye
4,2000,9013,11,hello,goodbye,goodbye


# Importing data with header 

In [104]:
Y = pd.read_csv("data_1.csv", engine = "python", skipfooter =2) # skips the last two rows 

In [105]:
Y

Unnamed: 0,YEAR,TIME,MONTH
0,2000,9099,12
1,2000,9010,11
2,2000,9011,11
3,2000,9012,11
4,2000,9013,11
5,2000,9024,11
6,2000,9035,11
7,2000,9046,11
8,2000,9047,11
9,2000,9048,11


In [88]:
from datetime import datetime

In [92]:
# test datetime function
datetime.strptime("1999-09-22","%Y-%m-%d")

datetime.datetime(1999, 9, 22, 0, 0)

In [94]:
Y["time_year"] = Y.apply(lambda row: row["YEAR"], axis = 1)

In [95]:
Y

Unnamed: 0,YEAR,TIME,MONTH,time_year
0,2000,9099,12,2000
1,2000,9010,11,2000
2,2000,9011,11,2000
3,2000,9012,11,2000
4,2000,9013,11,2000
5,2000,9024,11,2000
6,2000,9035,11,2000
7,2000,9046,11,2000
8,2000,9047,11,2000
9,2000,9048,11,2000


In [106]:
Y["time_year"] = Y[["YEAR", "MONTH"]].apply(lambda row: "-".join(row.astype(str)), axis = 1)

In [107]:
Y

Unnamed: 0,YEAR,TIME,MONTH,time_year
0,2000,9099,12,2000-12
1,2000,9010,11,2000-11
2,2000,9011,11,2000-11
3,2000,9012,11,2000-11
4,2000,9013,11,2000-11
5,2000,9024,11,2000-11
6,2000,9035,11,2000-11
7,2000,9046,11,2000-11
8,2000,9047,11,2000-11
9,2000,9048,11,2000-11


In [111]:
Y["date"] = Y.apply(lambda row: datetime.strptime(row["time_year"], "%Y-%m"), axis = 1) # axis = 1 gives row

In [112]:
Y

Unnamed: 0,YEAR,TIME,MONTH,time_year,separate,date
0,2000,9099,12,2000-12,2000-12-01,2000-12-01
1,2000,9010,11,2000-11,2000-11-01,2000-11-01
2,2000,9011,11,2000-11,2000-11-01,2000-11-01
3,2000,9012,11,2000-11,2000-11-01,2000-11-01
4,2000,9013,11,2000-11,2000-11-01,2000-11-01
5,2000,9024,11,2000-11,2000-11-01,2000-11-01
6,2000,9035,11,2000-11,2000-11-01,2000-11-01
7,2000,9046,11,2000-11,2000-11-01,2000-11-01
8,2000,9047,11,2000-11,2000-11-01,2000-11-01
9,2000,9048,11,2000-11,2000-11-01,2000-11-01


In [113]:
Y.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 19 entries, 0 to 18
Data columns (total 6 columns):
YEAR         19 non-null int64
TIME         19 non-null int64
MONTH        19 non-null int64
time_year    19 non-null object
separate     19 non-null datetime64[ns]
date         19 non-null datetime64[ns]
dtypes: datetime64[ns](2), int64(3), object(1)
memory usage: 992.0+ bytes


In [None]:
Y.iloc[()]