# 1. List Comprehension

Form of list comprehension:

[x for x in 'source of list' if 'condition']

In [1]:
students = [
    ['Harry',3.5,3,True],
    ['Ron', 2.7,2,True],
    ['Hermes',3.9,1,False],
    ['Dan',2.9,4,True],
    ['Chad',3.1,4,False]
]

In [3]:
#Goal: get the name of every student
names = [student[0] for student in students]
names

['Harry', 'Ron', 'Hermes', 'Dan', 'Chad']

In [6]:
# Goal: get the length of the name of each student
len_names = [len(student[0]) for student in students]
len_names

[5, 3, 6, 3, 4]

In [20]:
#Goal: get the name of every student who is a 4th year
fourth_year_students = [student[0] for student in students if student[2] == 4]
fourth_year_students

['Dan', 'Chad']

In [23]:
#Goal: get every 4th year or in-state student
fourth_year_or_instate = [student for student in students if(student[2] == 4 or student[3] == True)]
fourth_year_or_instate

[['Harry', 3.5, 3, True],
 ['Ron', 2.7, 2, True],
 ['Dan', 2.9, 4, True],
 ['Chad', 3.1, 4, False]]

### Nested comprehension list

In [28]:
#Goal: get the data type of each piece in 'students'
students1 = [[type(item) for item in student] for student in students]
students1

[[str, float, int, bool],
 [str, float, int, bool],
 [str, float, int, bool],
 [str, float, int, bool],
 [str, float, int, bool]]

In [46]:
%%timeit
import numpy as np
l1 = np.array([1,2,3,4,5,6])
l2 = l1 ** 2
l3 = l2.sum()
l4 = sum(l2)

15.7 µs ± 1.43 µs per loop (mean ± std. dev. of 7 runs, 100000 loops each)


In [40]:
#Goal: get the square of each even integer between 2 and 10
l1 = [x**2 for x in range(2,11) if x%2 == 0]
l1

[4, 16, 36, 64, 100]

In [44]:
#Goal: 'Flatten' the following matrix

M = [[1,2,3],[4,5,6],[7,8,9]]
M_flat = [item for row in M for item in row]
M_flat

[1, 2, 3, 4, 5, 6, 7, 8, 9]

# 2. Dictionary = Hash Map = Hash Table

### A way to quickly lookup a value given a key

#### Why using dictionary is fast for looking up items?
What does d['key'] do?
Internally:
(1) 'key' => [hash function h] => h('key')
(2) h('key') maps to a memory location
(3) the value of the key is at the same memory location

So, the computational cost is O(1) for look up using dictionaries
##### The location of the memory is important, not the number of items in the dictionary.
##### Important note: if your problem is about looking up something with a key, dictionary is a good choice but if you need to sort things, dictionary is not a good choice.

### Implications
Due to the memory location: The keys of a dictionary or hash map need to be immutable. So, they cannot be changed.

Immutable data types are: int, float, string, tuple

Mutable data types: list, set, dictionary

#### Basic dictionary functions

In [49]:
d = {'red':[1,2,3,4], 'blue':4, 'black':(1,2,5)}

In [50]:
d.keys()

dict_keys(['red', 'blue', 'black'])

In [51]:
d.values()

dict_values([[1, 2, 3, 4], 4, (1, 2, 5)])

In [52]:
d.items()

dict_items([('red', [1, 2, 3, 4]), ('blue', 4), ('black', (1, 2, 5))])

# 3. Vectorization

Vectorization tries to use the parallel computing capabilities. Vectorization functions apply to the entire list at the same time rather than applying on the elements one by one, which extensively speed up the computation.

We use Numpy library in Python for vectorization because of three main properties:
1. Parallel computation.
2. Same data types of the list elements.
3. Localization that uses the same area in the memory for the matrix.


#### Example 1. Adding two lists

In [2]:
# Importing Numpy library
import numpy as np

In [3]:
# Create two random vectors
n = 100000
v1 = np.random.rand(n)
v2 = np.random.rand(n)

##### Adding two lists elements without vectorization (using for loop) and compute the computational time

In [14]:
%%timeit
sum_vectors = []
for i in range(n):
    sum_vectors.append(v1[i]+v2[i])

40.1 ms ± 3.52 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


##### Adding two lists elements using vectorization

It is not possible to use this method for regular lists in Python.

In [21]:
%%timeit
sum_vectorized = v1 + v2

79.7 µs ± 8.72 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)


#### Example2. Matrix Multiplication

In [24]:
n = 100
A = np.random.rand(n,n)
B = np.random.rand(n,n)
C = np.zeros((n,n))

##### Multiplying two matrix without using vectorization using for loops.

In [29]:
%%timeit
for i in range(n):
    row = A[i]
    for j in range(n):
        column = B[:,j]
        sum1 = 0
        for k in range(n):
            sum1 += row[k] * column[k]
        C[i,j] = sum1

358 ms ± 24.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


##### Multiplying two matrix using vectorization.

In [33]:
%%timeit
C1 = np.dot(A,B)

42.8 µs ± 2.91 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
