# 1. List Comprehension

Form of list comprehension:

[x for x in 'source of list' if 'condition']

In [1]:
students = [
    ['Harry',3.5,3,True],
    ['Ron', 2.7,2,True],
    ['Hermes',3.9,1,False],
    ['Dan',2.9,4,True],
    ['Chad',3.1,4,False]
]

In [3]:
#Goal: get the name of every student
names = [student[0] for student in students]
names

['Harry', 'Ron', 'Hermes', 'Dan', 'Chad']

In [6]:
# Goal: get the length of the name of each student
len_names = [len(student[0]) for student in students]
len_names

[5, 3, 6, 3, 4]

In [20]:
#Goal: get the name of every student who is a 4th year
fourth_year_students = [student[0] for student in students if student[2] == 4]
fourth_year_students

['Dan', 'Chad']

In [23]:
#Goal: get every 4th year or in-state student
fourth_year_or_instate = [student for student in students if(student[2] == 4 or student[3] == True)]
fourth_year_or_instate

[['Harry', 3.5, 3, True],
 ['Ron', 2.7, 2, True],
 ['Dan', 2.9, 4, True],
 ['Chad', 3.1, 4, False]]

### Nested comprehension list

In [28]:
#Goal: get the data type of each piece in 'students'
students1 = [[type(item) for item in student] for student in students]
students1

[[str, float, int, bool],
 [str, float, int, bool],
 [str, float, int, bool],
 [str, float, int, bool],
 [str, float, int, bool]]

In [46]:
%%timeit
import numpy as np
l1 = np.array([1,2,3,4,5,6])
l2 = l1 ** 2
l3 = l2.sum()
l4 = sum(l2)

15.7 µs ± 1.43 µs per loop (mean ± std. dev. of 7 runs, 100000 loops each)


In [40]:
#Goal: get the square of each even integer between 2 and 10
l1 = [x**2 for x in range(2,11) if x%2 == 0]
l1

[4, 16, 36, 64, 100]

In [44]:
#Goal: 'Flatten' the following matrix

M = [[1,2,3],[4,5,6],[7,8,9]]
M_flat = [item for row in M for item in row]
M_flat

[1, 2, 3, 4, 5, 6, 7, 8, 9]

# 2. Dictionary = Hash Map = Hash Table

### A way to quickly lookup a value given a key

#### Why using dictionary is fast for looking up items?
What does d['key'] do?
Internally:
(1) 'key' => [hash function h] => h('key')
(2) h('key') maps to a memory location
(3) the value of the key is at the same memory location

So, the computational cost is O(1) for look up using dictionaries
##### The location of the memory is important, not the number of items in the dictionary.
##### Important note: if your problem is about looking up something with a key, dictionary is a good choice but if you need to sort things, dictionary is not a good choice.

### Implications
Due to the memory location: The keys of a dictionary or hash map need to be immutable. So, they cannot be changed.

Immutable data types are: int, float, string, tuple

Mutable data types: list, set, dictionary

#### Basic dictionary functions

In [49]:
d = {'red':[1,2,3,4], 'blue':4, 'black':(1,2,5)}

In [50]:
d.keys()

dict_keys(['red', 'blue', 'black'])

In [51]:
d.values()

dict_values([[1, 2, 3, 4], 4, (1, 2, 5)])

In [52]:
d.items()

dict_items([('red', [1, 2, 3, 4]), ('blue', 4), ('black', (1, 2, 5))])

# 3. Vectorization

Vectorization tries to use the parallel computing capabilities. Vectorization functions apply to the entire list at the same time rather than applying on the elements one by one, which extensively speed up the computation.

We use Numpy library in Python for vectorization because of three main properties:
1. Parallel computation.
2. Same data types of the list elements.
3. Localization that uses the same area in the memory for the matrix.


#### Example 1. Adding two lists

In [2]:
# Importing Numpy library
import numpy as np

In [3]:
# Create two random vectors
n = 100000
v1 = np.random.rand(n)
v2 = np.random.rand(n)

##### Adding two lists elements without vectorization (using for loop) and compute the computational time

In [14]:
%%timeit
sum_vectors = []
for i in range(n):
    sum_vectors.append(v1[i]+v2[i])

40.1 ms ± 3.52 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


##### Adding two lists elements using vectorization

It is not possible to use this method for regular lists in Python.

In [21]:
%%timeit
sum_vectorized = v1 + v2

79.7 µs ± 8.72 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)


#### Example2. Matrix Multiplication

In [24]:
n = 100
A = np.random.rand(n,n)
B = np.random.rand(n,n)
C = np.zeros((n,n))

##### Multiplying two matrix without using vectorization using for loops.

In [29]:
%%timeit
for i in range(n):
    row = A[i]
    for j in range(n):
        column = B[:,j]
        sum1 = 0
        for k in range(n):
            sum1 += row[k] * column[k]
        C[i,j] = sum1

358 ms ± 24.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


##### Multiplying two matrix using vectorization.

In [33]:
%%timeit
C1 = np.dot(A,B)

42.8 µs ± 2.91 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)


# Python Interview Practice for Data Science Interview

In [6]:
# Assume the list below
nums = [1,2,3,4,5,6,7,8,9,10]

1. Use for loop to generate list nums

In [7]:
list1 = []
for i in range(1,11):
    list1.append(i)
print(list1)

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]


2. Use list comprehension to generate list nums

In [8]:
list2 = [i for i in range(1,11)]
print(list2)

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]


3. Calculate n*n for each n in nums (hint: it's better to use list comprehension but you can use for loop too)

In [10]:
list3 = [n*n for n in nums]
print(list3)

[1, 4, 9, 16, 25, 36, 49, 64, 81, 100]


4. Calculate n*n for each in nums by using map and lambda.

In [14]:
list4 = list(map(lambda x:x**2, nums))
list4

[1, 4, 9, 16, 25, 36, 49, 64, 81, 100]

5. Print out the 'n' for each 'n' in nums if 'n' is even. (Because you have a condition you should use filter rather than map)

In [20]:
# Using for loop (simplest form)
list5 = []
for i in nums:
    if i%2 == 0:
        list5.append(i)
print(list5)

[2, 4, 6, 8, 10]


In [23]:
# Using map & lambda
list6 = list(map(lambda n:n%2==0,nums))
list6

[False, True, False, True, False, True, False, True, False, True]

In [22]:
# Using filter & lambda
list7 = list(filter(lambda n: n%2 == 0, nums))
list7

[2, 4, 6, 8, 10]

In [24]:
# Using list comprehension
list8 = [n for n in nums if n%2 == 0]
list8

[2, 4, 6, 8, 10]

##### Let's practice with string and numbers

6. Print out a (letter, number) pair for each letter in 'abcd' and each number in '0123'

In [25]:
# Using for loop (simplest form!)
list9 = []
for letter in 'abcd':
    for number in range(4):
        list9.append((letter,number))
print(list9)

[('a', 0), ('a', 1), ('a', 2), ('a', 3), ('b', 0), ('b', 1), ('b', 2), ('b', 3), ('c', 0), ('c', 1), ('c', 2), ('c', 3), ('d', 0), ('d', 1), ('d', 2), ('d', 3)]


In [26]:
# Using list comprehension
list10 = [(letter,number) for letter in 'abcd' for number in range(4)]
print(list10)

[('a', 0), ('a', 1), ('a', 2), ('a', 3), ('b', 0), ('b', 1), ('b', 2), ('b', 3), ('c', 0), ('c', 1), ('c', 2), ('c', 3), ('d', 0), ('d', 1), ('d', 2), ('d', 3)]


##### Let's practice dictionary and dictionary comprehension

In [27]:
fruits = ['apple','orange','banana','blackburry']
colors = ['red','orange','yellow','black']

7. Use the lists 'fruits' and 'colors' to create a dictionary in the form of {'fruit':'color'} for the fruits that name and the color is not the same.

In [31]:
# Using zip method.
dict1 = {}
for fruit, color in zip(fruits,colors):
    if fruit != color:
        dict1[fruit] = color
print(dict1)

{'apple': 'red', 'banana': 'yellow', 'blackburry': 'black'}


In [32]:
# Using dictionary comprehension.
dict2 = {fruit:color for fruit,color in zip (fruits,colors) if fruit != color}
print(dict2)

{'apple': 'red', 'banana': 'yellow', 'blackburry': 'black'}


##### Let's practice sets and set comprehension.

Sets are like lists except the sets contains only unique values.

8. Create the set of the below list.

In [33]:
numbers = [1,1,2,2,2,3,3,3,3,4,4,5,6,7,8,8,8,9,4,6,5]

In [35]:
# Using for loop.
set1 = set()
for num in numbers:
    set1.add(num)
print(set1)

{1, 2, 3, 4, 5, 6, 7, 8, 9}


In [36]:
# Using set comprehension.
set2 = {n for n in numbers}
print(set2)

{1, 2, 3, 4, 5, 6, 7, 8, 9}


Feel free to directly send me messages on LinkedIn (Shahrad Shakerian) and ask your questions for getting a job as a data scientist with any kind of background and experience you have.