[source:] https://www.analyticsvidhya.com/blog/2016/01/python-tutorial-list-comprehension-examples/

## For vs List Comprehension

```
for (set of values to iterate):
  if (conditional filtering): 
    output_expression()
    
 ```

```
 [ output_expression() for(set of values to iterate) if(conditional filtering) ]
 ```

### Example: { x: x is a natural number less than or equal to 100, x is a perfect square }


```
for i in range(1,101):     #the iterator
   if int(i**0.5)==i**0.5: #conditional filtering
     print(i)               #output-expression
```

[i for i in range(1,101) if int(i**0.5)==i**0.5]

In [3]:
for i in range(1,101):     #the iterator
   if int(i**0.5)==i**0.5: #conditional filtering
     print(i)               #output-expression

1
4
9
16
25
36
49
64
81
100


In [4]:
[i for i in range(1,101) if int(i**0.5)==i**0.5]

[1, 4, 9, 16, 25, 36, 49, 64, 81, 100]

### Removing vowels from a sentence

In [6]:
def using_for(sentence):
    vowels = 'aeiou'
    filtered_list = []
    for l in sentence:
        if l not in vowels:
            filtered_list.append(l)
    return ''.join(filtered_list)


def using_lc(sentence):
    vowels = 'aeiou'
    return ''.join([l for l in sentence if l not in vowels])


sentence = 'hello this is a list comprehension exercise'
print("for-loop: " + using_for(sentence))
print("LC: " + using_lc(sentence))

for-loop: hll ths s  lst cmprhnsn xrcs
LC: hll ths s  lst cmprhnsn xrcs


### Comparing run time

In [10]:
def for_sqr(arr):
    res = []
    for i in arr:
        res.append(i**2)
    return res
%timeit for_sqr(range(1,11))

3.5 µs ± 51.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


In [11]:
def map_sqr(arr):
    return map(lambda: x**2, arr)
%timeit map_sqr(range(1,11))

412 ns ± 10.9 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


In [12]:
def lc_sqr(arr):
    return [i**2 for i in arr]
%timeit lc_sqr(range(1,11))

3.38 µs ± 134 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


#### For-loop should be preferred in cases where we need to simply iterate - LC is faster when simple expressions are involved - LC and MAP perform nearly the same in case of complex functions

In [13]:
def empty_for(arr):
    for i in arr:
        pass
%timeit empty_for(range(1,11))

399 ns ± 17.1 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


In [14]:
def empty_map(arr):
    map(lambda x: None,arr)
%timeit empty_map(range(1,11))

461 ns ± 17.1 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


In [15]:
def empty_lc(arr):
    [None for i in arr]
%timeit empty_lc(range(1,11))

The slowest run took 6.77 times longer than the fastest. This could mean that an intermediate result is being cached.
2.21 µs ± 2.02 µs per loop (mean ± std. dev. of 7 runs, 100000 loops each)


## ----

In [16]:
def x2_for(arr):
    for i in arr:
        i*2
%timeit x2_for(range(1,11))

672 ns ± 65.4 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


In [17]:
def x2_map(arr):
    map(lambda x: x*2,arr)
%timeit x2_map(range(1,11))

482 ns ± 27.7 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


In [18]:
def x2_lc(arr):
    [i*2 for i in arr]
%timeit x2_lc(range(1,11))

The slowest run took 7.35 times longer than the fastest. This could mean that an intermediate result is being cached.
2.52 µs ± 1.78 µs per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


## -- ML example --

### Load the dataset - skills.csv - [https://www.analyticsvidhya.com/wp-content/uploads/2016/01/skills.csv]

In [20]:
import pandas as pd
data = pd.read_csv("skills.csv")
data.head(10)

Unnamed: 0,personID,skills
0,1,cricket;tabletennis;football
1,2,tabletennis;badminton
2,3,tabletennis
3,4,cricket;tabletennis;volleyball
4,5,football;volleyball
5,6,tabletennis;football
6,7,cricket;volleyball;badminton
7,8,football;cricket
8,9,badminton;tabletennis;volleyball
9,10,cricket;volleyball


### To make this data usable in a predictive model, create a new column for each sport ans mark 1 or 0 depending on whether the person plays it or not

In [21]:
data['skills_list'] = data['skills'].apply(lambda x: x.split(';'))
data.head(10)

Unnamed: 0,personID,skills,skills_list
0,1,cricket;tabletennis;football,"[cricket, tabletennis, football]"
1,2,tabletennis;badminton,"[tabletennis, badminton]"
2,3,tabletennis,[tabletennis]
3,4,cricket;tabletennis;volleyball,"[cricket, tabletennis, volleyball]"
4,5,football;volleyball,"[football, volleyball]"
5,6,tabletennis;football,"[tabletennis, football]"
6,7,cricket;volleyball;badminton,"[cricket, volleyball, badminton]"
7,8,football;cricket,"[football, cricket]"
8,9,badminton;tabletennis;volleyball,"[badminton, tabletennis, volleyball]"
9,10,cricket;volleyball,"[cricket, volleyball]"


### We need a unique list of games to identify the different number of columns required

In [23]:
skills_unq = set()
skills_unq.update(sport for l in data['skills_list'] for sport in l)
print(skills_unq)

{'badminton', 'tabletennis', 'cricket', 'football', 'volleyball'}


### Generatinf a matrix using LC containg 5 columns with 0-1 tags corresponding to each sport

In [26]:
skills_unq = list(skills_unq)
sport_matrix = [ [1 if skill in row else 0 for skill in skills_unq] for row in data['skills_list']]
sport_matrix

[[0, 1, 1, 1, 0],
 [1, 1, 0, 0, 0],
 [0, 1, 0, 0, 0],
 [0, 1, 1, 0, 1],
 [0, 0, 0, 1, 1],
 [0, 1, 0, 1, 0],
 [1, 0, 1, 0, 1],
 [0, 0, 1, 1, 0],
 [1, 1, 0, 0, 1],
 [0, 0, 1, 0, 1],
 [1, 0, 0, 1, 0],
 [0, 0, 1, 1, 1],
 [0, 0, 0, 1, 1],
 [1, 0, 0, 0, 0],
 [0, 1, 1, 1, 0]]

### Convert this into a pandas dataframe

In [27]:
data = pd.concat([data, pd.DataFrame(sport_matrix, columns = skills_unq)], axis = 1)
data

Unnamed: 0,personID,skills,skills_list,badminton,tabletennis,cricket,football,volleyball
0,1,cricket;tabletennis;football,"[cricket, tabletennis, football]",0,1,1,1,0
1,2,tabletennis;badminton,"[tabletennis, badminton]",1,1,0,0,0
2,3,tabletennis,[tabletennis],0,1,0,0,0
3,4,cricket;tabletennis;volleyball,"[cricket, tabletennis, volleyball]",0,1,1,0,1
4,5,football;volleyball,"[football, volleyball]",0,0,0,1,1
5,6,tabletennis;football,"[tabletennis, football]",0,1,0,1,0
6,7,cricket;volleyball;badminton,"[cricket, volleyball, badminton]",1,0,1,0,1
7,8,football;cricket,"[football, cricket]",0,0,1,1,0
8,9,badminton;tabletennis;volleyball,"[badminton, tabletennis, volleyball]",1,1,0,0,1
9,10,cricket;volleyball,"[cricket, volleyball]",0,0,1,0,1


### -- Ex2 --

In [29]:
data2 = pd.DataFrame([1,2,3,4,5], columns=['number'])
data2

Unnamed: 0,number
0,1
1,2
2,3
3,4
4,5


In [31]:
deg=6
power_matrix = [ [i**p for p in range(2,deg+1) ] for i in data2['number'] ]  
power_matrix

[[1, 1, 1, 1, 1],
 [4, 8, 16, 32, 64],
 [9, 27, 81, 243, 729],
 [16, 64, 256, 1024, 4096],
 [25, 125, 625, 3125, 15625]]

In [32]:
cols = ['power_%d'%i for i in range(2,deg+1)]
data2 = pd.concat([data2, pd.DataFrame(power_matrix,columns=cols)],axis=1)
data2

Unnamed: 0,number,power_2,power_3,power_4,power_5,power_6
0,1,1,1,1,1,1
1,2,4,8,16,32,64
2,3,9,27,81,243,729
3,4,16,64,256,1024,4096
4,5,25,125,625,3125,15625
