## Python Data Scient Toolbox Part 2

## Chapter 2 - List Comprehensions

### List Comprehensions

For loops are often inefficient both computationally and for coding time and space and using a list comprehension, it is often possible to do the same work in a single line of code. The syntax for a list comprehension is to write the value within square brackets you write the values you to create, this is referred to as the output expression. This is followed by the for clause that refers to the original list. 

In [2]:
#Populate a list with a loop
nums = [12, 8, 21, 3, 16]

#Add a 1 to each element in nums and create a new list
num_1 = []
for num in nums:
    num_1.append(num + 1)
print(num_1)

# A List Comprehension
new_nums = [num + 1 for num in nums]
print(new_nums)

[13, 9, 22, 4, 17]
[13, 9, 22, 4, 17]


#### For Loop and List Comprehension Syntax Comparison
Notice the similarities between the For Loop syntax and the List Comprehension syntax:
![forlooplistcompresyntax.png](attachment:forlooplistcompresyntax.png)

A list comprehension is not limited to lists, a list comprehension can be written over any iterable. Here is an example of a list comprehension using a range.

In [3]:
result = [num for num in range(11)]
print(result)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]


#### Nested loops (1)

List comprehensions can be used in place of nested for loops. Below is an example of a nested for loop to create a list of tuples from two ranges.

In [5]:
pairs_1 = []

for num1 in range(0,2):
    for num2 in range(6,8):
        pairs_1.append((num1, num2))

print(pairs_1)

[(0, 6), (0, 7), (1, 6), (1, 7)]


This same nested for loop can be written as a list generator. Within the square brackets, place the desired output expression, in this case the numbers from each of the ranges, followed by both of the required for loop clauses. In the example below, while the code is a single line, it reduces the reability of the code. 

In [8]:
pairs_2 = [(num1, num2) for num1 in range(0,2) for num2 in range(6,8)]
print(pairs_2)

[(0, 6), (0, 7), (1, 6), (1, 7)]


### Exercise 1

#### Writing list comprehensions
You now have all the knowledge necessary to begin writing list comprehensions! Your job in this exercise is to write a list comprehension that produces a list of the squares of the numbers ranging from 0 to 9.

__Instructions:__
* Using the range of numbers from 0 to 9 as your iterable and i as your iterator variable, write a list comprehension that produces a list of numbers consisting of the squared values of i.

In [1]:
# Create list comprehension: squares
squares = [i**2 for i in range(0,10)]
print(squares)

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]


#### Nested list comprehensions
Great! At this point, you have a good grasp of the basic syntax of list comprehensions. Let's push your code-writing skills a little further. In this exercise, you will be writing a list comprehension within another list comprehension, or nested list comprehensions. It sounds a little tricky, but you can do it!

Let's step aside for a while from strings. One of the ways in which lists can be used are in representing multi-dimension objects such as matrices. Matrices can be represented as a list of lists in Python. For example a 5 x 5 matrix with values 0 to 4 in each row can be written as:

![matrix.png](attachment:matrix.png)

Your task is to recreate this matrix by using nested listed comprehensions. Recall that you can create one of the rows of the matrix with a single list comprehension. To create the list of lists, you simply have to supply the list comprehension as the output expression of the overall list comprehension:

[[output expression] for iterator variable in iterable]

Note that here, the output expression is itself a list comprehension.

In [2]:
# Create a 5 x 5 matrix using a list of lists: matrix
matrix = [[col for col in range(0,5)] for row in range(0,5)]

# Print the matrix
for row in matrix:
    print(row)

[0, 1, 2, 3, 4]
[0, 1, 2, 3, 4]
[0, 1, 2, 3, 4]
[0, 1, 2, 3, 4]
[0, 1, 2, 3, 4]


### Advanced Comprehensions

#### Conditionals in Comprehensions

By placing a conditional statement on the iterable, we are able to filter the output of a list comprehension. In the example below, the output is the square of the value in range 10 under the condition that the value itself is even. 

In [14]:
[num **2 for num in range(10) if num % 2 == 0]

[0, 4, 16, 36, 64]

We can also put a conditional statement on the output expression. In this example, for an even integer we output its square, else we output 0 for a range of 10.

In [16]:
[num ** 2 if num % 2 == 0 else 0 for num in range (10)]

[0, 0, 4, 0, 16, 0, 36, 0, 64, 0]

#### Dictionary Comprehensions

We can also write dictionary comprehensions to create new dictionaries from iterables. The syntax is mostly the same, with these differences:
Use curly braces {} instead of square brackets []
The key and value are separated by a colon (:) in the output expression

In this example, we are creating a dictionary with positive numbers as the keys and the corresponding negative number as the value. 

In [17]:
pos_neg = {num: -num for num in range(9)}
print(pos_neg)
print(type(pos_neg))

{0: 0, 1: -1, 2: -2, 3: -3, 4: -4, 5: -5, 6: -6, 7: -7, 8: -8}
<class 'dict'>


### Exercise 2

#### Using conditionals in comprehensions (1)
You've been using list comprehensions to build lists of values, sometimes using operations to create these values.

An interesting mechanism in list comprehensions is that you can also create lists with values that meet only a certain condition. One way of doing this is by using conditionals on iterator variables. In this exercise, you will do exactly that!

Recall from the video that you can apply a conditional statement to test the iterator variable by adding an if statement in the optional predicate expression part after the for statement in the comprehension:

> [ output expression for iterator variable in iterable if predicate expression ].

You will use this recipe to write a list comprehension for this exercise. You are given a list of strings fellowship and, using a list comprehension, you will create a list that only includes the members of fellowship that have 7 characters or more.

__Instructions:__
* Use member as the iterator variable in the list comprehension. For the conditional, use len() to evaluate the iterator variable. Note that you only want strings with 7 characters or more.

In [3]:
# Create a list of strings: fellowship
fellowship = ['frodo', 'samwise', 'merry', 'aragorn', 'legolas', 'boromir', 'gimli']

# Create list comprehension: new_fellowship
new_fellowship = [member for member in fellowship if len(member)>=7]

# Print the new list
print(new_fellowship)

['samwise', 'aragorn', 'legolas', 'boromir']


#### Using conditionals in comprehensions (2)
In the previous exercise, you used an if conditional statement in the predicate expression part of a list comprehension to evaluate an iterator variable. In this exercise, you will use an if-else statement on the output expression of the list.

You will work on the same list, fellowship and, using a list comprehension and an if-else conditional statement in the output expression, create a list that keeps members of fellowship with 7 or more characters and replaces others with an empty string. Use member as the iterator variable in the list comprehension.

__Instructions:__
* In the output expression, keep the string as-is if the number of characters is >= 7, else replace it with an empty string - that is, '' or "".

In [4]:
# Create a list of strings: fellowship
fellowship = ['frodo', 'samwise', 'merry', 'aragorn', 'legolas', 'boromir', 'gimli']

# Create list comprehension: new_fellowship
new_fellowship = [member if len(member) >= 7  else "" for member in fellowship]

# Print the new list
print(new_fellowship)

['', 'samwise', '', 'aragorn', 'legolas', 'boromir', '']


#### Dict comprehensions
Comprehensions aren't relegated merely to the world of lists. There are many other objects you can build using comprehensions, such as dictionaries, pervasive objects in Data Science. You will create a dictionary using the comprehension syntax for this exercise. In this case, the comprehension is called a dict comprehension.

Recall that the main difference between a list comprehension and a dict comprehension is the use of curly braces {} instead of []. Additionally, members of the dictionary are created using a colon :, as in <key> : <value>.

You are given a list of strings fellowship and, using a dict comprehension, create a dictionary with the members of the list as the keys and the length of each string as the corresponding values.

__Instructions:__
* Create a dict comprehension where the key is a string in fellowship and the value is the length of the string. Remember to use the syntax <key> : <value> in the output expression part of the comprehension to create the members of the dictionary. Use member as the ite

In [5]:
# Create a list of strings: fellowship
fellowship = ['frodo', 'samwise', 'merry', 'aragorn', 'legolas', 'boromir', 'gimli']

# Create dict comprehension: new_fellowship
new_fellowship = {member:len(member) for member in fellowship}

# Print the new dictionary
print(new_fellowship)

{'frodo': 5, 'samwise': 7, 'merry': 5, 'aragorn': 7, 'legolas': 7, 'boromir': 7, 'gimli': 5}


### Introduction to Generator Expressions

#### Generate Expressions

While list comprehension objects are created with square brackets, using the same syntax as a list comprehension (output statement for value in an iterator) with parenthesis will create a generator object.

In [23]:
(2*num for num in range(10))

<generator object <genexpr> at 0x0000025560A288C8>

#### List Comprehensions vs Generators

A generator objects are like a list comprehension, but a list comprehension creates a list that is stored in memory and a generator returns a generator object that can be iterated over to produce elements of the list as required. Here we can see that looping over a generator expression produced the elements of the analogous list.

In [24]:
result = (num for num in range(6))
for num in result:
    print(num)

0
1
2
3
4
5


We can also pass a generator to the function list() to create a list. And, like any other iterator, we can pass a generator to the function next() to iterate through it's elements. This is an example of something called Lazy Evaluation where the evaluation of an expression is delayed until the value is needed. This is very useful when working with extremely large sequences where you don't want to store the entire list in memory, which is what comprehensions do. Instead, you want to generate elements of the sequence on the fly. 

In [28]:
result = (num for num in range(6))
print(list(result))


result = (num for num in range(6,11))
print(next(result))
print(next(result))
print(next(result))
print(next(result))
print(next(result))

[0, 1, 2, 3, 4, 5]
6
7
8
9
10


#### Conditionals in Generator Expressions

Anything we can do in a list comprehension such as filtering and applying conditional statements can also be done to generator expressions.

In [30]:
even_nums = (num for num in range(10) if num % 2 == 0)
print(list(even_nums))

[0, 2, 4, 6, 8]


#### Generator Functions

Generator functions are functions that produce generator objects when called. These functions are defined using the def keyword like regular functions, but rather than returning a single value, they yield a sequence of values using they keyword yield.

In [35]:
def num_sequence(n):
    """Generate values from 0 to n"""
    i = 0
    while i < n:
        yield i
        i += 1

result = num_sequence(10)

for item in result:
    print(item)

0
1
2
3
4
5
6
7
8
9


### Exercise 3
You've seen from the videos that list comprehensions and generator expressions look very similar in their syntax, except for the use of parentheses () in generator expressions and brackets [] in list comprehensions.

> '# List of strings <br>
> fellowship = ['frodo', 'samwise', 'merry', 'aragorn', 'legolas', 'boromir', 'gimli']

> '# List comprehension <br>
fellow1 = [member for member in fellowship if len(member) >= 7]

>'# Generator expression <br>
> fellow2 = (member for member in fellowship if len(member) >= 7)


#### Write your own generator expressions
You are familiar with what generators and generator expressions are, as well as its difference from list comprehensions. In this exercise, you will practice building generator expressions on your own.

Recall that generator expressions basically have the same syntax as list comprehensions, except that it uses parentheses () instead of brackets []; this should make things feel familiar! Furthermore, if you have ever iterated over a dictionary with .items(), or used the range() function, for example, you have already encountered and used generators before, without knowing it! When you use these functions, Python creates generators for you behind the scenes.

Now, you will start simple by creating a generator object that produces numeric values.

__Instructions:__
* Create a generator object that will produce values from 0 to 30. Assign the result to result and use num as the iterator variable in the generator expression.
* Print the first 5 values by using next() appropriately in print().
* Print the rest of the values by using a for loop to iterate over the generator object.

In [6]:
# Create generator object: result
result = (num for num in range(31))

# Print the first 5 values
print(next(result))
print(next(result))
print(next(result))
print(next(result))
print(next(result))

# Print the rest of the values
for value in result:
    print(value)

0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30


#### Changing the output in generator expressions
Great! At this point, you already know how to write a basic generator expression. In this exercise, you will push this idea a little further by adding to the output expression of a generator expression. Because generator expressions and list comprehensions are so alike in syntax, this should be a familiar task for you!

You are given a list of strings lannister and, using a generator expression, create a generator object that you will iterate over to print its values.

__Instructions:__
* Write a generator expression that will generate the lengths of each string in lannister. Use person as the iterator variable. Assign the result to lengths.
* Supply the correct iterable in the for loop for printing the values in the generator object.

In [7]:
# Create a list of strings: lannister
lannister = ['cersei', 'jaime', 'tywin', 'tyrion', 'joffrey']

# Create a generator object: lengths
lengths = (len(person) for person in lannister)

# Iterate over and print the values in lengths
for value in lengths:
    print(value)

6
5
5
6
7


#### Build a generator
In previous exercises, you've dealt mainly with writing generator expressions, which uses comprehension syntax. Being able to use comprehension syntax for generator expressions made your work so much easier!

Now, recall from the video that not only are there generator expressions, there are generator functions as well. Generator functions are functions that, like generator expressions, yield a series of values, instead of returning a single value. A generator function is defined as you do a regular function, but whenever it generates a value, it uses the keyword yield instead of return.

In this exercise, you will create a generator function with a similar mechanism as the generator expression you defined in the previous exercise:

> lengths = (len(person) for person in lannister)

__Instructions:__
* Complete the function header for the function get_lengths() that has a single parameter, input_list.
* In the for loop in the function definition, yield the length of the strings in input_list.
* Complete the iterable part of the for loop for printing the values generated by the get_lengths() generator function. Supply the call to get_lengths(), passing in the list lannister.

In [8]:
# Create a list of strings
lannister = ['cersei', 'jaime', 'tywin', 'tyrion', 'joffrey']

# Define generator function get_lengths
def get_lengths(input_list):
    """Generator function that yields the
    length of the strings in input_list."""

    # Yield the length of a string
    for person in input_list:
        yield len(person)

# Print the values generated by get_lengths()
for value in get_lengths(lannister):
    print(value)

6
5
5
6
7


### Lecture - Wrapping Up Comprehensions and Generators

#### Recap: List Comprehensions
Basic - Structured in square brackets: [output expression for iterable variable in iterable]

Advanced - Can include conditionals on the output expression or the iterable: [output expression + conditional on output for iterator variable in iterable + condition on iterable.


### Exercise 4

#### List comprehensions for time-stamped data
You will now make use of what you've learned from this chapter to solve a simple data extraction problem. You will also be introduced to a data structure, the pandas Series, in this exercise. We won't elaborate on it much here, but what you should know is that it is a data structure that you will be working with a lot of times when analyzing data from pandas DataFrames. You can think of DataFrame columns as single-dimension arrays called Series.

In this exercise, you will be using a list comprehension to extract the time from time-stamped Twitter data. The pandas package has been imported as pd and the file 'tweets.csv' has been imported as the df DataFrame for your use.

__Instructions:__
* Extract the column 'created_at' from df and assign the result to tweet_time. Fun fact: the extracted column in tweet_time here is a Series data structure!
* Create a list comprehension that extracts the time from each row in tweet_time. Each row is a string that represents a timestamp, and you will access the 12th to 19th characters in the string to extract the time. Use entry as the iterator variable and assign the result to tweet_clock_time. Remember that Python uses 0-based indexing!

In [10]:
# Behind the scenes work to load data used in exercise
import os
os.chdir('c:\\datacamp\\data\\')

import pandas as pd
df = pd.read_csv('tweets.csv')

# Extract the created_at column from df: tweet_time
tweet_time = df['created_at']

# Extract the clock time: tweet_clock_time
tweet_clock_time = [entry[11:19] for entry in tweet_time]

# Print the extracted times
print(tweet_clock_time)

tweet_time = df['created_at']

tweet_clock_time = [entry[11:19] for entry in tweet_time]
print(tweet_clock_time)

['23:40:17', '23:40:17', '23:40:17', '23:40:17', '23:40:17', '23:40:17', '23:40:18', '23:40:17', '23:40:18', '23:40:18', '23:40:18', '23:40:17', '23:40:18', '23:40:18', '23:40:17', '23:40:18', '23:40:18', '23:40:17', '23:40:18', '23:40:17', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:17', '23:40:18', '23:40:18', '23:40:17', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:19', '23:40:18', '23:40:18', '23:40:18', '23:40:19', '23:40:19', '23:40:19', '23:40:18', '23:40:19', '23:40:19', '23:40:19', '23:40:18', '23:40:19', '23:40:19', '23:40:19', '23:40:18', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23

#### Conditional list comprehensions for time-stamped data
Great, you've successfully extracted the data of interest, the time, from a pandas DataFrame! Let's tweak your work further by adding a conditional that further specifies which entries to select.

In this exercise, you will be using a list comprehension to extract the time from time-stamped Twitter data. You will add a conditional expression to the list comprehension so that you only select the times in which entry[17:19] is equal to '19'. The pandas package has been imported as pd and the file 'tweets.csv' has been imported as the df DataFrame for your use.

__Instructions:__
* Extract the column 'created_at' from df and assign the result to tweet_time.
* Create a list comprehension that extracts the time from each row in tweet_time. Each row is a string that represents a timestamp, and you will access the 12th to 19th characters in the string to extract the time. Use entry as the iterator variable and assign the result to tweet_clock_time. Additionally, add a conditional expression that checks whether entry[17:19] is equal to '19'.

In [11]:
# Extract the created_at column from df: tweet_time
tweet_time = df['created_at']

# Extract the clock time: tweet_clock_time
tweet_clock_time = [entry[11:19] for entry in tweet_time if entry[17:19] == '19']

# Print the extracted times
print(tweet_clock_time)

['23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19']
