# Contents:
## 1. Iterators
## 2. List comprehensions and generators
## 3. Bringing it all together

## 1. Iterators

#### Enumerators
Here, you have 2 option: 
1. Use `enumerate` along with `list()` 
2. Use `enumerate()` to create a zip object and then pass it to a `for` loop

In [5]:
# Create a list of strings: mutants
mutants = ['charles xavier', 
            'bobby drake', 
            'kurt wagner', 
            'max eisenhardt', 
            'kitty pryde']

# Create a list of tuples: mutant_list
mutant_list = list(enumerate(mutants))

# Print the list of tuples
print(mutant_list) # first result
print("   ")

# Unpack and print the tuple pairs
for index1,value1 in enumerate(mutants):
    print(index1, value1) # Second result

print("   ")

# Change the start index
for index2,value2 in list(enumerate(mutants,start = 1)):
    print(index2, value2) # thrid result


[(0, 'charles xavier'), (1, 'bobby drake'), (2, 'kurt wagner'), (3, 'max eisenhardt'), (4, 'kitty pryde')]
   
0 charles xavier
1 bobby drake
2 kurt wagner
3 max eisenhardt
4 kitty pryde
   
1 charles xavier
2 bobby drake
3 kurt wagner
4 max eisenhardt
5 kitty pryde


#### Zip Object
Here, you have 2 option: 
1. Use `zip` along with `list()` 
2. Use `zip()` to create a zip object and then pass it to a `for` loop

In [10]:
aliases = ['prof x', 'iceman', 'nightcrawler', 'magneto', 'shadowcat']

powers = ['telepathy',
 'thermokinesis',
 'teleportation',
 'magnetokinesis',
 'intangibility']

# Create a list of tuples: mutant_data
mutant_data = list(zip(mutants,aliases,powers))

# Print the list of tuples
print(mutant_data)
print("  ")

# Create a zip object using the three lists: mutant_zip
mutant_zip = zip(mutants,aliases,powers)

# Print the zip object
print(mutant_zip)
print("  ")

# Unpack the zip object and print the tuple values
for value1, value2, value3 in mutant_zip:
    print(value1, value2, value3)

[('charles xavier', 'prof x', 'telepathy'), ('bobby drake', 'iceman', 'thermokinesis'), ('kurt wagner', 'nightcrawler', 'teleportation'), ('max eisenhardt', 'magneto', 'magnetokinesis'), ('kitty pryde', 'shadowcat', 'intangibility')]
  
<zip object at 0x103092488>
  
charles xavier prof x telepathy
bobby drake iceman thermokinesis
kurt wagner nightcrawler teleportation
max eisenhardt magneto magnetokinesis
kitty pryde shadowcat intangibility


#### Using * to 'Unzip' a zipped object

In [14]:
# Create a zip object from mutants and powers: z1
z1 = zip(mutants,powers)

# Print the tuples in z1 by unpacking with *
print(*z1)
print("  ")

# Re-create a zip object from mutants and powers: z1
z1 = zip(mutants,powers)

# 'Unzip' the tuples in z1 by unpacking with * and zip(): result1, result2
result1, result2 = zip(*z1)

# Check if unpacked tuples are equivalent to original tuples
print(result1 == mutants)
print(result2 == powers)
# although the result for appears "False", it should be "True" in theory
# need to check what went wrong
# after unpacking using zip, results 1 == mutants and results2 == powers
# try printing the 2 results separately to verify

('charles xavier', 'telepathy') ('bobby drake', 'thermokinesis') ('kurt wagner', 'teleportation') ('max eisenhardt', 'magnetokinesis') ('kitty pryde', 'intangibility')
  
False
False


In [15]:
print(result1)

('charles xavier', 'bobby drake', 'kurt wagner', 'max eisenhardt', 'kitty pryde')


In [17]:
print(result2)

('telepathy', 'thermokinesis', 'teleportation', 'magnetokinesis', 'intangibility')


### Working with Chunks

In [21]:
import pandas as pd
tweets = pd.read_csv("https://assets.datacamp.com/production/repositories/464/datasets/82e9842c09ad135584521e293091c2327251121d/tweets.csv")

# Creating a string object with file location
tweetstr = 'https://assets.datacamp.com/production/repositories/464/datasets/82e9842c09ad135584521e293091c2327251121d/tweets.csv'


In [20]:
worldbank = pd.read_csv("https://assets.datacamp.com/production/repositories/464/datasets/2175fef4b3691db03449bbc7ddffb740319c1131/world_ind_pop_data.csv")

In [22]:
# Initialize an empty dictionary: counts_dict
counts_dict = {}

# Iterate over the file chunk by chunk
for chunk in pd.read_csv(tweetstr,chunksize=10):

    # Iterate over the column in DataFrame
    for entry in chunk['lang']:
        if entry in counts_dict.keys():
            counts_dict[entry] += 1
        else:
            counts_dict[entry] = 1

# Print the populated dictionary
print(counts_dict)


{'en': 97, 'et': 1, 'und': 2}


In [23]:
## Creating a generic function for the above
# Define count_entries()
def count_entries(csv_file,c_size,colname):
    """Return a dictionary with counts of
    occurrences as value for each key."""
    
    # Initialize an empty dictionary: counts_dict
    counts_dict = {}

    # Iterate over the file chunk by chunk
    for chunk in pd.read_csv(csv_file,chunksize = c_size) :

        # Iterate over the column in DataFrame
        for entry in chunk[colname]:
            if entry in counts_dict.keys():
                counts_dict[entry] += 1
            else:
                counts_dict[entry] = 1

    # Return counts_dict
    return counts_dict

# Call count_entries(): result_counts
result_counts = count_entries(tweetstr,10,'lang')

# Print result_counts
print(result_counts)


{'en': 97, 'et': 1, 'und': 2}


## List Comprehensions
List comprehensions
- Collapse for loops for building lists into a single line


- Components of list comprehensions:
    1. Iterable
    2. Iterator variable (represent members of iterable)
    3. Output expression



In [25]:
# Basic syntax for list comprehensions
# Create list comprehension: squares
squares = [i**2 for i in range(0,10)]
squares

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

#### Nested list comprehensions

To create the list of lists, you simply have to supply the list comprehension as the output expression of the overall list comprehension:

`[[output expression] for iterator variable in iterable]`

Note that here, the output expression is itself a list comprehension.

In [26]:
# Create a 5 x 5 matrix using a list of lists: matrix
matrix = [[col for col in range(5)] for row in range(5)]

# Print the matrix
for row in matrix:
    print(row)

[0, 1, 2, 3, 4]
[0, 1, 2, 3, 4]
[0, 1, 2, 3, 4]
[0, 1, 2, 3, 4]
[0, 1, 2, 3, 4]


#### Conditionals in list comprehensions
An interesting mechanism in list comprehensions is that you can also create lists with values that meet only a certain condition. One way of doing this is by using conditionals on iterator variables. 

We can apply a conditional statement to test the iterator variable by adding an if statement in the optional predicate expression part after the for statement in the comprehension:

```python
[ output_expression for iterator_variable in iterable if predicate_expression ]

```

##### Just if statement

In [28]:
# Create a list of strings: fellowship
fellowship = ['frodo', 'samwise', 'merry', 'aragorn', 'legolas', 'boromir', 'gimli']

# Create list comprehension: new_fellowship
# we only want members with 7 characters or more in their names
new_fellowship = [member for member in fellowship if len(member) >= 7 ]

# Print the new list
print(new_fellowship)

['samwise', 'aragorn', 'legolas', 'boromir']


##### With if-else statement

In [32]:
# Create a list of strings: fellowship
fellowship = ['frodo', 'samwise', 'merry', 'aragorn', 'legolas', 'boromir', 'gimli']

# Create list comprehension: new_fellowship
# the goal is return member name if member has 7 chars or morer
# OR rerturn blank string otherwise
new_fellowship = [member if len(member) >= 7 else '' for member in fellowship]

# Print the new list
print(new_fellowship)

['', 'samwise', '', 'aragorn', 'legolas', 'boromir', '']


### Dict Comprehensions

In [33]:
# Create a list of strings: fellowship
fellowship = ['frodo', 'samwise', 'merry', 'aragorn', 'legolas', 'boromir', 'gimli']

# Create dict comprehension: new_fellowship
new_fellowship = {member : len(member) for member in fellowship}

# Print the new dictionary
print(new_fellowship)

{'frodo': 5, 'samwise': 7, 'merry': 5, 'aragorn': 7, 'legolas': 7, 'boromir': 7, 'gimli': 5}


## Generators

A list comprehension returns a list. A generato returns an object that can iterated over.

Generator Expressions are somewhat similar to list comprehensions, but the former doesn’t construct list object.

Instead of creating a list and keeping the whole sequence in the memory, the generator generates the next element in demand.

When a normal function with a return statement is called, it terminates whenever it gets a return statement. But a function with a yield statement saves the state of the function and can be picked up from the same state, next time the function is called.

The Generator Expression allows us to create a generator without the yield keyword.

```python
# List of strings
fellowship = ['frodo', 'samwise', 'merry', 'aragorn', 'legolas', 'boromir', 'gimli']

# List comprehension
fellow1 = [member for member in fellowship if len(member) >= 7]

# Generator expression
fellow2 = (member for member in fellowship if len(member) >= 7)
```

In [34]:
# Create generator object: result
result = (num for num in range(31))

# Print the first 5 values
print(next(result))
print(next(result))
print(next(result))
print(next(result))
print(next(result))

# Print the rest of the values
for value in result:
    print(value)

0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30


#### Generartor functions

Not only are there generator expressions, there are generator functions as well. 
Generator functions are functions that, like generator expressions, yield a series of values, instead of returning a single value. 
A generator function is defined as you do a regular function, but whenever it generates a value, it uses the keyword yield instead of return

In [35]:
# Create a list of strings
lannister = ['cersei', 'jaime', 'tywin', 'tyrion', 'joffrey']

# Define generator function get_lengths
def get_lengths(input_list):
    """Generator function that yields the
    length of the strings in input_list."""

    # Yield the length of a string
    for person in input_list:
        yield len(person)

# Print the values generated by get_lengths()
for value in get_lengths(lannister):
    print(value)

6
5
5
6
7


## Bringing it all Together

Below we have to create a list comprehension that extracts the time from the time column.
characters 12 to 19 represent time.
we also add a condition where char 17 to 19 should be equal to '19'

In [38]:
# Extract the created_at column from df: tweet_time
tweet_time = tweets['created_at']

# Extract the clock time: tweet_clock_time
tweet_clock_time = [entry[11:19] for entry in tweet_time if entry[17:19] == '19']

# Print the extracted times
print(tweet_clock_time)


['23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19']
