# Iterable objects and representatives
This chapter focuses on iterable objects. We'll refresh the definition of iterable objects and explain, how to identify one. Next, we'll cover list comprehensions, which is a very special feature of Python programming language to define lists. Then, we'll recall how to combine several iterable objects into one. Finally, we'll cover how to create custom iterable objects using generators.

# 1. What are iterable objects?
## 1.1 enumerate()
Your task is, given a string, to define the function `retrieve_character_indices()` that creates a dictionary `character_indices`, where each key represents a unique character from the string and the corresponding value is a list containing the indices/positions of this letter in the string.

For example, passing the string `'ukulele'` to the `retrieve_character_indices()` function should result in the following output: `{'e': [4, 6], 'k': [1], 'l': [3, 5], 'u': [0, 2]}`.

For this task, you are not allowed to use any string methods!

### Instructions:
* Define the `for` loop that iterates over the characters in the string and their indices.
* Update the dictionary if the key already exists.
* Update the dictionary if the key is absent.

In [1]:
def retrieve_character_indices(string):
    character_indices = dict()
    # Define the 'for' loop
    for index, character in enumerate(string):
        # Update the dictionary if the key already exists
        if character in character_indices:
            character_indices[character].append(index)
        # Update the dictionary if the key is absent
        else:
            character_indices[character] = [index]
            
    return character_indices
  
print(retrieve_character_indices('enumerate an Iterable'))

{'e': [0, 4, 8, 15, 20], 'n': [1, 11], 'u': [2], 'm': [3], 'r': [5, 16], 'a': [6, 10, 17], 't': [7, 14], ' ': [9, 12], 'I': [13], 'b': [18], 'l': [19]}


__A little trick__: actually, you can pass an integer value to the `enumerate()` initializer. In this case, it will start to count from that value.

## 1.2 Iterators
Let's check your knowledge on Iterators!

As we discussed, all Iterables like `list`, `set`, or `dict` must have the associated Iterator. You are given the dictionary `pets` whose keys are Harry Potter characters and the values are the corresponding creature companions they had. Your task is to answer the set of questions regarding the Iterator created from the `pets` dictionary. Use the console to help you answer them!

#### Question:
What would be the second element of the Iterator created from the `pets` dictionary?
Possible Answers:
1. 'Harry'
2. 'Hermione'
3. 'Hedwig the owl'
4. 'Crookshanks the cat'

In [2]:
pets = {'Harry': 'Hedwig the owl', 'Hermione': 'Crookshanks the cat', 'Ron': 'Scabbers the rat'}

In [7]:
for i, pet in enumerate(pets):
    print(str(i+1)+': '+pet)

1: Harry
2: Hermione
3: Ron


#### Question
Assuming that you retrieved the Iterator from the `pets` dictionary and called the `next()` function on it twice, what will be the output when you convert the Iterator to a list?
Possible Answers:
1. `['Ron']`
2. `[]`
3. `StopIteration` error is raised
4. `['Hermione', 'Ron']`
5. `['Harry', 'Hermione', 'Ron']`

In [9]:
iterator = iter(pets)
next(iterator)
next(iterator)
list(iterator)

['Ron']

#### Question
Assuming that you retrieved the Iterator from the `pets` dictionary and converted it to a list, what will be the output if you call the `next()` function on it?
Possible Answers:
1. `'Ron'`
2. `'Hermione'`
3. `'Harry'`
4. `StopIteration` error is raised

In [10]:
iterator_list = list(iter(pets))
next(iterator_list)

TypeError: 'list' object is not an iterator

Correct! The Iterator does not contain any more elements to go through after converting it to a list.

## 1.3 Traversing a DataFrame
Let's iterate through a DataFrame! You are given the `heroes` DataFrame you're already familiar with. This time, it contains only categorical data and no missing values. You have to create the following dictionary from this dataset:

* Each key is a column name.
* Each value is another dictionary:
    * Each key is a unique category from the column.
    * Each value is the amount of heroes falling into this category.
    
Tip: a `Series` object is also an Iterable. It traverses through the values it stores when you put it in a `for` loop or pass it to `list()`, `tuple()`, or `set()` initializers.

### Instructions:
* Traverse through the columns in the `heroes` DataFrame.
* Retrieve the values stored in `series` in a list form.
* Traverse through unique categories in `values`.
* Count the appearance of `category` in `values`.

In [19]:
import pandas as pd

# Uplodating the data
heroes = pd.read_csv('_datasets/heroes_information.csv', index_col = 1, na_values = ("-", -99))
heroes = heroes[['Gender', 'Eye color', 'Race', 'Hair color', 'Publisher', 'Skin color', 'Alignment']]
heroes = heroes.dropna()
heroes.head()

Unnamed: 0_level_0,Gender,Eye color,Race,Hair color,Publisher,Skin color,Alignment
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
Abe Sapien,Male,blue,Icthyo Sapien,No Hair,Dark Horse Comics,blue,good
Abin Sur,Male,blue,Ungaran,No Hair,DC Comics,red,good
Apocalypse,Male,red,Mutant,Black,Marvel Comics,grey,bad
Archangel,Male,blue,Mutant,Blond,Marvel Comics,blue,good
Ardina,Female,white,Alien,Orange,Marvel Comics,gold,good


In [20]:
column_counts = dict()

# Traverse through the columns in the heroes DataFrame
for column_name, series in heroes.iteritems():
    # Retrieve the values stored in series in a list form
    values = list(series)
    category_counts = dict()  
    # Traverse through unique categories in values
    for category in set(values):
        # Count the appearance of category in values
        category_counts[category] = values.count(category)
    
    column_counts[column_name] = category_counts
    
print(column_counts)

{'Gender': {'Female': 13, 'Male': 46}, 'Eye color': {'red': 16, 'brown': 1, 'purple': 1, 'green': 10, 'white': 8, 'yellow (without irises)': 1, 'yellow': 5, 'grey': 1, 'gold': 1, 'black': 4, 'blue': 11}, 'Race': {'Ungaran': 1, 'Mutant': 11, 'Demon': 2, 'God / Eternal': 3, 'Human': 8, 'New God': 2, 'Martian': 1, 'Strontian': 1, 'Human / Radiation': 3, 'Frost Giant': 1, 'Talokite': 1, 'Kakarantharaian': 1, "Yoda's species": 1, 'Metahuman': 1, 'Human-Kree': 1, 'Zen-Whoberian': 1, 'Human / Cosmic': 2, 'Icthyo Sapien': 1, 'Eternal': 1, 'Bizarro': 1, 'Alien': 4, 'Bolovaxian': 1, 'Luphomoid': 1, 'Czarnian': 1, 'Tamaranean': 1, 'Human / Altered': 1, 'Inhuman': 1, 'Korugaran': 1, 'Android': 3, 'Neyaphem': 1}, 'Hair color': {'Black': 14, 'Green': 3, 'Silver': 1, 'White': 4, 'Magenta': 1, 'Blue': 2, 'Purple': 1, 'Auburn': 1, 'Red': 2, 'Red / Orange': 1, 'Blond': 2, 'No Hair': 25, 'Orange': 1, 'Brown': 1}, 'Publisher': {'DC Comics': 22, 'IDW Publishing': 2, 'George Lucas': 2, 'Dark Horse Comics': 

# 2. What is a list comprehension?
## 2.1 Basic list comprehensions
For this task, you will have to create a bag-of-words representation of the spam email stored in the `spam` variable (you can explore the content using the shell). Recall that bag-of-words is simply a counter of unique words in a given text. This representation can be further used for text classification, e.g. for spam detection (given enough training examples).

We created a small auxiliary function `create_word_list()` to help you split a string into words, e.g. applying it to `'To infinity... and beyond!'` will return `['To', 'infinity', 'and', 'beyond']`.

In [29]:
import re

# A supplementary function that creates a word list from a string
def create_word_list(string):
  
    # Finding all the words
    pattern = re.compile(r'\w+')
    words = re.findall(pattern, string)
        
    return words

spam = "Dear User,\n\nOur Administration Team needs to inform you that you are reaching the storage limit of your Mailbox account.\nYou have to verify your account within the next 24 hours.\nOtherwise, it will not be possible to use the service.\nPlease, click on the link below to verify your account and continue using our service.\n\nYour Administration Team."

print(spam)

Dear User,

Our Administration Team needs to inform you that you are reaching the storage limit of your Mailbox account.
You have to verify your account within the next 24 hours.
Otherwise, it will not be possible to use the service.
Please, click on the link below to verify your account and continue using our service.

Your Administration Team.


### Instructions:
* Convert the text to lower case and create a word list.
* Create a set that will store only unique words from the list.
* Using list comprehension, create a dictionary that counts a word appearance in the `word` list.
* Print words that appear in the `word_counter` more than once.

In [31]:
# Convert the text to lower case and create a word list
words = create_word_list(spam.lower())

# Create a set storing only unique words
word_set = set(words)

# Create a dictionary that counts each word in the list
tuples = [(word, words.count(word)) for word in word_set]
word_counter = dict(tuples)

# Printing words that appear more than once
for (key, value) in word_counter.items():
    if value > 1:
        print("{}: {}".format(key, value))

team: 2
administration: 2
our: 2
service: 2
to: 4
the: 4
verify: 2
your: 4
account: 3
you: 3


## 2.2 Prime number sequence
A prime number is a positive number that is divisible by 1 or itself (e.g. 3, 7, 11 etc.). However, 1 is not a prime number.

Your task is, given a list of candidate numbers `cands`, to filter only prime numbers in a new list `primes`.

But first, you need to create a function `is_prime()` that returns `True` if the input number $n$ is prime or `False`, otherwise. A number is prime if it is not divisible by any integer number from 2 to $\sqrt{n}$ (any number $n$ is not divisible by anything higher than $\sqrt{n}$.

Tip: you might need to use the `%` operator that calculates a remainder from a division (e.g. `8 % 3` is `2`).

### Instructions 1/2:
Define the initial check: numbers lower than 2 are not prime.
Define the loop checking if the number `n` is not prime.

In [34]:
import math

def is_prime(n):
    # Define the initial check
    if n < 2:
        return False
    # Define the loop checking if a number is not prime
    for i in range(2, int(math.sqrt(n)) + 1):
        if n % i == 0:
            return False
    return True

In [35]:
cands = [1, 5, 9, 13, 17, 21, 25, 29, 33, 37, 41, 45, 49]

# Filter prime numbers into the new list
primes = [num for num in cands if is_prime(num)]

print("primes = " + str(primes))

primes = [5, 13, 17, 29, 37, 41]


## 2.3 Coprime number sequence
Two numbers $a$ and $b$ are coprime if their Greatest Common Divisor (GCD) is 1. GCD is the largest positive number that divides two given numbers $a$ and $b$. For example, the numbers 7 and 9 are coprime because their GCD is 1.

Given two lists `list1` and `list2`, your task is to create a new list `coprimes` that contains all the coprime pairs from `list1` and `list2`.

But first, you need to write a function for the GCD using the following algorithm:

1. check if $b=0$
    * if true, return $a$ as the GCD between $a$ and $b$
    * if false, go to step 2
2. make a substitution $a←b$ and ${b}\leftarrow{a\%b}$
3. go back to step 1

### Instructions 1/2:
* Define the while loop as described in the context.
* Complete the return statement.

In [36]:
def gcd(a, b):
    # Define the while loop as described
    while b!=0:
        temp_a = a
        a = b
        b = temp_a % a 
    # Complete the return statement
    return a

In [37]:
list1 = [5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70]
list2 = [7, 14, 21, 28, 35, 42, 49, 56, 63, 70, 77, 84, 91, 98]

# Create a list of tuples defining pairs of coprime numbers
coprimes = [(i, j) for i in list1 for j in list2 if gcd(i, j)==1]
print(coprimes)

[(5, 7), (5, 14), (5, 21), (5, 28), (5, 42), (5, 49), (5, 56), (5, 63), (5, 77), (5, 84), (5, 91), (5, 98), (10, 7), (10, 21), (10, 49), (10, 63), (10, 77), (10, 91), (15, 7), (15, 14), (15, 28), (15, 49), (15, 56), (15, 77), (15, 91), (15, 98), (20, 7), (20, 21), (20, 49), (20, 63), (20, 77), (20, 91), (25, 7), (25, 14), (25, 21), (25, 28), (25, 42), (25, 49), (25, 56), (25, 63), (25, 77), (25, 84), (25, 91), (25, 98), (30, 7), (30, 49), (30, 77), (30, 91), (40, 7), (40, 21), (40, 49), (40, 63), (40, 77), (40, 91), (45, 7), (45, 14), (45, 28), (45, 49), (45, 56), (45, 77), (45, 91), (45, 98), (50, 7), (50, 21), (50, 49), (50, 63), (50, 77), (50, 91), (55, 7), (55, 14), (55, 21), (55, 28), (55, 42), (55, 49), (55, 56), (55, 63), (55, 84), (55, 91), (55, 98), (60, 7), (60, 49), (60, 77), (60, 91), (65, 7), (65, 14), (65, 21), (65, 28), (65, 42), (65, 49), (65, 56), (65, 63), (65, 77), (65, 84), (65, 98)]


Writing an algorithm to find the greatest common divisor is also one of the most popular coding interview questions. Now you know how to proceed! By the way, to impress interviewers, you can substitute lines 4-6 with just one line of code `a, b = b, a % b`.

In [38]:
def gcd(a, b):
    # Define the while loop as described
    while b!=0:
        a, b = b, a % b
    # Complete the return statement
    return a