<a href="https://colab.research.google.com/github/aaron-abrams-uva/DS1002-S24/blob/main/Python/Les_iterables_student.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Iterables and Iterators


### University of Virginia
### DS 1002: Programming for Data Science
---  

### PREREQUISITES
- data types
- variables
- `for` loop

### SOURCES
- Iterable objects  
http://tutorial.eyehunts.com/python/python-iterable-object-lists-tuples-dictionaries-and-sets/


- Iterators  
https://www.geeksforgeeks.org/iterators-in-python/


### OBJECTIVES
- Define iterables and iterators
- Using two methods, show how iterators can be used to return data from sets, lists, strings, tuples, dicts:
  - `for` loops    
  - `iter()` and `next()`



### CONCEPTS

- `iterable objects` or `iterables`
- iterators
- iteration
- sequence
- collection


---

## I. Defining Iterables and Iterators

`Iterable objects` or `iterables` can return elements one at a time  

An `iterator` is an object that iterates over iterable objects such as sets, lists, tuples, dictionaries, and strings  

`Iteration` can be implemented:
- with a `for` loop
- with the `next()` method



## II. Misc

Other useful tools for iterating

**Ranges**

If you just want to iterate for a known number of times, use `range()`

In [None]:
for i in range(10) :
    print(str(i+1).zfill(2), (i+1)**2 * '|')

`zfill()` adds zeros to the front of a string until the it reaches the specified length

**Formatted String Literals (i.e. f-strings)**

Also called “formatted string literals,” f-strings are string literals that have an f at the beginning and curly braces containing expressions that will be replaced with their values. The expressions are evaluated at runtime and then formatted using the `__format__` protocol.

Example:

In [None]:
name = 'Ted Lasso'
age = 42

f"Hello, {name}. You are {age}."

**Get iteration number**  

Often you will want to know what iteration number you are in the loop, use `enumerate()`  

`enumerate()` returns the index and key for each iteration

In [None]:
ff_teams = {'NCSU' : 3, 'Iowa' : 1, 'South Carolina' : 1, 'Connecticut': 3}

enumerate(ff_teams)
#print(type(enumerate(ff_teams)))
#for x in enumerate(ff_teams): print (type(x))
#list(enumerate(ff_teams))



In [None]:
for school in ff_teams:
  print(f"School:  {school} \t Seed: {ff_teams[school]}")

print(50*'-')

for i, school in enumerate(ff_teams) :
   school_name = f"{str(i).zfill(2)}_{school}: \t Seed: {ff_teams[school]}"
   print(school_name)

## III. Sequences and Collections

Iterables: `list`, `str`, `tuple`, `set`, `dict`

Lists, tuples, dictionaries, and strings are `sequences`. Sequences are designed so that elements come out of them in the same order they were put in.

Sets are not sequences, since they don't keep elements in order.
They are called `collections`.  The ordering of the items is arbitrary.

NOTE: This has changed for dictionaries in Python 3.7:  

> the insertion-order preservation nature of dict objects has been declared to be an official part of the Python language spec.\
-- [What's New in Python 3.7](https://docs.python.org/3.7/whatsnew/3.7.html)

## IV. Lists

**iterating using `for`**

In [None]:
tokens = ['living room','was','quite','large']

for tok in tokens:
    print(tok)

**iterating using `iter()` and `next()`**

`iter()` - gets an iterator. Outputs a value each time it is used.
`next()` - gets the next item from the iterator

In [None]:
tokens = ['living room','was','quite','large']

myit = iter(tokens)
print(next(myit))
print(next(myit))
print(next(myit))
print(next(myit))

Calling `next()` when the iterator has reached the end of the list produces an exception:

In [None]:
print(next(myit))

Next, look at the type of the iterator, and the documentation

In [None]:
type(myit)

In [None]:
help(myit)

In [None]:
help(next)

**Note that `for` implicitly creates an iterator and executes `next()` on each loop iteration. This is best way to iterate through a list-like object.**

## V. Strings

**iterating using `for`**

In [None]:
strn = 'datum'

for s in strn:
    print(s)

**iterating using `iter()` and `next()`**

In [None]:
st = iter(strn)

print(next(st))
print(next(st))
print(next(st))
print(next(st))

## VI. Tuples

**iterating using `for` loop**

In [None]:
metrics = ('auc','recall','precision','support')

for met in metrics:
    print(met)

**iterating using `iter()` and `next()`**

In [None]:
metrics = ('auc','recall','precision','support')

tup_metrics = iter(metrics)
print(next(tup_metrics))
print(next(tup_metrics))
print(next(tup_metrics))
print(next(tup_metrics))

**`break` and `continue`**

In [None]:
for met in metrics:
    print(met)
    if met == 'precision':
      break
    print(met)

print(40*'-')

for met in metrics:
    print(met)
    if met == 'precision':
      continue
    print(met)


## VII. Dictionaries


**iterating using `for-loop`**

In [None]:
courses = {'fall':['regression','python'], 'spring':['capstone','pyspark','nlp']}

*Python's default is to interate over the keys*

In [None]:
# iterate over keys

for k in courses:
    print(k)

In [None]:
# iterate over keys, using keys() method

for k in courses.keys():
    print(k)

In [None]:
# iterate over values

for v in courses.values():
    print(v)

In [None]:
# iterate over keys and values using `items()`

for k, v in courses.items():
    print("key  :", k)
    print("value:", v)
    print("-"*40)

Alternatively, keys and values can be extracted from the dict by:
- looping over the keys
- extract the value by indexing into the dict with the key

In [None]:
# iterate over keys and values using `key()`.

for k in courses.keys():
    print("key  :", k)
    print("value:", courses[k]) # index into the dict with the key
    print("-"*40)

In [None]:
# iterate over keys and values using `key()`.
for k in courses.keys():
    print(f"{k}:\t{', '.join(courses[k])}") # index into the dict with the key


enumerate() will return the index, key for each row

In [None]:
for k in enumerate(courses):
    print(k)

## VIII. Sets

**iterating using `for`**  
note: set has no notion of order

In [None]:
princesses = {'belle','cinderella','rapunzel'}

for princess in princesses:
    print(princess)

**iterating using `iter()` and `next()`**

In [None]:
princesses = {'aaron','belle','cinderella','rapunzel'}

myset = iter(princesses)
print(next(myset))
print(next(myset))
print(next(myset))

In [None]:
enumerate(princesses)
list(enumerate(princesses))

---

### TRY FOR YOURSELF (UNGRADED EXERCISES)

1a) Create a list of strings, where each string contains a mix of uppercase and lowercase letters.  
Write a `for` loop` to iterate over the strings and:
- lowercase the string (hint: `lower()`)
- print the string

In [None]:
names = ['John', 'Paul', 'George', 'Ringo']


1b) Using the list from (1a), use `iter()` and `next()` to iterate over the list, printing each string.  
The strings don't need to be lowercased.

2a) Create a dictionary. Use a `for` loop with `items()` to print each key-value pair.

In [None]:
city_zip = {'Santa Barbara':93103, 'Charlottesville':22903}



2b) Using the dictionary from (2a), use a `for` loop with `key()` to print each key-value pair.  
To extract the values, use the key to index into the dict.

---

## IX. Nested Loops  

Iterations can be nested!

This works well with nested data structures, like dicts within dicts.

This is basically how `JSON` files are handled.

Be careful, though -- these can get complicated.

In [None]:
courses

In [None]:
for i, semester in enumerate(courses):
    print(f"{i+1}. {semester.upper()}:")
    for j, course in enumerate(courses[semester]):
        print(f"\t{i+1}.{j+1}. {course}")

## X. List Comprehensions

Start with this `for` loop:

In [None]:
vals = [1, 5, 6, 8, 12, 15]
is_odd = []

for val in vals:
    if val % 2: # if remainder is one, val is odd
        is_odd.append(True)
    else:       # else it's not odd
        is_odd.append(False)

is_odd

The code loops over each value in the list, checks the condition, and appends to a new list.  

The code works, but it's lengthy compared to a list comprehension.  

The approach takes extra time to write and understand.  

Let's solve with a list comprehension:

In [None]:
print(is_odd)
del(is_odd)
print(is_odd)

In [None]:
is_odd = [val % 2 == 1 for val in vals]
is_odd


Much shorter, and if you understand the syntax, quicker to interpet.

Note the in-place use of an expression.

Now let's discuss the syntax.

## Comprehensions in General

Comprehensions provide a concise method for iterating over any list-like object to a new list like object.

There are comprehensions for each list-like object:
* List comprehensions
* Dictionary comprehensions
* Tuple comprehensions
* Set comprehensions

Comprehensions are essentially very concise `for` loops. They are compact visually, but they also are more efficient than loops.

All comprehensions have the form:

listlike_result = `[ expression + context]`

The type of comprehension is indicated by the use of enclosing pairs, just like anonymous constructors:

* List comprehensions       `[expression + context]`
* Dictionary comprehensions `{expression + context}`
* Tuple comprehensions      `(expression + context)`
* Set comprehensions        `{expression + context}`


**Expression** defines what to do with each element in the list. This has the structure of the kind of comprehension. So, dictionary comprehension expressions take the form `k:v` while sets use `v`.

**Context** defines which list elements to select.  The context always consists of an arbitrary number of `for` and `if` statements.

---

**Another example:**

*Stop Word Remover*

Create list of words, and list of stop words.  
Filter out the stop words (considered not important).

In [None]:
stop_words = ['a','am','an','i','the','of']
words      = ['i','am','not','a','fan','of','the','film']

clean_words = [wd for wd in words if wd not in stop_words]
clean_words

-The expression is very simple: wd. keep the word if meets condition  
-The condition does the work: if the word isn't in list of stop words, keep it

Side note: This task can also be done with sets, if you are not concerned with mulitple instances of the same word:

In [None]:
s1 = set(stop_words)
s2 = set(words)
s3 = s2 - s1

s3

---
**Another Example**  

Select a list of measurements, retain elements containing mmHG

In [None]:
units = 'mmHg'
measures = ['20', '115mmHg', '5mg', '10 mg', '7.5dl', '120 mmHg']

meas_mmhg = [meas for meas in measures if units in meas]

meas_mmhg

Filter on 2 conditions

In [None]:
units1 = 'mmHg'
units2 = 'dl'
meas_mmhg_dl = [meas for meas in measures if units1 in meas or units2 in meas]

meas_mmhg_dl

For clarity:

In [None]:
[meas
 for meas in measures
 if units1 in meas
 or units2 in meas]

---
## Dictionary Comprehensions

**Dictionary comprehensions** provide a concise method for iterating over a dictionary to create a new dictionary.

This is common when data is structured as key-value pairs, and we'd like to filter the dict.

In [None]:
# various deep learning models and their depths

model_arch = {'cnn_1':'15 layers', 'cnn_2':'20 layers', 'rnn': '10 layers'}

In [None]:
# create a new dict containing only key-value pairs where the key contains 'cnn'

cnns = {key:model_arch[key] for key in model_arch.keys() if 'cnn' in key}
cnns

We build the key-value pairs using the expression `key:model_arch[key]`, where the key indexes into the dict `model_arch`.