# Lecture 4.1 - Unfolding Sequences

In this lecture, we will cover one more method for processing sequences: unfolding sequences.

Out primary goal in this module is applying what we have learned about regular expressions and processing sequences to the cleaning of messy text files.  Unfolding sequences is an important and necessary skill when performing this task.



## Unfolding a sequence is like peeling an onion

<img src="https://luminexusa.org/wp-content/uploads/bfi_thumb/onion-n2fhsqcdk8a1irebz8ua3d5ne782hyz8xa8ek3jph4.jpg" width="400"/>

1. Pull off a layer at a time.
2. We don't know how many layers before processing

## How to unfold an onion recursively

<img src="./img/unfold_the_onion.png" width="800"/>

## How to unfold an onion recursively

<img src="./img/unfold_the_onion_1.png" width="800"/>

## How to unfold an onion recursively

<img src="./img/unfold_the_onion_2.png" width="800"/>

## How to unfold an onion recursively

<img src="./img/unfold_the_onion_3.png" width="800"/>

<h2> <font color="red"> Exercise 4.1.1 </font> </h2>

Please answer each of the following questions.

#### Question: Why do we need to use a `while` loop?

**Answer:** While loops are used when we don't know how many times we need an iteration.

#### Question: How do we know when to stop?

**Answer:** Give a parameter (or conditions) to look for.

#### Question: How do we know it *will* stop?

**Answer:** Good parameter coding, make sure the conditions will be met eventually (or test the loop)

## Example - Splitting a string on spaces

When learning a new process, it is often useful to recreate existing functions to help us understand the mechanics involved.  In this exercise, we will split a string on spaces *without* using the `split` method.  Instead we will use a `while` loop to unfold the string.

In [None]:
example_quote = "Bad programmers worry about code. Good programmers worry about data structures and their relationships."

### Step 1 - Create the `get_layer` and `get_remaining` functions

#### Finding the split location

In [None]:
help(example_quote.find)

In [None]:
example_quote.find(' ')

#### Building `get_layer`

In [None]:
first_layer = example_quote[:example_quote.find(' ')]
first_layer

In [None]:
get_layer = lambda s: s[:s.find(' ')]
get_layer(example_quote)

#### Building `get_remaining`

In [None]:
remaining = example_quote[example_quote.find(' ') + 1:]
remaining

In [None]:
get_remaining = lambda s: s[s.find(' ') + 1:]
get_remaining(example_quote)

#### Building `stop_condition`

In [None]:
stop_condition = lambda s: len(s) == 0

### Step 2: Set the initial conditions

In [None]:
new_seq = []
remaining_layers = example_quote
new_seq, remaining_layers

#### Step 3a: Test out the iteration

In [None]:
new_seq = new_seq + [get_layer(remaining_layers)]
remaining_layers = get_remaining(remaining_layers)
new_seq, remaining_layers

In [None]:
new_seq = new_seq + [get_layer(remaining_layers)]
remaining_layers = get_remaining(remaining_layers)
new_seq, remaining_layers

In [None]:
new_seq = new_seq + [get_layer(remaining_layers)]
remaining_layers = get_remaining(remaining_layers)
new_seq, remaining_layers

### Step 3: Iterate with a while loop

In [None]:
while not stop_condition(remaining_layers):
    new_seq = new_seq + [get_layer(remaining_layers)]
    remaining_layers = get_remaining(remaining_layers)
    print(new_seq, remaining_layers)

# Oops!

Looks like we created an infinite loop.  Notice that `remaining_layers` stayed `"relationships."` once we were done.  This is because there are no more spaces.  Let's fix out `get_remaining` functions.

In [None]:
'relationships'['relationships'.find(' '):]

#### Building `get_remaining`--attempt 2

In [None]:
remaining = example_quote[example_quote.find(' ') + 1:] if ' ' in example_quote else ''
remaining

In [None]:
get_remaining = lambda s: s[s.find(' ') + 1:] if ' ' in s else ''
get_remaining(example_quote)

### Let's try that again!

In [None]:
new_seq = []
remaining_layers = example_quote
while not stop_condition(remaining_layers):
    new_seq = new_seq + [get_layer(remaining_layers)]
    remaining_layers = get_remaining(remaining_layers)
    print(new_seq, remaining_layers)

In [None]:
# Helper functions
get_layer = lambda s: s[:s.find(' ')]
get_remaining = lambda s: s[s.find(' ') + 1:] if ' ' in s else ''
stop_condition = lambda s: len(s) == 0

def split_on_space(s):
    ''' splits a string into a list of words (based on spaces).
    
    Args:
        s: a string
        
    Returns:
        a list of the words in the original string, where a "word" is defined
        by spaces.  Note that the spaces are removed.
    '''
    new_seq = []
    remaining_layers = s
    while not stop_condition(remaining_layers):
        new_seq = new_seq + [get_layer(remaining_layers)]
        remaining_layers = get_remaining(remaining_layers)
        print(new_seq, remaining_layers)
    return new_seq

def test_split_on_space():
    assert split_on_space("My cat") == ['My', 'cat']
    assert split_on_space('') == []
test_split_on_space()

#### Building `get_layer`

In [None]:
first_layer = example_quote[:example_quote.find(' ')]
first_layer

In [None]:
s = 'no_spaces'
s.find(' ')

In [None]:
s[:s.find(' ')]

In [None]:
get_layer = lambda s: s[:s.find(' ')] if ' ' in s else s
get_layer(s)

In [None]:
get_remaining(s)

In [None]:
# Helper functions
get_layer = lambda s: s[:s.find(' ')] if ' ' in s else s
get_remaining = lambda s: s[s.find(' ') + 1:] if ' ' in s else ''
stop_condition = lambda s: len(s) == 0

def split_on_space(s):
    ''' splits a string into a list of words (based on spaces).
    
    Args:
        s: a string
        
    Returns:
        a list of the words in the original string, where a "word" is defined
        by spaces.  Note that the spaces are removed.
    '''
    new_seq = []
    remaining_layers = s
    while not stop_condition(remaining_layers):
        new_seq = new_seq + [get_layer(remaining_layers)]
        remaining_layers = get_remaining(remaining_layers)
        # print(new_seq, remaining_layers)
    return new_seq

def test_split_on_space():
    assert split_on_space("My cat is cute.") == ['My', 'cat', 'is', 'cute.']
    assert split_on_space('') == []
test_split_on_space()

<h2> <font color="red"> Exercise 4.1.2 </font> </h2>

Redo the last problem but this time include an argument `sep` then split on this value.

**Hint:** Don't forget to replace the `+ 1` with a better value!

In [None]:
# Your code here

<h2> <font color="red"> Exercise 4.1.3 </font> </h2>

Create a function called `partition` that has two arguments `n` (an int) and `seq` (some sequence) and returns a list with the original content partitioned into `tuple`s of size `n`.

Example: `partition(2, [1, 2, 3, 4, 5]) == [(1,2), (3,4), (5,)]`

**Note:** To get create for this problem, you need to.

1. Document playing around with an example.
2. Document the creation and testing of your three `lambda functions (`get_layer`, `get_remaining` and `stop_condition`)
3. Package the code in a `def` statement with a good doc string and test function.

In [None]:
# Your code here