# Lecture 4.1 - Unfolding Sequences

In this lecture, we will cover one more method for processing sequences: unfolding sequences.

Out primary goal in this module is applying what we have learned about regular expressions and processing sequences to the cleaning of messy text files.  Unfolding sequences is an important and necessary skill when performing this task.



## Unfolding a sequence is like peeling an onion

<img src="https://luminexusa.org/wp-content/uploads/bfi_thumb/onion-n2fhsqcdk8a1irebz8ua3d5ne782hyz8xa8ek3jph4.jpg" width="400"/>

1. Pull off a layer at a time.
2. We don't know how many layers before processing

## How to unfold an onion recursively

<img src="./img/unfold_the_onion.png" width="800"/>

## How to unfold an onion recursively

<img src="./img/unfold_the_onion_1.png" width="800"/>

## How to unfold an onion recursively

<img src="./img/unfold_the_onion_2.png" width="800"/>

## How to unfold an onion recursively

<img src="./img/unfold_the_onion_3.png" width="800"/>

<h2> <font color="red"> Exercise 4.1.1 </font> </h2>

Please answer each of the following questions.

#### Question: Why do we need to use a `while` loop?

**Answer:** <font color="orange"> <b> We don't know how many iterations there will be</b> </font>

#### Question: How do we know when to stop?

**Answer:** <font color="orange"> <b> When there aren't any more iterations</b> </font>

#### Question: How do we know it *will* stop?

**Answer:** <font color="orange"> <b> It gets smaller each iteration, so eventually it has to stop </b> </font>

## Example - Splitting a string on spaces

When learning a new process, it is often useful to recreate existing functions to help us understand the mechanics involved.  In this exercise, we will split a string on spaces *without* using the `split` method.  Instead we will use a `while` loop to unfold the string.

In [57]:
example_quote = "Bad programmers worry about code. Good programmers worry about data structures and their relationships."

### Step 1 - Create the `get_layer` and `get_remaining` functions

#### Finding the split location

In [58]:
help(example_quote.find)

Help on built-in function find:

find(...) method of builtins.str instance
    S.find(sub[, start[, end]]) -> int
    
    Return the lowest index in S where substring sub is found,
    such that sub is contained within S[start:end].  Optional
    arguments start and end are interpreted as in slice notation.
    
    Return -1 on failure.



In [59]:
example_quote.find(' ')

3

#### Building `get_layer`

In [60]:
first_layer = example_quote[:example_quote.find(' ')]
first_layer

'Bad'

In [61]:
get_layer = lambda s: s[:s.find(' ')]
get_layer(example_quote)

'Bad'

#### Building `get_remaining`

In [62]:
remaining = example_quote[example_quote.find(' ') + 1:]
remaining

'programmers worry about code. Good programmers worry about data structures and their relationships.'

In [63]:
get_remaining = lambda s: s[s.find(' ') + 1:]
get_remaining(example_quote)

'programmers worry about code. Good programmers worry about data structures and their relationships.'

#### Building `stop_condition`

In [64]:
stop_condition = lambda s: len(s) == 0

### Step 2: Set the initial conditions

In [65]:
new_seq = []
remaining_layers = example_quote
new_seq, remaining_layers

([],
 'Bad programmers worry about code. Good programmers worry about data structures and their relationships.')

#### Step 3a: Test out the iteration

In [66]:
new_seq = new_seq + [get_layer(remaining_layers)]
remaining_layers = get_remaining(remaining_layers)
new_seq, remaining_layers

(['Bad'],
 'programmers worry about code. Good programmers worry about data structures and their relationships.')

In [67]:
new_seq = new_seq + [get_layer(remaining_layers)]
remaining_layers = get_remaining(remaining_layers)
new_seq, remaining_layers

(['Bad', 'programmers'],
 'worry about code. Good programmers worry about data structures and their relationships.')

In [68]:
new_seq = new_seq + [get_layer(remaining_layers)]
remaining_layers = get_remaining(remaining_layers)
new_seq, remaining_layers

(['Bad', 'programmers', 'worry'],
 'about code. Good programmers worry about data structures and their relationships.')

### Step 3: Iterate with a while loop

In [None]:
while not stop_condition(remaining_layers):
    new_seq = new_seq + [get_layer(remaining_layers)]
    remaining_layers = get_remaining(remaining_layers)
    print(new_seq, remaining_layers)

# Oops!

Looks like we created an infinite loop.  Notice that `remaining_layers` stayed `"relationships."` once we were done.  This is because there are no more spaces.  Let's fix out `get_remaining` functions.

#### Building `get_remaining`--attempt 2

In [69]:
remaining = example_quote[example_quote.find(' ') + 1:] if ' ' in example_quote else ''
remaining

'programmers worry about code. Good programmers worry about data structures and their relationships.'

In [70]:
get_remaining = lambda s: s[s.find(' ') + 1:] if ' ' in s else ''
get_remaining(example_quote)

'programmers worry about code. Good programmers worry about data structures and their relationships.'

### Let's try that again!

In [71]:
new_seq = []
remaining_layers = example_quote
while not stop_condition(remaining_layers):
    new_seq = new_seq + [get_layer(remaining_layers)]
    remaining_layers = get_remaining(remaining_layers)
    print(new_seq, remaining_layers)

['Bad'] programmers worry about code. Good programmers worry about data structures and their relationships.
['Bad', 'programmers'] worry about code. Good programmers worry about data structures and their relationships.
['Bad', 'programmers', 'worry'] about code. Good programmers worry about data structures and their relationships.
['Bad', 'programmers', 'worry', 'about'] code. Good programmers worry about data structures and their relationships.
['Bad', 'programmers', 'worry', 'about', 'code.'] Good programmers worry about data structures and their relationships.
['Bad', 'programmers', 'worry', 'about', 'code.', 'Good'] programmers worry about data structures and their relationships.
['Bad', 'programmers', 'worry', 'about', 'code.', 'Good', 'programmers'] worry about data structures and their relationships.
['Bad', 'programmers', 'worry', 'about', 'code.', 'Good', 'programmers', 'worry'] about data structures and their relationships.
['Bad', 'programmers', 'worry', 'about', 'code.', 'G

In [None]:
# Helper functions
get_layer = lambda s: s[:s.find(' ')] if ' ' in s else s
get_remaining = lambda s: s[s.find(' ') + 1:] if ' ' in s else ''
stop_condition = lambda s: len(s) == 0

def split_on_space(s):
    ''' splits a string into a list of words (based on spaces).
    
    Args:
        s: a string
        
    Returns:
        a list of the words in the original string, where a "word" is defined
        by spaces.  Note that the spaces are removed.
    '''
    new_seq = []
    remaining_layers = s
    while not stop_condition(remaining_layers):
        new_seq = new_seq + [get_layer(remaining_layers)]
        remaining_layers = get_remaining(remaining_layers)
        print(new_seq, remaining_layers)
    return new_seq

def test_split_on_space():
    assert split_on_space("My cat") == ['My', 'cat']
    assert split_on_space('') == []
test_split_on_space()

#### Building `get_layer`

In [None]:
first_layer = example_quote[:example_quote.find(' ')]
first_layer

In [None]:
s = 'no_spaces'
s.find(' ')

In [None]:
s[:s.find(' ')]

In [None]:
get_layer = lambda s: s[:s.find(' ')] if ' ' in s else s
get_layer(s)

In [None]:
get_remaining(s)

In [None]:
# Helper functions
get_layer = lambda s: s[:s.find(' ')] if ' ' in s else s
get_remaining = lambda s: s[s.find(' ') + 1:] if ' ' in s else ''
stop_condition = lambda s: len(s) == 0

def split_on_space(s):
    ''' splits a string into a list of words (based on spaces).
    
    Args:
        s: a string
        
    Returns:
        a list of the words in the original string, where a "word" is defined
        by spaces.  Note that the spaces are removed.
    '''
    new_seq = []
    remaining_layers = s
    while not stop_condition(remaining_layers):
        new_seq = new_seq + [get_layer(remaining_layers)]
        remaining_layers = get_remaining(remaining_layers)
        # print(new_seq, remaining_layers)
    return new_seq

def test_split_on_space():
    assert split_on_space("My cat is cute.") == ['My', 'cat', 'is', 'cute.']
    assert split_on_space('') == []
test_split_on_space()

<h2> <font color="red"> Exercise 4.1.2 </font> </h2>

Redo the last problem but this time include an argument `sep` then split on this value.

**Hint:** Don't forget to replace the `+ 1` with a better value!

In [32]:
get_layer = lambda s, sep: s[:s.find(sep)] if sep in s else s
get_remaining = lambda s, sep: s[s.find(sep) +len(sep):] if sep in s else ''
stop_condition = lambda s: len(s) == 0

def split_on_sep(s, sep):
    ''' splits a string based on sep value.
    
    Args:
        s: a string
        sep: seperating key
        
    Returns:
        a list seperated by sep value.  Note that the sep values are removed.
    '''
    new_seq = []
    remaining_layers = s
    while not stop_condition(remaining_layers):
        new_seq = new_seq + [get_layer(remaining_layers, sep)]
        remaining_layers = get_remaining(remaining_layers, sep)
        #print(new_seq, remaining_layers)
    return new_seq

def test_split_on_sep():
    assert split_on_sep("My cat is cute.", ' ') == ['My', 'cat', 'is', 'cute.']
    assert split_on_sep('', ' ') == []
test_split_on_sep()

In [35]:
split_on_sep("My cat is cute.", 'c')

['My ', 'at is ', 'ute.']

In [31]:
split_on_sep("My cat is a cactus.", "ca")

['My ', 't is a ', 'ctus.']

<h2> <font color="red"> Exercise 4.1.3 </font> </h2>

Create a function called `partition` that has two arguments `n` (an int) and `seq` (some sequence) and returns a list with the original content partitioned into `tuple`s of size `n`.

Example: `partition(2, [1, 2, 3, 4, 5]) == [(1,2), (3,4), (5,)]`

**Note:** To get create for this problem, you need to.

1. Document playing around with an example.
2. Document the creation and testing of your three `lambda functions (`get_layer`, `get_remaining` and `stop_condition`)
3. Package the code in a `def` statement with a good doc string and test function.

In [90]:
s = [1,2,3,4,5]
tuple(s)

(1, 2, 3, 4, 5)

In [96]:
tuple(s[:2])
s[4:]

[5]

In [97]:
tuple(s[4:])

(5,)



>>>>>>>>>>>>>>>>>LAMBDAS



In [76]:
seq = [3,4,5,6]

In [104]:
get_lay = lambda seq, n: tuple(seq[:n])
get_lay(seq,2)

(1, 2)

In [85]:
get_remain = lambda seq, n: tuple(seq[n:])
get_remain(seq,2)

(5, 6)

In [86]:
stop_cond = lambda seq: len(seq) == 0

In [101]:
seq = [1,2,3,4,5,6,7,8,9,10,11]
new_seq = []
remaining_layers = seq
n = 3
while not stop_cond(remaining_layers):
    new_seq = new_seq + [get_lay(remaining_layers,n)]
    remaining_layers = get_remain(remaining_layers,n)
    print(new_seq, remaining_layers)

[(1, 2, 3)] (4, 5, 6, 7, 8, 9, 10, 11)
[(1, 2, 3), (4, 5, 6)] (7, 8, 9, 10, 11)
[(1, 2, 3), (4, 5, 6), (7, 8, 9)] (10, 11)
[(1, 2, 3), (4, 5, 6), (7, 8, 9), (10, 11)] ()


In [107]:
get_lay = lambda n, seq: tuple(seq[:n])
get_remain = lambda n, seq: tuple(seq[n:])
stop_cond = lambda seq: len(seq) == 0

def partition(n, seq):
    ''' partitions a list into tuples of specificed length.
    
    Args:
        n: size of tuple (int)
        seq: sequence (list)
        
    Returns:
        a list with the original content partitioned into tuples of size n.
    '''
    new_seq = []
    remaining_layers = seq
    while not stop_cond(remaining_layers):
        new_seq = new_seq + [get_lay(n, remaining_layers)]
        remaining_layers = get_remain(n, remaining_layers)
    return new_seq

def test_partition():
    assert partition(2,[1,2,3,4,5]) == [(1,2), (3,4), (5,)]
    assert partition(3,[0,1,2,3,4,5]) == [(0,1,2), (3,4,5)]
    assert partition(2,[-2,0,3,13]) == [(-2,0), (3,13)]
test_partition()