##### Travis' Notes 2020.09.08

# Generators

Here we'll take a deeper dive into Python generators, including *generator expressions* and *generator functions*.

## Generator Expressions

The difference between list comprehensions and generator expressions is sometimes confusing; here we'll quickly outline the differences between them:

### List comprehensions use square brackets, while generator expressions use parentheses
This is a representative list comprehension:

In [1]:
[n ** 2 for n in range(12)]

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121]

While this is a representative generator expression:

In [2]:
(n ** 2 for n in range(12))

<generator object <genexpr> at 0x000002B518831CC8>

Notice that printing the generator expression does not print the contents; one way to print the contents of a generator expression is to pass it to the ``list`` constructor:

In [3]:
G = (n ** 2 for n in range(12))

In [4]:
list(G)

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121]

### A list is a collection of values, while a generator is a recipe for producing values
When you create a list, you are actually building a collection of values, and there is some memory cost associated with that.
When you create a generator, you are not building a collection of values, but a recipe for producing those values.
Both expose the same iterator interface, as we can see here:

In [5]:
L = [n ** 2 for n in range(12)]
for val in L:
    print(val, end=' ')

0 1 4 9 16 25 36 49 64 81 100 121 

In [6]:
G = (n ** 2 for n in range(12))
for val in G:
    print(val, end=' ')

0 1 4 9 16 25 36 49 64 81 100 121 

The difference is that a generator expression does not actually compute the values until they are needed.
This not only leads to memory efficiency, but to computational efficiency as well!
This also means that while the size of a list is limited by available memory, the size of a generator expression is unlimited!

An example of an infinite generator expression can be created using the ``count`` iterator defined in ``itertools``:

In [7]:
from itertools import count
count()

count(0)

In [8]:
for i in count():
    print(i, end=' ')
    if i >= 10: break

0 1 2 3 4 5 6 7 8 9 10 

The ``count`` iterator will go on happily counting forever until you tell it to stop; this makes it convenient to create generators that will also go on forever:

In [9]:
factors = [2, 3, 5, 7]
G = (i for i in count() if all(i % n > 0 for n in factors))
for val in G:
    print(val, end=' ')
    if val > 40: break

1 11 13 17 19 23 29 31 37 41 

You might see what we're getting at here: if we were to expand the list of factors appropriately, what we would have the beginnings of is a prime number generator, using the Sieve of Eratosthenes algorithm. We'll explore this more momentarily.

### A list can be iterated multiple times; a generator expression is single-use
This is one of those potential gotchas of generator expressions.
With a list, we can straightforwardly do this:

In [10]:
L = [n ** 2 for n in range(12)]
for val in L:
    print(val, end=' ')
print()

for val in L:
    print(val, end=' ')

0 1 4 9 16 25 36 49 64 81 100 121 
0 1 4 9 16 25 36 49 64 81 100 121 

A generator expression, on the other hand, is used-up after one iteration:

In [11]:
G = (n ** 2 for n in range(12))
list(G)

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121]

In [12]:
list(G)

[]

This can be very useful because it means iteration can be stopped and started:

In [13]:
G = (n**2 for n in range(12))
for n in G:
    print(n, end=' ')
    if n > 30: break

print("\ndoing something in between")

for n in G:
    print(n, end=' ')

0 1 4 9 16 25 36 
doing something in between
49 64 81 100 121 

One place I've found this useful is when working with collections of data files on disk; it means that you can quite easily analyze them in batches, letting the generator keep track of which ones you have yet to see.

## Generator Functions: Using ``yield``
We saw in the previous section that list comprehensions are best used to create relatively simple lists, while using a normal ``for`` loop can be better in more complicated situations.
The same is true of generator expressions: we can make more complicated generators using *generator functions*, which make use of the ``yield`` statement.

Here we have two ways of constructing the same list:

In [14]:
L1 = [n ** 2 for n in range(12)]

L2 = []
for n in range(12):
    L2.append(n ** 2)

print(L1)
print(L2)

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121]
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121]


In [16]:
G1 = (n ** 2 for n in range(12))

def gen():
    for n in range(12):
        yield n ** 2

G2 = gen()
print(*G1)
print(*G2)

0 1 4 9 16 25 36 49 64 81 100 121
0 1 4 9 16 25 36 49 64 81 100 121


A generator function is a function that, rather than using ``return`` to return a value once, uses ``yield`` to yield a (potentially infinite) sequence of values.
Just as in generator expressions, the state of the generator is preserved between partial iterations, but if we want a fresh copy of the generator we can simply call the function again.

## Example: Prime Number Generator
Here I'll show my favorite example of a generator function: a function to generate an unbounded series of prime numbers.
A classic algorithm for this is the *Sieve of Eratosthenes*, which works something like this:

In [17]:
# Generate a list of candidates
L = [n for n in range(2, 40)]
print(L)

[2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39]


In [18]:
# Remove all multiples of the first value
L = [n for n in L if n == L[0] or n % L[0] > 0]
print(L)

[2, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39]


In [19]:
# Remove all multiples of the second value
L = [n for n in L if n == L[1] or n % L[1] > 0]
print(L)

[2, 3, 5, 7, 11, 13, 17, 19, 23, 25, 29, 31, 35, 37]


In [20]:
# Remove all multiples of the third value
L = [n for n in L if n == L[2] or n % L[2] > 0]
print(L)

[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37]


If we repeat this procedure enough times on a large enough list, we can generate as many primes as we wish.

Let's encapsulate this logic in a generator function:

In [21]:
def gen_primes(N):
    """Generate primes up to N"""
    primes = set()
    for n in range(2, N):
        if all(n % p > 0 for p in primes):
            primes.add(n)
            yield n

print(*gen_primes(100))

2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97


That's all there is to it!
While this is certainly not the most computationally efficient implementation of the Sieve of Eratosthenes, it illustrates how convenient the generator function syntax can be for building more complicated sequences.

## Regular Expression Generators for Python

#### Related Videos and Documentation:
**[Gabe's Video](https://www.youtube.com/watch?v=5jwV3zxXc8E)**

**[Marissa's RegExOps](https://docs.python.org/2/library/re.html)**

**[Marissa's RegExCheatsheet](https://learnbyexample.github.io/cheatsheet/python/python-regex-cheatsheet/)**

### [Pyregex](http://www.pyregex.com/)

### [Pythex](https://pythex.org/) 

[Regular Expression Cheatsheet](https://learnbyexample.github.io/cheatsheet/python/python-regex-cheatsheet/)
<br>
[Regular Expression Operations Documentation](https://docs.python.org/2/library/re.html)

## Regular Expressions

### Examples

We have already seen that we can ask from a string `str`
whether it begins with some substring as follows:
`str.startswith('Apple')`.
If we would like to know whether it starts with `"Apple"` or
`"apple"`, we would have to call `startswith` method twice.
Regular expressions offer a simpler solution:
`re.match(r"[Aa]pple", str)`.
The bracket notation is one example of the special syntax of
*regular expressions*. In this case it says that any of the
characters inside brackets will do: either `"A"` or `"a"`. The other
letters in `"pple"` will act normally. The string `r"[Aa]pple"` is
called a *pattern*.

A more complicated example asks whether the string `str`
starts with either `apple` or `banana` (no matter if the first letter
is capital or not):
`re.match(r"[Aa]pple|[Bb]anana", str)`.
In this example we saw a new special character `|` that denotes
an alternative. On either side of the bar character we have a
*subpattern*.

A legal variable name in Python starts with a letter or an
underline character and the following characters can also be
digits.
So legal names are, for instance: `_hidden`, `L_value`, `A123_`.
But the name `2abc` is not a valid variable name.
Let’s see what would be the regular expression pattern to
recognise valid variable names:
`r"[A-Za-z_][A-Za-z_0-9]*\Z"`.
Here we have used a shorthand for character ranges: `A-Z`.
This means all the characters from `A` to `Z`.

The first character of the variable name is defined in the first
brackets. The subsequent characters are defined in the second
brackets.
The special character `*` means that we allow any number
(0,1,2, . . . ) of the previous subpattern. For example the
pattern `r"ba*"` allows strings `"b"`, `"ba"`, `"baa"`, `"baaa"`, and
so on.
The special syntax `\Z` denotes the end of the string.
Without it we would also accept `abc-` as a valid name since
the `match` function normally checks only that a string starts with a pattern.

The special notations, like `\Z`, also cause problems with string
handling.
Remember that normally in string literals we have some
special notation: `\n` stands for newline, `\t` stands for tab, and
so on.
So, both string literals and regular expressions use similar
looking notations, which can create serious confusion.
This can be solved by using the so-called *raw strings*. We
denote a raw string by having an `r` letter before the first
quotation mark, for example `r"ab*\Z"`.
When using raw strings, the newline (`\n`), tab (`\t`), and other
special string literal notations aren’t interpreted. One should
always use raw strings when defining regular expression
patterns!

### Patterns

A pattern represents a set of strings. This set can even be
potentially infinite.
They can be used to describe a set of strings that have some
commonality; some regular structure.
Regular expressions (RE) are a classical computer science topic.
They are very common in programming tasks. Scripting
languages, like Python, are very fluent in regular expressions.
Very complex text processing can be achieved using regular
expressions.

In patterns, normal characters (letters, numbers) just represent
themselves, unless preceded by a backslash, which may trigger
some special meaning.
Punctuation characters have special meaning, unless preceded
by backslash (`\`), which deprives their special meaning.
Use `\\` to represent a backslash character without any special
meaning.
In the following slides we will go through some of the more
common RE notations.

```
. Matches any character
[...] Matches any character contained within the brackets
[^...] Matches any character not appearing after the hat (ˆ)
ˆ Matches the start of the string
$ Matches the end of the string
* Matches zero or more previous RE
+ Matches one or more previous RE
{m,n} Matches m to n occurences of previous RE
? Matches zero or one occurences of previous RE
```

We have already seen that a `|` character denotes alternatives.
For example, the pattern `r"Get (on|off|ready)"` matches
the following strings: `"Get on"`, `"Get off"`, `"Get ready"`.
We can use parentheses to create groupings inside a pattern:
`r"(ab)+"` will match the strings `"ab"`, `"abab"`, `"ababab"`,
and so on.
These groups are also given a reference number starting from 1. 
We can refer to groups using backreferences: `\number`.
For example, we can find separated patterns that get
repeated: `r"([a-z]{3,}) \1 \1"`.
This will recognise, for example, the following strings: `"aca
aca aca"`, `"turn turn turn"`. But not the strings `"aca
aba aca"` or `"ac ac ac"`.


In the following, note that a hat (ˆ) as the first character
inside brackets will create a complement set of characters:

```
`\d` same as `[0-9]`, matches a digit
`\D` same as `[ˆ0-9]`, matches anything but a digit
`\s` matches a whitespace character (space, newline, tab, ... )
`\S` matches a nonwhitespace character
`\w` same as `[a-zA-Z0-9_]`, matches one alphanumeric character
`\W` matches one non-alphanumeric character
```

Using the above notation we can now shorten our previous
variable name example to `r’[a-zA-Z_]\w*\Z’`

The patterns `\A`, `\b`, `\B`, and `\Z` will all match an empty
string, but in specific places.
The patterns `\A` and `\Z` will recognise the beginning and end
of the string, respectively.
Note that the patterns `ˆ` and `$` can in some cases match also
after a newline and before a newline, correspondingly.
So, `\A` is distinct from `ˆ`, and `\Z` is distinct from `$`.
The pattern `\b` matches at the start or end of a word. The
pattern `\B` does the reverse.

### Match and search functions

We have so far only used the `re.match` function which tries
to find a match at the beginning of a string
The function `re.search` allows to match any substring of a
string.
Example: `re.search(r'\bback\b', s)` will match
strings `"back"`, `"a back, is a body part"`, `"get back"`. But it
will not match the strings `"backspace"` or `"comeback"`.

The function `re.search` finds only the first occurence.
We can use the `re.findall` function to find all occurences.
Let’s say we want to find all present participle words in a
string `s`. The present participle words have ending `'ing'`.
The function call would look like this:
`re.findall(r'\w+ing\b', s)`.
Let’s try running this:

In [25]:
import re
s = "Doing things, going home, staying awake, sleeping later"
re.findall(r'\w+ing\B', s)

['thing']

Let’s say we want to pick up all the integers from a string.
We can try that with the following function call:
`re.findall(r'[+-]?\d+', s)`.
An example run:

In [24]:
re.findall(r'[+-]?\d+', "23 + -24 = -1")

['23', '-24', '-1']

Suppose we are given a string of if/then sentences, and we would like to extract the conditions from these sentences. Let’s try the following function call: re.findall(r'[Ii]f (.*), then', s). An example run:

In [26]:
s = ("If I’m not in a hurry, then I should stay. " +
    "On the other hand, if I leave, then I can sleep.")
re.findall(r'[Ii]f (.*), then', s)

['I’m not in a hurry, then I should stay. On the other hand, if I leave']

But I wanted a result: `["I'm not in a hurry", 'I leave']`. That
is, the condition from both sentences. How can this be fixed?

The problem is that the pattern `.*` tries to match as many
characters as possible.
This is called *greedy matching*.
One way of solving this problem is to notice that the two
sentences are separated by a full-stop (.).
So, instead of matching all the characters, we need to match
everything but the dot character.
This can be achieved by using the complement character
class: `[^.]`. The hat character (`ˆ`) in the beginning of a
character class means the complement character class

After the modification the function call looks like this:
`re.findall(r'[Ii]f ([^.]*), then', s)`.
Another way of solving this problem is to use a non-greedy
matching.
The repetition specifiers `+`, `*`, `?`, and `{m,n}` have
corresponding non-greedy versions: `+?`, `*?`, `??`, and `{m,n}?`.
These expressions use as few characters as possible to make
the whole pattern match some substring.
By using non-greedy version, the function call looks like this:
`re.findall(r’[Ii]f (.*?), then’, s)`.

### Functions in the `re` module

Below is a list of the most common functions in the `re` module

* `re.match(pattern, str)`
* `re.search(pattern, str)`
* `re.findall(pattern, str)`
* `re.finditer(pattern, str)`
* `re.sub(pattern, replacement, str, count=0)`

Functions `match` and `search` return a *match object*.
A match object describes the found occurence.
The function `findall` returns a list of all the occurences of
the pattern. The elements in the list are strings.
The function `finditer` works like `findall` function except
that instead of returning a list, it returns an iterator whose
items are match objects.
The function `sub` replaces all the occurences of the pattern in
`str` with the string replacement and returns the new string.

An example: The following program will replace all "she"
words with "he"

```
import re
str = "She goes where she wants to, she's a sheriff."
newstr = re.sub(r'\b[Ss]he\b', 'he', str)
print newstr
```

This will print `he goes where he wants to, he's a sheriff.`

The `sub` function can also use backreferences to refer to the
matched string. The backreferences \1, \2, and so on, refer
to the groups of the pattern, in order.
An example:
```
import re
str = """He is the president of Russia.
He’s a powerful man."""
newstr = re.sub(r'(\b[Hh]e\b)', r'\1 (Putin)', str, 1)
print newstr
```

This will print
```
He (Putin) is the president of Russia.
He’s a powerful man.
```

### Match object

Functions `match`, `search`, and `finditer` use `match` objects
to describe the found occurence.
The method `groups()` of the match object returns the tuple
of all the substrings matched by the groups of the pattern.
Each pair of parentheses in the pattern creates a new group.
These groups are are referred to by indices 1, 2, ...
The group 0 is a special one: it refers to the match created by
the whole pattern.

Let’s look at the match object returned by the call

```
mo = re.search(r'\d+ (\d+) \d+ (\d+)',
'first 123 45 67 890 last')
```

The call `mo.groups()` returns a tuple `(’45’, ’890’)`.
We can access just some individual groups by using the
method `group(gid, ...)`.
For example, the call `mo.group(1)` will return `’45’`.
The zeroth group will represent the whole match:
`’123 45 67 890’`

In addition to accessing the strings matched by the pattern
and its groups, the corresponding indices of the original string
can be accessed:

* The `start(gid=0)` and `end(gid=0)` methods return the start
and end indices of the matched group gid, correspondingly
* The method `span(gid)` just returns the pair of these start
and end indices

The match object mo can also be used like a boolean value:

```python
mo = re.search(...)
if mo:
    # do something
```

will do something if a match was found.
Alternatively, the match object can be converted to a boolean
value by the call `found = bool(mo)`.

### Miscellaneous stuff

If the same pattern is used in many function calls, it may be
wise to precompile the pattern, mainly for efficiency reasons.
This can be done using the `compile(pattern, flags=0)` function
in the `re` module. The function returns a so-called RE object.
The RE object has method versions of the functions found in
module `re`.
The only difference is that the first parameter is not the
pattern since the precompiled pattern is stored in the RE
object.

The details of matching operation can be specified using
optional flags.
These flags can be given either inside the pattern or as a
parameter to the compile function.
Some of the more common flags are given in the following
table

| x   | Flag |
|-----|--------------|
|`(?i)` | re.IGNORECASE|
|`(?m)` | re.MULTILINE|
|`(?s)` | re.DOTALL|

The elements on the left can appear anywhere in the pattern
but preferably in the beginning.
On the right there are attributes of the re module that can be
given to the compile function as the second parameter

The `IGNORECASE` flag makes lower- and uppercase
characters appear as equal.
The `MULTILINE` flag makes the special characters `ˆ` and `$`
match the beginning and end of each line in addition to the
beginning and end of the whole string. These flags make `\A`
differ from `ˆ`, and `\Z` differ from `$`.
The `DOTALL` flag makes the character class `.` (dot) also
accept the newline character, in addition to all the other
letters.

When giving multiple flags to the compile function, the flags
can be separated with the `|` sign.
For example, `re.compile(pattern, re.MULTILINE | re.DOTALL)`.
This is equal to `re.compile('(?m)(?s)' + pattern)`.

# String Manipulation and Regular Expressions

In [2]:
from IPython.display import HTML

# Youtube
HTML('<iframe width="560" height="315" src="https://www.youtube.com/embed/K8L6KVGG-7o?rel=0&amp;controls=0&amp;showinfo=0" frameborder="0" allowfullscreen></iframe>')


One place where the Python language really shines is in the manipulation of strings. This section will cover some of Python's built-in string methods and formatting operations, before moving on to a quick guide to the extremely useful subject of regular expressions. Such string manipulation patterns come up often in the context of data science work, and is one big perk of Python in this context.

Strings in Python can be defined using either single or double quotations (they are functionally equivalent):

In [29]:
x = 'a string'
y = "a string"
x == y

True

In addition, it is possible to define multi-line strings using a triple-quote syntax:

In [31]:
multiline = """
one
two
three
"""
print(multiline)


one
two
three



With this, let's take a quick tour of some of Python's string manipulation tools.

## Simple String Manipulation in Python

For basic manipulation of strings, Python's built-in string methods can be extremely convenient.
If you have a background working in C or another low-level language, you will likely find the simplicity of Python's methods extremely refreshing.
We introduced Python's string type and a few of these methods earlier; here we'll dive a bit deeper

### Formatting strings: Adjusting case

Python makes it quite easy to adjust the case of a string.
Here we'll look at the ``upper()``, ``lower()``, ``capitalize()``, ``title()``, and ``swapcase()`` methods, using the following messy string as an example:

In [33]:
fox = "tHe qUICk bROWn fOx."

To convert the entire string into upper-case or lower-case, you can use the ``upper()`` or ``lower()`` methods respectively:

In [34]:
fox.upper()

'THE QUICK BROWN FOX.'

In [35]:
fox.lower()

'the quick brown fox.'

A common formatting need is to capitalize just the first letter of each word, or perhaps the first letter of each sentence.
This can be done with the ``title()`` and ``capitalize()`` methods:

In [36]:
fox.title()

'The Quick Brown Fox.'

In [37]:
fox.capitalize()

'The quick brown fox.'

The cases can be swapped using the ``swapcase()`` method:

In [38]:
fox.swapcase()

'ThE QuicK BrowN FoX.'

### Formatting strings: Adding and removing spaces

Another common need is to remove spaces (or other characters) from the beginning or end of the string.
The basic method of removing characters is the ``strip()`` method, which strips whitespace from the beginning and end of the line:

In [39]:
line = '         this is the content         '
line.strip()

'this is the content'

To remove just space to the right or left, use ``rstrip()`` or ``lstrip()`` respectively:

In [40]:
line.rstrip()

'         this is the content'

In [41]:
line.lstrip()

'this is the content         '

To remove characters other than spaces, you can pass the desired character to the ``strip()`` method:

In [42]:
num = "000000000000435"
num.strip('0')

'435'

The opposite of this operation, adding spaces or other characters, can be accomplished using the ``center()``, ``ljust()``, and ``rjust()`` methods.

For example, we can use the ``center()`` method to center a given string within a given number of spaces:

line = "this is the content"
line.center(30)

Similarly, ljust() and rjust() will left-justify or right-justify the string within spaces of a given length:

In [43]:
line.ljust(30)

'         this is the content         '

In [44]:
line.rjust(30)

'         this is the content         '

All these methods additionally accept any character which will be used to fill the space.
For example:

In [45]:
'435'.rjust(10, '0')

'0000000435'

Because zero-filling is such a common need, Python also provides ``zfill()``, which is a special method to right-pad a string with zeros:

In [47]:
'435'.zfill(10)

'0000000435'

### Finding and replacing substrings

If you want to find occurrences of a certain character in a string, the ``find()``/``rfind()``, ``index()``/``rindex()``, and ``replace()`` methods are the best built-in methods.

``find()`` and ``index()`` are very similar, in that they search for the first occurrence of a character or substring within a string, and return the index of the substring:

In [48]:
line = 'the quick brown fox jumped over a lazy dog'
line.find('fox')

16

In [49]:
line.index('fox')

16

The only difference between ``find()`` and ``index()`` is their behavior when the search string is not found; ``find()`` returns ``-1``, while ``index()`` raises a ``ValueError``:

In [50]:
line.find('bear')

-1

In [52]:
line.index('fox')

16

The related ``rfind()`` and ``rindex()`` work similarly, except they search for the first occurrence from the end rather than the beginning of the string:

In [53]:
line.rfind('a')

35

For the special case of checking for a substring at the beginning or end of a string, Python provides the ``startswith()`` and ``endswith()`` methods:

In [54]:
line.endswith('dog')

True

In [55]:
line.startswith('fox')

False

To go one step further and replace a given substring with a new string, you can use the ``replace()`` method.
Here, let's replace ``'brown'`` with ``'red'``:

In [56]:
line.replace('brown', 'red')

'the quick red fox jumped over a lazy dog'

The ``replace()`` function returns a new string, and will replace all occurrences of the input:

In [57]:
line.replace('o', '--')

'the quick br--wn f--x jumped --ver a lazy d--g'

### Splitting and partitioning strings

If you would like to find a substring *and then* split the string based on its location, the ``partition()`` and/or ``split()`` methods are what you're looking for.
Both will return a sequence of substrings.

The ``partition()`` method returns a tuple with three elements: the substring before the first instance of the split-point, the split-point itself, and the substring after:

In [58]:
line.partition('fox')

('the quick brown ', 'fox', ' jumped over a lazy dog')

The ``rpartition()`` method is similar, but searches from the right of the string.

The ``split()`` method is perhaps more useful; it finds *all* instances of the split-point and returns the substrings in between.
The default is to split on any whitespace, returning a list of the individual words in a string:

In [59]:
line.split()

['the', 'quick', 'brown', 'fox', 'jumped', 'over', 'a', 'lazy', 'dog']

A related method is ``splitlines()``, which splits on newline characters.
Let's do this with a Haiku, popularly attributed to the 17th-century poet Matsuo Bashō:

In [60]:
haiku = """matsushima-ya
aah matsushima-ya
matsushima-ya"""

haiku.splitlines()

['matsushima-ya', 'aah matsushima-ya', 'matsushima-ya']

Note that if you would like to undo a ``split()``, you can use the ``join()`` method, which returns a string built from a splitpoint and an iterable:

In [61]:
'--'.join(['1', '2', '3'])

'1--2--3'

A common pattern is to use the special character ``"\n"`` (newline) to join together lines that have been previously split, and recover the input:

In [62]:
print("\n".join(['matsushima-ya', 'aah matsushima-ya', 'matsushima-ya']))

matsushima-ya
aah matsushima-ya
matsushima-ya


## Format Strings

In the preceding methods, we have learned how to extract values from strings, and to manipulate strings themselves into desired formats.
Another use of string methods is to manipulate string *representations* of values of other types.
Of course, string representations can always be found using the ``str()`` function; for example:

In [63]:
pi = 3.14159
str(pi)

'3.14159'

In [64]:
"The value of pi is " + str(pi)

'The value of pi is 3.14159'

A more flexible way to do this is to use *format strings*, which are strings with special markers (noted by curly braces) into which string-formatted values will be inserted.
Here is a basic example:

In [65]:
"The value of pi is {}".format(pi)

'The value of pi is 3.14159'

Inside the ``{}`` marker you can also include information on exactly *what* you would like to appear there.
If you include a number, it will refer to the index of the argument to insert:

In [66]:
"""First letter: {0}. Last letter: {1}.""".format('A', 'Z')

'First letter: A. Last letter: Z.'

If you include a string, it will refer to the key of any keyword argument:

In [67]:
"""First letter: {first}. Last letter: {last}.""".format(last='Z', first='A')

'First letter: A. Last letter: Z.'

Finally, for numerical inputs, you can include format codes which control how the value is converted to a string.
For example, to print a number as a floating point with three digits after the decimal point, you can use the following:

In [68]:
"pi = {0:.3f}".format(pi)

'pi = 3.142'

As before, here the "``0``" refers to the index of the value to be inserted.
The "``:``" marks that format codes will follow.
The "``.3f``" encodes the desired precision: three digits beyond the decimal point, floating-point format.

This style of format specification is very flexible, and the examples here barely scratch the surface of the formatting options available.