## 8.2 Formating Strings

8.2.1 Presentation Types

When you specify a placeholder for a value in an f-string, Python assumes the value should be displayed as a string unless you specify another type. In some cases, the type is required. For example, let’s format the float value 17.489 rounded to the hundredths position:
In [1]: f'{17.489:.2f}'

In [1]:
f'{17.489:.2f}'

'17.49'

The d presentation type formats integer values as strings:

In [2]:
f'{10:d}'

'10'

The c presentation type formats an integer character code as the corresponding character:

In [3]:
f'{65:c} {97:c}'

'A a'

Strings

The s presentation type is the default. If you specify s explicitly, the value to format must be a variable that references a string, an expression that produces a string or a string literal, as in the first placeholder below. If you do not specify a presentation type, as in the second placeholder below, non-string values like the integer 7 are converted to strings:

In [4]:
f'{"hello":s} {7}'

'hello 7'

Floating-Point and Decimal Values

Let’s show the difference between f and e for a large value, each with three digits of precision to the right of the decimal point:

In [5]:
from decimal import Decimal

In [6]:
f'{Decimal("10000000000000000000000000.0"):.3f}'

'10000000000000000000000000.000'

In [7]:
f'{Decimal("10000000000000000000000000.0"):.3e}'

'1.000e+25'

For the e presentation type in snippet [5], the formatted value 1.000e+25 is equivalent to

1.000 x 10^25

Self Check: use the type specifier c to display the characters that correspond to the character codes 58, 45 and 41.

In [8]:
print(f'{58:c}{45:c}{41:c}')

:-)


8.2.2 Field Widths and Alignment

By default, Python
right0-aligns
numbers and 
left-aligns
other values such as strings—we enclose the results below in brackets ([]) so you can see how the values align in the field:

In [9]:
f'[{27:10d}]'

'[        27]'

In [10]:
f'[{3.5:10f}]'

'[  3.500000]'

In [11]:
f'[{"hello":10}]'

'[hello     ]'

Explicitly Specifying Left and Right Alignment in a Field

Recall that you can specify left and right alignment with < and >:

In [12]:
f'[{27:<15d}]'

'[27             ]'

In [13]:
f'[{3.5:<15f}]'

'[3.500000       ]'

In [14]:
f'[{"hello":>15}]'

'[          hello]'

Centering a Value in a Field

In [15]:
f'[{27:^7d}]'

'[  27   ]'

In [16]:
f'[{3.5:^7.1f}]'

'[  3.5  ]'

In [17]:
f'[{"hello":^7}]'

'[ hello ]'

Self Check: Display on separate lines the name 'Amanda' right-, center- and left-aligned in a field of 10 characters. Enclose each result in brackets so you can see the alignment results more clearly.

In [18]:
print(f'[{"Amanda":>10}]\n[{"Amanda":^10}]\n[{"Amanda":<10}]')

[    Amanda]
[  Amanda  ]
[Amanda    ]


8.2.3 Numeric Formatting

Formatting Positive Numbers with Signs

Sometimes it’s desirable to force the sign on a positive number:

In [19]:
f'[{27:+10d}]'

'[       +27]'

The + before the field width specifies that a positive number should be preceded by a +. A negative number always starts with a -. To fill the remaining characters of the field with 0s rather than spaces, place a 0 before the field width (and after the + if there is one):

In [20]:
f'[{27:+010d}]'

'[+000000027]'

Using a Space Where a + Sign Would Appear in a Positive Value

A space indicates that positive numbers should show a space character in the sign position. This is useful for aligning positive and negative values for display purposes:

In [22]:
print(f'{27:d}\n{27: d}\n{-27: d}')

27
 27
-27


Grouping Digits

You can format numbers with thousands separators by using a comma (,), as follows:

In [23]:
f'{12345678:,d}'

'12,345,678'

In [24]:
f'{123456.78:,.2f}'

'123,456.78'

Self Check: Print the values 10240.473 and -3210.9521, each preceded by its sign, in 10-character fields with thousands separators, their decimal points aligned vertically and two digits of precision.

In [25]:
print(f'{10240.473:+10,.2f}\n{-3210.9521:+10,.2f}')

+10,240.47
 -3,210.95


8.2.4 String’s format Method

Python’s f-strings were added to the language in version 3.6. Before that, formatting was performed with the string method format. In fact, f-string formatting is based on the format method’s capabilities. We show you the format method here because you’ll encounter it in code written prior to Python 3.6.

In [26]:
'{:.2f}'.format(17.489)

'17.49'

Multiple Placeholders

In [27]:
'{} {}'.format('Amanda',  'Cyan')

'Amanda Cyan'

Referencing Arguments By Position Number

In [28]:
'{0} {0} {1}'.format('Happy', 'Birthday')

'Happy Happy Birthday'

Referencing Keyword Arguments

In [29]:
'{first} {last}'.format(first='Amanda', last='Gray')

'Amanda Gray'

In [31]:
'{last} {first}'.format(first='Amanda', last='Gray')

'Gray Amanda'

Self Check

In [32]:
print('{:c}{:c}{:c}'.format(58, 45, 41))

:-)


In [33]:
 print('[{0:>10}]\n[{0:^10}]\n[{0:<10}]'.format('Amanda'))

[    Amanda]
[  Amanda  ]
[Amanda    ]


In [34]:
print('{:+10,.2f}\n{:+10,.2f}'.format(10240.473, -3210.9521))

+10,240.47
 -3,210.95


## 8.3 Concatenating and Repeating Strings

In earlier chapters, we used the + operator to concatenate strings and the  * operator to repeat strings. You also can perform these operations with augmented assignments. Strings are immutable, so each operation assigns a new string object to the variable:

In [35]:
s1 = 'happy'

In [36]:
s2 = 'birthday'

In [37]:
s1 += ' ' + s2

In [38]:
s1

'happy birthday'

In [39]:
symbol = '>'

In [40]:
symbol *= 5

In [41]:
symbol

'>>>>>'

Self Check:  Use the += operator to concatenate your first and last name. Then use the *= operator to create a bar of asterisks with the same number of characters as your full name and display the bar above and below your name.

In [52]:
name = 'Samantha'

In [53]:
name += ' Cress'

In [54]:
bar = '*'

In [55]:
bar *= len(name)

In [56]:
print(f'{bar}\n{name}\n{bar}')

**************
Samantha Cress
**************


## 8.4 Stripping Whitespace from Strings

There are several string methods for removing whitespace from the ends of a string. Each returns a new string leaving the original unmodified. Strings are immutable, so each method that appears to modify a string returns a new one.

Removing Leading and Trailing Whitespace
Let’s use string method strip to remove the leading and trailing whitespace from a string:

In [57]:
sentence = '\t \n This is a test string. \t\t \n'

In [58]:
sentence.strip()

'This is a test string.'

Removing Leading Whitespace
Method lstrip removes only leading whitespace:

In [60]:
sentence.lstrip()

'This is a test string. \t\t \n'

Removing Trailing Whitespace
Method rstrip removes only trailing whitespace:

In [61]:
sentence.rstrip()

'\t \n This is a test string.'

As the outputs demonstrate, these methods remove all kinds of whitespace, including spaces, newlines and tabs.

Self Check: Use the methods in this section to strip the whitespace from the following string, which has five spaces at the beginning and end of the string:

In [62]:
name = '       Margo Magenta     '

In [63]:
name.strip()

'Margo Magenta'

In [64]:
name.lstrip()

'Margo Magenta     '

In [65]:
name.rstrip()

'       Margo Magenta'

## 8.5 Changing Character Case

Capitalizing Only a String’s First Character

Method capitalize copies the original string and returns a new string with only the first letter capitalized (this is sometimes called sentence capitalization):

In [66]:
'happy birthday'.capitalize()

'Happy birthday'

Capitalizing the First Character of Every Word in a String

Method title copies the original string and returns a new string with only the first character of each word capitalized (this is sometimes called book-title capitalization):

In [67]:
 'strings: a deeper look'.title()

'Strings: A Deeper Look'

Self Check: Demonstrate the results of calling capitalize and title on the string 'happy new year'.

In [68]:
test_string = 'happy new year'

In [69]:
test_string.capitalize()

'Happy new year'

In [70]:
test_string.title()

'Happy New Year'

## 8.6 Comparison Operators for Strings

Strings may be compared with the comparison operators. Recall that strings are compared based on their underlying integer numeric values. So uppercase letters compare as less than lowercase letters because uppercase letters have lower integer values. For example, 'A' is 65 and 'a' is 97. You’ve seen that you can check character codes with ord:

In [71]:
print(f'A: {ord("A")}; a: {ord("a")}')

A: 65; a: 97


Let’s compare the strings 'Orange' and 'orange' using the comparison operators:

In [72]:
'Orange' == 'orange'

False

In [73]:
'Orange' != 'orange'

True

In [74]:
'Orange' < 'orange'

True

In [75]:
'Orange' <= 'orange'

True

In [76]:
'Orange' > 'orange'

False

In [77]:
'Orange' >= 'orange'

False

## 8.7 Searching for Substrings

Counting Occurrences
String method count returns the number of times its argument occurs in the string on which the method is called:

In [78]:
 sentence = 'to be or not to be that is the question'

In [79]:
sentence.count('to')

2

If you specify as the second argument a start_index, count searches only the slice string[start_index:]—that is, from start_index through end of the string:

In [80]:
sentence.count('to', 12)

1

If you specify as the second and third arguments the start_index and end_index, count searches only the slice string[start_index:end_index]—that is, from start_index up to, but not including, end_index:

In [81]:
sentence.count('that', 12, 25)

1

Like count, each of the other string methods presented in this section has start_index and end_index arguments for searching only a slice of the original string.

Locating a Substring in a String
String method index searches for a substring within a string and returns the first index at which the substring is found; otherwise, a ValueError occurs:

In [82]:
sentence.index('be')

3

String method rindex performs the same operation as index, but searches from the end of the string and returns the last index at which the substring is found; otherwise, a ValueError occurs:

In [83]:
sentence.rindex('be')

16

String methods find and rfind perform the same tasks as index and rindex but, if the substring is not found, return -1 rather than causing a ValueError.

Determining Whether a String Contains a Substring 
If you need to know only whether a string contains a substring, use operator in or not in:

In [84]:
'that' in sentence

True

In [85]:
'THAT' in sentence

False

In [86]:
 'THAT' not in sentence

True

Locating a Substring at the Beginning or End of a String
String methods startswith and endswith return True if the string starts with or ends with a specified substring:

In [87]:
 sentence.startswith('to')

True

In [88]:
sentence.startswith('be')

False

In [89]:
 sentence.endswith('question')

True

In [90]:
sentence.endswith('quest')

False

Self Check: Create a loop that locates and displays every word that starts with 't' in the string 'to be or not to be that is the question'.

In [91]:
for word in 'to be or not to be that is the question'.split():
    if word.startswith('t'):
        print(word, end=' ')

to to that the 

## 8.8 Replacing Substrings

Method replace takes two substrings. It searches a string for the substring in its first argument and replaces each occurrence with the substring in its second argument. The method returns a new string containing the results. Let’s replace tab characters with commas:

In [92]:
values = '1\t2\t3\t4\t5'

In [93]:
values.replace('\t', ',')

'1,2,3,4,5'

Self Check: Replace the spaces in the string '1 2 3 4 5' with ' --

In [94]:
'1 2 3 4 5'.replace(' ', ' --> ')

'1 --> 2 --> 3 --> 4 --> 5'

## 8.9 Splitting and Joining Strings

Splitting Strings

To tokenize a string at a custom delimiter (such as each comma-and-space pair), specify the delimiter string (such as, ', ') that split uses to tokenize the string:

In [95]:
letters = 'A, B, C, D'

In [96]:
letters.split(', ')

['A', 'B', 'C', 'D']

If you provide an integer as the second argument, it specifies the maximum number of splits. The last token is the remainder of the string after the maximum number of splits:  

In [97]:
letters.split(', ', 2)

['A', 'B', 'C, D']

There is also an rsplit method that performs the same task as split but processes the maximum number of splits from the end of the string toward the beginning.

Joining Strings
String method join concatenates the strings in its argument, which must be an iterable containing only string values; otherwise, a TypeError occurs. The separator between the concatenated items is the string on which you call join. The following code creates strings containing comma-separated lists of values:

In [98]:
letters_list = ['A', 'B', 'C', 'D']

In [99]:
','.join(letters_list)

'A,B,C,D'

The next snippet joins the results of a list comprehension that creates a list of strings:

In [100]:
','.join([str(i) for i in range(10)])

'0,1,2,3,4,5,6,7,8,9'

String Methods partition and rpartition 
String method partition splits a string into a tuple of three strings based on the method’s separator argument. The three strings are

the part of the original string before the separator,
the separator itself, and
the part of the string after the separator.

This might be useful for splitting more complex strings. Consider a string representing a student’s name and grades:

In [101]:
'Amanda: 89, 97, 92'

'Amanda: 89, 97, 92'

Let’s split the original string into the student’s name, the separator ': ' and a string representing the list of grades:

In [102]:
'Amanda: 89, 97, 92'.partition(': ')

('Amanda', ': ', '89, 97, 92')

String Method splitlines

Method splitlines returns a list of new strings representing the lines of text split at each newline character in the original string. Recall that Python stores multiline strings with embedded \n characters to represent the line breaks, as shown in snippet:

In [105]:
lines = """This is line This is line2 This is line3"""

In [106]:
lines

'This is line This is line2 This is line3'

In [107]:
lines.splitlines()

['This is line This is line2 This is line3']

Passing True to splitlines keeps the newlines at the end of each string:

In [108]:
lines.splitlines(True)

['This is line This is line2 This is line3']

Self Check: Use split and join in one statement to reformat the string

In [109]:
', '.join(reversed('Pamela White'.split()))

'White, Pamela'

## 8.10 Characters and Character-Testing Methods

Characters (digits, letters and symbols such as $, @, % and *) are the fundamental building blocks of programs. Every program is composed of characters that, when grouped meaningfully, represent instructions and data that the interpreter uses to perform tasks. Many programming languages have separate string and character types. In Python, a character is simply a one-character string.

Python provides string methods for testing whether a string matches certain characteristics. For example, string method isdigit returns True if the string on which you call the method contains only the digit characters (0–9). You might use this when validating user input that must contain only digits:

In [110]:
'-27'.isdigit()

False

In [111]:
'27'.isdigit()

True

## 8.11 Raw Strings

In [1]:
file_path = r'C:\MyFolder\MySubFolder\MyFile.txt'

In [2]:
file_path

'C:\\MyFolder\\MySubFolder\\MyFile.txt'

For such cases, raw strings—preceded by the character r—are more convenient. They treat each backslash as a regular character, rather than the beginning of an escape sequence:

Python converts the raw string to a regular string that still uses the two backslash characters in its internal representation, as shown in the last snippet. Raw strings can make your code more readable, particularly when using the regular expressions that we discuss in the next section. Regular expressions often contain many backslash characters.

## 8.12 Introduction to Regular Expressions

A regular expression string describes a search pattern for matching characters in other strings.
Regular expressions can help you extract data from unstructured text, such as social media posts. They’re also important for ensuring that data is in the correct format before you attempt to process it.3

Validating Data
Before working with text data, you’ll often use regular expressions to validate the data. For example, you can check that:

A U.S. ZIP Code consists of five digits (such as 02215) or five digits followed by a hyphen and four more digits (such as 02215-4775).
A string last name contains only letters, spaces, apostrophes and hyphens.
An e-mail address contains only the allowed characters in the allowed order.
A U.S. Social Security number contains three digits, a hyphen, two digits, a hyphen and four digits, and adheres to other rules about the specific numbers that can be used in each group of digits.

Other Uses of Regular Expressions
In addition to validating data, regular expressions often are used to:

Extract data from text (sometimes known as scraping)—For example, locating all URLs in a web page. [You might prefer tools like BeautifulSoup, XPath and lxml.]
Clean data—For example, removing data that’s not required, removing duplicate data, handling incomplete data, fixing typos, ensuring consistent data formats, dealing with outliers and more.
Transform data into other formats—For example, reformatting data that was collected as tab-separated or space-separated values into comma-separated values (CSV) for an application that requires data to be in CSV format.

8.12.1 re Module and Function fullmatch

In [3]:
import re

One of the simplest regular expression functions is fullmatch, which checks whether the entire string in its second argument matches the pattern in its first argument.

Matching Literal Characters
Let’s begin by matching literal characters—that is, characters that match themselves:

In [4]:
pattern = '02215'

In [5]:
'Match' if re.fullmatch(pattern, '02215') else 'No match'

'Match'

In [9]:
'Match' if re.fullmatch(pattern, '51220') else 'No match'

'No match'

Metacharacters, Character Classes and Quantifiers

The \ metacharacter begins each of the predefined character classes, each matching a specific set of characters. Let’s validate a five-digit ZIP Code:

In [10]:
'Valid' if re.fullmatch(r'\d{5}', '02215') else 'Invalid'

'Valid'

In [11]:
'Valid' if re.fullmatch(r'\d{5}', '9876') else 'Invalid'

'Invalid'

Custom Character Classes
Square brackets, [], define a custom character class that matches a single character. For example, [aeiou] matches a lowercase vowel, [A-Z] matches an uppercase letter, [a-z] matches a lowercase letter and [a-zA-Z] matches any lowercase or uppercase letter.

Let’s validate a simple first name with no spaces or punctuation. We’ll ensure that it begins with an uppercase letter (A–Z) followed by any number of lowercase letters (a–z):

In [12]:
'Valid' if re.fullmatch('[A-Z][a-z]*', 'Wally') else 'Invalid'

'Valid'

In [13]:
'Valid' if re.fullmatch('[A-Z][a-z]*', 'eva') else 'Invalid'

'Invalid'

When a custom character class starts with a caret (^), the class matches any character that’s not specified. So [^a-z] matches any character that’s not a lowercase letter:

In [14]:
'Match' if re.fullmatch('[^a-z]', 'A') else 'No match'

'Match'

In [15]:
'Match' if re.fullmatch('[^a-z]', 'a') else 'No match'

'No match'

Metacharacters in a custom character class are treated as literal characters—that is, the characters themselves. So [*+$] matches a single *, + or $ character:

In [16]:
'Match' if re.fullmatch('[*+$]', '*') else 'No match'

'Match'

In [17]:
'Match' if re.fullmatch('[*+$]', '!') else 'No match'

'No match'

* vs. + Quantifier
If you want to require at least one lowercase letter in a first name, you can replace the * quantifier in snippet [7] with 
+, which matches at least one occurrence of a subexpression:

In [18]:
'Valid' if re.fullmatch('[A-Z][a-z]+', 'Wally') else 'Invalid'

'Valid'

In [19]:
'Valid' if re.fullmatch('[A-Z][a-z]+', 'E') else 'Invalid'

'Invalid'

Note : Both * and + are greedy—they match as many characters as possible. So the regular expression [A-Z][a-z]+ matches 'Al', 'Eva', 'Samantha', 'Benjamin' and any other words that begin with a capital letter followed at least one lowercase letter.

Other Quantifiers
The ? quantifier matches zero or one occurrences of a subexpression:

In [20]:
'Match' if re.fullmatch('labell?ed', 'labelled') else 'No match'

'Match'

In [21]:
'Match' if re.fullmatch('labell?ed', 'labeled') else 'No match'

'Match'

In [22]:
'Match' if re.fullmatch('labell?ed', 'labellled') else 'No match'

'No match'

You can match at least n occurrences of a subexpression with the {
n,} quantifier. The following regular expression matches strings containing at least three digits:

In [23]:
'Match' if re.fullmatch(r'\d{3,}', '123') else 'No match'

'Match'

In [24]:
'Match' if re.fullmatch(r'\d{3,}', '1234567890') else 'No match'

'Match'

In [25]:
'Match' if re.fullmatch(r'\d{3,}', '12') else 'No match'

'No match'

You can match between n and m (inclusive) occurrences of a subexpression with the {
n,
m} quantifier. The following regular expression matches strings containing 3 to 6 digits:

In [26]:
'Match' if re.fullmatch(r'\d{3,6}', '123') else 'No match'

'Match'

In [27]:
'Match' if re.fullmatch(r'\d{3,6}', '123456') else 'No match'

'Match'

In [28]:
'Match' if re.fullmatch(r'\d{3,6}', '1234567') else 'No match'

'No match'

In [29]:
 'Match' if re.fullmatch(r'\d{3,6}', '12') else 'No match'

'No match'

Self Check: Create and test a regular expression that matches a street address consisting of a number with one or more digits followed by two words of one or more characters each. The tokens should be separated by one space each, as in 123 Main Street.

In [30]:
import re

In [31]:
street = r'\d+ [A-Z][a-z]* [A-Z][a-z]*'

In [32]:
'Match' if re.fullmatch(street, '123 Main Street') else 'No match'

'Match'

In [33]:
'Match' if re.fullmatch(street, 'Main Street') else 'No match'

'No match'

8.12.2 Replacing Substrings and Splitting Strings: The re module provides function sub for replacing patterns in a string, and function split for breaking a string into pieces, based on patterns.

Function sub—Replacing Patterns 
By default, the re module’s sub function replaces all occurrences of a pattern with the replacement text you specify. Let’s convert a tab-delimited string to comma-delimited:

In [34]:
import re

In [35]:
re.sub(r'\t', ', ', '1\t2\t3\t4')

'1, 2, 3, 4'

The sub function receives three required arguments:

the pattern to match (the tab character '\t')
the replacement text (', ') and
the string to be searched ('1\t2\t3\t4')

and returns a new string. The keyword argument count can be used to specify the maximum number of replacements:

In [36]:
re.sub(r'\t', ', ', '1\t2\t3\t4', count=2)

'1, 2, 3\t4'

Function split 
The split function tokenizes a string, using a regular expression to specify the delimiter, and returns a list of strings. Let’s tokenize a string by splitting it at any comma that’s followed by 0 or more whitespace characters—\s is the whitespace character class and * indicates zero or more occurrences of the preceding subexpression:

In [37]:
re.split(r',\s*', '1, 2, 3,4, 5,6,7,8')

['1', '2', '3', '4', '5', '6', '7', '8']

Use the keyword argument maxsplit to specify the maximum number of splits:

In [38]:
re.split(r',\s*', '1, 2, 3,4, 5,6,7,8', maxsplit=3)

['1', '2', '3', '4, 5,6,7,8']

Self Check: Replace each occurrence of one or more adjacent tab characters in the following string with a comma and a space:

In [39]:
import re

In [40]:
 re.sub(r'\t+', ', ', 'A\tB\t\tC\t\t\tD')

'A, B, C, D'

Self Check: Use a regular expression and the split function to split the following string at one or more adjacent $ characters.

In [41]:
re.split('\$+', '123$Main$$Street')

['123', 'Main', 'Street']

## 8.12.3 Other Search Functions; Accessing Matches

Function search—Finding the First Match Anywhere in a String
Function search looks in a string for the first occurrence of a substring that matches a regular expression and returns a match object (of type SRE_Match) that contains the matching substring. The match object’s group method returns that substring:

In [42]:
import re

In [43]:
result = re.search('Python', 'Python is fun')

In [44]:
result.group() if result else 'not found'

'Python'

Function search returns None if the string does not contain the pattern:

In [45]:
result2 = re.search('fun!', 'Python is fun')

In [46]:
 result2.group() if result2 else 'not found'

'not found'

Ignoring Case with the Optional flags Keyword Argument

Many re module functions receive an optional flags keyword argument that changes how regular expressions are matched. For example, matches are case sensitive by default, but by using the re module’s IGNORECASE constant, you can perform a case-insensitive search:

In [47]:
result3 = re.search('Sam', 'SAM WHITE', flags=re.IGNORECASE)

In [48]:
result3.group() if result3 else 'not found'

'SAM'

Metacharacters That Restrict Matches to the Beginning or End of a String
The 
^ metacharacter at the beginning of a regular expression (and not inside square brackets) is an anchor indicating that the expression matches only the beginning of a string:

In [49]:
result = re.search('^Python', 'Python is fun')

In [50]:
result.group() if result else 'not found'

'Python'

In [51]:
result = re.search('^fun', 'Python is fun')

In [52]:
result.group() if result else 'not found'

'not found'

Similarly, the $ metacharacter at the end of a regular expression is an anchor indicating that the expression matches only the end of a string:

In [53]:
result = re.search('Python$', 'Python is fun')

In [54]:
result.group() if result else 'not found'

'not found'

In [55]:
result = re.search('fun$', 'Python is fun')

In [56]:
result.group() if result else 'not found'

'fun'

Function findall and finditer—Finding All Matches in a String

Function findall finds every matching substring in a string and returns a list of the matching substrings. Let’s extract all the U.S. phone numbers from a string. For simplicity we’ll assume that U.S. phone numbers have the form ###-###-####:

In [57]:
contact = 'Wally White, Home: 555-555-1234, Work: 555-555-4321'

In [58]:
re.findall(r'\d{3}-\d{3}-\d{4}', contact)

['555-555-1234', '555-555-4321']

Function finditer works like findall, but returns a lazy iterable of match objects. For large numbers of matches, using finditer can save memory because it returns one match at a time, whereas findall returns all the matches at once:

In [59]:
for phone in re.finditer(r'\d{3}-\d{3}-\d{4}', contact):
    print(phone.group())

555-555-1234
555-555-4321


Capturing Substrings in a Match
You can use parentheses metacharacters—( and )—to capture substrings in a match. For example, let’s capture as separate substrings the name and e-mail address in the string text:

In [60]:
text = 'Charlie Cyan, e-mail: demo1@deitel.com'

In [61]:
pattern = r'([A-Z][a-z]+ [A-Z][a-z]+), e-mail: (\w+@\w+\.\w{3})'

In [62]:
result = re.search(pattern, text)

In [63]:
result.groups()

('Charlie Cyan', 'demo1@deitel.com')

The match object’s group method returns the entire match as a single string:

In [64]:
result.group()

'Charlie Cyan, e-mail: demo1@deitel.com'

You can access each captured substring by passing an integer to the group method. The captured substrings are numbered from 1 (unlike list indices, which start at 0):

In [65]:
 result.group(1)

'Charlie Cyan'

In [66]:
result.group(2)

'demo1@deitel.com'

Self Check: Assume you have a string representing an addition problem such as
'10 + 5'
Use a regular expression to break the string into three groups representing the two operands and the operator, then display the groups.

In [None]:
import re

In [143]:
result = re.search(r'(\d+) ([-+*/]) (\d+)', '10 + 5')

In [144]:
result.groups()

('10', '+', '5')

In [145]:
result.group(1)

'10'

In [146]:
result.group(2)

'+'

In [147]:
 result.group(3)

'5'