# MANIPULATING STRINGS

Text is one of the most common forms of data your programs will handle. You already know how to concatenate two string values together with the `+` operator, but you can do much more than that. You can extract partial strings from string values, add or remove spacing, convert letters to lowercase or uppercase, and check that strings are formatted correctly. You can even write Python code to access the clipboard for copying and pasting text.

In this chapter, you’ll learn all this and more. Then you’ll work through two different programming projects: a simple clipboard that stores multiple strings of text and a program to automate the boring chore of formatting pieces of text.

## Working with Strings

Let’s look at some of the ways Python lets you write, print, and access strings in your code.

### String Literals

Typing string values in Python code is fairly straightforward: they begin and end with a single quote. But then how can you use a quote inside a string? Typing `'That is Alice's cat.'` won’t work, because Python thinks the string ends after Alice, and the rest (s cat.') is invalid Python code. Fortunately, there are multiple ways to type strings.

#### Double Quotes

Strings can begin and end with double quotes, just as they do with single quotes. One benefit of using double quotes is that the string can have a single quote character in it. Enter the following into the interactive shell:

```Python
spam = "That is Alice's cat."
```

Since the string begins with a double quote, Python knows that the single quote is part of the string and not marking the end of the string. However, if you need to use both single quotes and double quotes in the string, you’ll need to use escape characters.

#### Escape Characters

An escape character lets you use characters that are otherwise impossible to put into a string. An escape character consists of a backslash (`\`) followed by the character you want to add to the string. (Despite consisting of two characters, it is commonly referred to as a singular escape character.) For example, the escape character for a single quote is `\'`. You can use this inside a string that begins and ends with single quotes. To see how escape characters work, enter the following into the interactive shell:


```Python
spam = 'Say hi to Bob\'s mother.'
```

Python knows that since the single quote in `Bob\'s` has a backslash, it is not a single quote meant to end the string value. The escape characters `\'` and `\"` let you put single quotes and double quotes inside your strings, respectively.

Table 6-1: Escape Characters

|Escape character|Prints as|
|---|---|
|\'|Single quote|
|\"|Double quote|
|\t|Tab|
|\n|Newline (line break)|
|\\|Backslash|

In [1]:
print("Hello there!\nHow are you?\nI\'m doing fine.")

Hello there!
How are you?
I'm doing fine.


#### Raw Strings

You can place an `r` before the beginning quotation mark of a string to make it a **raw string**. A raw string completely ignores all escape characters and prints any backslash that appears in the string.

In [2]:
print(r'That is Carol\'s cat.')

That is Carol\'s cat.


Because this is a raw string, Python considers the backslash as part of the string and not as the start of an escape character. Raw strings are helpful if you are typing string values that contain many backslashes, such as the strings used for Windows file paths like `r'C:\Users\Al\Desktop'` or regular expressions described in the next chapter.

#### Multiline Strings with Triple Quotes

While you can use the `\n` escape character to put a newline into a string, it is often easier to use multiline strings. A **multiline string** in Python begins and ends with either *three single quotes* or three double quotes. Any quotes, tabs, or newlines in between the “triple quotes” are considered part of the string. Python’s indentation rules for blocks do not apply to lines inside a multiline string.

In [3]:
print('''Dear Alice,

Eve's cat has been arrested for catnapping, cat burglary, and extortion.

Sincerely,
Bob''')

Dear Alice,

Eve's cat has been arrested for catnapping, cat burglary, and extortion.

Sincerely,
Bob


Notice that the single quote character in `Eve's` does not need to be escaped. Escaping single and double quotes is optional in multiline strings. The following `print()` call would print identical text but doesn’t use a multiline string:

In [5]:
print('Dear Alice,\n\nEve\'s cat has been arrested for catnapping, catburglary, and extortion.\n\nSincerely,\nBob')

Dear Alice,

Eve's cat has been arrested for catnapping, catburglary, and extortion.

Sincerely,
Bob


#### Multiline Comments

While the hash character (`#`) marks the beginning of a comment for the rest of the line, a multiline string is often used for comments that span multiple lines. The following is perfectly valid Python code:

In [1]:
"""This is a test Python program.
Written by Al Sweigart al@inventwithpython.com

This program was designed for Python 3, not Python 2.
"""

def spam():
    """This is a multiline comment to help
    explain what the spam() function does."""
    print('Hello!')

### Indexing and Slicing Strings

Strings use indexes and slices the same way lists do. You can think of the string 'Hello, world!' as a *list* and each character in the string as an item with a corresponding index.
```
'   H   e   l   l   o   ,       w   o   r   l    d    !   '
    0   1   2   3   4   5   6   7   8   9   10   11   12
```

The space and exclamation point are included in the character count, so 'Hello, world!' is $13$ characters long, from `H` at index $0$ to `!` at index $12$.

In [3]:
spam = 'Hello, world!'

for i, c in enumerate(spam):
    print(i, ":", c)
    
assert spam[0] == 'H'
assert spam[4] == 'o'
assert spam[-1] == '!'
assert spam[0:5] == 'Hello'
assert spam[:5] == 'Hello'
assert spam[7:] == 'world!'

0 : H
1 : e
2 : l
3 : l
4 : o
5 : ,
6 :  
7 : w
8 : o
9 : r
10 : l
11 : d
12 : !


If you specify an index, you’ll get the character at that position in the string. If you specify a range from one index to another, the starting index is included and the ending index is not. That’s why, if spam is 'Hello, world!', `spam[0:5]` is 'Hello'. The substring you get from `spam[0:5]` will include everything from `spam[0]` to `spam[4]`, leaving out the comma at index $5$ and the space at index $6$. This is similar to how `range(5)` will cause a `for` loop to iterate up to, but not including, $5$.

Note that slicing a string does not modify the original string. You can capture a slice from one variable in a separate variable.

In [4]:
spam = 'Hello, world!'
fizz = spam[0:5]

assert fizz == 'Hello'

By slicing and storing the resulting substring in another variable, you can have both the whole string and the substring handy for quick, easy access.

### The `in` and `not in` Operators with Strings

The `in` and `not in` operators can be used with strings just like with list values. An expression with two strings joined using `in` or `not in` will evaluate to a Boolean `True` or `False`. 

In [5]:
assert ('Hello' in 'Hello, World') == True
assert ('Hello' in 'Hello') == True
assert ('HELLO' in 'Hello, World') == False
assert ('' in 'spam') == True
assert ('cats' not in 'cats and dogs') == False

These expressions test whether the first string (the exact string, case-sensitive) can be found within the second string.

## Putting Strings Inside Other Strings

Putting strings inside other strings is a common operation in programming. So far, we’ve been using the `+` operator and string concatenation to do this:

In [6]:
name = 'Al'
age = 4000

assert 'Hello, my name is ' + name + '. I am ' + str(age) + ' years old.' == 'Hello, my name is Al. I am 4000 years old.'

However, this requires a lot of tedious typing. A simpler approach is to use **string interpolation**, in which the `%s` operator inside the string acts as a marker to be replaced by values following the string. One benefit of string interpolation is that `str()` doesn’t have to be called to convert values to strings.

In [11]:
name = 'Al'
age = 4000

assert 'My name is %s. I am %s years old.' % (name, age) == 'My name is Al. I am 4000 years old.'
assert 'My name is {0}. I am {1} years old.'.format(name, age) == 'My name is Al. I am 4000 years old.'

Python 3.6 introduced **f-strings**, which is similar to string interpolation except that braces are used instead of %s, with the expressions placed directly inside the braces. Like raw strings, f-strings have an `f` prefix before the starting quotation mark.

In [12]:
name = 'Al'
age = 4000

assert f'My name is {name}. Next year I will be {age + 1}.' == 'My name is Al. Next year I will be 4001.'

Remember to include the `f` prefix; otherwise, the braces and their contents will be a part of the string value:

In [13]:
'My name is {name}. Next year I will be {age + 1}.'

'My name is {name}. Next year I will be {age + 1}.'

## Useful String Methods

Several string methods analyze strings or create transformed string values. This section describes the methods you’ll be using most often.

### The `upper()`, `lower()`, `isupper()`, and `islower()` Methods

The `upper()` and `lower()` string methods return a new string where all the letters in the original string have been converted to uppercase or lowercase, respectively. Nonletter characters in the string remain unchanged. 

In [14]:
spam = 'Hello, world!'

spam = spam.upper()
assert spam == 'HELLO, WORLD!'

spam = spam.lower()
assert spam == 'hello, world!'

Note that these methods do not change the string itself but return new string values. If you want to change the original string, you have to call `upper()` or `lower()` on the string and then assign the new string to the variable where the original was stored. This is why you must use `spam = spam.upper()` to change the string in `spam` instead of simply `spam.upper()`. (This is just like if a variable `eggs` contains the value $10$. Writing `eggs + 3` does not change the value of `eggs`, but `eggs = eggs + 3` does.)

The `upper()` and `lower()` methods are helpful if you need to make a *case-insensitive comparison*. For example, the strings 'great' and 'GREat' are not equal to each other. But in the following small program, it does not matter whether the user types Great, GREAT, or grEAT, because the string is first converted to lowercase.

In [17]:
print('How are you?')
feeling = input()
if feeling.lower() == 'great':
    print('I feel great too.')
else:
    print('I hope the rest of your day is good.')

How are you?


 gReaT


I feel great too.


The `isupper()` and `islower()` methods will return a Boolean `True` value if the string has at least one letter and all the letters are uppercase or lowercase, respectively. Otherwise, the method returns `False`.

In [18]:
spam = 'Hello, world!'

assert spam.islower() == False
assert spam.isupper() == False

assert 'HELLO'.isupper() == True
assert 'abc12345'.islower() == True
assert '12345'.islower() == False
assert '12345'.isupper() == False

Since the `upper()` and `lower()` string methods themselves return strings, you can call string methods on those returned string values as well. Expressions that do this will look like a chain of method calls. 

In [19]:
assert 'Hello'.upper() == 'HELLO'
assert 'Hello'.upper().lower() == 'hello'
assert 'Hello'.upper().lower().upper() == 'HELLO'
assert 'HELLO'.lower() == 'hello'
assert 'HELLO'.lower().islower() == True

### The `isX()` Methods

Along with `islower()` and `isupper()`, there are several other string methods that have names beginning with the word *is*. These methods return a Boolean value that describes the nature of the string. Here are some common isX string methods:

- `isalpha()` Returns `True` if the string consists only of letters and isn’t blank
- `isalnum()` Returns `True` if the string consists only of letters and numbers and is not blank
- `isdecimal()` Returns `True` if the string consists only of numeric characters and is not blank
- `isspace()` Returns `True` if the string consists only of spaces, tabs, and newlines and is not blank
- `istitle()` Returns `True` if the string consists only of words that begin with an uppercase letter followed by only lowercase letters

In [24]:
assert 'hello'.isalpha() == True
assert 'hello'.isalnum() == True
assert 'hello, world'.isalpha() == False
assert 'hello123'.isalpha() == False
assert 'hello123'.isalnum() == True
assert '123'.isdecimal() == True
assert '123'.isalpha() == False
assert '123'.isalnum() == True
assert '    '.isspace() == True
assert 'This Is Title Case'.istitle() == True
assert 'This Is Title Case 123'.istitle() == True
assert 'This Is not Title Case'.istitle() == False
assert 'This Is NOT Title Case Either'.istitle() == False

The `isX()` string methods are helpful when you need to *validate user input*. For example, the following program repeatedly asks users for their age and a password until they provide valid input. 

In [25]:
while True:
    print('Enter your age:')
    age = input()
    if age.isdecimal():
        break
    print('Please enter a number for your age.')

while True:
    print('Select a new password (letters and numbers only):')
    password = input()
    if password.isalnum():
        break
    print('Passwords can only have letters and numbers.')

Enter your age:


 12,


Please enter a number for your age.
Enter your age:


 12


Select a new password (letters and numbers only):


 asd!


Passwords can only have letters and numbers.
Select a new password (letters and numbers only):


 asd_


Passwords can only have letters and numbers.
Select a new password (letters and numbers only):


 ?


Passwords can only have letters and numbers.
Select a new password (letters and numbers only):


 ß


In the first `while` loop, we ask the user for their age and store their input in `age`. If age is a valid (decimal) value, we break out of this first `while` loop and move on to the second, which asks for a password. Otherwise, we inform the user that they need to enter a number and again ask them to enter their age. In the second `while` loop, we ask for a password, store the user’s input in `password`, and break out of the loop if the input was alphanumeric. If it wasn’t, we’re not satisfied, so we tell the user the password needs to be alphanumeric and again ask them to enter a password.



Calling `isdecimal()` and `isalnum()` on variables, we’re able to test whether the values stored in those variables are decimal or not, alphanumeric or not. Here, these tests help us reject the input forty two but accept $42$, and reject `secr3t!` but accept `secr3t`.

### The `startswith()` and `endswith()` Methods

The `startswith()` and `endswith()` methods return `True` if the string value they are called on begins or ends (respectively) with the string passed to the method; otherwise, they return `False`.

In [27]:
assert ('Hello, world!'.startswith('Hello')) == True
assert ('Hello, world!'.endswith('world!')) == True
assert ('abc123'.startswith('abcdef')) == False
assert ('abc123'.endswith('12')) == False
assert ('Hello, world!'.startswith('Hello, world!')) == True
assert ('Hello, world!'.endswith('Hello, world!')) == True

These methods are useful alternatives to the `==` equals operator if you need to check only whether the first or last part of the string, rather than the whole thing, is equal to another string.

### The `join()` and `split()` Methods

The `join()` method is useful when you have a list of strings that need to be joined together into a single string value. The `join()` method is called on a string, gets passed a list of strings, and returns a string. The returned string is the concatenation of each string in the passed-in list.

In [28]:
assert ', '.join(['cats', 'rats', 'bats']) == 'cats, rats, bats'
assert ' '.join(['My', 'name', 'is', 'Simon']) == 'My name is Simon'
assert 'ABC'.join(['My', 'name', 'is', 'Simon']) == 'MyABCnameABCisABCSimon'

Notice that the string `join()` calls on is inserted between each string of the list argument. For example, when `join(['cats', 'rats', 'bats'])` is called on the `', '` string, the returned string is 'cats, rats, bats'.

Remember that `join()` is called on a string value and is passed a list value. (It’s easy to accidentally call it the other way around.) The `split()` method does the *opposite*: It’s called on a string value and returns a list of strings

In [29]:
assert 'My name is Simon'.split() == ['My', 'name', 'is', 'Simon']

By default, the string 'My name is Simon' is split wherever *whitespace characters* such as the space, tab, or newline characters are found. These whitespace characters are not included in the strings in the returned list. You can pass a delimiter string to the `split()` method to specify a different string to split upon.

In [30]:
assert 'MyABCnameABCisABCSimon'.split('ABC') == ['My', 'name', 'is', 'Simon']
assert 'My name is Simon'.split('m') == ['My na', 'e is Si', 'on']

A common use of `split()` is to split a multiline string along the *newline characters*.

In [34]:
spam = '''Dear Alice,
How have you been? I am fine.
There is a container in the fridge
that is labeled "Milk Experiment."

Please do not drink it.
Sincerely,
Bob'''

assert spam.split('\n') == ['Dear Alice,', 'How have you been? I am fine.', 'There is a container in the fridge', 'that is labeled "Milk Experiment."', '', 'Please do not drink it.', 'Sincerely,', 'Bob']
assert spam.splitlines() == ['Dear Alice,', 'How have you been? I am fine.', 'There is a container in the fridge', 'that is labeled "Milk Experiment."', '', 'Please do not drink it.', 'Sincerely,', 'Bob']

Passing `split()` the argument `'\n'` lets us split the multiline string stored in spam along the newlines and return a list in which each item corresponds to one line of the string.

### Splitting Strings with the `partition()` Method

The `partition()` string method can split a string into the text before and after a separator string. This method searches the string it is called on for the separator string it is passed, and returns a tuple of three substrings for the “before,” “separator,” and “after” substrings. Enter the following into the interactive shell:

In [35]:
assert 'Hello, world!'.partition('w') == ('Hello, ', 'w', 'orld!')
assert 'Hello, world!'.partition('world') == ('Hello, ', 'world', '!')

If the separator string you pass to `partition()` occurs multiple times in the string that `partition()` calls on, the method splits the string only on the first occurrence:

In [36]:
assert 'Hello, world!'.partition('o') == ('Hell', 'o', ', world!')

If the separator string can’t be found, the first string returned in the tuple will be the entire string, and the other two strings will be empty:

In [37]:
assert 'Hello, world!'.partition('XYZ') == ('Hello, world!', '', '')

You can use the multiple assignment trick to assign the three returned strings to three variables:

In [38]:
before, sep, after = 'Hello, world!'.partition(' ')

assert before == 'Hello,'
assert sep == ' '
assert after == 'world!'

The `partition()` method is useful for splitting a string whenever you need the parts before, including, and after a particular separator string.

### Justifying Text with the `rjust()`, `ljust()`, and `center()` Methods

The `rjust()` and `ljust()` string methods return a *padded version* of the string they are called on, with spaces inserted to justify the text. The first argument to both methods is an integer length for the justified string.

In [45]:
assert 'Hello'.rjust(10) == '     Hello'
assert 'Hello'.rjust(20) == '               Hello'
assert 'Hello, World'.rjust(20) == '        Hello, World'
assert 'Hello'.ljust(10) == 'Hello     '

`'Hello'.rjust(10)` says that we want to right-justify 'Hello' in a string of total length $10$. 'Hello' is five characters, so five spaces will be added to its left, giving us a string of $10$ characters with 'Hello' justified right.

An optional second argument to `rjust()` and `ljust()` will specify a fill character other than a space character.

In [46]:
assert 'Hello'.rjust(20, '*') == '***************Hello'
assert 'Hello'.ljust(20, '-') == 'Hello---------------'

The `center()` string method works like `ljust()` and `rjust()` but centers the text rather than justifying it to the left or right.

In [47]:
assert 'Hello'.center(20) == '       Hello        '
assert 'Hello'.center(20, '=') == '=======Hello========'

These methods are especially useful when you need to print tabular data that has correct spacing. 

In [55]:
def printPicnic(itemsDict, leftWidth, rightWidth):
    print('PICNIC ITEMS'.center(leftWidth + rightWidth, '-'))
    for k, v in itemsDict.items():
        display_item = k.ljust(leftWidth, '.')
        display_value = str(v).rjust(rightWidth)
        print(display_item + display_value)

picnicItems = {'sandwiches': 4, 'apples': 12, 'cups': 4, 'cookies': 8000}
printPicnic(picnicItems, 12, 5)
print()
printPicnic(picnicItems, 20, 6)

---PICNIC ITEMS--
sandwiches..    4
apples......   12
cups........    4
cookies..... 8000

-------PICNIC ITEMS-------
sandwiches..........     4
apples..............    12
cups................     4
cookies.............  8000


In this program, we define a `printPicnic()` method that will take in a dictionary of information and use `center()`, `ljust()`, and `rjust()` to display that information in a neatly aligned table-like format.

The dictionary that we’ll pass to `printPicnic()` is `picnicItems`. In `picnicItems`, we have 4 sandwiches, 12 apples, 4 cups, and 8,000 cookies. We want to organize this information into two columns, with the name of the item on the left and the quantity on the right.

To do this, we decide how wide we want the left and right columns to be. Along with our dictionary, we’ll pass these values to `printPicnic()`.

The `printPicnic()` function takes in a dictionary, a leftWidth for the left column of a table, and a rightWidth for the right column. It prints a title, PICNIC ITEMS, centered above the table. Then, it loops through the dictionary, printing each key-value pair on a line with the key justified left and padded by periods, and the value justified right and padded by spaces.

After defining `printPicnic()`, we define the dictionary `picnicItems` and call `printPicnic()` twice, passing it different widths for the left and right table columns.

When you run this program, the picnic items are displayed twice. The first time the left column is 12 characters wide, and the right column is 5 characters wide. The second time they are 20 and 6 characters wide, respectively.

Using `rjust()`, `ljust()`, and `center()` lets you ensure that strings are neatly aligned, even if you aren’t sure how many characters long your strings are.

### Removing Whitespace with the `strip()`, `rstrip()`, and `lstrip()` Methods

Sometimes you may want to strip off whitespace characters (space, tab, and newline) from the left side, right side, or both sides of a string. The `strip()` string method will return a new string without any whitespace characters at the beginning or end. The `lstrip()` and `rstrip()` methods will remove whitespace characters from the left and right ends, respectively. 

In [56]:
spam = '    Hello, World    '

assert spam.strip() == 'Hello, World'
assert spam.lstrip() == 'Hello, World    '
assert spam.rstrip() == '    Hello, World'

In [21]:
"sfdljgEJLBG ".lower().strip()

'sfdljgejlbg'

Optionally, a string argument will specify which characters on the ends should be stripped.

In [57]:
spam = 'SpamSpamBaconSpamEggsSpamSpam'
assert spam.strip('ampS') == 'BaconSpamEggs'

Passing `strip()` the argument `'ampS'` will tell it to strip occurrences of `a`, `m`, `p`, and capital `S` from the ends of the string stored in `spam`. The order of the characters in the string passed to `strip()` does not matter: `strip('ampS')` will do the same thing as `strip('mapS')` or `strip('Spam')`.

## Numeric Values of Characters with the `ord()` and `chr()` Functions

Computers store information as **bytes—strings of binary numbers**, which means we need to be able to convert text to numbers. Because of this, every text character has a corresponding numeric value called a **Unicode code point**. For example, the numeric code point is $65$ for 'A', $52$ for '4', and $33$ for '!'. You can use the `ord()` function to get the code point of a one-character string, and the `chr()` function to get the one-character string of an integer code point.

In [58]:
assert ord('A') == 65
assert ord('4') == 52
assert ord('!') == 33
assert chr(65) == 'A'

These functions are useful when you need to do an ordering or mathematical operation on characters:

In [59]:
assert ord('B') == 66
assert (ord('A') < ord('B')) == True
assert chr(ord('A')) == 'A'
assert chr(ord('A') + 1) == 'B'

There is more to Unicode and code points, but those details are beyond the scope of this book. If you’d like to know more, I recommend watching Ned Batchelder’s 2012 PyCon talk, “Pragmatic Unicode, or, How Do I Stop the Pain?” at https://youtu.be/sgHbC6udIqc.

## Copying and Pasting Strings with the `pyperclip` Module

The `pyperclip` module has `copy()` and `paste()` functions that can send text to and receive text from your computer’s clipboard. Sending the output of your program to the clipboard will make it easy to paste it into an email, word processor, or some other software.


The pyperclip module does not come with Python. To install it, run:

In [61]:
!pip install pyperclip

Looking in indexes: https://pypi.org/simple, https://gsj5sl8:****@devstack.vwgroup.com/artifactory/api/pypi/adapmt-python-release/simple, https://gsj5sl8:****@devstack.vwgroup.com/artifactory/api/pypi/pypi/simple, https://gsj5sl8:****@devstack.vwgroup.com/artifactory/api/pypi/camsys-gmdm-pypi/simple
Collecting pyperclip
  Downloading https://devstack.vwgroup.com/artifactory/api/pypi/pypi/packages/packages/a7/2c/4c64579f847bd5d539803c8b909e54ba087a79d01bb3aba433a95879a6c5/pyperclip-1.8.2.tar.gz (20 kB)
Building wheels for collected packages: pyperclip
  Building wheel for pyperclip (setup.py) ... [?25ldone
[?25h  Created wheel for pyperclip: filename=pyperclip-1.8.2-py3-none-any.whl size=11107 sha256=3e4ad68ed7163f5c5edd8400076f0751c7adfd9a4ca038e3eaa03bf92675b179
  Stored in directory: /Users/miay/Library/Caches/pip/wheels/dc/31/3e/0c9585ef06811efacd3d74c5e256563004475664fafe35074e
Successfully built pyperclip
Installing collected packages: pyperclip
Successfully installed pyperclip-

In [62]:
import pyperclip

pyperclip.copy('Hello, world!')
assert pyperclip.paste() == 'Hello, world!'

Of course, if something outside of your program changes the clipboard contents, the `paste()` function will return it. For example, if I copied this sentence to the clipboard and then called `paste()`, it would look like this:

In [63]:
assert pyperclip.paste() == 'For example, if I copied this sentence to the clipboard and then called paste(), it would look like this:'

## Project: Multi-Clipboard Automatic Messages

You want to be able to run this program with a command line argument that is a short key phrase—for instance, agree or busy. The message associated with that key phrase will be copied to the clipboard so that the user can paste it into an email. This way, the user can have long, detailed messages without having to retype them.

In [23]:
#! python3
# mclip.py - A multi-clipboard program.

TEXT = {'agree': """Yes, I agree. That sounds fine to me.""",
        'busy': """Sorry, can we do this later this week or next week?""",
        'upsell': """Would you consider making this a monthly donation?"""}

import sys, pyperclip
if len(sys.argv) < 2:
    print('Usage: py mclip.py [keyphrase] - copy phrase text')
    sys.exit()

keyphrase = sys.argv[1]    # first command line arg is the keyphrase

if keyphrase in TEXT:
    pyperclip.copy(TEXT[keyphrase])
    print('Text for ' + keyphrase + ' copied to clipboard.')
else:
    print('There is no text for ' + keyphrase)

There is no text for -f


Since you want to associate each piece of text with its key phrase, you can store these as strings in a dictionary. The dictionary will be the data structure that organizes your key phrases and text.

The command line arguments will be stored in the variable sys.argv. (See Appendix B for more information on how to use command line arguments in your programs.) The first item in the `sys.argv` list should always be a string containing the program’s filename ('mclip.py'), and the second item should be the first command line argument. For this program, this argument is the key phrase of the message you want. Since the command line argument is mandatory, you display a usage message to the user if they forget to add it (that is, if the `sys.argv` list has fewer than two values in it).

Now that the key phrase is stored as a string in the variable `keyphrase`, you need to see whether it exists in the `TEXT` dictionary as a key. If so, you want to copy the key’s value to the clipboard using `pyperclip.copy()`. (Since you’re using the `pyperclip` module, you need to import it.) Note that you don’t actually need the `keyphrase` variable; you could just use `sys.argv[1]` everywhere `keyphrase` is used in this program. But a variable named `keyphrase` is much more readable than something cryptic like `sys.argv[1]`.

In [65]:
%run myclip.py

Usage: py mclip.py [keyphrase] - copy phrase text


In [25]:
%run ../code/automate_online-materials/myclip.py agree

Text for agree copied to clipboard.


## Project: Adding Bullets to Wiki Markup

When editing a Wikipedia article, you can create a bulleted list by putting each list item on its own line and placing a star in front. But say you have a really large list that you want to add bullet points to. You could just type those stars at the beginning of each line, one by one. Or you could automate this task with a short Python script.

The `bulletPointAdder.py` script will get the text from the clipboard, add a star and space to the beginning of each line, and then paste this new text to the clipboard. For example, if I copied the following text (for the Wikipedia article “List of Lists of Lists”) to the clipboard:

```
Lists of animals
Lists of aquarium life
Lists of biologists by author abbreviation
Lists of cultivars
```

and then ran the `bulletPointAdder.py` program, the clipboard would then contain the following:

```
* Lists of animals
* Lists of aquarium life
* Lists of biologists by author abbreviation
* Lists of cultivars
```

You want the `bulletPointAdder.py` program to do the following:

1. Paste text from the clipboard.
1. Do something to it.
1. Copy the new text to the clipboard.


In [None]:
#! python3
# bulletPointAdder.py - Adds Wikipedia bullet points to the start
# of each line of text on the clipboard.

import pyperclip
text = pyperclip.paste()

# TODO: Separate lines and add stars.

pyperclip.copy(text)

The call to `pyperclip.paste()` returns all the text on the clipboard as *one big string*. If we used the “List of Lists of Lists” example, the string stored in text would look like this:

```
'Lists of animals\nLists of aquarium life\nLists of biologists by author
abbreviation\nLists of cultivars'
```

The `\n` newline characters in this string cause it to be displayed with multiple lines when it is printed or pasted from the clipboard. There are many “lines” in this one string value. You want to add a star to the start of each of these lines.

You could write code that searches for each `\n` newline character in the string and then adds the star just after that. But it would be easier to use the `split()` method to return a list of strings, one for each line in the original string, and then add the star to the front of each string in the list.

In [None]:
#! python3
# bulletPointAdder.py - Adds Wikipedia bullet points to the start
# of each line of text on the clipboard.

import pyperclip
text = pyperclip.paste()

# Separate lines and add stars.
lines = text.splitlines()
for i in range(len(lines)):    # loop through all indexes in the "lines" list
    lines[i] = '* ' + lines[i] # add star to each string in "lines" list

pyperclip.copy(text)

The lines list now contains modified lines that start with stars. But `pyperclip.copy()` is expecting a *single string value*, however, not a list of string values. To make this single string value, pass lines into the `join()` method to get a single string joined from the list’s strings. Make your program look like the following:



In [67]:
#! python3
# bulletPointAdder.py - Adds Wikipedia bullet points to the start
# of each line of text on the clipboard.

import pyperclip
text = pyperclip.paste()

# Separate lines and add stars.
lines = text.splitlines()
for i in range(len(lines)):    # loop through all indexes for "lines" list
    lines[i] = '* ' + lines[i] # add star to each string in "lines" list
    
text = '\n'.join(lines)
pyperclip.copy(text)

In [26]:
%run ../code/automate_online-materials/bulletPointAdder.py

When this program is run, it replaces the text on the clipboard with text that has stars at the start of each line. Now the program is complete, and you can try running it with text copied to the clipboard.

Even if you don’t need to automate this specific task, you might want to automate some other kind of text manipulation, such as removing trailing spaces from the end of lines or converting text to uppercase or lowercase. Whatever your needs, you can use the clipboard for input and output.

## A Short Progam: Pig Latin

*Pig Latin* is a silly made-up language that alters English words. If a word begins with a vowel, the word *yay* is added to the end of it. If a word begins with a consonant or consonant cluster (like ch or gr), that consonant or cluster is moved to the end of the word followed by *ay*.

Let’s write a Pig Latin program that will output something like this:

```
Enter the English message to translate into Pig Latin:
My name is AL SWEIGART and I am 4,000 years old.
Ymay amenay isyay ALYAY EIGARTSWAY andyay Iyay amyay 4,000 yearsyay oldyay.
```

This program works by altering a string using the methods introduced in this chapter.

In [71]:
%run pigLatin.py

Enter the English message to translate into Pig Latin:


 year


yearyay


## Summary

Text is a common form of data, and Python comes with many helpful string methods to process the text stored in string values. You will make use of indexing, slicing, and string methods in almost every Python program you write.

The programs you are writing now don’t seem too sophisticated—they don’t have graphical user interfaces with images and colorful text. So far, you’re displaying text with `print()` and letting the user enter text with `input()`. However, the user can quickly enter large amounts of text through the clipboard. This ability provides a useful avenue for writing programs that manipulate massive amounts of text. These text-based programs might not have flashy windows or graphics, but they can get a lot of useful work done quickly.

Another way to manipulate large amounts of text is reading and writing files directly off the hard drive. You’ll learn how to do this with Python in Chapter 9.

That just about covers all the basic concepts of Python programming! You’ll continue to learn new concepts throughout the rest of this book, but you now know enough to start writing some useful programs that can automate tasks. If you’d like to see a collection of short, simple Python programs built from the basic concepts you’ve learned so far, check out https://github.com/asweigart/pythonstdiogames/.

## Practice Questions

1. What are escape characters?

2. What do the `\n` and `\t` escape characters represent?

3. How can you put a `\` backslash character in a string?

4. The string value "Howl's Moving Castle" is a valid string. Why isn’t it a problem that the single quote character in the word Howl's isn’t escaped?

5. If you don’t want to put `\n` in your string, how can you write a string with newlines in it?

In [27]:
"""
text
text
text"""

'\ntext\ntext\ntext'

6. What do the following expressions evaluate to?

In [28]:
assert 'Hello, world!'[1] == 'e'
assert 'Hello, world!'[0:5] == 'Hello'
assert 'Hello, world!'[:5] == 'Hello'
assert 'Hello, world!'[3:] == 'lo, world!'

7. What do the following expressions evaluate to?

In [29]:
assert 'Hello'.upper() == 'HELLO'
assert 'Hello'.upper().isupper() == True
assert 'Hello'.upper().lower() == 'hello'

8. What do the following expressions evaluate to?

In [31]:
assert 'Remember, remember, the fifth of November.'.split() == ['Remember,', 'remember,', 'the', 'fifth', 'of', 'November.']

## Practice Projects

For practice, write programs that do the following.

### Table Printer

Write a function named `printTable()` that takes a list of lists of strings and displays it in a well-organized table with each column right-justified. Assume that all the inner lists will contain the same number of strings. For example, the value could look like this:

```
tableData = [['apples', 'oranges', 'cherries', 'banana'],
             ['Alice', 'Bob', 'Carol', 'David'],
             ['dogs', 'cats', 'moose', 'goose']]
```

Your `printTable()` function would print the following:

```
   apples Alice  dogs
  oranges   Bob  cats
 cherries Carol moose
   banana David goose
```

Hint: your code will first have to find the longest string in each of the inner lists so that the whole column can be wide enough to fit all the strings. You can store the maximum width of each column as a list of integers. The `printTable()` function can begin with `colWidths = [0] * len(tableData)`, which will create a list containing the same number of 0 values as the number of inner lists in `tableData`. That way, `colWidths[0]` can store the width of the longest string in `tableData[0]`, `colWidths[1]` can store the width of the longest string in `tableData[1]`, and so on. You can then find the largest value in the colWidths list to find out what integer width to pass to the `rjust()` string method.