# Manipulating strings

Text ist one of the most commong forms of data your programs will handle. You already know how to concatenate two string values together with the + operator, but you can do much more than that. You can extract partial strings from string values, add or remove spacing, convert letters to lowercase or uppercase, and check that strings are formatted correctly. You can even write Python code to access the clipboard for copying and pasting text.

In this chapter, you'll learn all this and more. Then you'll work through two different programming projects: a simple password manager and a program to automate the boring chore of formatting pieces of text.

## Working with strings

Let's look at some of the ways Python lets you write, print and access strings in your code. 

## String literals

Typing string values in Python code is fairly straightforward. They begin and end with a single quote. But then how can you use a quote inside a string? 

### Double quotes

Use double quotes instead of single quotes:

In [2]:
print ("That is Alice's cat")

That is Alice's cat


### Escape characters

An escape character lets you use characters that are otherwise impossible to put into a string. It consists of a backslash (\) followed by the character you want to add to the string. 

In [3]:
print ('That is Alice\'s cat')

That is Alice's cat


Other escape characters are \" for double quote, \t for tab, \n for new line (line break) and \\ for backslash.

In [5]:
print("Hello there!\tHow are you?\nI\'m doing fine.")

Hello there!	How are you?
I'm doing fine.


## Raw strings

You can place an `r` before the beginning quotation mark of a string to make it a raw string. A raw string completely ignores all escapte characters and prints any backslashe that appears in the string.

In [6]:
print (r'That is Carol\'s cat.')

That is Carol\'s cat.


Because this is a raw string, Python considers the backslash as part of the string and not as the start of an escape character. Raw strings are helpful if you are typing string values that contain many backslashes, such as the strings used for regular expressions described in the next chapter. 

## Multiline strings with triple quotes

While you can use the ´\n´ escape character to put a new line into a string, it is often easire to use multiline strings. A multiline string in Python begins and ends with either three single quotes or three double quotes. Any quotes, tabs or new lines in between the triple quotes are considered part of the string. Python's indentation rules for blocks do not apply to lines inside a multiline string.

In [12]:
print ('''Dear Alice, 

Eve's cat has been arrested for catnappingg, cat burglary and extortion.

Sincerely, 
Bob''' )

Dear Alice, 

Eve's cat has been arrested for catnappingg, cat burglary and extortion.

Sincerely, 
Bob


Notice that the single quote character in `Eve's` doesn't need to be escaped. Escaping single and double quotes is optional in raw strings. 

The following `print()` call would print identical text, but doesn't use a multiline string:

In [11]:
print('Dear Alice,\n\nEve\'s cat has been arrested for catnapping, cat burglary, and extortion.\n\nSincerely,\nBob')

Dear Alice,

Eve's cat has been arrested for catnapping, cat burglary, and extortion.

Sincerely,
Bob


## Multiline comments

While the hash character (#) marks the beginning of a comment for the rest of the line, a muliline string is often used for comments that span multiple lines. 

In [15]:
"""This is a test Python program. 
Written by Al Sweigart 

This program was designed for Python 3, not Python 2.
"""

'This is a test Python program. \nWritten by Al Sweigart \n\nThis program was designed for Python 3, not Python 2.\n'

## Indexing and slicing strings

Strings use indexes and slices the same way lists do. You can think of the string `Hello World!` as a list and each character in the string as an item with a corresponding index. The space and exclamation mark are included in the character count, so `Hello World!` is 12 characters long, from `H` at index 0 to `!` at index 11. 

In [21]:
spam = "Hello World!"
print(spam[0])
print(spam[-1])
print(spam[0:5])
print(spam[6:])

H
!
Hello
World!


If you specify an index, you'll get the character at the position in the string. If you specify a range from one index to another, the starting index is INCLUDED and the ending index is NOT INCLUDED. 

Slicing does not modify the original string. You can capture a slice from one variable in a separate variable. 

In [22]:
spam = "Hello World!"
fizz = spam[0:5]
print(fizz)

Hello


# The IN and NOT IN operators with strings

The `in` and `not in` operators can be used with strings just like with list values. An expression with two strings joined using `in` or `not in` will evaluate to a Boolean `True` or `False`. 

In [23]:
"Hello" in "Hello World!"

True

In [24]:
"Hello" in "HELLO"

False

In [25]:
"" in "spam"

True

In [26]:
"cats" not in "cats and dogs"

False

These expressions test whether the first string (the exact string, case sensitive) can be found within the second string.

## Using string methods

Several string methods analyse strings or create transformed string values. 

## The `UPPER()`, `LOWER()`, `ISUPPER()`, and `ISLOWER()`string methods

The `upper()` and `lower()` string methods return a new string where all the letters in the original string have been converted to uppercase and lowercase, respectively. Nonletter characters in the string remain unchanged. 

In [34]:
spam = "Hello World!"
print (spam.upper())
spam = spam.upper()
print (spam)

HELLO WORLD!
HELLO WORLD!


In [33]:
spam = spam.lower()
print (spam)

hello world!


Note that these methods do not change the string itself, but return new string values. If you want to change the original string, you have to call `upper()` or `lower()` on the string and then assign the new string to the variable where the original was stored. This is why you must use `spam = spam.upper()` to change the string in `spam` instead of simply `spam.upper()`.

The `upper()` and `lower()` methods are helpful if you need to make a case-sensitive comparison. The strings `great` and `GREat` are not equal to each other. But in the following small program, it doesn't matter whether the user types `Great`, `GREAT` or `grEAT`, because the string is converted first to lowercase.

In [2]:
print ("How are you?")
feeling = input()
if feeling.lower() == "great":
    print ("I feel great too!")
else:
        print ("I hope the rest of your day is good.")

How are you?
GReat
I feel great too!


Adding code to your program to handle variations or mistakes in user input, such as inconsistent capitalisation, will make your programs easier to use and less likely to fail.

The `isupper()` and `islower()` will return a Boolean `True` value if the string has at least one letter and all the letters are uppercase or lowercase, respectively. Otherwise, the method returns `False`.

In [3]:
spam = "Hello World!"

In [4]:
spam.islower()

False

In [5]:
spam.isupper()

False

In [6]:
"abcd12335".islower()

True

In [7]:
"12345324".islower()

False

Since the `upper()` and `lower()` methods themselves return strings, you can call string methods on those returned string values as well. Expressions that do this will look like a chain of method calls.

In [8]:
print ("Hello".upper())

HELLO


In [9]:
print ("Hello".upper().lower())

hello


## The isX string methods

Along with `islower()`and `isupper()`, there are several methods that have names beginning with the word `is`. These methods return a Boolean value that describes the nature of the string. Here are some common `isX`string methods:

* `isalpha()`. Returns `True` if string consists only of letters and is not blank.

* `isalnum()`. Returns `True` if string consists only of letters and numbers and is not blank.

* `isdecimal()`. Retruns `True` if string consists only of numeric characters and is not blank. 

* `isspace()`. Returns `True`if string consists of only spaces, tabs, and new-lines and is not blank.

* `istitle()`. Returns `True` if string consists only of words that begin with an uppercase letter followed by only lowercase letters.

In [10]:
"hello".isalpha()


True

In [11]:
"hello".isalnum()

True

In [12]:
"1234".isalnum()


True

The `isX`string methods are helpful when you need to validate user input. 

In [20]:

while True:
    print ("Enter your age:")
    age = input()
    if age.isdecimal():
        break
    print ("Please enter a number for your age.")

while True:
    print ("Select a new password (letters and numbers only):")
    password = input()
    if password.isalnum():
        break
    print ("Passwords can only have letters and numbers!")

Enter your age:
Five
Please enter a number for your age.
Enter your age:
15
Select a new password (letters and numbers only):
www9)
Passwords can only have letters and numbers!
Select a new password (letters and numbers only):
wwi8


Calling `isdecimal()` and `isalnum()` on variables, we're able to test whether the values stored in those variables are decimal or not, alphanumerical or not. 

## The `startswith()` and `endswith()` string methods

They return `True` if the string value they are called on begins or ends with, respectively, the string passed to the method; otherwise they return `False`. 

In [21]:
"Hello World".startswith("Hello")

True

In [22]:
"abs12312".endswith("12")

True

These methods are useful alternatives to the == operator if you need ot check only whether the first or last part of the string, rather than the whole thing, is equal to another string.

## The `join()`and `split()` string methods

The `join()` method is useful when you have a list of strings that need to be joined together into a single value. The `join()` method is called on a string, gets passed a list of strings and returns a string. The returned string is the concatenation of each string in the passed-in list. 

In [25]:
", ".join(["cats", "rats", "bats"])

'cats, rats, bats'

In [26]:
" ".join(["My", "name", "is", "Simon"])

'My name is Simon'

In [27]:
"ABSCSDSD".join(["My", "name", "is", "Simon"])

'MyABSCSDSDnameABSCSDSDisABSCSDSDSimon'

Notice that the string `join()` calls on is inserted between each string of the list argument. Remember that `join()` is called on a string value and returns a list of strings. 

The `split()` does the opposite: it's called on a string value and returns a list of strings. 

In [28]:
"My name is Simon!".split()

['My', 'name', 'is', 'Simon!']

By default the string is split wherever whitespace characters such as the space, tab or newline characters are found. These whitespace characters are not included in the strings in the returned list. You can pass a delimiter string to the `split()` method to specify a different string to split upon. 

In [29]:
'MyABCnameABCisABCSimon'.split('ABC')

['My', 'name', 'is', 'Simon']

In [30]:
"My name is Simon".split("m")

['My na', 'e is Si', 'on']

A common use of `split()` is to split a multiline string along the newline characters.

In [32]:
spam = '''Dear Alice,
How have you been? I am fine.
There is a container in the fridge
that is labeled "Milk Experiment".

Please do not drink it.
Sincerely,
Bob'''

spam.split("\n")

['Dear Alice,',
 'How have you been? I am fine.',
 'There is a container in the fridge',
 'that is labeled "Milk Experiment".',
 '',
 'Please do not drink it.',
 'Sincerely,',
 'Bob']

Passing the `split()` argument `\n` lets us split the multiline string stored in `spam` along the newlines and return a list in which each item corresponds to one line of the string. 

## Justifying text with `rjust()`, `ljust()` and `center()`

The `rjust()` and `ljust()` string methods return a padded version of the string they are called on, with spaces inserted to justify the text. The first argument to both methods is an integer length for the justified string.  

In [33]:
"Hello".rjust(10)

'     Hello'

In [34]:
"Hello".rjust(20)

'               Hello'

In [35]:
"Hello World".ljust(20)

'Hello World         '

`"Hello".rjust(10)` says that we want to right-justify `Hello` in a string of total length 10. `Hello` is five characters, so five spaces will be added to its left, giving a string of 10 characters with `Hello` justified right.

As an optional argument to `rjust()` and `ljust()` will specify a fill character other than a space character

In [36]:
"Hello".rjust(10, "?")

'?????Hello'

In [37]:
"Hello".ljust(20, "-")

'Hello---------------'

In [38]:
"Hello".center(30, "=")



These methods are especially useful when you need to print tabular data that has the correct spacing. 

Using `rjust()`, `ljust()` and `center()` lets you ensure strings are neatly aligned, even if you aren't sure how many characters long your strings are. We can for example take a dictionary information and print it in a neatly aligned table format.

In [45]:
# We want to organise the information about picnic items into a table
# with two columns; items left, number right

# We have a function that takes a dictionary and two numbers as arguments
def printPicnic(itemsDict, leftWidth, rightWidth):
    
    # Print the title of the table
    print ("PICNIC ITEMS".center(leftWidth + rightWidth, "-"))
    
    # k = key in dictionary; v = value in dictionary
    # Adjust k to the left and fill with dots to leftWidth
    # Make string out of v, adjust right, fill with space to rightWidth
    for k, v in itemsDict.items():
        print(k.ljust(leftWidth, ".") + str(v).rjust(rightWidth))
        
# dictionary picnicItems of picnic items and numbers
picnicItems = {"sandwiches": 4, "apples": 12, "cups": 4, "cookies": 8000}

#The 3 arguments in the printPicnic function are which dictionary (picnicItems,
# which leftWidth (12 or 20) and which rightWidth (5 or 6)). leftWidth = width
# of left column, rightWidth = width of right column
printPicnic(picnicItems, 12, 5)
print ("\n")
printPicnic(picnicItems, 20, 6)

---PICNIC ITEMS--
sandwiches..    4
apples......   12
cups........    4
cookies..... 8000


-------PICNIC ITEMS-------
sandwiches..........     4
apples..............    12
cups................     4
cookies.............  8000


## Removing whitespace with `strip()`, `rstrip()` and `lstrip()`

Sometimes you may want to strip off whitespace characters (space, tab and newline) from the left side, right side, or both sides of a string. The `strip()` string method will return a new string without any whitespace characters at the beginning or end. The `lstrip()` and `rstrip()` methods will remove whitespace characters from the left and right ends, respectively. 

In [49]:
spam = "         Hello World         "
print (spam)
print (spam.strip())
print (spam.lstrip())
print (spam.rstrip())

         Hello World         
Hello World
Hello World         
         Hello World


Optionally, a string argument will specify which characters on the ends should be stripped. 
Passing `strip()` the argument `ampS` will tell it to strip occurences of `a, m, p` and `S` from the end of strings stored in `spam`. The order of characters in the string passed to `strip` does not matter: `strip("ampS")` will do the same thing as `strip("mapS")` or `strip("Sapm")`.

In [53]:
spam = "SpamSpamBaconSpamEggsBaconSpam"
print (spam.strip("aSmp"))

BaconSpamEggsBacon


## Copying and pasting strings with the pyperclip module

The `pyperclip` module has `copy()` and `paste()` functions that can send text to and receive text from your computer's clipboard. Sending the output of your program to the clipboard will make it easy to paste it into an email, word processor or some software.

Pyperclip doesn't come with Python. You have to install and import it first.

In [58]:
import pyperclip
pyperclip.copy ("Hello World")
print (pyperclip.paste())

Hello World


Of course if something outside of your program changes the clipboard contents, the `paste()` function will return it. 

In [57]:
# Copy something e.g. from your browser window and then return it here

print (pyperclip.paste())

o install it, follow the directions for installing third-party modules in Appendix A.


# Project: Password Locker

Create a password manager that uses one master password to unlock the password manager. Then you can copy any account password to the clipboard and paste it into the website’s Password field.


## Step 1: Program design and data structures

You want to be able to run this program with a command line argument that is the account’s name—for instance, email or blog. That account’s password will be copied to the clipboard so that the user can paste it into a Password field. This way, the user can have long, complicated passwords without having to memorize them.

Open a new file editor window and save the program as pw.py. You need to start the program with a `#!` (shebang) line (see Appendix B) and should also write a comment that briefly describes the program. Since you want to associate each account’s name with its password, you can store these as strings in a dictionary. The dictionary will be the data structure that organizes your account and password data. Make your program look like the following:

`#! python3`

`# pw.py - An insecure password locker program.`

`PASSWORDS = {'email': 'F7minlBDDuvMJuxESSKHFhTxFtjVB6',`

             `'blog': 'VmALvQyKAxiVH5G8v01if1MLZF3sdt',`
             
             `'luggage': '12345'}`

## Step 2: Handle command line arguments

The command line arguments will be stored in the variable `sys.argv`. (See Appendix B for more information on how to use command line arguments in your programs.) The first item in the `sys.argv` list should always be a string containing the program’s filename ('pw.py'), and the second item should be the first command line argument. For this program, this argument is the name of the account whose password you want. Since the command line argument is mandatory, you display a usage message to the user if they forget to add it (that is, if the sys.argv list has fewer than two values in it). Make your program look like the following:

`import sys`

`if len(sys.argv) <2:`

    `print("Usage: python pw.py [account] - copy account password")`
    
    `sys.exit()`

## Step 3: Copy the right password

Now that the account name is stored as a string in the variable `account`, you need to see whether it exists in the `PASSWORDS` dictionary as a key. If so, you want to copy the key’s value to the clipboard using `pyperclip.copy()`. (Since you’re using the pyperclip module, you need to import it.) Note that you don’t actually need the account variable; you could just use `sys.argv[1]` everywhere `account` is used in this program. But a variable named `account` is much more readable than something cryptic like `sys.argv[1]`.

`import pyperclip`

`if account in PASSWORDS:`

    `pyperclip.copy(PASSWORDS[account])`
    
    `print ("Password for " + account + " copied to clipboard.")`

`else:`
    
    `print ("There is no account named " + account)`
    
