# Text Strings

Strings are the first example of a Python sequence. They're a sequence of characters

Unlike other languages, strings in Python are immutable (不可變的). You can't change a string in place, but you can copy parts of strings to another string to get the same effect.  

## Crate with Quotes

You make a Python string by enclosing characters in matching single or double quotes

The interactive integer echoes strings with a sngle quote, but all are threated exactly the same by Python

In [1]:
'Snap'

'Snap'

In [2]:
"Crackle" 

'Crackle'

Why have two kinds of quote characters? The main purpose is to create strings containing quote characters.

In [3]:
"'Nay', said the naysayer. 'Neigh?' said the horse."

"'Nay', said the naysayer. 'Neigh?' said the horse."

In [4]:
'The rare double quote in captivity: ".'

'The rare double quote in captivity: ".'

In [5]:
'A "two by four" is actually 1 1/2" x 3 1/2".'

'A "two by four" is actually 1 1/2" x 3 1/2".'

In [6]:
"'There's the man that shot my paw!' cried the limping hound."

"'There's the man that shot my paw!' cried the limping hound."

You can also use three single quotes (''') or three double quotes (""")

In [7]:
'''Boom!'''

'Boom!'

In [8]:
"""Eek!"""

'Eek!'

Triple quotes aren't very useful for short strings. Their most common use is to create multiline strings.

In [9]:
poem = '''There was a Young Lady of Norway,
Who casually sat in a doorway;
When the door squeezed her flat,
She exclaimedm "What of that?"
This courageous Young Lady of Norway.'''

In [10]:
poem = 'There was a Young Lady of Norway,
Who casually sat in a doorway;
When the door squeezed her flat,
She exclaimedm "What of that?"
This courageous Young Lady of Norway.'

SyntaxError: EOL while scanning string literal (<ipython-input-10-217efcb5be20>, line 1)

There is a difference between the output of print( ) and the automatic echoing done by the interactive interpreter

![%E8%9E%A2%E5%B9%95%E5%BF%AB%E7%85%A7%202020-03-11%20%E4%B8%8B%E5%8D%883.53.12.png](attachment:%E8%9E%A2%E5%B9%95%E5%BF%AB%E7%85%A7%202020-03-11%20%E4%B8%8B%E5%8D%883.53.12.png)

In [11]:
poem2 = '''I do not like thee, Doctor Fell.
    The reason why, I cannot tell. 
    But this I know, and know full well:
    I do not like thee, Doctor Fell. 
'''
poem2

'I do not like thee, Doctor Fell.\n    The reason why, I cannot tell. \n    But this I know, and know full well:\n    I do not like thee, Doctor Fell. \n'

In [12]:
# print() stirps quotes from strings and print their content.
print(poem2) 

I do not like thee, Doctor Fell.
    The reason why, I cannot tell. 
    But this I know, and know full well:
    I do not like thee, Doctor Fell. 



There is the empty string, which has no characters at all but is perfectly valid

In [13]:
''

''

In [14]:
""

''

In [15]:
''''''

''

In [16]:
""""""

''

## Crate with str( )

You can make a string from another data tpye by using the str( ) function

In [17]:
str(98.6)

'98.6'

In [18]:
str(1.0e4)

'10000.0'

In [19]:
str(True)

'True'

## Escape with \

Python lets you escape the meaning of some characters within strings to achieve effects that otherwise be difficult to express. By preceding a character with a backslash( \\ ), you give it a special meaning

In [20]:
palindrome = 'A man,\nA plan,\nA canal:\nPanama.'
print(palindrome)

A man,
A plan,
A canal:
Panama.


In [21]:
print('\tabc')

	abc


In [22]:
print('a\tbc')

a	bc


In [23]:
print('ab\tc')

ab	c


In [24]:
# The final string has a terminating tab which you can't see
print('abc\t') 

abc	


In [25]:
testimony = "\" I did nothing!\" he said. \"Or that other thing.\""
testimony

'" I did nothing!" he said. "Or that other thing."'

In [26]:
print(testimony)

" I did nothing!" he said. "Or that other thing."


In [27]:
fact = "The world's largest rubber duck was 54'2\" by 65'7\" by 105'"
print(fact)

The world's largest rubber duck was 54'2" by 65'7" by 105'


In [28]:
speech = 'The backslash(\\) bends over backwards to please you.'
print(speech)

The backslash(\) bends over backwards to please you.


In [29]:
# a raw string negates escapes
info = r'Type a \n to get a new line in a normal string'
print(info)

Type a \n to get a new line in a normal string


## Combine by Using +

You can combine literal strings or string variables in Python by using the + operator

In [30]:
'Release the kraken! ' + 'No, wait!'

'Release the kraken! No, wait!'

You can also combine literal strings (not string variables) just by having one after the other

In [31]:
"My word! " "A gentleman caller!"

'My word! A gentleman caller!'

In [32]:
"Alas! ""The Kraken!"

'Alas! The Kraken!'

Python do not add spaces for you when concatenating strings 

In [33]:
a = 'Duck.'

In [34]:
b = a

In [35]:
c = 'Grey Duck!'

In [36]:
a + b + c

'Duck.Duck.Grey Duck!'

In [37]:
print(a, b, c)

Duck. Duck. Grey Duck!


## Duplicate with * 

You use the * operator to duplicate a string. Notice that the * has higher precedence than +

In [38]:
start = 'Na ' * 4 + '\n'

In [39]:
middle = 'Hey ' * 3 + '\n'

In [40]:
end = 'Goodbye.'

In [41]:
print(start + start + middle + end)

Na Na Na Na 
Na Na Na Na 
Hey Hey Hey 
Goodbye.


## Get a Character with [ ]

To get a single character from a string, specify its offset inside square brackets after the string's name. The first (leftmost) offset is 0, the next is 1, and so on. The last (rightmost) offset can be specified with -1, so you don't have to count; going to the left are -2, -3, and so on

In [42]:
letters = 'abcdefghijklmnopqrstuvwxyz'

In [43]:
letters[0]

'a'

In [44]:
letters[1]

'b'

In [45]:
letters[-1]

'z'

In [46]:
letters[-2]

'y'

In [47]:
letters[25]

'z'

In [48]:
letters[5]

'f'

If you specify an offset that is the length of the string or longer (remember, offsets go from 0 to length -1), you'll get an exception

In [49]:
letters[26]

IndexError: string index out of range

In [50]:
letters[1000]

IndexError: string index out of range

Because strings are immutable, you can't insert a character directly into one or change at a specific index

In [51]:
name = 'Henny'
name[0] = 'P'

TypeError: 'str' object does not support item assignment

## Get a Substring with a Slice 

You can extract a substring from a string by using a slice

You define a slice by using square brackets, a start offset, and end offset, and an optional step count between them. 

The slice will include characters from offset start to the one before end. If you don't specify start, the slice uses 0 (the begining). If you don't specify end, it uses the end of the string. 
* [ : ] extracts the entire sequence from start to end.
* [start  : ] specifies from the start offset to the end.
* [ : end ] specifies from the begining to the end offset minus 1
* [ start  :  end ] indicates from the start to the end offset minus 1
* [ start  :  end : step ] extracts from the start to the end offset minus 1, skipping characters by step.

In [52]:
letters = 'abcdefghijklmnopqrstuvwxyz'

In [53]:
letters[:]

'abcdefghijklmnopqrstuvwxyz'

In [54]:
letters[20:]

'uvwxyz'

In [55]:
letters[10:]

'klmnopqrstuvwxyz'

In [56]:
letters[12:15]

'mno'

In [57]:
letters[-3:]

'xyz'

In [58]:
letters[18:-3]

'stuvw'

In [59]:
letters[-6:-2]

'uvwx'

In [60]:
letters[::7]

'ahov'

In [61]:
letters[4:20:3]

'ehknqt'

In [62]:
letters[19::4]

'tx'

In [63]:
letters[:21:5]

'afkpu'

Given a negative step size, this handy Python slicer can also step backward. This starts at the end and ends at the start, skipping nothing

In [64]:
letters[-1::-1]

'zyxwvutsrqponmlkjihgfedcba'

In [65]:
letters[::-1]

'zyxwvutsrqponmlkjihgfedcba'

Slices are more forgiving of bad offsets than are single-index lookups with []. A slice offset earlier than the beginning of a string is treated as 0, and one after the end is trated as -1.  

In [66]:
letters[-50:]

'abcdefghijklmnopqrstuvwxyz'

In [67]:
letters[-51:-50]

''

In [68]:
letters[:70]

'abcdefghijklmnopqrstuvwxyz'

In [69]:
letters[70:71]

''

## Get Length with Len ( )

The len function counts characters in a string

In [70]:
len(letters)

26

In [71]:
empty=''
len(empty)

0

## Split with split( )

You can use the built-in string split( ) function to break a string into a list of smaller strings based on some separator

In [72]:
tasks = 'get gloves,get mask,give cat vitamins,call ambulance'

In [73]:
# the string function split(), with the single argument ','
tasks.split(',') 

['get gloves', 'get mask', 'give cat vitamins', 'call ambulance']

If you don't specify a separator, split() uses any sequence of white space characters -- newlines, spaces, and tabs

In [74]:
tasks.split()

['get', 'gloves,get', 'mask,give', 'cat', 'vitamins,call', 'ambulance']

## Combine with Using join( )

The join( ) function is the opposite of split( ). It collapses a list of strings into a single string  

It looks a bit backward beacuse you specify the string that glues everything together first, and then the list of strings to glue: string.join(list)

In [75]:
crypto_list = ['Yeti', 'Bigfoot', 'Loch Ness Monster']

In [76]:
crypto_string = ",".join(crypto_list)

In [77]:
print('Found and signing book deals:', crypto_string)

Found and signing book deals: Yeti,Bigfoot,Loch Ness Monster


## Substitute by Using replace( )

You can use replace( ) for simple substring substitution. Give it the old sunstring, the new one, and how may instances of the old substring to replace. It returns the changed string but does not modify the original string 

In [78]:
setup = "a duck goes into a bar ..."

In [79]:
setup.replace('duck', 'marmoset')

'a marmoset goes into a bar ...'

In [80]:
setup

'a duck goes into a bar ...'

If you omit the final count argument, it replaces all instances.

In [81]:
# When you know the exact substrings(s) you want to change
# , replace() is a good choice. But watch out.   
setup.replace('a', 'a famous', 100)

'a famous duck goes into a famous ba famousr ...'

## Strip with strip( )

It's very common to strip leading or trailing "padding" characters from a string, especially spaces. Assume you want to get rid of whitespace characters (' ', '\t', '\n'), if you don't given them an argument, strip( ) strips both ends, lstrip( ) only from the left, and rstrip( ) only from the right

In [82]:
world = "  earth  "

In [83]:
world.strip()

'earth'

In [84]:
world.strip(' ')

'earth'

In [85]:
world.lstrip()

'earth  '

In [86]:
world.rstrip()

'  earth'

In [87]:
# If the character were not there, nothing happens
world.strip('!')

'  earth  '

Besides no arguments (meaning whitespaces characters) or a single character, you can also tell strip( ) to remove any character in a multicharacter string

In [88]:
blurt = "What the ...!!?"

In [89]:
blurt.strip('.!?')

'What the '

## Search and Select

In [90]:
poem = """All that doth flow we cannot liquid name
Or else would fire and water be the same;
But that is liquid which is moist and wet
Fire that property can never get.
Then 'tis not cold that doth the fire put out
But 'tis the wet that makes it die, no doubt."""

In [91]:
poem[:13]

'All that doth'

In [92]:
len(poem)

250

In [93]:
poem.startswith('All')

True

In [94]:
poem.endswith('That\'s all, folks!')

False

Python has two methods (find( ) and index( )) for finding the offset of a substring, and has two versions of each (starting from the beginning or the end). They work the same if the substring is found, find( ) returns -1, and index( ) raises an exception 

In [95]:
word = 'the'

In [96]:
poem.find(word)

73

In [97]:
poem.index(word)

73

In [98]:
# the offset of the last the
word = 'the'
poem.rfind(word)

214

In [99]:
poem.rfind(word)

214

In [100]:
# if the substring isn't in there
word = "duck"
poem.find(word)

-1

In [101]:
poem.rfind(word)

-1

In [102]:
poem.index(word)

ValueError: substring not found

In [103]:
poem.rindex(word)

ValueError: substring not found

How many times does the three-letter sequence "the" occur?

In [104]:
word = 'the'
poem.count(word)

3

Are all of the characters in the poem either letters or numbers?

In [105]:
poem.isalnum()

False

## Case 

In [106]:
setup = 'a duck goes into a bar...'

In [107]:
# Remove . sequences from both ends
setup.strip('.')

'a duck goes into a bar'

In [108]:
# Capitalize the first word
setup.capitalize()

'A duck goes into a bar...'

In [109]:
# Capitalize all the words
setup.title()

'A Duck Goes Into A Bar...'

In [110]:
# Capitalize all characters to uppercase
setup.upper()

'A DUCK GOES INTO A BAR...'

In [111]:
# Capitalize all characters to lowercase
setup.lower()

'a duck goes into a bar...'

In [112]:
# Swap uppercase and lowercase
setup.swapcase()

'A DUCK GOES INTO A BAR...'

## Alignment

In [113]:
# Center the string within 30 spaces
setup.center(30)

'  a duck goes into a bar...   '

In [114]:
# Left justify
setup.ljust(30)

'a duck goes into a bar...     '

In [115]:
# Right justify
setup.rjust(30)

'     a duck goes into a bar...'

## Formating

Let's look at how to interpolate data values into strings using various formats. You can use this to produce the reports, forms, and other outputs where appearences need to be just so.

Python has three ways of formatting strings
* old style (supported in Python 2 and 3)
* new style (Python 2.6 and up)
* f-string (Python 3.6 and up)

### Old-style: %

The old style of string formatting has the format_string % data. A % followed by a letter indicating the data type to be performed

In [116]:
'%s' % 42 # %s string

'42'

In [117]:
'%d' % 42 # %d decimal integer

'42'

In [118]:
'%x' % 42 # %x hex integer

'2a'

In [119]:
'%o' % 42 # %o octal integer

'52'

In [120]:
'%s' % 7.03 # %s string

'7.03'

In [121]:
'%f' % 7.03 # %f decimal float

'7.030000'

In [122]:
'%e' % 7.03 # %e exponential float

'7.030000e+00'

In [123]:
'%g' % 7.03 # %g  decimal or exponential float

'7.03'

In [124]:
# An integer and a literal %
'%d%%' % 100

'100%'

In [125]:
actor = 'Richard Gere'
cat = 'Chester'
weight = 28

In [126]:
"My wife's favorite actor os %s" % actor

"My wife's favorite actor os Richard Gere"

In [127]:
# multiple data must be grouped into a tuple 
"Our cat %s weights %s pounds" % (cat, weight) 

'Our cat Chester weights 28 pounds'

You can add other values in the format string between the % abd the type specifier to designate minimum and maximum widths, alignment, and character filling
* An initial '%' character
* An optional alignment character: nothing or '+' means right-align, and '-' means left-align
* An optional minwidth field width to use
* An optional '.' character to separate minwidth and maxchars.
* An optional maxchars (if conversion type is s) saying how many characters to print from the data value. If the conversion type is f, this specifies precision (how many digits to print after the decimal point)
* The conversion type character

In [128]:
thing = 'woodchuck'

In [129]:
'%s' % thing

'woodchuck'

In [130]:
'%12s' % thing

'   woodchuck'

In [131]:
'%+12s' %thing

'   woodchuck'

In [132]:
'%-12s' %thing

'woodchuck   '

In [133]:
'%.3s' %thing

'woo'

In [134]:
'%12.3s' %thing

'         woo'

In [135]:
'%-12.3s' %thing

'woo         '

Once more with feelings, and a float with %f variants

In [136]:
thing = 98.6

In [137]:
'%f' %thing

'98.600000'

In [138]:
'%12f' %thing

'   98.600000'

In [139]:
'%+12f' %thing

'  +98.600000'

In [140]:
'%-12f' %thing

'98.600000   '

In [141]:
'%.3f' %thing

'98.600'

In [142]:
'%12.3f' %thing

'      98.600'

In [143]:
'%-12.3f' %thing

'98.600      '

An an integer with %d

In [144]:
thing = 9876

In [145]:
'%d' % thing

'9876'

In [146]:
'%12d' % thing

'        9876'

In [147]:
'%+12d' % thing

'       +9876'

In [148]:
'%-12d' % thing

'9876        '

In [149]:
# the format strings with .3 has no effect as they do for a float
'%.3d' % thing 

'9876'

In [150]:
'%12.3d' % thing

'        9876'

In [151]:
'%+12.3d' % thing # just forces the sign to be priented

'       +9876'

In [152]:
'%-12.3d' % thing

'9876        '

### New Style: { } and format ()

"New style" formatting has the form format_string.format(data)

In [153]:
thing = 'woodchuck'

In [154]:
'{}'.format(thing)

'woodchuck'

The arguments to the format( ) function need to be in the order as the { } placeholders in the format string

In [155]:
thing = 'woodchuck'
place = 'lake'
'The {} is in the {}.'.format(thing, place)

'The woodchuck is in the lake.'

In [156]:
# The value 0 referred to the first argument, place, 
# and 1 referred to thing
'The {1} is in the {0}.'.format(place, thing)

'The woodchuck is in the lake.'

The arguments to format( ) can also be named arguments or a dictionary

In [157]:
'The {thing} is in the {place}'.format(thing='duck', place='bathtub')

'The duck is in the bathtub'

In [158]:
# {0} is the first argument to format () (the dictionary d)
d = {'thing': 'duck', 'place': 'bathtub'}
'The {0[thing]} is in the {0[place]}.'.format(d)

'The duck is in the bathtub.'

New-style formating has a slightly different format string definition from the old-style one 
* An initial colon ( : )
* An optional fill character (default ' ') to pad the value string if it's shorter than minwidth
* An optional alignment character. This time, left alignment is the default. '<' also means left, '>' means right, and '^' means center
* An optional sign for numbers. Nothing means only prepend a minus sign ('-') for negative numbers. ' ' means prepend a minus sign for negative numbers, and a space (' ') for positive ones.
* An optional minwidth. An optional period ('.') to separate mimwidth and maxchars
* An optional maxchars.
* The conversion type.

In [159]:
thing = 'wraith'
place = 'window'
'The {} is at the {}'.format(thing, place)

'The wraith is at the window'

In [160]:
'The {:10s} is at the {:10s}'.format(thing, place)

'The wraith     is at the window    '

In [161]:
'The {:<10s} is at the {:<10s}'.format(thing, place)

'The wraith     is at the window    '

In [162]:
'The {:^10s} is at the {:^10s}'.format(thing, place)

'The   wraith   is at the   window  '

In [163]:
'The {:>10s} is at the {:>10s}'.format(thing, place)

'The     wraith is at the     window'

In [164]:
'The {:!^10s} is at the {:!^10s}'.format(thing, place)

'The !!wraith!! is at the !!window!!'

### Newset Style: f-strings

f-strings appears in Python 3.6, and now the recommended way of formatting strings

To make an f-string:
* Type the letter f or F directly before the initial quote.
* Include variable names or expressions within curly brackets ({ }) to get their values into the string

It's like the previous section's "new-style" formatting, but without the format( ) function, and without empty brackets({ }) or positional ones({1}) in the format string

In [165]:
thing = 'wereduck'
place = 'werepond'
f'The {thing} in the {place}'

'The wereduck in the werepond'

In [166]:
f'The {thing.capitalize()} is in the {place.rjust(20)}'

'The Wereduck is in the             werepond'

f-strings use the same formatting language (width, padding, alignment) as new-style formatting, after a ":" 

In [167]:
f'The {thing:>20} is in the {place:.^20}'

'The             wereduck is in the ......werepond......'