# Working with Strings

Textual data in Python is handled with `str` objects, or strings. Strings are immutable `sequences` of Unicode code points. 

<div class="alert alert-block alert-warning">
    String handling was one of, if not, the biggest changes between Python 2 and Python 3. If you are working in legacy systems you need to pay careful attention to what is going on. 
</div>

String literals are written in a variety of ways:
1. Single quotes: `'allows embedded "double" quotes'`
2. Double quotes: `"allows embedded 'single' quotes"`
3. Triple quoted: `'''Three single quotes''', """Three double quotes"""`

Triple quoted strings may span multiple lines - all associated whitespace will be included in the string literal.

In [2]:
"""The quick brown fox
jumped over 
the lazy dog
"""

'The quick brown fox\njumped over \nthe lazy dog\n'

String objects provide several handy functions. See [String Methods](https://docs.python.org/3/library/stdtypes.html#string-methods) for all the details.

We can test if a string is made of characters in various character classes

In [5]:
"1234".isdigit()

True

In [6]:
"1234asdf".isalpha()

False

In [8]:
"1234asdf".isalnum()

True

We can search for strings in other strings.

In [9]:
'Python' in 'Python 3'

True

In [11]:
'Python 3'.startswith('Py') and 'Python 3'.endswith('3')

True

In [13]:
'Python 3'.index(' ')

6

We can create new strings by transforming the current string

In [14]:
"chris utz".capitalize()

'Chris utz'

In [16]:
# That's not quite right
"chris utz".title()

'Chris Utz'

In [17]:
# We can trim whitespace from the start and end
"   padded ".strip()

'padded'

Joining a collection of strings or splitting a string into multiple strings is a common operation when processing textual data.

In [18]:
"The quick brown fox jumped over the lazy dog".split()

['The', 'quick', 'brown', 'fox', 'jumped', 'over', 'the', 'lazy', 'dog']

In [21]:
# of course maybe we want to use a different separator
words = "my, comma, seperated, text".split(", ")
words

['my', 'comma', 'seperated', 'text']

In [22]:
# We can join the resulting text by a string
"|".join(words)

'my|comma|seperated|text'

As we've seen we can create strings using string literals. 

We can also create strings using the `str` class.

In [23]:
str("hello world")

'hello world'

We can turn objects into strings by passing them to the `str` class.

In [27]:
mylist = [1, 2, 3]
assert isinstance(mylist, list)
stringified = str(mylist)
assert isinstance(stringified, str)
stringified

'[1, 2, 3]'

How does this work?

`str(object)` returns `type(object).__str__(object)`, which is the “informal” or nicely printable string representation of object. For string objects, this is the string itself. If object does not have a `__str__()` method, then `str()` falls back to returning `repr(object)`.

In [30]:
class mylist(list):
    def __str__(self):
        return "|".join(str(x) for x in self)

str(mylist([1,2,3]))

'1|2|3'

## Strings are sequences

Strings implement all of the [common](https://docs.python.org/3/library/stdtypes.html#typesseq-common) sequence operations. We've already seen the use of `in`. But we can find their length, slice them, iterate them, etc.

In [31]:
greeting = 'Hello Everyone'
len(greeting)

14

In [32]:
greeting[1:-1]

'ello Everyon'

In [33]:
list(enumerate(greeting))

[(0, 'H'),
 (1, 'e'),
 (2, 'l'),
 (3, 'l'),
 (4, 'o'),
 (5, ' '),
 (6, 'E'),
 (7, 'v'),
 (8, 'e'),
 (9, 'r'),
 (10, 'y'),
 (11, 'o'),
 (12, 'n'),
 (13, 'e')]

## Formatted String Literals

[Formatted string literals](https://docs.python.org/3/reference/lexical_analysis.html#f-strings) (also called f-strings for short) let you include the value of Python expressions inside a string by prefixing the string with `f` or `F` and writing expressions as {expression}.

In [34]:
import math
mystr = f"The value of pi is approximately {math.pi}"
print(type(mystr))
mystr

<class 'str'>


'The value of pi is approximately 3.141592653589793'

In [36]:
f"The portion in '{{}}' is any valid expression. Pi is {22 / 7}"

"The portion in '{}' is any valid expression. Pi is 3.142857142857143"

An optional format specifier can follow the expression. This allows greater control over how the value is formatted. The following example rounds pi to three places after the decimal.

In [37]:
f"The value of pi is approximately {math.pi:.3f}"

'The value of pi is approximately 3.142'

Other modifiers can be used to convert the value before it is formatted. `'!a'` applies `ascii()`, `'!s'` applies `str()`, and `'!r'` applies `repr()`. `!s` is the default.

In [41]:
animals = mylist(['eel', 'fox', 'cat'])
print(f'My hovercraft is full of {animals}.')
print(f'My hovercraft is full of {animals!r}.')

My hovercraft is full of eel|fox|cat.
My hovercraft is full of ['eel', 'fox', 'cat'].


The `=` specifier can be used to expand an expression to the text of the expression, an equal sign, then the representation of the evaluated expression.

In [42]:
bugs = 'roaches'
count = 13
area = 'living room'
f'Debugging {bugs=} {count=} {area=}'

"Debugging bugs='roaches' count=13 area='living room'"

We can even format dates and currencies using fstrings

In [44]:
f"${35235.56:,.2f}"

'$35,235.56'

In [45]:
from datetime import datetime
now = datetime.now()
f"{now:%d-%B-%Y}"

'08-August-2023'