# Strings
One place where the Python language really shines is in the manipulation of strings. This section will cover some of Python's built-in string methods and formatting operations.

Such string manipulation patterns come up often in the context of data science work, and is one big perk of Python in this context.

In [1]:
x = 'Pluto is a planet'
y = "Pluto is a planet"
x == y

True

Double quotes are convenient if your string contains a single quote character (e.g. representing an apostrophe).

Similarly, it's easy to create a string that contains double-quotes if you wrap it in single quotes:

In [2]:
print("Pluto's a planet!")
print('My dog is named "Pluto"')

Pluto's a planet!
My dog is named "Pluto"


If we try to put a single quote character inside a single-quoted string, Python gets confused:

In [3]:
'Pluto's a planet!'

SyntaxError: invalid syntax (<ipython-input-3-a43631749f52>, line 1)

In [6]:
"Pluto's a planet!"

"Pluto's a planet!"

We can fix this by "escaping" the single quote with a backslash.

In [7]:
'Pluto\'s a planet!'

"Pluto's a planet!"

The table below summarizes some important uses of the backslash character.

|What you type...|	What you get|	example	|print(example)|
|--------:|:------------|:-----------|:------|
| \\'      |	' |	'What\\'s up?'|	What's up?|
| \\"  |	"|	"That's \\"cool\\""|	That's "cool"|
| \\\\	| \\	|"Look, a mountain: /\\\\"|	Look, a mountain: /\\|
|\n	|  |"1\n2 3"|	1  2 3|

The last sequence, `\n`, represents the newline character. It causes Python to start a new line.

In [9]:
hello = "hello\nworld"
print(hello)

hello
world


In [10]:
triplequoted_hello = """hello
world"""
print(triplequoted_hello)
triplequoted_hello == hello

hello
world


True

The `print()` function automatically adds a newline character unless we specify a value for the keyword argument `end` other than the default value of `'\n'`:

In [11]:
print("hello")
print("world")
print("hello", end='')
print("pluto", end='')

hello
world
hellopluto

# Strings are sequences
Strings can be thought of as sequences of characters. Almost everything we've seen that we can do to a list, we can also do to a string.

In [14]:
# Indexing
planet = 'Pluto'
planet[2]

'u'

In [15]:
# Slicing
planet[-3:]

'uto'

In [16]:
# How long is this string?
len(planet)

5

In [17]:
# Yes, we can even loop over them
[char+'! ' for char in planet]

['P! ', 'l! ', 'u! ', 't! ', 'o! ']

But a major way in which they differ from lists is that they are immutable. We can't modify them.

In [18]:
planet[0] = 'B'
# planet.append doesn't work either

TypeError: 'str' object does not support item assignment

### String methods
Like list, the type str has lots of very useful methods. I'll show just a few examples here.

In [19]:
# ALL CAPS
claim = "Pluto is a planet!"
claim.upper()

'PLUTO IS A PLANET!'

In [20]:
# all lowercase
claim.lower()

'pluto is a planet!'

In [21]:
# Searching for the first index of a substring
claim.index('plan')

11

In [24]:
claim.startswith(planet)

True

In [25]:
claim.endswith('dwarf planet')

False

In [27]:
claim.endswith('planet!')

True

###### Going between strings and lists: .split() and .join()
`str.split()` turns a string into a list of smaller strings, breaking on whitespace by default. This is super useful for taking you from one big string to a list of words.

In [37]:
words = claim.split()
words

['Pluto', 'is', 'a', 'planet!']

In [29]:
words = claim.split('p')
words

['Pluto is a ', 'lanet!']

In [30]:
words = claim.split('l')
words

['P', 'uto is a p', 'anet!']

Occasionally you'll want to split on something other than whitespace:

In [32]:
datestr = '1956-01-31'
year, month, day = datestr.split('-')
print(year)
print(month)
print(day)

1956
01
31


`str.join()` takes us in the other direction, sewing a list of strings up into one long string, using the string it was called on as a separator.

In [34]:
'.'.join([month, day, year])

'01.31.1956'

In [38]:
# Yes, we can put unicode characters right in our string literals :)
' 👏 '.join([word.upper() for word in words])

'PLUTO 👏 IS 👏 A 👏 PLANET!'

##### Building strings with `.format()`
Python lets us concatenate strings with the `+` operator.

In [39]:
planet + ', we miss you.'

'Pluto, we miss you.'

If we want to throw in any non-string objects, we have to be careful to call `str()` on them first

In [40]:
position = 9
planet + ", you'll always be the " + position + "th planet to me."

TypeError: can only concatenate str (not "int") to str

In [41]:

planet + ", you'll always be the " + 5 + "th planet to me."

TypeError: can only concatenate str (not "int") to str

In [42]:
planet + ", you'll always be the " + str(position) + "th planet to me."

"Pluto, you'll always be the 9th planet to me."

This is getting hard to read and annoying to type. `str.format()` to the rescue.

In [43]:
"{}, you'll always be the {}th planet to me.".format(planet, position)

"Pluto, you'll always be the 9th planet to me."

So much cleaner! We call `.format()` on a "format string", where the Python values we want to insert are represented with `{}` placeholders.

Notice how we didn't even have to call `str()` to convert `position` from an int. `format()` takes care of that for us.

If that was all that `format()` did, it would still be incredibly useful. But as it turns out, it can do a lot more. Here's just a taste:

In [44]:
pluto_mass = 1.303 * 10**22
earth_mass = 5.9722 * 10**24
population = 52910390
#         2 decimal points   3 decimal points, format as percent     separate with commas
"{} weighs about {:.2} kilograms ({:.3%} of Earth's mass). It is home to {:,} Plutonians.".format(
    planet, pluto_mass, pluto_mass / earth_mass, population,
)

"Pluto weighs about 1.3e+22 kilograms (0.218% of Earth's mass). It is home to 52,910,390 Plutonians."

In [45]:
# Referring to format() arguments by index, starting from 0
s = """Pluto's a {0}.
No, it's a {1}.
{0}!
{1}!""".format('planet', 'dwarf planet')
print(s)

Pluto's a planet.
No, it's a dwarf planet.
planet!
dwarf planet!


You could probably write a short book just on str.format, so I'll stop here, and point you to [pyformat.info](https://pyformat.info/) and [the official docs](https://docs.python.org/3/library/string.html#formatstrings) for further reading.

# Dictionaries
Dictionaries are a built-in Python data structure for mapping keys to values.

In [1]:
numbers = {'one':1, 'two':2, 'three':3}