# Strings and Dictionaries

In [1]:
pip install jupyterlab-mathjax2

Note: you may need to restart the kernel to use updated packages.


## Strings

The Python language shines bright in the manipulation of strings. This section will cover some of Python's built-in string methods and formatting operations.

Such string manipulation patterns come up often in the context of **data science** work.

### String syntax

As a recap, strings in Python can be defined using either single or double quotations; they are functionally equivalent. The example below confirms this:

In [2]:
x = 'Pluto is a planet'
y = "Pluto is a planet"
x == y

True

You might be asking yourself when it would be preferable to use one version over the other; it depends on the context. Double quotes are convenient if the string contains a single quote character; an example is a string that makes use of an apostrophe.

Similarly, it is easy to create a string that contains double quotes if you wrap it in single quotes:

In [3]:
print("Pluto's a planet!")
print('My dog is named "Pluto"')

Pluto's a planet!
My dog is named "Pluto"


If we try to stick with one type of quotation for strings, you could run into problems. An example of this is below:

In [4]:
'Pluto's a planet!'

SyntaxError: unterminated string literal (detected at line 1) (1561186517.py, line 1)

One way to fix or resolve this is to use a backslash to the apostrophe:

In [5]:
'Pluto\'s a planet!'

"Pluto's a planet!"

The table below summarises the uses of backslash in strings:

(i)  \ '  gives  '. An example of this is: 'What\'s up?' gives What's up? .

(ii)  \ "  gives  ". An example of this is: "That's \"cool\"" gives That's "cool" .

(iii)  \ \  gives \. An example of this is: "Look, a mountain: / \ \ " gives Look, a mountain: /\ .

(iv)  \n   gives  . An example of this is: "1\n 2 3" gives 

1 <br> 2 3 .

The last point above, `\n`, represents the *newline* character; it causes Python to start a new line. See an example below:

In [6]:
hello = "hello\nworld"
print(hello)

hello
world


In addition, Python's triple quote syntax for strings lets us include newlines literally, i.e. by just hitting 'Enter' on our keyboard, rather than using the special '\n' sequence. We have already seen this in the docstrings we use to document our functions, but we can use them anywhere we want to define a string; see the example below:

In [7]:
triplequoted_hello = """hello
world"""
print(triplequoted_hello)

hello
world


In [8]:
triplequoted_hello == hello # Are these the same?

True

The `print()` function automatically adds a newline character unless we specify a value for the keyword argument end other than the default value of `'\n'`:

In [9]:
print("hello")
print("world")
print("hello", end='')
print("pluto", end='')

hello
world
hellopluto

### Strings are sequences

Strings can be thought of as sequences of characters. Almost everything we have seen that we can do to a list can also be done to a string.

In [10]:
# Indexing
planet = 'Pluto'
planet[0]

'P'

In [11]:
planet[-1]

'o'

In [12]:
planet[int(len(planet)/2)]

'u'

In [13]:
# Slicing
planet[-3:]

'uto'

In [14]:
planet[:2]

'Pl'

In [15]:
planet[1:3]

'lu'

In [16]:
# How long is this string?
len(planet)

5

In [17]:
len(planet) + len(planet) * 2*len(planet)

55

In [18]:
# We can even loop over them
[char +'!' for char in planet]

['P!', 'l!', 'u!', 't!', 'o!']

However, a major way strings differ from lists is that they are immutable (like tuples); we cannot modify them. See an example below:

In [19]:
planet[0] = 'B'
# planet.append does not work either

TypeError: 'str' object does not support item assignment

### String methods

Like `list`, the type `str` has lots of very useful methods. Below are a few examples:

In [20]:
# ALL CAPS
claim = "Pluto is a planet!"
claim.upper()

'PLUTO IS A PLANET!'

In [21]:
# Searching for the first index of a substring
claim.index('plan')

11

In [22]:
claim.startswith(planet) # This will return True because the string list 'planet' shares the same starting string as 'claim'.

True

In [23]:
claim.startswith("planet") # It will return False because the string list 'claim' starts with "Pluto", not 'pluto'. It is case sensitive.

False

In [24]:
claim.startswith("pluto")

False

In [25]:
claim.startswith("Pluto")

True

In [26]:
# false because of missing exclamation mark
claim.endswith('planet')

False

#### Going between strings and lists: `.split()` and `.join()`

`str.split()` turns a string into a list of smaller strings, breaking on whitespace by default. This is super useful for taking you from one big string to a list of words. An example of this is below:

In [27]:
words = claim.split()
words

['Pluto', 'is', 'a', 'planet!']

There will come a time when you will want to split on something other than whitespace:

In [28]:
datestr = '1956-01-31'
year, month, day = datestr.split('-')

`str.join()` takes us in the other direction, sewing a list of strings up into one long string, using the string it was called on as a separator. Let us one again use the string list we just split, 'datestr', for demonstration:

In [29]:
'/'.join([month, day, year])

'01/31/1956'

In [30]:
# Yes, we can put unicode characters right in our string literals :)
' 👏 '.join([word.upper() for word in words])

'PLUTO 👏 IS 👏 A 👏 PLANET!'

#### Building strings with `.format()`

We can concatenate (i.e. string together) strings with the `+` operator; see the example below demonstrating this:

In [31]:
planet + ', we miss you.'

'Pluto, we miss you.'

If you wish to concatenate items that are not the `str` datatype, you have to first convert it using `str()`:

In [33]:
position = 9
# Instead of taking this approach...
planet + ", you'll always be the " + position + "th planet to me."

TypeError: can only concatenate str (not "int") to str

In [34]:
# Use this approach without fail:
planet + ", you'll always be the " + str(position) + "th planet to me."

"Pluto, you'll always be the 9th planet to me."

It is becoming rather tedious typing out these with the approach we are currently using; let us rather make use of `str.format()` to aid us. See below:

In [35]:
"{}, you'll always be the {}th planet to me.".format(planet, position)

"Pluto, you'll always be the 9th planet to me."

It should be clear that `.format` will put into the place of `{}` the values we place in its arguments in the order both of them follow, i.e. in a respective manner.
 <br> <br>
It is important to note that we did not need to convert 'position' using `str()` since `format()` does it already in its stead. 
<br>
It can do a lot more than just that; let us see below what else it can do:

In [47]:
pluto_mass = 1.303 * 10**22
earth_mass = 5.9722 * 10**24
population = 52910390
#         2 sig figures   3 decimal pts, format as percentage     separate with commas
"{} weighs about {:.2} kilograms ({:.3%} of Earth's mass). It is home to {:,} Plutonians.".format(
    planet, pluto_mass, pluto_mass / earth_mass, population)

"Pluto weighs about 1.3e+22 kilograms (0.218% of Earth's mass). It is home to 52,910,390 Plutonians."

Another example is as seen below:

In [53]:
# Referring to format() arguments by index; remember that indexes start from 0
s = """Pluto's a {0}.
No, it's a {1}.
{0}!
{1}!""".format('planet', 'dwarf planet')
print(s)

Pluto's a planet.
No, it's a dwarf planet.
planet!
dwarf planet!


## Dictionaries

Dictionaries are a built-in Python data structure for mapping keys to values. An example of this is:

In [1]:
numbers = {'one':1, 'two':2, 'three':3}

In this case, the strings `'one'`, `'two'`, and `'three'` are the **keys**, and the integers 1, 2 and 3 are their corresponding values. Also, note that dictionaries use curly brackets to contain their values within and key-value pairs are separated by a colon.

We can access the values of keys in dictionaries by calling on them using square brackets after the name of the dictionary:

In [2]:
numbers['two']

2

Think of it as looking up a word in an actual dictionary, you know, the book, and it returns the definition of said word.

We can also add new key-value pairs to dictionaries by assigning them as follows:

In [3]:
numbers['twelve'] = 12
numbers

{'one': 1, 'two': 2, 'three': 3, 'twelve': 12}

If we wish to change the value of an already existing key, we can reassign its value as follows:

In [5]:
numbers['one'] = 'Mars'
numbers

{'one': 'Mars', 'two': 2, 'three': 3, 'twelve': 12}

Python has *dictionary comprehensions* with a syntax similar to the list comprehensions we saw in the previous tutorial.

In [6]:
planets = ['Mercury', 'Venus', 'Earth', 'Mars', 'Jupiter', 'Saturn', 'Uranus', 'Neptune']

In [7]:
planet_to_initial = {planet: planet[0] for planet in planets}

In [8]:
planet_to_initial

{'Mercury': 'M',
 'Venus': 'V',
 'Earth': 'E',
 'Mars': 'M',
 'Jupiter': 'J',
 'Saturn': 'S',
 'Uranus': 'U',
 'Neptune': 'N'}

The `in` operator tells us whether something is a key in the dictionary; see the examples below:

In [9]:
'Saturn' in planet_to_initial

True

In [10]:
'Nibiru' in planet_to_initial

False

A for loop over a dictionary will loop over its keys:

In [11]:
for k in numbers:
    print("{} = {}".format(k, numbers[k]))

one = Mars
two = 2
three = 3
twelve = 12


We can access a collection of all the keys or values with `dict.keys()` and `dict.values()`, respectively:

In [12]:
# Get all the initials, sort them alphabetically, and put them in a space-separated string.
' '.join(sorted(planet_to_initial.values()))

'E J M M N S U V'

The very useful `dict.items()` method lets us iterate over the keys and values of a dictionary simultaneously. (In Python jargon, an **item** refers to a key-value pair.)

In [14]:
for planet, initial in planet_to_initial.items():
    print("{} begins with \"{}\".".format(planet.rjust(10), initial))

   Mercury begins with "M".
     Venus begins with "V".
     Earth begins with "E".
      Mars begins with "M".
   Jupiter begins with "J".
    Saturn begins with "S".
    Uranus begins with "U".
   Neptune begins with "N".


To read a full inventory of dictionaries' methods, run the cell below to read the full help page:

In [15]:
help(dict)

Help on class dict in module builtins:

class dict(object)
 |  dict() -> new empty dictionary
 |  dict(mapping) -> new dictionary initialized from a mapping object's
 |      (key, value) pairs
 |  dict(iterable) -> new dictionary initialized as if via:
 |      d = {}
 |      for k, v in iterable:
 |          d[k] = v
 |  dict(**kwargs) -> new dictionary initialized with the name=value pairs
 |      in the keyword argument list.  For example:  dict(one=1, two=2)
 |
 |  Built-in subclasses:
 |      StgDict
 |
 |  Methods defined here:
 |
 |  __contains__(self, key, /)
 |      True if the dictionary has the specified key, else False.
 |
 |  __delitem__(self, key, /)
 |      Delete self[key].
 |
 |  __eq__(self, value, /)
 |      Return self==value.
 |
 |  __ge__(self, value, /)
 |      Return self>=value.
 |
 |  __getattribute__(self, name, /)
 |      Return getattr(self, name).
 |
 |  __getitem__(self, key, /)
 |      Return self[key].
 |
 |  __gt__(self, value, /)
 |      Return self>va