# Lecture 06 - Strings and Dictionaries

This lesson will be a double-shot of essential Python types: **strings** and **dictionaries**.

# Strings

One place where the Python language really shines is in the manipulation of strings.
This section will cover some of Python's built-in string methods and formatting operations.

Such string manipulation patterns come up often in the context of data science work, and is one big perk of Python in this context.

## String syntax

You've already seen plenty of strings in examples during the previous lessons, but just to recap, strings in Python can be defined using either single or double quotations. They are functionally equivalent.

In [2]:
x = 'DataCracy is meaningful!'
y = "DataCracy is meaningful!"
x == y

True

Double quotes are convenient if your string contains a single quote character (e.g. representing an apostrophe).

Similarly, it's easy to create a string that contains double-quotes if you wrap it in single quotes:

In [3]:
print("Let's join DataCracy!")
print('DataCracy stands for "Data & Democracy"')

Let's join DataCracy!
DataCracy stands for "Data & Democracy"


If we try to put a single quote character inside a single-quoted string, Python gets confused:

In [4]:
'Let's join DataCracy!'

SyntaxError: invalid syntax (<ipython-input-4-e1199c679eb8>, line 1)

We can fix this by **"escaping"** the single quote with a backslash. 

In [6]:
'Let\'s join DataCracy!'

"Let's join DataCracy!"

The table below summarizes some important uses of the **backslash(escape)** `\` character.

| What you type... | What you get | example               | `print(example)`             |
|--------------|----------------|-------------------------|------------------------------|
| `\'`         | `'`            | `'What\'s up?'`         | `What's up?`                 |  
| `\"`         | `"`            | `"That's \"cool\""`     | `That's "cool"`              |  
| `\\`         | `\`            |  `"Look, a mountain: /\\"` |  `Look, a mountain: /\`  |
| `\n`         |   new line        |   `"1\n2 3"`                       |   `1`<br/>`2 3`              |

The last sequence, `\n`, represents the ***newline character***. It causes Python to start a new line.

In [8]:
hello = "Hello\nDataCracy"
print(hello)

Hello
DataCracy


In addition, Python's **triple quote** syntax for strings lets us include newlines literally (i.e. by just hitting `Enter` on our keyboard, rather than using the special '\n' sequence). We've already seen this in the docstrings we use to document our functions, but we can use them anywhere we want to define a string.

In [9]:
triplequoted_hello = """hello
DataCracy"""
print(triplequoted_hello)
triplequoted_hello == hello

hello
DataCracy


False

The `print()` function automatically adds a newline character unless we specify a value for the keyword argument **`end`** other than the default value of `'\n'`:

In [10]:
print("hello")
print("DataCracy")
print("hello", end='')
print("DataCracy", end='')

hello
DataCracy
helloDataCracy

## Strings are Sequences

Strings can be thought of as **sequences of characters**. Almost everything we've seen that we can do to a **list**, we can also do to a string.

In [11]:
# Indexing
program = 'DataCracy'
program[0]

'D'

In [12]:
# Slicing
program[-3:]

'acy'

In [13]:
# How long is this string?
len(program)

9

In [14]:
# Yes, we can even loop over them
[char+'! ' for char in program]

['D! ', 'a! ', 't! ', 'a! ', 'C! ', 'r! ', 'a! ', 'c! ', 'y! ']

But a major way in which they differ from lists is that they are ***immutable***. We can't modify them.

In [16]:
program[0] = 'B'
# program.append doesn't work either

TypeError: 'str' object does not support item assignment

## String Methods

Like `list`, the type `str` has lots of very useful methods. I'll show just a few examples here.

In [17]:
# ALL CAPS
claim = "DataCracy is fun!"
claim.upper()

'DATACRACY IS FUN!'

In [18]:
# all lowercase
claim.lower()

'datacracy is fun!'

In [19]:
# Searching for the first index of a substring
claim.index('fun')

13

In [21]:
claim.startswith(program)

True

In [22]:
claim.endswith('fun!')

True

### Going between strings and lists: `.split()` and `.join()`

**`str.split()`** turns a string into a list of smaller strings, breaking on whitespace by default. This is super useful for taking you from one big string to a list of words.

In [23]:
words = claim.split()
words

['DataCracy', 'is', 'fun!']

Occasionally you'll want to split on something other than whitespace:

In [24]:
datestr = '1956-01-31'
year, month, day = datestr.split('-')
print(year)
print(month)
print(day)

1956
01
31


**`str.join()`** takes us in the other direction, sewing a list of strings up into one long string, using the string it was called on as a separator.

In [25]:
'/'.join([month, day, year])

'01/31/1956'

In [26]:
# Yes, we can put unicode characters right in our string literals :)
' 👏 '.join([word.upper() for word in words])

'DATACRACY 👏 IS 👏 FUN!'

## Building Strings with `.format()`

Python lets us concatenate strings with the `+` operator.

In [30]:
program + ' means Data and Democracy'

'DataCracy means Data and Democracy'

If we want to throw in any non-string objects, we have to be careful to call **`str()`** on them first!

In [80]:
planet = 'Pluto'
position = 9
planet + " is the " + position + "th planet in the solar system."

TypeError: must be str, not int

In [81]:
planet + " is the " + str(position) + "th planet in the solar system."

'Pluto is the 9th planet in the solar system.'

This is getting hard to read and annoying to type. **`str.format()`** to the rescue.

In [82]:
"{} is the {}th planet in the solar system.".format(planet, position)

'Pluto is the 9th planet in the solar system.'

So much cleaner! We call `.format()` on a "format string", where the Python values we want to insert are represented with `{}` placeholders.

Notice how we didn't even have to call `str()` to convert `position` from an int. `format()` takes care of that for us.

If that was all that `format()` did, it would still be incredibly useful. But as it turns out, it can do a *lot* more. Here's just a taste:

In [83]:
pluto_mass = 1.303 * 10**22 # 1.3 x 10^22
earth_mass = 5.9722 * 10**24 # 5.9 x 10^24
population = 52910390
#         (2 decimal points)   (3 decimal points, format as percent)     (separate with commas)
"{} weighs about {:.2} kilograms ({:.3%} of Earth's mass). It is home to {:,} Plutonians.".format(
    planet, pluto_mass, pluto_mass / earth_mass, population,
)

"Pluto weighs about 1.3e+22 kilograms (0.218% of Earth's mass). It is home to 52,910,390 Plutonians."

In [89]:
# Referring to format() arguments by index, starting from 0
s = """
    Pluto is a {0}.
    No, it is a {1}.
    {0}!
    {1}!
    """.format('planet', 'dwarf planet')
print(s)


    Pluto is a planet.
    No, it is a dwarf planet.
    planet!
    dwarf planet!
    


There are much more about the capability of **`str.format`**, but it is extra for the purpose of the course.

In case you want to go further, [pyformat.info](https://pyformat.info/) and [the official docs](https://docs.python.org/3/library/string.html#formatstrings) is worth being referenced.

# Dictionaries

Dictionaries are a built-in Python data structure for mapping **keys** to **values**.

To define a Dictionary, we use curly braces `{}`

In [68]:
numbers = {'one':1, 'two':2, 'three':3}

In this case `'one'`, `'two'`, and `'three'` are the **keys**, and 1, 2 and 3 are their corresponding **values**.

Values are accessed via square bracket syntax `[]` similar to indexing into lists and strings.

In [69]:
numbers['one']

1

We can use the same syntax to add another `key-value` pair

In [70]:
numbers['eleven'] = 11
numbers

{'eleven': 11, 'one': 1, 'three': 3, 'two': 2}

Or to change the value associated with an existing key

In [90]:
numbers['one'] = 'Pluto'
numbers

{'eleven': 11, 'one': 'Pluto', 'three': 3, 'two': 2}

Python has ***dictionary comprehensions*** with a syntax similar to the list comprehensions we saw in the previous tutorial.

In [72]:
planets = ['Mercury', 'Venus', 'Earth', 'Mars', 'Jupiter', 'Saturn', 'Uranus', 'Neptune']
planet_to_initial = {planet: planet[0] for planet in planets}
planet_to_initial

{'Earth': 'E',
 'Jupiter': 'J',
 'Mars': 'M',
 'Mercury': 'M',
 'Neptune': 'N',
 'Saturn': 'S',
 'Uranus': 'U',
 'Venus': 'V'}

The **`in`** operator tells us whether something is a key in the dictionary

In [73]:
'Saturn' in planet_to_initial

True

In [74]:
'Betelgeuse' in planet_to_initial

False

A for loop over a dictionary will loop over its keys

In [91]:
numbers = {'one':1, 'two':2, 'three':3}
for k in numbers:
    print("{} = {}".format(k, numbers[k]))

one = 1
two = 2
three = 3


## Dictionary Methods

We can access a collection of all the keys or all the values with **`dict.keys()`** and **`dict.values()`**, respectively.

In [94]:
# Get all the initials, sort them alphabetically, and put them in a space-separated string.
initial_keys = planet_to_initial.keys()
initial_vals = planet_to_initial.values()
print(initial_keys)
print(initial_vals)

' '.join(sorted(initial_vals))

dict_keys(['Mercury', 'Venus', 'Earth', 'Mars', 'Jupiter', 'Saturn', 'Uranus', 'Neptune'])
dict_values(['M', 'V', 'E', 'M', 'J', 'S', 'U', 'N'])


'E J M M N S U V'

The very useful **`dict.items()`** method lets us iterate over the keys and values of a dictionary simultaneously. 

(In Python jargon, an **item** refers to a **`key-value`** pair)

In [95]:
for planet, initial in planet_to_initial.items():
    print("{} begins with \"{}\"".format(planet, initial))

Mercury begins with "M"
Venus begins with "V"
Earth begins with "E"
Mars begins with "M"
Jupiter begins with "J"
Saturn begins with "S"
Uranus begins with "U"
Neptune begins with "N"


To read a full inventory of dictionaries' methods, check out the [official online documentation](https://docs.python.org/3/library/stdtypes.html#dict).

In [96]:
help(dict)

Help on class dict in module builtins:

class dict(object)
 |  dict() -> new empty dictionary
 |  dict(mapping) -> new dictionary initialized from a mapping object's
 |      (key, value) pairs
 |  dict(iterable) -> new dictionary initialized as if via:
 |      d = {}
 |      for k, v in iterable:
 |          d[k] = v
 |  dict(**kwargs) -> new dictionary initialized with the name=value pairs
 |      in the keyword argument list.  For example:  dict(one=1, two=2)
 |  
 |  Methods defined here:
 |  
 |  __contains__(self, key, /)
 |      True if D has a key k, else False.
 |  
 |  __delitem__(self, key, /)
 |      Delete self[key].
 |  
 |  __eq__(self, value, /)
 |      Return self==value.
 |  
 |  __ge__(self, value, /)
 |      Return self>=value.
 |  
 |  __getattribute__(self, name, /)
 |      Return getattr(self, name).
 |  
 |  __getitem__(...)
 |      x.__getitem__(y) <==> x[y]
 |  
 |  __gt__(self, value, /)
 |      Return self>value.
 |  
 |  __init__(self, /, *args, **kwargs)
 |

# Your Turn 👋