# Strings and Dictionaries
---

One place where the Python language really shines is in the manipulation of strings. This section will cover some of Python's built-in string methods and formatting operations.

Such string manipulation patterns come up often in the context of data science work, and is one big perk of Python in this context.

## String syntax

You've already seen plenty of strings in examples during the previous lessons, but just to recap, strings in Python can be defined using either single or double quotations. They are functionally equivalent.

In [1]:
x = 'Pluto is a planet'
y = "Pluto is a planet"
x == y

True

Double quotes are convenient if your string contains a single quote character (e.g. representing an apostrophe).

Similarly, it's easy to create a string that contains double-quotes if you wrap it in single quotes:


In [2]:
print("Pluto's a planet!")
print('My dog is named "Pluto"')

Pluto's a planet!
My dog is named "Pluto"


If we try to put a single quote character inside a single-quoted string, Python gets confused:

In [3]:
'Pluto's a planet!'

SyntaxError: invalid syntax (<ipython-input-3-a43631749f52>, line 1)

We can fix this by "escaping" the single quote with a backslash.

In [4]:
'Pluto\'s a planet!'

"Pluto's a planet!"

The table below summarizes some important uses of the backslash character.

***What you type*** | ***What you get*** | ***Example*** | ***print(example)***
--- | --- | --- | ---
`\'` | ' | `'What\' up?'` | `What's up?`
`\"` | " | `"That's \"cool\""` | `That's "cool"`
`\\` | \ | `"Look, a mountain: /\\"` | `Look, a mountain: /\`
`\n` |  | `"1\n2 3"` | `1`<br>`2 3`

The last sequence `\n` represents the *newline character*. It causes Python to start a new line.

In [5]:
hello = "hello\nworld"
print(hello)

hello
world


In addition, Python's triple quote syntax for strings lets us include newlines literally (i.e. by just hitting 'Enter' on our keyboard, rather than using the special '\n' sequence). We've already seen this in the docstrings we use to document our functions, but we can use them anywhere we want to define a string.

In [6]:
triplequoted_hello = """hello
world"""
print(triplequoted_hello)
triplequoted_hello == hello

hello
world


True

The print() function automatically adds a newline character unless we specify a value for the keyword argument end other than the default value of '\n':

In [7]:
print("hello")
print("world")
print("hello", end='')
print("pluto", end='')

hello
world
hellopluto

## Strings are sequences

Strings can be thought of as sequences of characters. Almost everything we've seen that we can do to a list, we can also do to a string.

In [8]:
# Indexing
planet = 'Pluto'
planet[0]

'P'

In [9]:
# Slicing
planet[-3:]

'uto'

In [10]:
# How long is this string?
len(planet)

5

In [11]:
# Yes, we can even loop over them
[char+'! ' for char in planet]

['P! ', 'l! ', 'u! ', 't! ', 'o! ']

In [12]:
print('a' in 'abcd') # True
print('ab' in 'abcd') # also True

# this doesn't work for lists
print(['a', 'b'] in ['a', 'b', 'c', 'd']) # False

True
True
False


In [13]:
planet[0] = 'B'
# planet.append doesn't work either

TypeError: 'str' object does not support item assignment

We can easily convert a string to a list of characters:

In [14]:
abc_list = list("abracadabra")
print(abc_list)

['a', 'b', 'r', 'a', 'c', 'a', 'd', 'a', 'b', 'r', 'a']


What if we want to convert a list of characters into a string? Using the `str` function will only give us a printable string of the list - including commas, quotes and brackets - which we may not want. To join a sequence of characters (or longer strings) together into a single string, we have to use `join`.

`join` is not a function or a sequence method – it’s a string method which takes a sequence of strings as a parameter. When we call a string’s `join` method, we are using that string to glue the strings in the sequence together. For example:

In [15]:
l = ['a', 'b', 'r', 'a', 'c', 'a', 'd', 'a', 'b', 'r', 'a']

s = "".join(l)
print(s)

abracadabra


We can use any string we like to join a sequence of strings together:

In [16]:
animals = ('cat', 'dog', 'fish')

# a space-separated list
print(" ".join(animals))

# a comma-separated list
print(",".join(animals))

# a comma-separated list with spaces
print(", ".join(animals))

cat dog fish
cat,dog,fish
cat, dog, fish


The opposite of joining is *splitting*. We can split up a string into a list of strings using the `split` method. If called without any parameters, `split` divides up a string into words, using any number of consecutive whitespace characters as a delimiter. We can use additional parameters to specify a different delimiter as well as a limit on the maximum number of splits to perform:

In [17]:
print("cat    dog fish\n".split())
print("cat|dog|fish".split("|"))
print("cat, dog, fish".split(", "))
print("cat, dog, fish".split(", ", 1))

['cat', 'dog', 'fish']
['cat', 'dog', 'fish']
['cat', 'dog', 'fish']
['cat', 'dog, fish']


## String methods

Like list, the type `str` has lots of very useful methods. I'll show just a few examples here.

In [18]:
# ALL CAPS
claim = "Pluto is a planet!"
claim.upper()

'PLUTO IS A PLANET!'

In [19]:
# all lowercase
claim.lower()

'pluto is a planet!'

In [20]:
# change each word to begin with a capital letter
claim.title() 

'Pluto Is A Planet!'

In [21]:
# Searching for the first index of a substring
claim.index('plan')

11

In [22]:
claim.startswith(planet)

True

In [23]:
claim.endswith('dwarf planet')

False

## Going between strings and lists: `.split()` and `.join()`

`str.split()` turns a string into a list of smaller strings, breaking on whitespace by default. This is super useful for taking you from one big string to a list of words.

In [24]:
words = claim.split()
words

['Pluto', 'is', 'a', 'planet!']



Occasionally you'll want to split on something other than whitespace:


In [25]:
datestr = '1956-01-31'
year, month, day = datestr.split('-')

`str.join()` takes us in the other direction, sewing a list of strings up into one long string, using the string it was called on as a separator.

In [26]:
'/'.join([month, day, year])

'01/31/1956'

In [27]:
# Yes, we can put unicode characters right in our string literals :)
' 👏 '.join([word.upper() for word in words])

'PLUTO 👏 IS 👏 A 👏 PLANET!'

## Building strings with .format()

Python lets us concatenate strings with the `+` operator.

In [28]:
planet + ', we miss you.'

'Pluto, we miss you.'

If we want to throw in any non-string objects, we have to be careful to call `str()` on them first

In [29]:
position = 9
planet + ", you'll always be the " + position + "th planet to me."

TypeError: can only concatenate str (not "int") to str

In [30]:
planet + ", you'll always be the " + str(position) + "th planet to me."

"Pluto, you'll always be the 9th planet to me."

This is getting hard to read and annoying to type. `str.format()` to the rescue.

In [31]:
"{}, you'll always be the {}th planet to me.".format(planet, position)

"Pluto, you'll always be the 9th planet to me."

So much cleaner! We call `.format()` on a "format string", where the Python values we want to insert are represented with `{}` placeholders.

Notice how we didn't even hmave to call `str()` to convert position from an int. `format()` takes care of that for us.

If that was all that `format()` did, it would still be incredibly useful. But as it turns out, it can do a lot more. Here's just a taste:

In [32]:
pluto_mass = 1.303 * 10**22
earth_mass = 5.9722 * 10**24
population = 52910390
#          2 decimal points   3 decimal points, format as percent     separate with commas
"{} weighs about {:.2} kilograms ({:.3%} of Earth's mass). It is home to {:,} Plutonians.".format(
    planet, pluto_mass, pluto_mass / earth_mass, population,
)

"Pluto weighs about 1.3e+22 kilograms (0.218% of Earth's mass). It is home to 52,910,390 Plutonians."

In [33]:
# Referring to format() arguments by index, starting from 0
s = """Pluto's a {0}.
No, it's a {1}.
{0}!
{1}!""".format('planet', 'dwarf planet')
print(s)

Pluto's a planet.
No, it's a dwarf planet.
planet!
dwarf planet!


## F-strings

We can also use the f-strings to insert a variable's value into a string:

***Note:*** F-strings were introduced in Python 3.6. If you're using Python 3.5 or earlier, you'll need to use the `format()` method. 

In [34]:
first_name = 'John'
last_name = 'Doe'

full_name = f'{first_name} {last_name}'  # an f-string
print(full_name)

# You can also use f-strings to compose a message with variables in it.
print(f'Hello, {full_name}!')

John Doe
Hello, John Doe!


## Adding Whitespace to Strings with Tabs or Newlines

You can use whitespace to organize your output so it's easier for users to read. 

In [35]:
# This is a string without whitespace
print("Python")

# This adds a tab (\t) to the text
print("\tPython")

# This adds newlines (\n) to the text
print("\nLanguages:\nPython\nC\nJavaScript")

Python
	Python

Languages:
Python
C
JavaScript


##  Stripping whitespace

You can remove extra whitespace on either the right or left sides, or even both, of a string using `.lstrip()`, `.rstrip()`, and `.strip()`:

In [36]:
a = b = c = '  Python  '

a = a.lstrip()   # remove whitespace on left side of the string
b = b.rstrip()   # remove whitespace on right side of the string
c = c.strip()    # remove whitespace on both sides of the string

print(f'This \'{a}\' has been stripped of whitespace on the left.')
print(f'This \'{b}\' has been stripped of whitespace on the right.')
print(f'This \'{c}\' has been stripped of whitespace on both sides.')

This 'Python  ' has been stripped of whitespace on the left.
This '  Python' has been stripped of whitespace on the right.
This 'Python' has been stripped of whitespace on both sides.


You can refer to [pyformat.info](https://pyformat.info/) and the [official docs](https://docs.python.org/3/library/string.html#formatstrings) for further reading.

## Dictionaries

Dictionaries are a built-in Python data structure for mapping keys to values. To define a dictionary literal, we put a comma-separated list of key-value pairs between curly brackets `{` and `}`. We use a colon to separate each key from its value:

In [37]:
numbers = {'one':1, 'two':2, 'three':3}

In this case 'one', 'two', and 'three' are the keys, and 1, 2 and 3 are their corresponding values.

Values are accessed via square bracket syntax similar to indexing into lists and strings.

In [38]:
numbers['one']

1

We can use the same syntax to add another key, value pair

In [39]:
numbers['eleven'] = 11
numbers

{'one': 1, 'two': 2, 'three': 3, 'eleven': 11}

Or to change the value associated with an existing key

In [40]:
numbers['one'] = 'Pluto'
numbers

{'one': 'Pluto', 'two': 2, 'three': 3, 'eleven': 11}

Python has dictionary comprehensions with a syntax similar to the list comprehensions we saw in the previous lesson:

In [41]:
planets = ['Mercury', 'Venus', 'Earth', 'Mars', 'Jupiter', 'Saturn', 'Uranus', 'Neptune']
planet_to_initial = {planet: planet[0] for planet in planets}
planet_to_initial

{'Mercury': 'M',
 'Venus': 'V',
 'Earth': 'E',
 'Mars': 'M',
 'Jupiter': 'J',
 'Saturn': 'S',
 'Uranus': 'U',
 'Neptune': 'N'}

The `in` operator tells us whether something is a key in the dictionary

In [42]:
'Saturn' in planet_to_initial

True

In [43]:
'Betelgeuse' in planet_to_initial

False

A for loop over a dictionary will loop over its keys

In [44]:
for k in numbers:
    print("{} = {}".format(k, numbers[k]))

one = Pluto
two = 2
three = 3
eleven = 11


## Switch-case statement

Python does not have a switch statement, which other programming languages tend to have. On the bright side however, we can still achieve something similar by using a dictionary like below:

In [56]:
DEPARTMENT_NAMES = {
    "CSC": "Computer Science",
    "MAM": "Mathematics and Applied Mathematics",
    "STA": "Statistical Sciences", # Trailing commas like this are allowed in Python!
}

course_code = 'CSC'

if course_code in DEPARTMENT_NAMES: # this tests whether the variable is one of the dictionary's keys
    print("Department: %s" % DEPARTMENT_NAMES[course_code])
else:
    print("Unknown course code: %s" % course_code)

Department: Computer Science


## Dictionary methods

We can access a collection of all the keys or all the values with `dict.keys()` and `dict.values()`, respectively.

In [46]:
# Get all the initials, sort them alphabetically, and put them in a space-separated string.
' '.join(sorted(planet_to_initial.values()))

'E J M M N S U V'

We can also check if a value is in the dictionary using `in` operator in conjunction with the `dict.values` method:

In [47]:
'M' in planet_to_initial.values()

True

The very useful `dict.items()` method lets us iterate over the keys and values of a dictionary simultaneously. (In Python jargon, an item refers to a key, value pair)

In [48]:
for planet, initial in planet_to_initial.items():
    print("{} begins with \"{}\"".format(planet.rjust(10), initial))

   Mercury begins with "M"
     Venus begins with "V"
     Earth begins with "E"
      Mars begins with "M"
   Jupiter begins with "J"
    Saturn begins with "S"
    Uranus begins with "U"
   Neptune begins with "N"


We can also use the `.get()` method to fetch the corresponding value of the specific key:

In [49]:
planet_to_initial.get("Saturn")

'S'

The `.update()` method allow us to update several values at once:

In [50]:
marbles = {"red": 34, "green": 30, "brown": 31, "yellow": 29 }

# Add several items to the dictionaryx at once, and modify "green" value to 29
marbles.update({"orange": 34, "blue": 23, "purple": 36, "green": 32})

print(marbles)

{'red': 34, 'green': 32, 'brown': 31, 'yellow': 29, 'orange': 34, 'blue': 23, 'purple': 36}


## Removing key-value pairs
   
   You can use the `del` statement to remove a key-value pair:

In [51]:
del marbles["brown"]  # delete key-value pair identified by the key: "brown"
print(marbles)

{'red': 34, 'green': 32, 'yellow': 29, 'orange': 34, 'blue': 23, 'purple': 36}


## Different data types in a dictionary

The keys in dictionary don’t have to be strings – we can mix different types of keys and different types of values:

In [52]:
# Tuples are used as keys while corresponding values are in boolean
battleship_guesses = {
    (3, 4): False,
    (2, 6): True,
    (2, 5): True,
}
print(battleship_guesses[(3,4)]) # print value corresponding to key: (3,4)
print(battleship_guesses.get((3,4))) # this does the same thing as the code above

False
False


## Nesting

You can store multiple dictionaries in a list, or a list of items as a value in a dictionary, i.e. *nesting*. You can nest dictionaries inside a list, a list of items inside a dictionary, or even a dictionary inside another dictionary:

In [53]:
## nesting a list of dictionaries

car_0 = {'brand': 'toyota', 'colour': 'silver'}
car_1 = {'brand': 'honda', 'colour': 'blue'}
car_2 = {'brand': 'mercedes', 'colour': 'white'}
cars = [car_0, car_1, car_2]
print(cars)

[{'brand': 'toyota', 'colour': 'silver'}, {'brand': 'honda', 'colour': 'blue'}, {'brand': 'mercedes', 'colour': 'white'}]


In [54]:
## nesting a list in a dictionary
pizza = {
    'crust':    'thick',
    'toppings': ['mushrooms', 'extra cheese'],  # list of topping ingredients
}

print(f"\nYou ordered a {pizza['crust']}-crust pizza with the following toppings: ")

# loop through list of topping and print each item
for topping in pizza['toppings']:
    print(f"\t{topping}")


You ordered a thick-crust pizza with the following toppings: 
	mushrooms
	extra cheese


In [55]:
## nesting a dictionary in a dictionary
users = {
    'aeinstein': {
        'first': 'albert',
        'last' : 'einstein',
        'location': 'princeton',
    },
    'mcurie': {
        'first': 'marie',
        'last' : 'curie',
        'location': 'paris',
    }
}

# loop through key-value pairs in dictionary and within each value, print every nested values: first, last and location
for username, user_info in users.items():
    print(f"\nUsername: {username}")
    full_name = f"{user_info['first']} {user_info['last']}"
    location = user_info['location']
    
    print(f"\tFull name: {full_name.title()}")
    print(f"\tLocation: {location.title()}")


Username: aeinstein
	Full name: Albert Einstein
	Location: Princeton

Username: mcurie
	Full name: Marie Curie
	Location: Paris


Next up, we'll learn about [working with external libraries](https://github.com/colintwh/python-basics/blob/master/external_libs.ipynb) that will aid in your coding and programs.