# Collections

---
   Now we will see how we can group multiple values together in a collection – like a *list* of numbers, or a *dictionary* which we can use to store and retrieve key-value pairs. Many useful collections are built-in types in Python, and we will encounter them quite often.
   
## Lists
   A *list* is a collection of items in a particular order - a type of sequence that we can use it to store multiple values, and access them sequentially, by their position, or index. In Python, square brackets `[]` indicate a list, and individual elements in the list are separated by commas:

In [1]:
# a list of strings
animals = ['cat', 'dog', 'fish', 'bison']

# a list of integers
numbers = [1, 7, 34, 20, 12, 7]

# an empty list
my_list = []

a = "1"   # this is a string
b = 2     # this is an integer
c = 3.0   # this is a float
d = "hello world!"
e = True

# we can mix the types of values we store in a list
things = [
    a,
    b,
    c, 
    d, 
    e, # this trailing comma is legal in Python
]

Because a list usually contains more than one element, it's a good practice to use plural nouns to label your list, such as *letters*, *digits*, or *names*. If you print a list in Python, the interpreter will return its representation of the list, including the square brackets.

In [2]:
print(animals)
print(numbers)
print(things)

['cat', 'dog', 'fish', 'bison']
[1, 7, 34, 20, 12, 7]
['1', 2, 3.0, 'hello world!', True]


### Accessing elements in a list
   To access an element in a list, write the name of the list followed by the index of the item enclosed in square brackets. Python considers the first item in a list to be at position 0, **not** position 1. This case applies in most programming languages.

In [4]:
print(animals[0]) # prints first item: cat
print(numbers[1]) # prints second item: 7

# You can use string methods on any element in the list
print(things[3].title())

cat
7
Hello World!


In [5]:
# This will give us an error, because the list only has four elements
print(animals[6])

IndexError: list index out of range

Using the `index()` method allows you to find index of a value in the list

In [6]:
# find the index of a value
a = numbers.index(34)
b = animals.index('dog')
print(f'Index position of 34 in numbers is: {a}')
print(f'Index position of dog in animals is: {b}')

# if the value appears more than once, we will get the index of the first one
c = numbers.index(7)
print(f'Index position of 7 in numbers is: {c}')

Index position of 34 in numbers is: 2
Index position of dog in animals is: 1
Index position of 7 in numbers is: 1


In [7]:
# if the value is not in the list, we will get a ValueError!
numbers.index(42)

ValueError: 42 is not in list

   Python also facilitates a special syntax for accessing the last element in a list, by asking for the item at index `-1`. This is useful when you'll want to access the last items in a list without knowing exactly how long the list is. The convention extends to other negative values as well:

In [8]:
print(animals[-1])  # prints last item: bison
print(numbers[-2])  # prints second last item: 20

bison
12


### Extracting subsets
We can extract a subset of a list, which will itself be a list, using a slice. This uses almost the same syntax as accessing a single element, but instead of specifying a single index between the square brackets we need to specify an upper and lower bound. Note that our sublist will include the element at the lower bound, but exclude the element at the upper bound:

In [9]:
print(animals[1:3])  # prints range from second (dog) to fourth item (fish)
print(animals[1:-1]) # this achieves the same result; range from second to second last item (fish)

['dog', 'fish']
['dog', 'fish']


   If one of the bounds is one of the ends of the list, we can leave it out. A slice with neither bound specified gives us a copy of the list:

In [10]:
print(animals[2:]) # prints range from third item onwards
print(animals[:2]) # prints range (from first) till second item
print(animals[:])  # prints whole list

['fish', 'bison']
['cat', 'dog']
['cat', 'dog', 'fish', 'bison']


   Note that printing `animals` and `animals[:]` yields the same output but their usage in assignment operator behaves differently in the program, which we will see later below.
   
   We can also include a third parameter to specify the step size:

In [11]:
print(animals[::2]) # prints range starting from first item, but skips every alternate item

['cat', 'fish']


### Changing, Adding and Removing Elements
   Lists are mutable – we can modify elements, add elements to them or remove elements from them. A list will change size dynamically when we add or remove elements – we don’t have to manage this ourselves:

In [12]:
# assign a new value to an existing element
animals[3] = "hamster"

# add a new element to the end of the list
animals.append("squirrel")

# remove an element by its index
del animals[2]  # deletes 'fish'

print(animals)

['cat', 'dog', 'hamster', 'squirrel']


You can add a new element at any position in your list by using the `insert()` method:

In [13]:
# insert a string value at third position
animals.insert(2, 'rabbit')  
print(animals)

# insert a numerical value at a particular index
numbers.insert(0, 45) # insert 45 at the beginning of the list
print(numbers)

['cat', 'dog', 'rabbit', 'hamster', 'squirrel']
[45, 1, 7, 34, 20, 12, 7]


Instead of using `append()`, which works with only **one** new element, you can use the `extend()` method to append multiple values at the end of the list:

In [14]:
# append several values at once to the end
numbers.extend([56, 2, 12])
print(numbers)

[45, 1, 7, 34, 20, 12, 7, 56, 2, 12]


### Pop and remove
Sometimes you'll want to use the value of an item after you remove it from a list. The `pop()` method removes the last item in a list, but it lets you work with that item after removing it. The term *pop* comes from thinking of a list as a stack of items and popping one items off the top of the stack. In this analogy, the top of a stack corresponds to the end of a list:

In [15]:
popped_animal = animals.pop() # remove last list item: 'aardvark' and assign it to popped_animal
print(animals)
print(popped_animal)

# You can also pop items from any position in a list by specifying an index
first_animal = animals.pop(0)
print(f'The first animal in the list was {first_animal}.')

['cat', 'dog', 'rabbit', 'hamster']
squirrel
The first animal in the list was cat.


You can remove an item from the list if you know the value of the item even though you do not know the position of the item in the list:

In [16]:
# remove based on value of the item
motorcycles = ['honda', 'yamaha', 'suzuki', 'ducati']
motorcycles.remove('ducati')
print(motorcycles)

# You can also use remove() to work with the removed item
purchased_motorcycle = 'honda'
motorcycles.remove(purchased_motorcycle)
print(f'\nI just bought a {purchased_motorcycle.title()} motorcycle.')

['honda', 'yamaha', 'suzuki']

I just bought a Honda motorcycle.


**Note**: The `remove()` method deletes only the *first* occurence of the value you specify. If there's a possibility the value appears more than once in the list, you'll need to use a loop.


### Lists are mutable
Because lists are mutable, we can modify a list variable without assigning the variable a completely new value. Remember that if we assign the same list value to two variables, any in-place changes that we make while referring to the list by one variable name will also be reflected when we access the list through the other variable name:

In [17]:
animals = ['cat', 'dog', 'goldfish', 'canary']
pets = animals # now both variables refer to the same list object

animals.append('aardvark')
print(pets) # pets is still the same list as animals

animals = ['rat', 'gerbil', 'hamster'] # now we assign a new list value to animals
print(pets) # pets still refers to the old list

pets = animals[:] # assign a *copy* of animals to pets
animals.append('aardvark')
print(pets) # pets remains unchanged, because it refers to a copy, not the original list

['cat', 'dog', 'goldfish', 'canary', 'aardvark']
['cat', 'dog', 'goldfish', 'canary', 'aardvark']
['rat', 'gerbil', 'hamster']


### Check for specific item in list
How do we check whether a list contains a particular value? We use in or not in, the membership operators:

In [18]:
numbers = [34, 67, 12, 29]
number = 67

if number in numbers:
    print("%d is in the list!" % number)

number = 90
if number not in numbers:
    print("%d is not in the list!" % number)


67 is in the list!
90 is not in the list!


### Organising a list

   Python provides a number of different ways to organise your lists, depending on the situation. You can use the `sort()` method to sort a list:

In [19]:
cars = ['bmw', 'audi', 'toyota', 'subaru']
print('\n---Sorted order---')

# sort the list by alphabetical order; note the list has been modified
cars.sort()   
print(cars)

numbers = [5, 3, 7, 9, 1]

# sort the list by increasing order; list has been modified
numbers.sort() 
print(numbers)


# sort the list in reverse alphabetical order
cars.sort(reverse=True)
print('\n---Reverse order---')
print(cars)


# sort the list in decreasing/descending order
numbers.sort(reverse=True)
print(numbers)


---Sorted order---
['audi', 'bmw', 'subaru', 'toyota']
[1, 3, 5, 7, 9]

---Reverse order---
['toyota', 'subaru', 'bmw', 'audi']
[9, 7, 5, 3, 1]


   As shown in the examples above, the `sort()` method modifies the order of the lists. To maintain the original order of a list, you can use the `sorted()` function. The `sorted()` function lets you display your list ina particular order but doesn't affect the actual order of the list:

In [20]:
# print the original list
print(cars)

# print the sorted list
print(sorted(cars))

# print the original list again
print(cars)

['toyota', 'subaru', 'bmw', 'audi']
['audi', 'bmw', 'subaru', 'toyota']
['toyota', 'subaru', 'bmw', 'audi']


To reverse the orignal order of a list, you can use the `reverse()` method:

In [21]:
cars = ['bmw', 'audi', 'toyota', 'subaru']

# print original list
print(cars)

# print sorted list in reverse
cars.reverse()
print(cars)

['bmw', 'audi', 'toyota', 'subaru']
['subaru', 'toyota', 'audi', 'bmw']


### Finding length 

You can quickly find the length of a list by using the `len()` function:

In [22]:
len(cars)

4

### Other useful methods

List objects also have useful methods which we can call:

In [23]:
# the sum of a list of numbers
print(sum(numbers))

# the minimum of the list of numbers
print(min(numbers))

# the maximum of the list of numbers
print(max(numbers))

# are any of these values true?
print(any([1,0,1,0,1]))

# are all of these values true?
print(all([1,0,1,0,1]))

25
1
9
True
False


### Using arithmetic operators with lists

Some of the arithmetic operators we have used on numbers before can also be used on lists, but the effect may not always be what we expect:

In [24]:
# we can concatenate two lists by adding them
print([1, 2, 3] + [4, 5, 6])

# we can concatenate a list with itself by multiplying it by an integer
print([1, 2, 3] * 3)

[1, 2, 3, 4, 5, 6]
[1, 2, 3, 1, 2, 3, 1, 2, 3]


In [25]:
# not all arithmetic operators can be used on lists -- this will give us an error!
print([1, 2, 3] - [2, 3])

TypeError: unsupported operand type(s) for -: 'list' and 'list'

### Lists vs arrays

Many other programming languages don’t have a built-in type which behaves like Python’s list. Arrays are simpler, more low-level data structures, which don’t have all the functionality of a list. Here are some major differences between lists and arrays:

   + An array has a fixed size which you specify when you create it. If you need to add or remove elements, you have to make a new array.

   + If the language is statically typed, you also have to specify a single type for the values which you are going to put in the array when you create it.
   
   + In languages which have primitive types, arrays are usually not objects, so they don’t have any methods – they are just containers.

Compared to lists, arrays are less easy to use but they do have some advantages: because they are so simple, and there are so many restrictions on what you can do with them, the computer can handle them very efficiently. That means that it is often much faster to use an array than to use an object which behaves like a list. A lot of programmers use them when it is important for their programs to be fast.

Python has a built-in array type. It’s not so restrictive as an array in C or Java – you have to specify a type for the contents of the array, and you can only use it to store numeric values, but you can resize it dynamically, like a list. You will probably never need to use it.

## Tuples

Python has another sequence type which is called `tuple`. Tuples are similar to lists in many ways, but they are **immutable**, that is a list of items that cannot change. We define a tuple literal by putting a comma-separated list of values inside round brackets `(` and `)`:

In [26]:
WEEKDAYS = ('Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday')
print(WEEKDAYS)

# Tuples are already used when inserting multiple values into a formatted string
print("%d %d %d" % (1, 2, 3))

('Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday')
1 2 3


What happens if we try to change one of the items in the tuple?

In [27]:
def print_weekday_list(weekdays):
    weekdays[5] = 'Caturday' # this is going to modify the original list!
    print(weekdays)

print_weekday_list(WEEKDAYS)

TypeError: 'tuple' object does not support item assignment

You can define a tuple with one element by including a trailing comma:

In [28]:
a_tuple = (3,)
print(a_tuple)

# this assigns a number instead of a tuple
a_number = (3)
print(a_number)

(3,)
3


In [29]:
len(a_tuple)

1

You can also use some of the useful methods (e.g. length, summing) used for lists earlier on tuples as well:

In [30]:
numbers_t = (34, 67, 12, 29)

# accessing specific item in a tuple
print(numbers_t[1])

# get index of specific value in a tuple
print(numbers_t.index(12))

# length of tuple
print(len(numbers_t))

# the sum of a list of numbers
print(sum(numbers_t))

# are any of these values true?
print(any((1,0,1,0,1)))

# are all of these values true?
print(all((1,0,1,0,1)))

67
2
4
142
True
False


## Sets

Python also supports another type called `set`. A set is a collection of unique elements. If we add multiple copies of the same element to a set, the duplicates will be eliminated, and we will be left with one of each element. To define a set literal, we put a comma-separated list of values inside curly brackets `{` and `}`:

In [31]:
animals = {'cat', 'dog', 'goldfish', 'canary', 'cat'}
print(animals) # the set will only contain one cat

{'cat', 'dog', 'goldfish', 'canary'}


We can perform various set operations on sets:

In [32]:
even_numbers = {2, 4, 6, 8, 10}
big_numbers = {6, 7, 8, 9, 10}

# subtraction: big numbers which are not even
print(big_numbers - even_numbers)

# union: numbers which are big or even
print(big_numbers | even_numbers)

# intersection: numbers which are big and even
print(big_numbers & even_numbers)

# numbers which are big or even but not both
print(big_numbers ^ even_numbers)

{9, 7}
{2, 4, 6, 7, 8, 9, 10}
{8, 10, 6}
{2, 4, 7, 9}


Important to note that unlike lists and tuples sets are **not** ordered. When we print a set, the order of the elements will be random. If we want to process the contents of a set in a particular order, we will first need to convert it to a list or tuple and sort it:

In [33]:
print(animals)
print(sorted(animals))   # sorted() on tuple returns a list object

{'cat', 'dog', 'goldfish', 'canary'}
['canary', 'cat', 'dog', 'goldfish']


How do we make an empty set? We have to use the `set` function. Dictionaries, which will be discuss in later section below, used curly brackets before sets adopted them, so an empty set of curly brackets is actually an empty dictionary:

In [34]:
# this is an empty dictionary
a = {}

# this is how we make an empty set
b = set()

## Range

`range` is another kind of immutable sequence type. It is very specialised – we use it to create sequence of integers. Ranges are also *generators*, which we will find out more about it in later chapters. For now we just need to know that the numbers in the range are generated one at a time as they are needed, and not all at once. In the examples below, we convert each range to a list so that all the numbers are generated and we can print them out:

In [35]:
# print the integers from 0 to 9
print(list(range(10)))

# print the integers from 1 to 10
print(list(range(1, 11)))

# print the odd integers from 1 to 10
print(list(range(1, 11, 2)))

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
[1, 3, 5, 7, 9]


As observed above, a single parameter passed to the range function will be used as the upper bound. If we use two parameters, the first is the lower bound and the second forms the upper bound. If we use three, the third parameter is the step size. The default lower bound is zero, and the default step size is one. Note that the range includes the lower bound and excludes the upper bound.

## Dictionaries

A *dictionary* (`dict`) in Python is a collection of key-value pairs. Each *key* is connected to a value, and you can use a key to access the associated value, which can be a number, a string, a list, or even another dictionary. Thus we use keys instead of indices, unlike how we access values in list or tuple elements. 

To define a dictionary literal, we put a comma-separated list of key-value pairs between curly brackets `{` and `}`. We use a colon to separate each key from its value:

In [36]:
marbles = {"red": 34, "green": 30, "brown": 31, "yellow": 29 }

personal_details = {
    "name": "Jane Doe",
    "age": 38,  # trailing comma is legal
}

print(marbles["green"])
print(personal_details["name"])

30
Jane Doe


In [37]:
# This will give us an error, because there is no such key in the dictionary
print(marbles["blue"])

# modify a value
marbles["red"] += 3
personal_details["name"] = "Jane Q. Doe"

KeyError: 'blue'

The keys of a dictionary don’t have to be strings – we can mix different types of keys and different types of values, be it numbers, any immutable type, and even tuples, in one dictionary. Keys are unique – if we repeat a key, we will overwrite the old value with the new value. When we store a value in a dictionary, the key doesn’t have to exist – it will be created automatically:

In [38]:
# Tuples are used as keys while corresponding values are in boolean
battleship_guesses = {
    (3, 4): False,
    (2, 6): True,
    (2, 5): True,
}
print(battleship_guesses[(3,4)]) # print value corresponding to key: (3,4)
print(battleship_guesses.get((3,4))) # this does the same thing as the code above

surnames = {} # this is an empty dictionary
surnames["John"] = "Smith"
surnames["John"] = "Doe"
print(surnames) # we overwrote the older surname

marbles = {"red": 34, "green": 30, "brown": 31, "yellow": 29 }

# add a new key-pair: "blue"-30
marbles["blue"] = 30
print(marbles)

False
False
{'John': 'Doe'}
{'red': 34, 'green': 30, 'brown': 31, 'yellow': 29, 'blue': 30}


Like sets, dictionaries are **not ordered** – if we print a dictionary, the order will be random.


### Some useful methods for dictionaries
Here are some commonly used methods of dictionary objects:

In [39]:
marbles = {"red": 34, "green": 30, "brown": 31, "yellow": 29 }

# Get a value by its key, or None if it doesn't exist
print(marbles.get("orange"))
# We can specify a different default
print(marbles.get("orange", 0))

# Add several items to the dictionary at once, and modify "green" value to 29
marbles.update({"orange": 34, "blue": 23, "purple": 36, "green": 32})

# All the keys in the dictionary
print(marbles.keys())
# All the values in the dictionary
print(marbles.values())
# All the items in the dictionary
print(marbles.items())

None
0
dict_keys(['red', 'green', 'brown', 'yellow', 'orange', 'blue', 'purple'])
dict_values([34, 32, 31, 29, 34, 23, 36])
dict_items([('red', 34), ('green', 32), ('brown', 31), ('yellow', 29), ('orange', 34), ('blue', 23), ('purple', 36)])


The last three methods return special sequence types which are read-only views of various properties of the dictionary. We cannot edit them directly, but they will be updated when we modify the dictionary. We most often access these properties because we want to iterate over them (something we will discuss in the next chapter), but we can also convert them to other sequence types if we need to.

We can check if a **key** is in the dictionary using `in` and `not in`:

In [40]:
print("purple" in marbles)
print("white" not in marbles)

True
True


We can also check if a **value** is in the dictionary using in in conjunction with the `values` method:

In [41]:
print("Smith" in surnames.values())

False


### Removing key-value pairs
   
   You can use the `del` statement to remove a key-value pair:

In [42]:
del marbles["brown"]  # delete key-value pair identified by the key: "brown"
print(marbles)

{'red': 34, 'green': 32, 'yellow': 29, 'orange': 34, 'blue': 23, 'purple': 36}


### Nesting

You can store multiple dictionaries in a list, or a list of items as a value in a dictionary, i.e. *nesting*. You can nest dictionaries inside a list, a list of items inside a dictionary, or even a dictionary inside another dictionary:

In [43]:
## nesting a list of dictionaries

# we first create 3 dictionaries describing car properties
car_0 = {'brand': 'toyota', 'colour': 'silver'}
car_1 = {'brand': 'honda', 'colour': 'blue'}
car_2 = {'brand': 'mercedes', 'colour': 'white'}

# create a list of cars with above properties
cars = [car_0, car_1, car_2]
print(cars)


## nesting a list in a dictionary
pizza = {
    'crust':    'thick',
    'toppings': ['mushrooms', 'extra cheese'],  # list of topping ingredients
}

print(f"\nYou ordered a {pizza['crust']}-crust pizza with the following toppings: ")

# loop through list of topping and print each item
for topping in pizza['toppings']:
    print(f"\t{topping}")
    
    
## nesting a dictionary in a dictionary
users = {
    'aeinstein': {
        'first': 'albert',
        'last' : 'einstein',
        'location': 'princeton',
    },
    'mcurie': {
        'first': 'marie',
        'last' : 'curie',
        'location': 'paris',
    }
}

# loop through key-value pairs in dictionary and within each value, print every nested values: first, last and location
for username, user_info in users.items():
    print(f"\nUsername: {username}")
    full_name = f"{user_info['first']} {user_info['last']}"
    location = user_info['location']
    
    print(f"\tFull name: {full_name.title()}")
    print(f"\tLocation: {location.title()}")

[{'brand': 'toyota', 'colour': 'silver'}, {'brand': 'honda', 'colour': 'blue'}, {'brand': 'mercedes', 'colour': 'white'}]

You ordered a thick-crust pizza with the following toppings: 
	mushrooms
	extra cheese

Username: aeinstein
	Full name: Albert Einstein
	Location: Princeton

Username: mcurie
	Full name: Marie Curie
	Location: Paris


You will notice in the examples above, I have used `for` loops to run through all entries in a list or dictionaries, performing the same task with each item. I will explain more in-depth about loops in the next chapter.

## N-dimensional sequences

Most of the sequences we have seen so far have been one-dimensional: each sequence is a row of elements. What if we want to use a sequence to represent a two-dimensional data structure, which has both rows and columns? The easiest way to do this is to make a sequence in which each element is also a sequence. For example, we can create a list of lists:

In [44]:
my_table = [
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9],
    [10, 11, 12],
]

The outer list has four elements, and each of these elements is a list with three elements (which are numbers). To access one of these numbers, we need to use two indices – one for the outer list, and one for the inner list:

In [45]:
print(my_table[0][0])

# lists are mutable, so we can do this
my_table[0][0] = 42

print(my_table[0][0])

1
42


We can also make a three-dimensional sequence by making a list of lists of lists:

In [46]:
my_3d_list = [
    [[1, 2], [3, 4]],
    [[5, 6], [7, 8]],
]

print(my_3d_list[0][0][0])

1


## Converting between collection types

### Implicit conversions

If we try to iterate over a collection in a `for` loop, Python will try to convert it into something that we can iterate over if it knows how to. For example, the dictionary we saw above are not actually iterators, but Python knows how to make them into iterators – so we can use them in a `for` loop without having to convert them ourselves.

Sometimes the iterator we get by default may not be what we expected – if we iterate over a dictionary in a `for` loop, we will iterate over the keys. If what we actually want to do is iterate over the values, or key and value pairs, we will have to specify that ourselves by using the dictionary’s values or items instead.

### Explicit conversions

We can convert between the different sequence types quite easily by using the type functions to cast sequences to the desired types – just like we would use `float` and `int` to convert numbers:


In [47]:
animals = ['cat', 'dog', 'goldfish', 'canary', 'cat']

animals_set = set(animals)
animals_unique_list = list(animals_set)
animals_unique_tuple = tuple(animals_unique_list)


We have to be more careful when converting a dictionary to a sequence: do we want to use the keys, the values or pairs of keys and values?

In [48]:
marbles = {"red": 34, "green": 30, "brown": 31, "yellow": 29 }

colours = list(marbles) # the keys will be used by default
counts = tuple(marbles.values()) # but we can use a view to get the values
marbles_set = set(marbles.items()) # or the key-value pairs

We can also convert a sequence to a dictionary, but only if it’s a sequence of pairs – each pair must itself be a sequence with two values:

In [49]:
# This works
dict([(1, 2), (3, 4)])

{1: 2, 3: 4}

In [50]:

# Python doesn't know how to convert this into a dictionary
dict([1, 2, 3, 4])

TypeError: cannot convert dictionary update sequence element #0 to a sequence

## Another look at string
Strings are also a kind of sequence type – they are sequences of characters. We can find the length of a string or the index of a character in the string, and we can access individual elements of strings or slices:

In [51]:
s = "abracadabra"

print(len(s))
print(s.index("a"))

print(s[0])
print(s[3:5])

11
0
a
ac


The membership operator has special behaviour when applied to strings: we can use it to determine if a string contains a single character as an element, but we can also use it to check if a string contains a substring:

In [52]:
print('a' in 'abcd') # True
print('ab' in 'abcd') # also True

# this doesn't work for lists
print(['a', 'b'] in ['a', 'b', 'c', 'd']) # False

True
True
False


We can easily convert a string to a list of characters:

In [53]:
abc_list = list("abracadabra")
print(abc_list)

['a', 'b', 'r', 'a', 'c', 'a', 'd', 'a', 'b', 'r', 'a']


What if we want to convert a list of characters into a string? Using the `str` function on the list will give us a printable string of the list, including commas, quotes and brackets, which we may not want. To join a sequence of characters (or longer strings) together into a single string, we have to use `join`.

`join` is not a function or a sequence method – it’s a string method which takes a sequence of strings as a parameter. When we call a string’s `join` method, we are using that string to glue the strings in the sequence together. For example, to join a list of single characters into a string, with no spaces between them, we call the `join` method on the empty string:

In [54]:
l = ['a', 'b', 'r', 'a', 'c', 'a', 'd', 'a', 'b', 'r', 'a']

s = "".join(l)
print(s)

abracadabra


We can use any string we like to join a sequence of strings together:

In [55]:
animals = ('cat', 'dog', 'fish')

# a space-separated list
print(" ".join(animals))

# a comma-separated list
print(",".join(animals))

# a comma-separated list with spaces
print(", ".join(animals))

cat dog fish
cat,dog,fish
cat, dog, fish


The opposite of joining is *splitting*. We can split up a string into a list of strings using the `split` method. If called without any parameters, `split` divides up a string into words, using any number of consecutive whitespace characters as a delimiter. We can use additional parameters to specify a different delimiter as well as a limit on the maximum number of splits to perform:

In [56]:
print("cat    dog fish\n".split())
print("cat|dog|fish".split("|"))
print("cat, dog, fish".split(", "))
print("cat, dog, fish".split(", ", 1))

['cat', 'dog', 'fish']
['cat', 'dog', 'fish']
['cat', 'dog', 'fish']
['cat', 'dog, fish']
