# Data Structures

Python includes built-in variable types for a number of fundamental data structures, including lists, tuples, sets, and dictionaries (sometimes called maps). 

In this notebook we will look at creating and using variables for each of these types.

## Lists

A *list* is an ordered collection of values. These values can have different types. Lists definitions are enclosed within square brackets.

In [1]:
# create an empty list
mylist = []       
mylist

[]

In [3]:
# create a list of 3 integers
numbers = [12, 108, 21]
numbers

[12, 108, 21]

In [5]:
# a list containing 4 different variables of different types
somedata = ["text", 7, 0.34, True]
somedata

['text', 7, 0.34, True]

Entries in a list are accessed by specifying the *index* in square brackets - i.e. the position of the value in the list. 

Note: We always count from 0 in Python, so the first entry in a list has index 0.

In [9]:
values = [34, 9, 12, 34]
values[-4]

34

We can count from the end of the list backwards by using negative index numbers. The index -1 is the last entry in the list, the index -2 is the second last entry, and so on.

In [11]:
values[-2]

12

**Nesting**: 

Lists can also be contained within other lists, which allows the construction of hierarchical data structures.

In [13]:
child1 = [12, 108, 23]
child2 = [99, 4]
child3 = ["a","b","c"]
parent = [ child1, child2, child3 ]
print(parent)

[[12, 108, 23], [99, 4], ['a', 'b', 'c']]


Values in nested lists can be accessed using multiple indexes in square brackets:

In [15]:
parent[0][2]

23

In [17]:
parent[2][1]

'b'

**Slicing**: 

Lists can also be *sliced* to access subsets of that list. The notation is [i:j], where *i* is the start index inclusive and *j* is the end index exclusive. Remember that we always count from index 0.

In [19]:
fulllist = [9, 12, 23, 18, 21]
fulllist[0:2] # start at 1st item, end before 3rd item

[9, 12]

In [21]:
fulllist[0:3]  # first three items

[9, 12, 23]

When slicing, the default for i is 0, default for j is the end of the string.

In [23]:
fulllist[1:] # all items from the 2nd one onewards

[12, 23, 18, 21]

In [25]:
fulllist[:4] # start at 1st item, end before 5th item

[9, 12, 23, 18]

**Modifiying lists**: 

Entries in a list can be changed after the list is created by specifying the index and performing assignment.

In [29]:
values = [34, 9, 12, 34]
values[2] = 5000
print(values)
values[5]

[34, 9, 5000, 34]


IndexError: list index out of range

If we try to assign a value to an index that is beyond the length of the list, we will get an error message. Instead, we can add a value to the end of a list using the *append()* function:

In [33]:
values.append("extra")
print(values)
values[5]

[34, 9, 5000, 34, 'extra', 'extra']


'extra'

We can also concatenate two or more lists together using the plus + operator:

In [35]:
values + [11, 27]

[34, 9, 5000, 34, 'extra', 'extra', 11, 27]

In [37]:
["A","B"] + ["Y","Z"]

['A', 'B', 'Y', 'Z']

We can insert a new value at a particular location in a list by using its associated *insert()* function. We specify the position and the value to insert as arguments. All the entries after that position are shifted to the right.

In [39]:
values.insert(2, 88)
values

[34, 9, 88, 5000, 34, 'extra', 'extra']

**Checking lists**: 

The *in* identity operator can be used to test if a value is contained in a list. The result is a boolean value.

In [43]:
mylist = [3,6,9,12]

In [45]:
3 in mylist

True

In [47]:
27 in mylist

False

The logical *not in* operator can be used to test if a value is missing from a list.

In [49]:
27 not in mylist

True

**Related functions:** 

A variety of built-in functions can be used with lists.

For instance, we can check the length of a list using the built-in *len()* function:

In [59]:
print(len(values))
values[6]

7


'extra'

We can sort the items in a list by a calling the *sort()* function on the list. Note that this sorts the list "in place" - i.e. the list itself is modified, rather than copied.

In [61]:
letters = ["b", "d", "a", "c"]
letters.sort()
print(letters)

['a', 'b', 'c', 'd']


We can also use the Python *sorted()* function, which returns a new sorted list, leaving the original list unchanged:

In [63]:
grades = ["B", "A", "C", "C", "A", "E"]
sorted(grades)

['A', 'A', 'B', 'C', 'C', 'E']

## Tuples

*Tuples* are like lists but are "immutable". This means that they cannot be modified after creation. 

Tuples are created using parenthesis notation, with values separated by commas.

In [65]:
suits = ("hearts", "diamonds", "spades", "clubs")
suits

('hearts', 'diamonds', 'spades', 'clubs')

Values in tuples are also accessed using the same square bracket index notation that we saw for lists.

In [67]:
suits[0]

'hearts'

In [69]:
suits[-1]

'clubs'

Like lists, different types of variables can be contained within the same tuple.

In [71]:
t = (123, True, "UCD", 123.23)
t

(123, True, 'UCD', 123.23)

However, unlike lists, we cannot modify the tuple once it has been defined. If we try to assign a new value to an index in the tuple, we will get an error message.

In [73]:
t[3] = 3435

TypeError: 'tuple' object does not support item assignment

## Sets

Sets are unordered lists which contain no duplicate values. Sets do not have an order, so we cannot index into them by position.

We can create a new set using curly brackets notation:

In [75]:
countries = {"Ireland", "Spain", "Italy", "Croatia"}
countries

{'Croatia', 'Ireland', 'Italy', 'Spain'}

In [93]:
# a set with 4 different types
mix = {"UCD", 2000, True, 15.6}
print(mix)

{'UCD', True, 15.6, 2000}


To make a set without any elements, we call the *set*() function without any argument:

In [97]:
elements = set()
print(elements, type(set()))

set() <class 'set'>


We can also create sets from lists, strings or any other iterable value, using the *set()* function:

In [103]:
mylist = [1, 3, 1, 4, 3, 6, 8, 1, 4, 4]
print(mylist)
set(mylist)

[1, 3, 1, 4, 3, 6, 8, 1, 4, 4]


{1, 3, 4, 6, 8}

Note that only the unique values from the original list are retained:

In [125]:
winners = ["Brazil", "Germany", "Argentina", "Italy", "Argentina", "Germany", "Brazil", "France", "Brazil", "Italy"]
print(type(winners))
set(winners)

<class 'list'>
{'Brazil', 'Argentina', 'France', 'Germany', 'Italy'} <class 'list'>


The 'in' membership operator also works on sets:

In [107]:
names = {"Bill", "Lisa", "Ted"}

In [109]:
'Bill' in names

True

In [113]:
'Sharon' not in names

True

**Modifying sets:** 

To add a single value to an existing set, we call its associated *add()* function:

In [117]:
names.add("Catherine")
names

{'Bill', 'Catherine', 'Lisa', 'Ted'}

Note that sets do not allow duplicates, so adding the same value multiple times has no effect:

In [119]:
names.add("Olivia")
names.add("Olivia")
names.add("Olivia")
names

{'Bill', 'Catherine', 'Lisa', 'Olivia', 'Ted'}

We can add multiple values to an existing set using its *update()* function. This function can take tuples, lists, strings or other sets as its argument.

In [127]:
names.update(["Bob", "Alice", "John"])
names

{'Alice', 'Bill', 'Bob', 'Catherine', 'John', 'Lisa', 'Olivia', 'Ted'}

**Comparing sets:** 

We can then calculate unions, intersections and differences between pairs of sets.

In [3]:
x = {1, 2, 3, 4}
y = {3, 4, 5, 6}

In [5]:
# which values are in both x and y?
print(x.intersection(y))

{3, 4}


In [133]:
# which are values are in either x or y, or both?
x.union(y)

{1, 2, 3, 4, 5, 6}

In [135]:
# which values are in x but not in y?
x.difference(y)

{1, 2}

In [137]:
# which values are in y but not in x?
y.difference(x)    

{5, 6}

We can convert a *set* to a *list* by calling the built-in *list()* function:

In [139]:
list(x)

[1, 2, 3, 4]

## Dictionaries

A *dictionary* (sometimes called a *map*) is a data structure consisting of *(key:value)* pairs. Each *key* is linked to a *value*, and keys are unique. 

Dictionaries can be created using curly bracket notation, and can either be initially empty or populated with one or more pairs.

In [141]:
# create an empty dictionary
d0 = {}
d0

{}

In [143]:
# create a dictionary containing two pairs 
d1 = {"Ireland":"Dublin", "France":"Paris"}
d1

{'Ireland': 'Dublin', 'France': 'Paris'}

In [145]:
# create a dictionary containing three pairs 
d2 = {"age":22, "name":"alice", 100:False}
d2

{'age': 22, 'name': 'alice', 100: False}

We can check the number of key-value pairs in a dictionary using the built-in *len()* function:

In [147]:
len(d2)

3

Note that types of keys and values in a dictionary can be mixed

In [149]:
mixedmap = {1:"ucd", 0.8:False, "b":10, "c":"d"}
mixedmap

{1: 'ucd', 0.8: False, 'b': 10, 'c': 'd'}

We can access a value in a dictionary by using the square bracket notation and specifying the corresponding key:

In [151]:
d1["Ireland"]

'Dublin'

In [153]:
d2["name"]

'alice'

In [155]:
mixedmap[1]

'ucd'

If we try to access a value for a non-existent key in a dictionary, we will get an error message:

In [157]:
d1["Sweden"]

KeyError: 'Sweden'

To avoid this type of error, check for the presence of a key in a dictionary using the **in** operator:

In [159]:
"Ireland" in d1

True

In [163]:
"Sweden" not in d1

True

We can easily add new values to a dictionary using square bracket notation and assignment. If a does not already exist for a given key, it will be added. 

In [189]:
d1["2"] = "3"
d1

{'Ireland': 'Cork', 'France': 'Paris', 'Germany': 'Berlin', '2': '3'}

If a value for the key exists, the previous value will be over-written.

In [167]:
d1["Ireland"] = "Cork"
d1

{'Ireland': 'Cork', 'France': 'Paris', 'Germany': 'Berlin'}

Dictionaries have various associated functions to access the keys and/or values.

In [169]:
# get only the keys from a dictionary
d2.keys()

dict_keys(['age', 'name', 100])

In [171]:
# get only the values from a dictionary
d2.values()

dict_values([22, 'alice', False])

In [185]:
# get all (key:value) pairs as tuples
d2.items()

dict_items([('age', 22), ('name', 'alice'), (100, False)])

In [183]:
d2

{'age': 22, 'name': 'alice', 100: False}