# Further data types

## Tuples

Tuples are defined by round brackets and commas. Tuples are like frozen lists. They are 'immutable'. They are less flexible than lists but slightly lighter-weight. Use them instead of a list when the contents won't change and you want to safe-guard against this.

In [1]:
protocols = ['http', 'https', 'ftp', 'snmp', 'smtp']   # list: mutable
type(protocols)

list

Here is a ``tuple``, created with round brackets (parentheses) instead of square brackets:

In [2]:
days_of_week = ('M', 'Tu', 'W', 'Th', 'F')   # tuple: immutable
type(days_of_week)

tuple

In [3]:
protocols.append('telnet')   # okay: lists are mutable

In [4]:
days_of_week.append('Sat')  # doesn't work: tuples can't be modified

AttributeError: 'tuple' object has no attribute 'append'

In [5]:
protocols[0] = 'http'  # okay: lists are mutable

In [6]:
days_of_week[0] = 'Mon'     # doesn't work: tuples can't be modified

TypeError: 'tuple' object does not support item assignment

Here is another example: converting a sequence such as a ``list`` to a ``tuple``.

In [7]:
protocols2 = tuple(protocols)
protocols2

('http', 'https', 'ftp', 'snmp', 'smtp', 'telnet')

In [8]:
type(protocols2)

tuple

Why use a tuple? If you know you won't need to modify your data, then tuples can prevent you from accidentally modifying it. They are also marginally lighter-weight and faster than lists. Most importantly, you can use tuples as keys in dictionaries, whereas you cannot use lists.

A tuple of one element has a strange-looking syntax:

In [9]:
mytuple = (1,)

Without the comma, ``(1)`` is the same as an integer ``1``:

In [10]:
x = (1)
print(x)

1


Tuples are actually defined by the comma operator. The parentheses are actually optional.

In [11]:
another_tuple = 1,
print(another_tuple)
print(type(another_tuple))

(1,)
<class 'tuple'>


Tuples are useful in assigning multiple values at once and returning multiple values from a function. Here are some examples:

#### Example 1: "tuple unpacking" for multi-variable assignment

In [12]:
a, b = 1, 2
print(a)
print(b)

1
2


#### Example 2: the easiest way to swap the values of two variables:

In [13]:
b, a = a, b
print(a); print(b)

2
1


#### Example 3:
If you owe someone $1.65 and you have only 50c pieces to give them. How many 50c pieces do you give them and how much is left over?

In [14]:
# The built-in ``divmod()`` function returns a tuple: (quotient, remainder)
divmod(165, 50)

(3, 15)

You might use it like this:

In [15]:
num_fifty_cent_coins, leftover_cents = divmod(165, 50)
print(num_fifty_cent_coins)

3


#### Exercise

There is a function ``mkstemp()`` in the standard library's ``tempfile`` module which creates a new temporary file securely. It returns a tuple: ``(file_descriptor, name)``. Try using it as follows:

In [16]:
from tempfile import mkstemp
(fd, name) = mkstemp()

**Task:** examine the name of your new temporary file.

### Using tuples as keys in dictionaries

Keys of dictionaries don't have to be strings. They can be most kinds of objects (anything 'hashable'). This includes a tuple of strings, which can be quite useful.

In [28]:
hash(tuple())

3527539

In [29]:
hash(list())

TypeError: unhashable type: 'list'

Here we have keys representing pairs of currencies and the corresponding values being the exchange rates. The keys are length-2 tuples.

In [30]:
heights3 = {('Fred', 'Flintstone'): 176,
           ('Wilma', 'Flintstone'): 165,
           ('Fred', 'Hollows'): 183}

In [31]:
exchange_rates = {('USD', 'AUD'): 1.2152,
                  ('EUR', 'JPY'): 108.2322,
                  # ...
                 }

list(exchange_rates.keys())

[('USD', 'AUD'), ('EUR', 'JPY')]

Each key is a tuple (in round brackets -- tuples are described next). We can 'unpack' keys while looping like
this:

In [32]:
# Method 1: looping through keys
for currency_pair in exchange_rates:    # equivalent to "for ... in exchange_rates.keys():"
    (cur1, cur2) = currency_pair        # tuple unpacking
    # do stuff ...
    # e.g.
    print(cur1, ':', cur2, '=', exchange_rates[currency_pair])

USD : AUD = 1.2152
EUR : JPY = 108.2322


In [33]:
# Method 2: in-place 'tuple unpacking'
for (cur1, cur2) in exchange_rates:
    # do stuff ...
    # e.g.
    print(cur1, ':', cur2, '=', exchange_rates[cur1, cur2])

USD : AUD = 1.2152
EUR : JPY = 108.2322


With an automated system (or a lot more typing) we could create a dicitonary mapping all pairs of currencies in this way.

## Default Dictionaries and Counting

Dictionaries are useful for storing tallies or counts. One example is the number of occurrences of some event or statistic or message in a log file.

Here is a simple example of using a dictionary for counting manually:

In [35]:
event_counts = {}    # empty dictionary

# Count how often 'error' occurs in some log file:

# Have we already seen any error events?
if 'error' in event_counts:
    # Increment the error count:
    event_counts['error'] += 1
else:
    # First error we've seen. Set the count to 1.
    event_counts['error'] = 1

# ...
# and so on for all lines in a log file.

We can streamline counting using a "defaultdict" -- a kind of dictionary where new or unknown keys are associated with a certain value by default. Here, we will assume the flap counts of new interfaces start at zero.

In [36]:
from collections import defaultdict
event_counts = defaultdict(int) # the values (event counts) will be
                                # integers starting at zero by default

event_counts['error'] += 1
event_counts['warning'] += 1
event_counts['info message'] += 1
# etc.

In [37]:
# Now print out the flap counts:
for (event, count) in event_counts.items():
    print('Event "{}" occurred {} times'.format(event, count))

Event "info message" occurred 1 times
Event "error" occurred 1 times


Here is how we can automate this:

#### Exercise: word counts

Count the number of times each word occurs in ``/data/alice_in_wonderland.txt``.

**Tip:** Use ``counts = defaultdict(int)`` to initialize the dictionary.

## Sets

A set is an unordered collection of unique values.

Unordered...

In [44]:
{1,2,3} == {3,2,1}

True

Unique values...

In [45]:
{1,2,3} == {1,1,2,2,3,3}

True

Intersection of two sets...

In [46]:
{1,2,3} & {2,3,4}

{2, 3}

Union of two sets...

In [47]:
{1,2,3} | {2,3,4}

{1, 2, 3, 4}

Difference between two sets...

In [48]:
{1,2,3} - {2,3,4}   # items in the first set but not the second

{1}

A "set comprehension" is a natural extension of a list or dictionary comprehension:

In [49]:
{x*x for x in [1,1,2,3]}

{1, 4, 9}

### Exercise: Set intersection

Use a set intersection to find all of the OECD countries in Europe. The file ``oecd_stats.csv`` contains statistics about the OECD countries, while `Countries-Continents.csv` contains the continents for each country.

In [None]:
# See solutions/oecd_europe.py