# Introduction to Tuples and Dictionairies

This is part of a tutorial on python basic objects.

Authors: ['Arthur de Fluiter', 'Cheng-yu Lam']

This particular document deals with python's `tuple`, `namedtuple`, `dict`, `defaultdict` and `Counter`. 

**Note, it might be good to read through the Lists and Sets Tutorial first**

# Tuples

Python has a data structure for simple data points, like coordinates called `tuples`

In Python, a tuples are 
 - *static* - tuples cannot be altered after creation. 
 - *ordered* - the elements added first are in the tuple first
 - *non-type specific* - any element can be of a different type

They typically use braces and a comma seperated list:

In [1]:
simple_tuple = (1,2)          # tuple with 2 elements
large_tuple  = (1,2,3,4,5,6)  # tuple with 6 elements
single_tuple = (1,)           # tuple with 1 element

Tuples can, similar to lists and sets also be made with a comprehension:

In [2]:
large_tuple = tuple(i+1 for i in range(6))

**Note** like briefly mentioned above you cannot change the contents of a tuple after creation.
Because of this there is no way to add/append anything.

Similarly to lists they can be accessed with indices, slices though.

Example:

In [3]:
print(large_tuple)
print("index   3   = ", large_tuple[3])
print("indices 3-5 = ", large_tuple[3:5])

(1, 2, 3, 4, 5, 6)
index   3   =  4
indices 3-5 =  (4, 5)


Tuples are a large part of what makes Python readible compared to other languages, allowing for things like nicer for-loops, sorting, formatting etc.

We'll go over some of the things it enables, starting with packing/unpacking, especially in for loops.

Unpacking:

In [4]:
a, b, c, d  = (1,2,3,4) # a,b,c and d are assigned the 1st till 4th part of the tuple
print("a =", a, "b =", b, "c =", c, "d =", d)

# imagine the following conditions
# - tuples is has at least 3 elements to unpack
# - first and last must be in a and c
# - middle one(s) must be captured in b
a, *b, c = (1,2,3,4,5,6)  # Note in such an unpack, only one * is allowed
print(b)

a = 1 b = 2 c = 3 d = 4
[2, 3, 4, 5]


Similarly for loops can do the same with indices as we will see in a bit.

First we'll create a list of tuples to iterate over. We want a list of tuples of letters and their index in the alphabet

a has index 0, b has index 1 etc.

We'll store it as (index, letter) tuples

In [5]:
letters = [(0, 'a'), (1, 'b'), (2, 'c')]
print(letters)

[(0, 'a'), (1, 'b'), (2, 'c')]


In [6]:
# The normal way lists work, we loop over each element.
# Since theyre all tuples, t is also a tuple
for t in letters:
    print(t, t[0], t[1])

(0, 'a') 0 a
(1, 'b') 1 b
(2, 'c') 2 c


In [7]:
# Now we can use the same unpacking
# the same tuple is now unpacked into index and l
for index,l in letters:
    print((index,l), index, l)

(0, 'a') 0 a
(1, 'b') 1 b
(2, 'c') 2 c


Unpacking can also do one other thing when calling functions:
It can fill in the functions arguments for us.

Syntax for this is `function_name(*tuple)` or `function_name(*list)`. 

Example below:

In [8]:
print(1,2,3)

# same as
args = (1,2,3)
print(*args)

1 2 3
1 2 3


### `enumerate`

Earlier we create `letters` the list of indices and letters. 

We would like to not type this out for every letter manually, but we also don't want to make it ugly code.

This is where `enumerate` comes in. It takes something to loop over, and returns a tuple of the index of the element and the element. 

In the case below it will be `(0, 'a'), ..., (25, 'z')`

In [9]:
# some helper code
alphabet = "abcdefghijklmnopqrstuvwxyz"

def print_letters():
    print("printing letters")
    for index, l in letters:
        print("(%2d,%s)" % (index, l), end="\n" if ((index + 1) % 13) == 0 else " ")
    print("\n")
# ---------------------------
# actual code

# Naive code
i       = 0
letters = []
for letter in alphabet:
    letters.append((i, letter))
    i += 1
print_letters()
    
    
# Better code, with enumerate
letters = []
for index, letter in enumerate(alphabet):
    letters.append((index, letter))
print_letters()
    
# Best code, list comprehension and enumerate
letters = [ (index, letter) for index, letter in enumerate(alphabet)]
print_letters()

printing letters
( 0,a) ( 1,b) ( 2,c) ( 3,d) ( 4,e) ( 5,f) ( 6,g) ( 7,h) ( 8,i) ( 9,j) (10,k) (11,l) (12,m)
(13,n) (14,o) (15,p) (16,q) (17,r) (18,s) (19,t) (20,u) (21,v) (22,w) (23,x) (24,y) (25,z)


printing letters
( 0,a) ( 1,b) ( 2,c) ( 3,d) ( 4,e) ( 5,f) ( 6,g) ( 7,h) ( 8,i) ( 9,j) (10,k) (11,l) (12,m)
(13,n) (14,o) (15,p) (16,q) (17,r) (18,s) (19,t) (20,u) (21,v) (22,w) (23,x) (24,y) (25,z)


printing letters
( 0,a) ( 1,b) ( 2,c) ( 3,d) ( 4,e) ( 5,f) ( 6,g) ( 7,h) ( 8,i) ( 9,j) (10,k) (11,l) (12,m)
(13,n) (14,o) (15,p) (16,q) (17,r) (18,s) (19,t) (20,u) (21,v) (22,w) (23,x) (24,y) (25,z)




### `zip`

`zip` allows you to "transpose" lists, tuples and other iterables, like you would a matrix.

that is 

    zip([1,2,3,4], ['a', 'b', 'c', 'd'])        becomes [(1, 'a'), (2, 'b'), (3, 'c'), (4, 'd')] and
    zip((1, 'a'), (2, 'b'), (3, 'c'), (4, 'd')) becomes [1,2,3,4], ['a', 'b', 'c', 'd']

Or viewed as a table

  | col1 | col2 | col3
---- | ---- | ---- | ---- 
row1 |    2 |    3 |    4
row2 |    5 |    6 |    7

Becomes

   | row1 | row2
---- | ---- | ----
col1 |    2 |    5
col2 |    3 |    6
col3 |    4 |    7

And its the reverse.

Example: getting the individual letters back from `letters`

In [10]:
print_letters()

print("becomes\n")

indices, indiv_letters = zip(*letters)
print(indices)
print(indiv_letters)

printing letters
( 0,a) ( 1,b) ( 2,c) ( 3,d) ( 4,e) ( 5,f) ( 6,g) ( 7,h) ( 8,i) ( 9,j) (10,k) (11,l) (12,m)
(13,n) (14,o) (15,p) (16,q) (17,r) (18,s) (19,t) (20,u) (21,v) (22,w) (23,x) (24,y) (25,z)


becomes

(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25)
('a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z')


### string formatting

Often you will want to print something in a nice fashion, as misaligned text does not make people very happy. 

The "old way" of formatting strings in python is done with tuples. It works as follows:

    "format string" % (tuple, with, data)

The format string is very similar to C-style formatting (exact details [here](https://docs.python.org/2.4/lib/typesseq-strings.html))

In basis: 

 Symbol | Meaning
------  | --------
  `%%`  | just a percentage sign (escaped)
   `%`  | should be filled in with variable
  `%s`  | should be filled in with string (calls `str`) if not string
  `%d`  | should be printed as integer. 
  `%x`  | view argument as number and print as hexidecimal 
`%15s`  | filled with string, if string falls short of 15 characters, fills rest with spaces from left
`%-15s` | same as previous but fills from the right. 

Note that the nth argument in the tuple should correspond to the nth variable in the format string

For details, again, you can check out the documentation. 

Example below:

In [11]:
person_names =   ['Albert', 'Heijn', 'Stein']
person_incomes = [     20 ,     49 ,     10 ]

for name,income in zip(person_names, person_incomes):
    print("%-10s %5d" % (name, income))

print("\nrather than\n")

for name,income in zip(person_names, person_incomes):
    print(name, income)

Albert        20
Heijn         49
Stein         10

rather than

Albert 20
Heijn 49
Stein 10


** sorting **

Also you'll often come across a situation where you want to sort something based on something else, say you're dealing with persons, and you want to sort based on income. Tuples come in quite handy here as well.

Sorting with tuples happens as follows, by default the sorting algorithm looks at the first part of the tuple and sorts based first on that, after that the second one etc.

so sorting the following 
    
    (2, "def"), (2, "aaa"), (1, "abc") 

will output

    (1, "abc"), (2, "aaa"), (2, "def")


Alternatively, we can tell the sorting algorithm where to find the `key` which it sorts on

In [12]:
print("People sorted on salary low to high")
for income, name in sorted(zip(person_incomes, person_names)):
    print("%-10s %5d" % (name, income))

People sorted on salary low to high
Stein         10
Albert        20
Heijn         49


In [13]:
print("People sorted on salary high to low")
for income, name in sorted(zip(person_incomes, person_names), reverse=True):
    print("%-10s %5d" % (name, income))    

People sorted on salary high to low
Heijn         49
Albert        20
Stein         10


Here we keep the order of name first and income as second argument but sort based on the second argument with a function

In [14]:
# we want to sort based on the second argument, the income
def give_second(person):
    return person[1]

print("People sorted on salary high to low")
for name, income in  sorted(zip(person_names, person_incomes), key=give_second, reverse=True):
    print("%-10s %5d" % (name, income))   

People sorted on salary high to low
Heijn         49
Albert        20
Stein         10


## Named tuples

Accessing tuples by indexes (`t[0]`) is considered difficult to read, so to help out, named_tuples were created (you need to get them from the `collections` module)

They work the same as tuples in almost any aspect, except that you can access elements with nice names

In [15]:
from collections import namedtuple

# Create a new sort of tuple called a coordinate, consisting of an x, and a y
Coordinate = namedtuple('coordinate', ['x', 'y'])

# create a coordinate with a raw tuple
raw_coordinate = (1, 2)
proper_coordinate = Coordinate(1, 2)

# difficult to tell what argument is the x and what argument is the y
print(raw_coordinate)
print(raw_coordinate[0])   # this is the x
print(raw_coordinate[1])   # this is the y

# easier to tell the x and y apart
print(proper_coordinate)
print(proper_coordinate.x) # this is the x
print(proper_coordinate.y) # this is the y

(1, 2)
1
2
coordinate(x=1, y=2)
1
2


# dict

A very basic data type in python is the dict, short for dictionairy.

The idea behind it is to map one object to another, a simple example would be an actual dictionairy, which maps a word to its definition.

Or in python terminology, dicts consists of keys (the words) and a values (the desciptions), the keys as well as the values may be of any type. 

Going back to the dictionairy example, imagine we're trying to make an oxford `dict`. For instance if we look up the word `"python"` in the oxford dictionairy, we get these entries:

- A large heavy-bodied non-venomous snake occurring throughout the Old World tropics, killing prey by constriction and asphyxiation.
- Computing [mass noun] A high-level general-purpose programming language.

Let's find out how we can make this, starting with an empty dictionairy.

In [16]:
oxford = dict()  # long  form
oxford = {}      # short form

### Filling the `dict`

As mentioned before, `dicts` consist out of keys and values. 

You're trying to map one thing to another.

Let's see an example of this:

In [17]:
oxford['python'] = 'A snake'

That's all there is to it, now let's set some more entries

In [18]:
# needs to be created first, if it didnt exist already
oxford = dict() 

# filling the oxford with
# 'a' -> 'Letter 1 of alphabet'
# ...
# 'z' -> 'Letter 26 of alphabet'
for i, l in enumerate("abcdefghijklmnopqrstuvwxyz"):
    oxford[l] = 'Letter %d of the alphabet' % (i + 1)

Since python is trying to be as short and readable as possible, it is possible to set some key value pairs while creating a dictionairy.

In [19]:
# python's simple way of creating initialised dicts (stores 2 key,value pairs)
oxford = {
    'a' : 'First letter of alphabet',
    'z' : 'Last letter of alphabet'
}

# more complicated comprehension form, allows for 26 key value pairs to be created here
oxford = { l : 'Letter %d of the alphabet' % (i + 1) for i, l in enumerate("abcdefghijklmnopqrstuvwxyz") }

print(oxford['a'])

Letter 1 of the alphabet


### our oxford `dict`

Looking back we saw that `"python"` actually can mean 2 things, since we can only store one thing in `oxford['python']` we need to work around this.

We can do this by letting `oxford['python']` be a list rather than just a string:

In [20]:
oxford           = dict()
oxford['python'] = list()

# oxford['python'] is a list, so we can use the append function on it
oxford['python'].append('A large heavy-bodied non-venomous snake occurring throughout the Old World tropics.')
oxford['python'].append('Computing [mass noun] A high-level general-purpose programming language.')

#Let's see the definitions of python:
print("python:")
for definition in oxford['python']:
    print(" - ", definition)

python:
 -  A large heavy-bodied non-venomous snake occurring throughout the Old World tropics.
 -  Computing [mass noun] A high-level general-purpose programming language.


alternatively, it could be that a dictionairy entry does not exist, resulting in a key error:

In [21]:
# Wrapped in try ... except to show that an error is printed
try:
    oxford["key that wasn't added"]
except Exception as e:
    print(repr(e))

KeyError("key that wasn't added",)


### Advanced features of dicts

At times we want to know what is in a certain dict, loop over them etc.

We'll discuss those here

In [22]:
# create a larger dictionairy:

oxford = {
    'html' : [
        'Hypertext Markup Language, system for tagging text files to display World Wide Web pages.'
    ],
    'python' : [
        'A large heavy-bodied non-venomous snake occurring throughout the Old World tropics.',
        'A high-level general-purpose programming language.'
    ],
    'c' : [
        'The third letter of the alphabet.',
        'The Roman numeral for 100.'
        'A computer programming language.'
    ]
}

In [23]:
# Check whether a key exists in a dict
print("key that wasn't added" in oxford)
print("python" in oxford)

False
True


In [24]:
# Loop over keys
for k in oxford.keys():
    print("-", k)

- c
- html
- python


In [25]:
# Looping over just the dictionairy, loops over keys as well
for k in oxford:
    print("-", k)

- c
- html
- python


In [26]:
# Looping over values (note that you cannot easily get back to keys from here)
for value in oxford.values():
    print("- entry:")
    for description in value:
        print("    -", description)
    print("") # leave some space

- entry:
    - The third letter of the alphabet.
    - The Roman numeral for 100.A computer programming language.

- entry:
    - Hypertext Markup Language, system for tagging text files to display World Wide Web pages.

- entry:
    - A large heavy-bodied non-venomous snake occurring throughout the Old World tropics.
    - A high-level general-purpose programming language.



In [27]:
# Looping over key value pairs
for key, value in oxford.items():
    print("- entry", key)
    for description in value:
        print("    -", description)
    print("") # leave some space

- entry c
    - The third letter of the alphabet.
    - The Roman numeral for 100.A computer programming language.

- entry html
    - Hypertext Markup Language, system for tagging text files to display World Wide Web pages.

- entry python
    - A large heavy-bodied non-venomous snake occurring throughout the Old World tropics.
    - A high-level general-purpose programming language.



## `defaultdict` and `Counter`

**Note these objects need to be imported from collections**

At some point there was a need for 2 extra objects, which work for a large part the same as `dicts` with some modifications. 

The defaultdict is able to call what are called default constructors, and Counters... well they count stuff.

We'll see some examples.

In our previous example, with the oxford `dict` every value was supposed to be a list. The disadvantage is that before you can add a definition it needs the list to be there

    # assumes we already made a list at oxford['Ruby']
    oxford['Ruby'].append('precious stone') 
    
    # creates a new list, possibly overwriting a previous one
    oxford['Ruby'] = [ 'precious stone' ]
    
    # overwrites possible previous one and doesn't allow multiple definitions of 'ruby'
    oxford['Ruby'] = 'precious stone'       

Now in an attempt to make it easier for python developers, the `defaultdict` was created. 
This allows for default objects to be placed. For example

In [28]:
from collections import defaultdict

# Every item that is requested, but not set will be an empty list
oxford = defaultdict(list)

# add as if it was already in the dictionairy
oxford['item that didnt yet exist'].append('a new item')

print(oxford['item that didnt yet exist'])
print(oxford['another item that didnt yet exist'])

['a new item']
[]


Essentially, it changes the behaviour when you want to read from something that isn't in the dictionairy yet.

Usually that would result in an exception as seen before, but with the defaultdict, it creates an object with the given function.

In [29]:
# This also works with other objects/functions etc.
have_i_visited = defaultdict(bool) # bool returns False by default
print(have_i_visited['grandma'])   # not even, you monster

have_i_visited['grandma'] = True
print(have_i_visited['grandma'])   # better

False
True


### `Counter`

It takes... well, a number of objects and counts them. 

The object than works as a dictionairy, with all the keys being the objects it got, and the values how many times it found them:

In [30]:
from collections import Counter

to_count = ['a', 'a', 'b', 'a', 'a', 'a', 'b']
counter = Counter(to_count)

print("Letter - Frequency")
for key, value in counter.items():
    print(key,"      ",value)

Letter - Frequency
a        5
b        2


Additionally it offers the following interesting functions:

In [31]:
# elements()
# return the input back, by returning the elements x the times they occurred
for word in counter.elements():
    print(word, end=' ')

a a a a a b b 

In [32]:
# most_common(n)
# returns the n most common elements, in order of most common to least.
for word, freq in counter.most_common(2):
    print(word, " - ", freq)

a  -  5
b  -  2


In [33]:
# update()
# allows to add another input, which it will add to the counted words

# unimportant function that returns all words in the alice.txt file
def words_alice():
    with open("alice.txt") as f:
        for line in f:
            for word in line.split():
                yield word

# updating our counter with all the words in alice in wonderland
counter.update(words_alice())


for word, freq in counter.most_common(10):
    # don't pay attention to the formatting below, its just to align
    print("%-4s : %5d" % (word, freq))

the  :  1505
and  :   714
to   :   703
a    :   611
of   :   490
she  :   484
said :   416
it   :   346
in   :   344
was  :   328
