# Programming and data analysis
*Alla Tambovtseva, NRU HSE*

*Partly based on [lectures](http://math-info.hse.ru/s17/1) of I.V.Schurov, course "Programming on Python for data collection and data analysis" (NRU HSE).*

## Tuples and dictionaries

## Tuples

On the first sight, tuples do not differ significantly from lists. The first difference that lies on the surface is that in lists elements are indicated in square brackets and in tuples they go in round ones.

In [2]:
my_tuple = (1, 2, 4, 6, 9) # a tuple

We can access elements of a tuple in the same way as we do it for lists: 

In [3]:
my_tuple[0]

1

However, despite some similarities, tuples and lists are different structures. The main difference lies in the fact that tuples are immutable objects. In other words, we cannot change elements of a tuple:

In [4]:
my_tuple[2] = 65  # error

TypeError: 'tuple' object does not support item assignment

Sometimes this feature can be useful (some sort of protection for our data), sometimes not, but we have to learn different structures in Python so as not to be suprised in the future. 

If we want, we can convert a tuple into a list:

In [5]:
list(my_tuple)

[1, 2, 4, 6, 9]

And vice versa:

In [6]:
tuple([1,2,3])

(1, 2, 3)

If we look at methods on tuples (for example, type `my_tuple.` and press *Tab*), we will see that there are fewer methods compared to lists. It is all due to immutability of tuples. However, we still can combine several tuples by concatenating them: 

In [7]:
(1, 3) + (7, 8)

(1, 3, 7, 8)

## Dictionaries

While talking about dictionaries in Python, it is logical to think of common dictionaries (paper or electronic ones). What is a dictionary? A set of pairs *key-value* or *word-list of its meanings* if a word has several meanings. A dictionary in Python is an object, a structure, that also stores pairs of corresponding values.  

Let's assume that we want to create a dictionary to store a programme for the musical "Notre Dame de Paris". So, in the dicitionary `prog` we will store pairs *hero-actor*.

In [8]:
prog = {'Gringoire' : 'Pelletier', 
        'Frollo' : 'Lavoie', 'Phoebus': 'Fiori'}

The first element of each pair (before `:`) is called *a key* and the second element (after `:`) is called *a value*. Let's look at our dicitionary:

In [9]:
prog

{'Gringoire': 'Pelletier', 'Frollo': 'Lavoie', 'Phoebus': 'Fiori'}

### Accessing dictionary elements

When we want to access an element in a dictionary we should use the key. We write the name of a dictionary and then write a key in square brackets. For example, we can find who plays the role of Phoebus:

In [10]:
prog['Phoebus']

'Fiori'

And what if we ask for an element that is not in a dictionary?

In [11]:
prog['Esmeralda']

KeyError: 'Esmeralda'

In this case we get *KeyError* – there is no element with the key "Esmeralda"! 

Now imagine the following situation. We have a list of heroes (keys) and we want to loop over these heroes and print names of actors who play the role of each hero (values). However, one hero from this list is not included in the dictionary. So, Python will certainly stop at this hero and return an error. Awful, isn't it?  So as to avoid this problem, we can use `.get()`:

In [12]:
prog.get('Esmeralda') # no result, but no KeyError - great!

If we print the result, we will see that Python shows `None`:

In [13]:
print(prog.get('Esmeralda'))

None


Instead of `None` we can print `Not found`:

In [13]:
# if Esmeralda in prog, it returns value, if not – returns Not found
prog.get('Esmeralda', 'Not found') 

'Not found'

Let's add the element with the key 'Esmeralda':

In [15]:
prog['Esmeralda'] = 'Segara'
prog

{'Gringoire': 'Pelletier',
 'Frollo': 'Lavoie',
 'Phoebus': 'Fiori',
 'Esmeralda': 'Segara'}

As elements of a dictionary are pairs *key-value*, there should be the method to extract only keys or only values. So, there are methods `.keys()` и `values()`. Let's asks for keys:

In [17]:
prog.keys()

dict_keys(['Gringoire', 'Frollo', 'Phoebus', 'Esmeralda'])

The object above resembles a list, but it is not a simple list. It is a special object of type `dict_keys`. The same situation is with values:

In [20]:
prog.values()

dict_values(['Pelletier', 'Lavoie', 'Fiori', 'Noa'])

Dictionaries can contain not only strings, but elements of any type. One important detail: keys must be immutable (numbers, strings or tuples) and values can be both mutable or immutable (numbers, strings, tuples, lists and so on). For example, we can create a dictionary with pairs of integers *student id-grade*.

In [21]:
numbers = {1 : 7, 2 : 8, 3 : 9} 

In [22]:
# grade of student with id = 1
numbers[1] 

7

And now let's look at the dictionary that has lists as values (two english words and lists of their meanings in Russian):

In [25]:
my_dict = {'swear' : ['клясться', 'ругаться'], 
           'dream' : ['спать', 'мечтать']}

By the key we get the value – the list of meanings of the verb 'swear':

In [26]:
my_dict['swear']

['клясться', 'ругаться']

Now we can choose the first element, the first meaning:

In [27]:
my_dict['swear'][0] # 1st element

'клясться'

And now let us think how to print all pairs *key-value* in a loop. The first attempt:

In [31]:
for k in prog:
    print(k)

Gringoire
Frollo
Phoebus
Esmeralda


This attempt was not successful – we got only keys. Let's try other ways.

**Task:** for each hero in `prog` print the message like

    Fiori plays the role of Phoebus

**Solution:** get values by keys indicating keys in square brackets:

In [32]:
for k  in prog:
    print(prog[k], "plays the role of", k)

Pelletier plays the role of Gringoire
Lavoie plays the role of Frollo
Fiori plays the role of Phoebus
Noa plays the role of Esmeralda


It is helpful, but, actually, there is a special method `.items()` that we can use for this task:

In [33]:
for k, v in prog.items():
    print(k, v)

Gringoire Pelletier
Frollo Lavoie
Phoebus Fiori
Esmeralda Noa


So as to print both keys and values, we should list two variables in the for-loop. It is not necessary to call them `k` and `v` or `key` and `value`, Python will understand that the first value corresponds to the key and the second one – to the value. Let's see how `items()` looks like in a loop:

In [34]:
for hero, actor in prog.items():
    print(actor, "plays the role of", hero)

Pelletier plays the role of Gringoire
Lavoie plays the role of Frollo
Fiori plays the role of Phoebus
Noa plays the role of Esmeralda


If we look inside `prog.items()`, we will see that this structure resembles a list of tuples:

In [35]:
prog.items()

dict_items([('Gringoire', 'Pelletier'), ('Frollo', 'Lavoie'), ('Phoebus', 'Fiori'), ('Esmeralda', 'Noa')])

As we saw in examples with `.keys()` and `.values()`, the object returned is not a proper list, it is a special object of type `dict_items`. If we need a list, we should convert it explicitly:

In [36]:
list(prog.items())

[('Gringoire', 'Pelletier'),
 ('Frollo', 'Lavoie'),
 ('Phoebus', 'Fiori'),
 ('Esmeralda', 'Noa')]

Method `.items()` is helpful when we want to filter some items based on their values. For the final example let's take a dictionary with pairs *student-grade*:

In [14]:
grades = {"Viktor": 7, "Pete" : 9, "Nick" : 8, "Helena" : 8, 
          "Vasilisa" : 10}

Print out names of students who got grade 8:

In [15]:
for name, grade in grades.items():
    if grade == 8:
        print(name)

Nick
Helena


Only two students: Nick and Helena. 

And the last question: how to check whether a dictionary has an element with some key? Use `in` operator:

In [16]:
"Nick" in grades.keys()

True

In [17]:
"Ivan" in grades.keys()

False