# Notebook №5. Information systems

by a student of the IS-20-1 group, Khromenko Danil.
<br>

## Python programming for data collection and analysis

### Dictionaries

Consider this problem: we have information about students' grades in a certain subject and we
want to be able to work with this information — for example, by the name of a student to determine
what grade he received. We could try to solve this problem by creating two lists — one with
the names of students and the other with grades:

In [1]:
#Ilya got 5, Andrey got 3, etc.
students = ["Ilya", "Andrey", "Sergey", "Daniil"]
grades = [5, 3, 4, 2]

It would be nice if we could have a data type in which elements are numbered
not by natural numbers, but by arbitrary objects. This data type exists: in Python it
is called a dictionary.

In [2]:
#This is how you can create a dictionary in Python:
gradebook = {"Ilya": 5, "Andrey": 3, "Sergey":4, "Daniil": 2}

This is similar to creating a list, but there are a number of differences. First, we used curly brackets
instead of square ones to show that we are creating a dictionary. Secondly, the dictionary consists of
entries, each entry consists of two parts: a key and a value. The key and value
are separated by a colon. For example, we have the entry "Andrey": 3 with the key "Andrey" and the value 3 .
In total, our gradebook dictionary now contains four entries, the keys of which are the names
of students, and the values are their grades.

In [3]:
#display the dictionary on the screen
gradebook

{'Ilya': 5, 'Andrey': 3, 'Sergey': 4, 'Daniil': 2}

Note that Python reordered the entries in the dictionary when printing. In fact, the order of output
of entries in the dictionary is arbitrary: entries inside the dictionary have no order.
Therefore, you cannot access, for example, the "first record", but you can access the record with
this key:

In [4]:
#output the value of the key "Daniil"
gradebook['Daniil']

2

In [5]:
#output the value of the key "Ilya"
gradebook['Ilya']

5

In [6]:
#You can change the value of an entry, just like you can change a list item.
gradebook['Andrey'] = 5
gradebook

{'Ilya': 5, 'Andrey': 5, 'Sergey': 4, 'Daniil': 2}

In [7]:
#You can add a new entry
gradebook['Maria'] = 4
gradebook

{'Ilya': 5, 'Andrey': 5, 'Sergey': 4, 'Daniil': 2, 'Maria': 4}

In [8]:
#If we try to access a record that does not exist, we will receive an error message:
gradebook['Sasha']

KeyError: 'Sasha'

Often we want to be able to request a record, and if there isn't one, get some kind of "default value", not an error. To do this, use the get() method instead
of square brackets.

In [9]:
#getting a record using the get() method
gradebook.get('Alice')

In [10]:
#None has returned here :
print(gradebook.get('Alice'))

None


In [11]:
#getting a record using the get() method
gradebook.get('Andrey')

5

It would be possible to pass get() the second argument, and then if there is no such key in the dictionary,
it will be returned.

In [12]:
#if there is no key, it will return the second argument of the get() method
gradebook.get('Alice', 'No such student')

'No such student'

In [13]:
#if there is no key, it will return the second argument of the get() method
gradebook.get('Daniil', 'No such student')

2

In [14]:
#You can get a list of all the dictionary keys:
gradebook.keys()

dict_keys(['Ilya', 'Andrey', 'Sergey', 'Daniil', 'Maria'])

In fact, it's not really a list, but this thing behaves almost like a list and you can
make a list out of it. Similarly with a list of all dictionary values.

In [15]:
#You can get a list of all dictionary values:
gradebook.values()

dict_values([5, 5, 4, 2, 4])

Dictionary keys can be not only lines. Let's say we want to create a dictionary in which
the keys are numbers. There is nothing easier:

In [16]:
#creating a list with number keys
squares={1:1, 2:4, 3:9}

In [17]:
squares

{1: 1, 2: 4, 3: 9}

In [18]:
#In the next two lines, squares behaves roughly like a list, but if you look closely, 
#you can see that this is not a list, but still a dictionary.
print(squares[1])
print(squares[2])

1
4


In [19]:
#For example, any non-empty list has an element with index 0, but squares does not have such an element:
#we refer to a non-existent list key
squares[0]

KeyError: 0

### Enumeration entries in the dictionary

How to process information in the dictionary? To iterate through all the elements of the list , you could
use the for loop . And what happens if you feed him a dictionary instead of a list? Let's try:

In [20]:
#trying to iterate through the dictionary with a for loop
for i in gradebook:
    print(i)

Ilya
Andrey
Sergey
Daniil
Maria


I see! The for loop in this case iterates through all the keys of our dictionary. And knowing the key, you can
get the value:

In [21]:
#output of keys and values
for k in gradebook:
    print("Student", k, "has a grade", gradebook[k])

Student Ilya has a grade 5
Student Andrey has a grade 5
Student Sergey has a grade 4
Student Daniil has a grade 2
Student Maria has a grade 4


In [22]:
#there is a more elegant way to get the key and value of the next record at once: use items()
for k, v in gradebook.items():
    print("Student", k, "has a grade", v)

Student Ilya has a grade 5
Student Andrey has a grade 5
Student Sergey has a grade 4
Student Daniil has a grade 2
Student Maria has a grade 4


How does this code work? Here the items() method is used, which returns a list (more precisely, an iterator) consisting of tuples of the form (key, value).

In [23]:
#output a list of tuples from the dictionary (key, value)
list(gradebook.items())

[('Ilya', 5), ('Andrey', 5), ('Sergey', 4), ('Daniil', 2), ('Maria', 4)]

In this case, the for operator understands that it is necessary to select the next tuple each time the loop passes
and assign its first element (that is, the key) to the variable k, and the second element (that is
, the value) to the variable v (of course, these variables could be called differently). 
We have already met with similar behavior when discussing the enumerate construction.

In [24]:
#output of all students with a filter (students with a grade of 4)
for k, v in gradebook.items():
    if v == 4:
        print(k)

Sergey
Maria


Note that such a "value search" requires going through all entries in the dictionary and if the dictionary
is large, it will take a lot of time — although the "key search" will still
be performed quickly. By the way, you can quickly check if there is an entry with this
key in the dictionary:

In [25]:
#checking whether a student exists in the dictionary
"Maria" in gradebook

True

In [26]:
#checking whether a student exists in the dictionary
"John" in gradebook

False

If we wanted to search among values, we would have to explicitly specify this using the values() method :

In [27]:
#checking whether such a student's assessment exists in the dictionary
1 in gradebook.values()

False

In [28]:
#checking whether such a student's assessment exists in the dictionary
5 in gradebook.values()

True

In [29]:
#In general, the in operator is not limited only to use with dictionaries:
#it can be used, for example, with lists:
print(5 in [1,2,3,4,5])
print(0 in range(1,5))

True
False


### Creating dictionaries and zip() function

There are different ways to create dictionaries. For example, you can create an empty dictionary and gradually fill it with elements:

In [30]:
#creating an empty dictionary and filling it in
my_dict = {}
my_dict[1] = 1
my_dict['hello'] = 'world'
my_dict

{1: 1, 'hello': 'world'}

Note that elements of different types (in this
case, strings and integers) get along well in the same dictionary.<br>
You can create a dictionary differently by passing the dict() function a list consisting of key-value pairs (in
some sense, this is the reverse operation of the items() method):

In [31]:
#filling the dictionary with the dict() function
my_dict2 = dict([('hello','world'), ('one', 'two')])
my_dict2

{'hello': 'world', 'one': 'two'}

Let's say we have two lists, one contains the names of students, and the other contains their grades. How can
I create a dictionary from these lists for which names would be keys and grades would be values?
And that's it:

In [32]:
#creating a dictionary from two lists with the zip() function
students = ["Ilya", "Andrey", "Sergey", "Daniil"]
grades = [5, 3, 4, 2]
new_gradebook = list(zip(students,grades))
new_gradebook

[('Ilya', 5), ('Andrey', 3), ('Sergey', 4), ('Daniil', 2)]

A convenient zip() function is used here, the use of which is not limited to creating
dictionaries. Like a zipper, it "fastens" (hence the name) several lists.
For example, zip() makes a list of pairs from a pair of lists:

In [33]:
#zip() makes a list of pairs from a pair of lists
list(zip([1,2,3],['a','b','c']))

[(1, 'a'), (2, 'b'), (3, 'c')]

This construction can be used when we need to iterate over the elements of two interconnected
lists. For example, this is how you can output information about which student has what grade, without using dictionaries:

In [34]:
#using two lists as a dictionary
for student, grade in zip(students, grades):
     print(student, "has grade", grade)

Ilya has grade 5
Andrey has grade 3
Sergey has grade 4
Daniil has grade 2


In [35]:
#The zip() function can also be used with more than two lists:
list(zip([1,2,3,4], [5,6,7,8], ['a','b','c','d']))

[(1, 5, 'a'), (2, 6, 'b'), (3, 7, 'c'), (4, 8, 'd')]

In [36]:
#If one of the lists turns out to be shorter, zip() will "trim" the rest of the lists:
list(zip([1,2,3], ['a','b']))

[(1, 'a'), (2, 'b')]

### Which objects can be dictionary keys

So far, we have considered dictionaries whose keys are strings and numbers. In fact,
keys can be more complexly arranged objects. For example, imagine such
an implementation of a fragment of the addition table in the form of a dictionary:

In [37]:
#creating a dictionary whose keys are tuples
sums = {(2,3): 5, (4, 1): 5, (5, 7): 12}
sums

{(2, 3): 5, (4, 1): 5, (5, 7): 12}

In [38]:
#output of the key value in the dictionary
print(sums[(2,3)])
print(sums[(5,7)])

5
12


In [39]:
#At this point, an important difference between tuples and lists appears:
#the latter cannot be dictionary keys, since they can change
#we will get an error when trying to create a dictionary element whose key will be a list
sums = { [1,2]: 3}

TypeError: unhashable type: 'list'

### List comprehensions

We have often faced such a task before: a list is given in which the numbers are written, but in the form
of lines. Create a new list in which the numbers would be numbers. We could solve this problem
using a loop:

In [40]:
#creating a list of numbers as strings
str_list = ["1", "5", "12", "7"]
#creating a list of numbers
int_list = []
#turning a list of strings into a list of numbers using a loop
for s in str_list:
    int_list.append(int(s))
print(int_list)

[1, 5, 12, 7]


Three lines are responsible for creating a new list. Writing them every time is quite boring, and
the creators of Python came up with (or rather, borrowed from functional programming languages, and
those borrowed it from mathematicians) a much more elegant syntax. It is arranged like this:

In [41]:
#a more elegant way to turn a list of strings into a list of numbers
int_list = [int(s) for s in str_list]
int_list

[1, 5, 12, 7]

The square brackets around the expression should suggest that we are creating a list (because when
we need to create a list, we usually enclose its elements in square brackets). The expression inside
the brackets should be read literally.

In [42]:
#The original list of str_list has not changed at the same time:
str_list

['1', '5', '12', '7']

Similarly, you can apply any operation to the list items. For example, let's square all the
elements from int_list:

In [43]:
#list of squared numbers
[x**2 for x in int_list]

[1, 25, 144, 49]

In [44]:
#doubling list items
double_list = [x*2 for x in int_list]
double_list

[2, 10, 24, 14]

In [45]:
#adding units to the list items
[x+1 for x in int_list]

[2, 6, 13, 8]

In [46]:
#transformation of numbers from ordinary to real
[float(x) for x in int_list]

[1.0, 5.0, 12.0, 7.0]

As you can see, you can do anything with the list items! However, that's not all. In the syntax
list comprehensions can be filtered. For example, we only need those elements
that are greater than 6. We can select them this way:

In [47]:
#creating a list of old items if the item is greater than 6
[x for x in int_list if x > 6]

[12, 7]

When we write x for x here, we mean that we just need to substitute
the elements of the old one into the new list without doing anything with them (only choosing the right ones). But you can also modify them somehow:

In [48]:
#creating a list of old elements if the element is greater than 6 and squaring it
[x**2 for x in int_list if x > 6]

[144, 49]

Now let's solve this problem: there are two lists with numbers, and we want to find their element sum.

In [49]:
#creating two lists of numbers
X = [2, 5, 54]
Y = [1, 3, 69]

In [50]:
#the code using list comprehensions looks nicer than if we used a loop of several lines
[x+y for x,y in zip(X,Y)]

[3, 8, 123]

In [51]:
#By the way, you can use a syntax similar to list comprehension to create dictionaries:
squared = {i: i**2 for i in range(10)}
squared

{0: 0, 1: 1, 2: 4, 3: 9, 4: 16, 5: 25, 6: 36, 7: 49, 8: 64, 9: 81}

### Map() function

The list inclusions have an analogue, which is now considered not too convenient, but sometimes
occurs: the map() function.

In [52]:
str_list #output of our list of strings

['1', '5', '12', '7']

In [53]:
#using the map() function
int_list = list(map(int, str_list))
int_list

[1, 5, 12, 7]

The map() function takes two arguments. The first argument it takes is a function, after that
it applies this function to each of the list items. In general, entries like list(map(int,
str_list)) and [int(x) for x in str_list] are almost equivalent. <br>
When the action to be applied already exists as a function (as in the case of int), the
construction with map() looks even more concise than the list comprehension. But if we
need to do something less trivial, list comprehensions are clearly easier:

In [54]:
#if we need to do something less trivial, list comprehensions are clearly easier
[int(x)+1 for x in str_list]

[2, 6, 13, 8]

To implement this using map(), you need to declare a new function that will
return the value of the expression int(x)+1 and pass it to map().

In [55]:
#creating your own function
def my_func(x):
    return int(x)+1

#using your function in the map() function
list(map(my_func,str_list))

[2, 6, 13, 8]

For brevity, you can use lambda functions, but this approach is much less transparent than
list inclusions, and it is not recommended to use it now.

### A few words about efficiency

Using list inclusions is not only pleasant, but also useful: they work more efficiently than
code with a loop.

In [56]:
#creating a list of random numbers using list comprehensions
from random import random
from math import sqrt
N = 10000
mylist = [random() for _ in range(N)]

In [57]:
%%timeit

#the execution time of filling a new list with values from the old one extracted from the root
#we use a loop
newlist = []
for x in mylist:
    newlist.append(sqrt(x))

1.16 ms ± 144 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [58]:
%%timeit

#the execution time of filling a new list with values from the old one extracted from the root
#we use a list comprehension
newlist = [sqrt(x) for x in mylist]

778 µs ± 6.11 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


In [59]:
%%timeit

#the execution time of filling a new list with values from the old one extracted from the root
#we use a map() function
newlist = list(map(sqrt, mylist))

555 µs ± 1.53 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


As can be seen from these data (the magic word %time it allows you to measure how much time it takes
for some operation), the list inclusions are faster than a normal cycle. map() works at about
the same speed as the list inclusions (sometimes a little slower, sometimes a little faster).

### Complex data structures

Lists allow you to save a number of values, but often you need to be able to work with more
complex structures — for example, tables. Some programming languages have
two-dimensional arrays. An analogue of a two-dimensional array in Python is a "list of lists", that is, a
list whose elements are other lists. We have already met with something similar. <br>

Consider an example: a table in which the results of several homework assignments from
several students are recorded. (Let's say we assigned some numbers to students and therefore we don't need
to know who's name is.) It can be written as a list of lists, for example, by lines:

In [60]:
#creating a table (a list of lists)
table = [["HW1", "HW2", "HW3", "HW4"], [4, 3, 4, 4], [3, 4, 3, 4], [4, 5, 5, 4]]

Here, each element of the table list is a row of our table, that is, also a list.

In [61]:
#output what is in the third row and fourth column of the table
table[2][3]

4

What happened here?

In [62]:
#We first called the third row of the table using:
table[3]

[4, 5, 5, 4]

In [63]:
#And then from this third line, the fourth element was selected using [3]:
table[2][3]

4

In [64]:
#It would be possible to write this down in more detail:
row = table[2]
print(row[3])

4


In [65]:
#This is how you can print all the elements of the table line by line:
for row in table:
    print(*row)

HW1 HW2 HW3 HW4
4 3 4 4
3 4 3 4
4 5 5 4


Let's say now that we still want to know which student got what grade. Then we could
use a dictionary instead of a list of lists, which lists would have values:

In [66]:
#creating a table in the form of a dictionary
gradebook = {'Bill': [4, 3, 2], 'Alice': [3, 4, 5], 'Bob': [5, 5, 4]}

In [67]:
#let's find out what grade Bob got on the second homework
gradebook["Bob"][1]

5