# Notebook №5. Information systems

by a student of the IS-20-1 group, Khromenko Danil.
<br>

## Python programming for data collection and analysis

### Dictionaries

Consider this problem: we have information about students' grades in a certain subject and we
want to be able to work with this information — for example, by the name of a student to determine
what grade he received. We could try to solve this problem by creating two lists — one with
the names of students and the other with grades:

In [1]:
#Ilya got 5, Andrey got 3, etc.
students = ["Ilya", "Andrey", "Sergey", "Daniil"]
grades = [5, 3, 4, 2]

It would be nice if we could have a data type in which elements are numbered
not by natural numbers, but by arbitrary objects. This data type exists: in Python it
is called a dictionary.

In [2]:
#This is how you can create a dictionary in Python:
gradebook = {"Ilya": 5, "Andrey": 3, "Sergey":4, "Daniil": 2}

This is similar to creating a list, but there are a number of differences. First, we used curly brackets
instead of square ones to show that we are creating a dictionary. Secondly, the dictionary consists of
entries, each entry consists of two parts: a key and a value. The key and value
are separated by a colon. For example, we have the entry "Andrey": 3 with the key "Andrey" and the value 3 .
In total, our gradebook dictionary now contains four entries, the keys of which are the names
of students, and the values are their grades.

In [3]:
#display the dictionary on the screen
gradebook

{'Ilya': 5, 'Andrey': 3, 'Sergey': 4, 'Daniil': 2}

Note that Python reordered the entries in the dictionary when printing. In fact, the order of output
of entries in the dictionary is arbitrary: entries inside the dictionary have no order.
Therefore, you cannot access, for example, the "first record", but you can access the record with
this key:

In [4]:
#output the value of the key "Daniil"
gradebook['Daniil']

2

In [5]:
#output the value of the key "Ilya"
gradebook['Ilya']

5

In [6]:
#You can change the value of an entry, just like you can change a list item.
gradebook['Andrey'] = 5
gradebook

{'Ilya': 5, 'Andrey': 5, 'Sergey': 4, 'Daniil': 2}

In [7]:
#You can add a new entry
gradebook['Maria'] = 4
gradebook

{'Ilya': 5, 'Andrey': 5, 'Sergey': 4, 'Daniil': 2, 'Maria': 4}

In [8]:
#If we try to access a record that does not exist, we will receive an error message:
gradebook['Sasha']

KeyError: 'Sasha'

Often we want to be able to request a record, and if there isn't one, get some kind of "default value", not an error. To do this, use the get() method instead
of square brackets.

In [9]:
#getting a record using the get() method
gradebook.get('Alice')

In [10]:
#None has returned here :
print(gradebook.get('Alice'))

None


In [11]:
#getting a record using the get() method
gradebook.get('Andrey')

5

It would be possible to pass get() the second argument, and then if there is no such key in the dictionary,
it will be returned.

In [12]:
#if there is no key, it will return the second argument of the get() method
gradebook.get('Alice', 'No such student')

'No such student'

In [13]:
#if there is no key, it will return the second argument of the get() method
gradebook.get('Daniil', 'No such student')

2

In [14]:
#You can get a list of all the dictionary keys:
gradebook.keys()

dict_keys(['Ilya', 'Andrey', 'Sergey', 'Daniil', 'Maria'])

In fact, it's not really a list, but this thing behaves almost like a list and you can
make a list out of it. Similarly with a list of all dictionary values.

In [15]:
#You can get a list of all dictionary values:
gradebook.values()

dict_values([5, 5, 4, 2, 4])

Dictionary keys can be not only lines. Let's say we want to create a dictionary in which
the keys are numbers. There is nothing easier:

In [22]:
#creating a list with number keys
squares={1:1, 2:4, 3:9}

In [17]:
squares

{1: 1, 2: 4, 3: 9}

In [18]:
#In the next two lines, squares behaves roughly like a list, but if you look closely, 
#you can see that this is not a list, but still a dictionary.
print(squares[1])
print(squares[2])

1
4


In [19]:
#For example, any non-empty list has an element with index 0, but squares does not have such an element:
#we refer to a non-existent list key
squares[0]

KeyError: 0

### Enumeration entries in the dictionary

How to process information in the dictionary? To iterate through all the elements of the list , you could
use the for loop . And what happens if you feed him a dictionary instead of a list? Let's try:

In [20]:
#trying to iterate through the dictionary with a for loop
for i in gradebook:
    print(i)

Ilya
Andrey
Sergey
Daniil
Maria


I see! The for loop in this case iterates through all the keys of our dictionary. And knowing the key, you can
get the value:

In [25]:
#getting all keys and values using the for loop
for i in gradebook:
    print("Student", i, "has a grade", gradebook[i])

Student Ilya has a grade 5
Student Andrey has a grade 5
Student Sergey has a grade 4
Student Daniil has a grade 2
Student Maria has a grade 4


However, there is a more elegant way to get the key and value of the next record at once: use items().

In [26]:
#getting all keys and values using the for loop (a slightly different method)
for i, k in gradebook.items():
    print("Student", i, "has a grade", k)

Student Ilya has a grade 5
Student Andrey has a grade 5
Student Sergey has a grade 4
Student Daniil has a grade 2
Student Maria has a grade 4


How does this code work? Here the items() method is used, which returns a list (more precisely,
an iterator) consisting of tuples of the form (key, value).

In [28]:
#creating a list of tuples from a dictionary
list(gradebook.items())

[('Ilya', 5), ('Andrey', 5), ('Sergey', 4), ('Daniil', 2), ('Maria', 4)]

In this case, the for operator understands that it is necessary to select the next tuple each time the loop passes
and assign its first element (that is, the key) to the variable i, and the second element (that is
, the value) to the variable k (of course, these variables could be called differently). 
We have already met with similar behavior when discussing the enumerate construction.

In [29]:
#this way you can find all records with a given value
for k, v in gradebook.items():
     if v==4:
         print(k)

Sergey
Maria


Note that such a "value search" requires going through all entries in the dictionary and if the dictionary
is large, it will take a lot of time — although the "key search" will still
be performed quickly. By the way, you can quickly check if there is an entry with this
key in the dictionary:

In [30]:
#checking whether an entry with this key exists in the dictionary
"Andrey" in gradebook

True

In [31]:
#checking whether an entry with this key exists in the dictionary
"Michael" in gradebook

False

If we wanted to search among the values, we would have to explicitly specify this using the method
values() :

In [32]:
#checking whether an entry with this value exists in the dictionary
1 in gradebook.values()

False

In [33]:
#checking whether an entry with this value exists in the dictionary
5 in gradebook.values()

True

In general, the in operator is not limited only to use with dictionaries: it can
be used, for example, with lists:

In [34]:
#the in operator when working with lists
5 in [1,2,3,5,8]

True

In [35]:
#the in operator when working with lists
6 in range(1,5)

False

### Creating dictionaries and zip() function

There are different ways to create dictionaries. For example, you can create an empty dictionary and gradually
fill it with elements:

In [36]:
#creating an empty dictionary
my_dict = {}

In [37]:
#filling the dictionary with values
my_dict[1] = 1
my_dict['hello'] = 'world'

In [38]:
#output of dictionary keys and values
my_dict

{1: 1, 'hello': 'world'}

Note that elements of different types (in this case, strings and integers) get along well in the same dictionary. <br>
You can create a dictionary differently by passing the dict() function a list consisting of key-value pairs (in
a sense, this is the reverse operation of the items() method):

In [39]:
#using the dect() function to fill the dictionary
my_dict = dict([('hello','world'), ('one', 'two')])

In [40]:
my_dict

{'hello': 'world', 'one': 'two'}

Let's say we have two lists, one contains the names of students, and the other contains their grades. How can
we create a dictionary from these lists for which names would be keys and grades would be values? Like this:

In [44]:
#creating a dictionary using two lists using the zip() function
students = ["Ilya", "Andrey", "Sergey", "Daniil"]
grades = [5, 2, 4, 3]
new_gradebook = list(zip(students,grades))
new_gradebook

[('Ilya', 5), ('Andrey', 2), ('Sergey', 4), ('Daniil', 3)]

A convenient zip() function is used here, the use of which is not limited to creating
dictionaries. Like a zipper, it "fastens" (hence the name) several lists.
For example, zip() makes a list of pairs from a pair of lists:

In [45]:
#zip() makes a list of pairs from a pair of lists
list(zip([1,2,3],['a','b','c']))

[(1, 'a'), (2, 'b'), (3, 'c')]

This construction can be used when we need to iterate over the elements of two interconnected
lists. For example, this is how you can output information about which student has what grade, without using dictionaries:

In [46]:
#iterating through two lists without using dictionaries
for student, grade in zip(students, grades):
    print(student, "has grade", grade)

Ilya has grade 5
Andrey has grade 2
Sergey has grade 4
Daniil has grade 3


In [47]:
#The zip() function can also be used with more than two lists:
list(zip([1,2,3,4], [5,6,7,8], ['a','b','c','d']))

[(1, 5, 'a'), (2, 6, 'b'), (3, 7, 'c'), (4, 8, 'd')]

In [48]:
#If one of the lists turns out to be shorter, zip() will "trim" the rest of the lists:
list(zip([1,2,3], ['a','b']))

[(1, 'a'), (2, 'b')]

### Which objects can be dictionary keys

So far, we have considered dictionaries whose keys are strings and numbers. In fact,
keys can be more complexly arranged objects. For example, imagine such
an implementation of a fragment of the addition table in the form of a dictionary:

In [50]:
#implementation of a fragment of the addition table in the form of a dictionary
sums = {(2,3): 5, (4, 1): 5, (5, 7): 12}
sums

{(2, 3): 5, (4, 1): 5, (5, 7): 12}

Here, the keys are tuples consisting of two numbers, and the values are the sums of these numbers.

In [53]:
#output of list values by key
print(sums[(2,3)])
print(sums[(5,7)])

5
12


At this point, an important difference between tuples and lists appears: the latter cannot be
dictionary keys, since they can change

In [55]:
#lists cannot be dictionary keys because they can change
sums = { [1,2]: 3, [4, 1]: 5, [5, 7]: 12}

TypeError: unhashable type: 'list'

### List comprehensions

We have often faced such a task before: a list is given in which the numbers are written, but in the form
of lines. Create a new list in which the numbers would be numbers. We could solve this problem
using a loop:

In [57]:
#converting a list of strings to a list of numbers
str_list = ["1", "5", "12", "7"]
int_list = []
for s in str_list:
     int_list.append(int(s))
print(str_list)
print(int_list)

['1', '5', '12', '7']
[1, 5, 12, 7]


Three lines are responsible for creating a new list. Writing them every time is pretty boring, and
the creators of Python came up with (or rather, borrowed from functional programming languages, and
those borrowed it from mathematicians) a much more elegant syntax. It is arranged like this:

In [59]:
#converting a list of strings to a list of numbers
int_list = [int(s) for s in str_list]

The square brackets around the expression should suggest that we are creating a list (because when
we need to create a list, we usually enclose its elements in square brackets). The expression inside
the brackets should be read literally:<br>
a list consisting of int(s) elements for (for ) s elements from the (in ) list str_list

In [61]:
#list output
int_list

[1, 5, 12, 7]

See? The quotes have disappeared — we have a list of numbers in front of us. Magic! The original list
of str_list has not changed at the same time:

In [62]:
#list output
str_list

['1', '5', '12', '7']

Similarly, you can apply any operation to the list items. For example, let's square all the
elements from int_list:

In [64]:
#squaring the elements of a list of numbers
[x**2 for x in int_list]

[1, 25, 144, 49]

In [66]:
#doubling all list items
double_int_list = [x*2 for x in int_list]
double_int_list

[2, 10, 24, 14]

In [67]:
#or convert the list items into floating point numbers
[float(x) for x in int_list]

[1.0, 5.0, 12.0, 7.0]

As you can see, you can do anything with the list items! However, that's not all. In the syntax
list inclusions can be filtered. For example, we only need those elements
that are greater than 6. We can select them this way:

In [68]:
#output of list items whose value is greater than 6
[x for x in int_list if x > 6]

[12, 7]

When we write x for x here, we mean that we just need to substitute
the elements of the old one into the new list without doing anything with them (only choosing the right ones). But you can also modify them somehow:

In [69]:
#output of squared list items whose value is greater than 6
[x**2 for x in int_list if x > 6]

[144, 49]

Now let's solve this problem: there are two lists with numbers, and we want to find their element sum.

In [70]:
#there are two lists with numbers
X = [2, 5, 8]
Y = [1, 3, 100]

It can be solved in this way (to iterate over the elements of two lists at the same time, we use
the zip() construction discussed above):

In [74]:
#нахождение суммы их элементов
Z = [x+y for x, y in zip(X, Y)]
Z

[3, 8, 108]

By the way, you can use a syntax similar to list inclusion to create dictionaries:

In [75]:
#you can use a syntax similar to list comprehensions to create dictionaries.
squared = {i: i**2 for i in range(10)}
squared

{0: 0, 1: 1, 2: 4, 3: 9, 4: 16, 5: 25, 6: 36, 7: 49, 8: 64, 9: 81}

### Map() function

The list inclusions have an analogue, which is now considered not too convenient, but sometimes
occurs: the map() function.

In [77]:
#output a list of strings
str_list

['1', '5', '12', '7']