# Notebook №6. Information systems

by a student of the IS-20-1 group, Khromenko Danil.
<br>

## Python programming for data collection and analysis

### Sorting. Formatting strings

****kwargs**

Let's talk about one method of passing arguments to a function using dictionaries. Recall two
ways of passing arguments:

In [1]:
#definition of the function that outputs x and y
def myfunc(x=0, y=1):
     print("x =", x)
     print("y =", y)

In [2]:
#the first method of passing arguments
myfunc(12,19)

x = 12
y = 19


In [3]:
#the second method of passing arguments
myfunc(y=2, x=5)

x = 5
y = 2


As the second example shows, arguments can be passed by specifying their names. Let's say we want
to write a function that accepts an indefinite number of named arguments (we
don't even know in advance which ones). This can be done as follows:

In [4]:
#a function that accepts an indefinite number of named arguments
def new_func(**kwargs):
    print(kwargs)

In [5]:
#calling a function with random arguments
new_func(x=1, y=54, z=69, s = "Some string")

{'x': 1, 'y': 54, 'z': 69, 's': 'Some string'}


Two asterisks in the definition of the new_func() function say the following: "all named parameters
passed to this function should be placed in the kwargs dictionary." As you can see, this is exactly how it
works: for example, the parameter x=1 turned into an entry 'x': 1 in the kwargs dictionary. However, this
happens only with "ownerless" parameters: if the function had a separate parameter x, it
would not have been included in kvargs.

In [6]:
#a function with a specific argument x and kwargs
def other_func(x, **kwargs):
     print(kwargs)
other_func(x=1, y=54, z=69)

{'y': 54, 'z': 69}


### Sorting

Sorting — that is, arranging the list items in a certain order -
is a common programming task. There are two main tools for sorting lists in Python. The first is the sort() method, which performs sorting in place, that is, inside
the list itself.

In [7]:
#creating a list of numbers
my_list = [9, 69, 2, 54, 7, 12, 8]

In [8]:
#sorting and list output
my_list.sort()
my_list

[2, 7, 8, 9, 12, 54, 69]

The sort() method changes the source list (and therefore, by the way, can only work with lists —
tuples do not have such a method). If you want to create a new list instead, you should
use the sorted() function.

In [9]:
#creating a list of numbers
my_list = [9, 69, 2, 54, 7, 12, 8]
#creating a new sorted list and displaying it
sorted_list = sorted(my_list)
sorted_list

[2, 7, 8, 9, 12, 54, 69]

In [10]:
#So we created a new list. The old one remained unchanged.
my_list

[9, 69, 2, 54, 7, 12, 8]

The sorted() function can be applied not only to lists, but also to immutable sequences.
— for example, to tuples. The output is always a list.

In [11]:
#sorting tuples by the sorted method
my_tuple = (7, 1, 2, 6)
print (sorted(my_tuple))

[1, 2, 6, 7]


### Sorting rows

You can sort lists consisting not only of numbers, but also of more complex objects — only
they would be able to compare them with each other. For example, strings can be compared with each other — they
are ordered in lexicographic order, that is, "alphabetically" and the way they would go in a dictionary
(meaning a regular paper dictionary, not a Python data type).

In [12]:
#string comparison
print("abcd" < "b")
print("abcd" < "addd")
print("a" < "aa")

True
True
True


In [13]:
#This is how sorting a list of strings looks like:
str_list = ["Bob", "Alice", "Bill", "Weigu"]
str_list.sort()
str_list

['Alice', 'Bill', 'Bob', 'Weigu']

### Sorting and loops

You can use the sorted() function together with the for operator to process
the list items in a certain order. For example, we have a dictionary and we want to output its
elements in ascending order of the key value.

In [14]:
#creating a dictionary
gradebook = {'Bob': 3, 'Alice': 5, 'Weigu': 4, 'Bill': 2}
#output of a dictionary sorted by key
for k in sorted(gradebook):
    print(k, gradebook[k])

Alice 5
Bill 2
Bob 3
Weigu 4


### More complex sorting examples

You can sort the list in reverse order (descending). To do this, use
the reverse parameter.

In [15]:
#sorting the list in descending order
sorted([4, 8, 1, 7], reverse=True)

[8, 7, 4, 1]

You can sort not only numbers and strings, but also more complex objects. For example, consider
such a table (implemented as a list of tuples), in which the names of students and their
grades for several papers are recorded.

In [16]:
#list of tuples - name, grades
names = [("Bob", 8, 4, 9),
        ("Alice", 7, 8, 9),
        ("Weigu", 7, 5, 3),
        ("Dan", 6, 4, 3)]

In [17]:
#sorting a list of tuples
names.sort()
names

[('Alice', 7, 8, 9), ('Bob', 8, 4, 9), ('Dan', 6, 4, 3), ('Weigu', 7, 5, 3)]

Judging by the result, it is logical to assume that the sorting was performed by the first element - the student's name. Indeed, tuples are compared in much the same way as strings,
lexicographically. First, the first elements are compared. If the first element matches, then the second elements are compared, etc.

In [18]:
#comparison of tuples
print(('a', 8) < ('b', 7))
print(('a', 8) < ('a', 7))

True
False


And what if we wanted to sort the tuples in the names list not by the first element,
but by the second or some other? To do this, use the key parameter, which specifies
the sorting key. Before we do this, we need to say a few words about how one
function can be passed to another function as a parameter.

### Digression: functions as arguments of functions

In [19]:
#a function that takes another function as an argument
def superfunc(f):
    return f(2)

As an argument, it takes some function f, calls this function, passes it the
number 2 as an argument, and returns the result that f returned.

In [20]:
#calling the superfunk function with the root function in argument
from math import sqrt
superfunc(sqrt)

1.4142135623730951

We imported the sqrt() function from the math module, and then passed the sqrt function to the superfunk()
function as a parameter. Please note: there are no parentheses when passing after the sqrt function: this is because we do not call it, but pass it to another function. The superfunk function took
our sqrt function and called it, passing it the number 2 as a parameter. That is, I calculated
the root of two.

You can imagine that sqrt is a recipe written on a piece of paper. We pass it in the form
of such a piece of paper to the superfunk function and it uses it somehow. Let's pass another piece of paper — she
uses it.

In [21]:
#passing another argument to a function
superfunc(print)

2


In [22]:
#passing another argument to a function
def plusone(x):
    return x + 1
superfunc(plusone)

3

If we try to pass something else to the superfunc function - for example, a string or a number —
nothing will work (it expects exactly the function).

In [23]:
#argument passing error (function is expected as an argument)
superfunc("sqrt")

TypeError: 'str' object is not callable

### Sorting Keys

Let's return to the problem of sorting a table presented as a list of tuples. To
sort such a list by the second element, you must first create a function that
will return the second element of the tuple (or list) passed to it.

In [24]:
#the function returns the second element of the list, tuple, etc.
def get_second_element(x):
    return x[1]

In [25]:
#executing the get_secong_element function
get_second_element([7, 8, 4, 2])

8

Now we pass this function as a key parameter to the sort() method (the sorted() function will also
work):

In [26]:
#sorting a list of tuples by the second element
names.sort(key=get_second_element)
names

[('Dan', 6, 4, 3), ('Alice', 7, 8, 9), ('Weigu', 7, 5, 3), ('Bob', 8, 4, 9)]

It can be seen that now the rows were ordered by the second column (the first estimate): Dan has
the lowest (6), Bob has the highest (8), and Alice and Weigu have the same (7).

A natural question arises: and how are the lines corresponding to Alice and
Weigu? Answer: in the order in which they were in the original list. This is convenient if you
want to sort first by one parameter, and then by another: just sort
sequentially, first by the second parameter, and then by the first.

In order not to define a get_second_element type function every time, you can use a ready-made one:
to do this, you need to import a special itemgetter function:

In [27]:
from operator import itemgetter
#ordered by the third column
sorted(names, key=itemgetter(2))

[('Dan', 6, 4, 3), ('Bob', 8, 4, 9), ('Weigu', 7, 5, 3), ('Alice', 7, 8, 9)]

In [28]:
#ordered by the fourth column
sorted(names, key=itemgetter(3))

[('Dan', 6, 4, 3), ('Weigu', 7, 5, 3), ('Alice', 7, 8, 9), ('Bob', 8, 4, 9)]

Let's say we want to sort by the third column, and if the third column gives the same
score, then alphabetically. This can be done as follows: first we will arrange alphabetically, and then — by
the third column.

In [29]:
#first, let's sort alphabetically, and then — by the third column.
print(names)
names.sort(key=itemgetter(0))
print(names)
names.sort(key=itemgetter(2))
print(names)

[('Dan', 6, 4, 3), ('Alice', 7, 8, 9), ('Weigu', 7, 5, 3), ('Bob', 8, 4, 9)]
[('Alice', 7, 8, 9), ('Bob', 8, 4, 9), ('Dan', 6, 4, 3), ('Weigu', 7, 5, 3)]
[('Bob', 8, 4, 9), ('Dan', 6, 4, 3), ('Weigu', 7, 5, 3), ('Alice', 7, 8, 9)]


### Formatting strings

It is often necessary to insert the value of some variables into some line. An example that we
have already met.

In [30]:
#inserting variables into a string
name = "Alice"
grade = 5
print("Student", name,"has grade", grade)

Student Alice has grade 5


Using print(), you can print such a string, but if we wanted to pass it to some other function, then we would have to come up with something else. And this other has already been invented!

There are two common ways to substitute the value of variables into a string (this is often
called interpolation, although it has nothing to do with the mathematical
operation of the same name). The first method is more classic.

In [31]:
#substituting the values of variables into a string
new_str = "Student %s has grade %i" % (name, grade)
print(new_str)

Student Alice has grade 5


The % operator is used here, which performs the following operation for strings: takes the string to the left
of it, finds all the "placeholders" there — in this case, these are %s and %i, and then
takes the variables listed to the right of it (it can be one variable or a tuple
of several variables, as in this case) and substitutes them sequentially — the first
variable in place of the first placeholder, the second in place of the second, etc.

The letters in placeholders denote the type of variable: in this case, %s is a string, and %i is an integer.

In [32]:
#a few more examples of substituting the values of variables into a string
print("The number is %i" % 2.3)
#f means float
print("The number is %f" % 2)
#two characters after the dot
print("The number is %.2f" % 2.1393)
 #add up to four characters with zeros
print("The number is %04i" % 3)

The number is 2
The number is 2.000000
The number is 2.14
The number is 0003


When using the % operator, you need to be careful: it takes precedence over
arithmetic operations, so you may get an unexpected result if you don't
put parentheses.

In [33]:
#logically incorrect execution of operations
print("a = %i" % 3*3)

a = 3a = 3a = 3


The following happened here: first, the code "a = %i" %3, and then the result was multiplied
by 3 (which for lines is equivalent to a triple repetition).

In [34]:
#If you wanted to substitute the result of executing 3*3, then you had to do this
print("a = %i" % (3*3))

a = 9


The second way of formatting ("new") is to use the format() method.

In [35]:
#using the format() method.
"hello, {0}, this is {1}, again {0}, {var}".format(7, 9, var="test")

'hello, 7, this is 9, again 7, test'

There is no need to explicitly specify data types (a string representation
of the variable is substituted). The same value can be used several times (they can be accessed
by numbers and names. However, you can not explicitly specify the numbers — then the variables will
be substituted in turn.

#using the format() method.
"Fist var: {}, the second one: {}".format(8, 1)

### Tricks with real numbers

In [36]:
#trivial addition of real numbers
print("%f" % (0.1+0.2))

0.300000


In [37]:
#but let's increase the accuracy
print("%.18f" % (0.1+0.2))

0.300000000000000044


When we asked to output the result with an accuracy of 18 decimal places, incomprehensible significant digits at the end came from somewhere. This is due to the fact that computers use
a binary number system, and in it numbers like 0.1 are written as an infinite
periodic fraction and cannot be represented as a finite fraction. During arithmetic
operations, rounding errors occur, which lead to such effects.

Sometimes these effects become dangerous. Do you think 0.1 + 0.2 is 0.3?

In [38]:
#Your computer has a different opinion on this
0.1 + 0.2 == 0.3


False

However, do not despair: you can use ordinary fractions or a special
decimal module to work with decimals.

In [39]:
from fractions import Fraction
#use of ordinary fractions
Fraction(1, 10) + Fraction(2, 10)

Fraction(3, 10)

In [40]:
#use of ordinary fractions
Fraction(1, 3) + Fraction(1,2)

Fraction(5, 6)

In [41]:
from decimal import Decimal
#using a special decimal module to work with decimals
Decimal("0.1")+Decimal("0.2")

Decimal('0.3')

In [42]:
#using a special decimal module to work with decimals
Decimal("0.1") + Decimal("0.2") == Decimal("0.3")

True