# Notebook №7. Information Technologies

by Alex Filozop from IS/b-20-2-o

## Features of working with mutable data types

To begin with, I would like to discuss some subtleties when working with lists, dictionaries and other mutable data types. Consider a simple example of a function call.

In [15]:
# define the function that increments its argument and returns the result
def my_func(x):
	x = x + 1
	return x

This function increments its argument by one and returns what happened.

In [16]:
y = 5 # assisgn the integer value to the variable
print("Function returns", my_func(y)) # print a return value in pretty format 
print("y =", y) # print a variable's value in pretty format

Function returns 6
y = 5


Nothing unexpected. The function did not change the variable y and should not have done so. Let's now try to do something similar with the list.

In [17]:
# define the function that appends integer 1 to a list
def other_func(my_list):
  my_list.append(1)
  return my_list

This function gets a list as an argument, adds element 1 to the end of it, and returns what happened. Let's try to call her:

In [18]:
this_list = [6, 9, 33] # create a list of integers
print("this_list before function:",this_list) # print the list
print("function returns:",other_func(this_list)) # print a return value
print("this_list after function", this_list) # print the list again

this_list before function: [6, 9, 33]
function returns: [6, 9, 33, 1]
this_list after function [6, 9, 33, 1]


Oh. Something strange happened. The `other_func` function modified the list that was passed to it, although it was created outside of this function and was not defined as `global` inside the function (and in general, work was done with another variable inside the function).

Why did this happen? To understand this issue, the easiest way is to look at the visualization.

In [19]:
%load_ext tutormagic

The tutormagic extension is already loaded. To reload it, use:
  %reload_ext tutormagic


In [20]:
# enable an extension
%%tutor --lang python3

# define the function that appends integer 1 to a list
def other_func(my_list):
  my_list.append(1)
  return my_list

this_list = [6, 9, 33] # create list of integers
other_func(this_list) # call the function

UsageError: Line magic function `%%tutor` not found.


## Two sorts

Let's look at two more examples that show the difference between these two scenarios — they will again involve lists. Perform two visualizations and find out what the difference is. After that, read on.

In [21]:
%%tutor --lang python3

# define the function that sorts a collection by calling sorted() function
def return_sorted(my_list):
  my_list = sorted(my_list)
  return my_list

this_list = [33, 1, 55] # create a list of integers
print("this_list before function:", this_list) # print that list
print("function returned:", return_sorted(this_list)) # print a return value
print("this_list after function:", this_list) # print that list again

In [22]:
%%tutor --lang python3

# define the function that sorts a collection by calling sort() method
def sort_and_return(my_list):
  my_list.sort()
  return my_list

this_list = [33, 1, 55] # create a list 
print("this_list before function:", this_list) # print that list
print("function returned:", sort_and_return(this_list)) # print a return value
print("this_list after function:", this_list) # print that list again

So, what's the difference?

The `return_sorted()` and `sort_and_return()` functions do about the same job: they take a list as input, sort it, and return it. At the moment when the function is entered (Step 5), the situation in both fragments is identical: there is a variable outside the function `this_list`, inside the function there is a variable `my_list`, they both point to the same list. The key difference happens in the next step. The `return_sorted()` function uses the `sorted()` function to create a new list. Then it assigns (`=`) the result of the `sorted()` execution the my_list variable . This leads to the fact that a variable with this name begins to refer to a new object (Step 7), and the old one (which is still referenced by this_list) remains intact. `sort_and_return()` works quite differently — it uses the `sort()` method, which sorts the list in place, inside itself. In this case, a new object is not created, the assignment operation is not used and the variable `my_list` continues to refer to the same list as before. It just turns out to be sorted.

Note the similarity of `return_sorted()` and `my_func()` from the example above: in both cases, the assignment operator is used. The difference is that it was impossible to do anything else with a numeric variable, since numbers are immutable, and in the case of a list, the developer has a choice: you can create a new object and assign it to an old variable, or you can change an existing object without creating a new one.

## Creating a copy

Suppose now that we want to write a function that accepts a list as input, returns the same list, but with one added element, and does not change the original list itself. This can be done, for example, like this:

In [23]:
# define the function that returns a copy of a list with appended second argument at the end
def return_append(L, a):
  new_L = L.copy()
  new_L.append(a)
  return new_L

In [24]:
outer_list = [7, 8, 9] # create a list of integers
print("outer_list before funciton", outer_list) # print that list
print("function returned", return_append(outer_list, 55)) # print a return value
print("outer_list after function", outer_list) # print that list again

outer_list before funciton [7, 8, 9]
function returned [7, 8, 9, 55]
outer_list after function [7, 8, 9]


Here the main trick is to use `L.copy()` — recall that this method creates a copy of an existing list. Then we perform the assignment operation again (that is, now `new_Line` is the name for a copy of the list `L`, and not for the list `L` itself) and we can do anything with this new list `new_L`. The old list will not change.

## Not just function calls

Problems similar to those discussed above arise not only when calling functions. Let's start with a simple example with a loop.

In [25]:
some_list = [7, 9, 11] # create a list of integers
# do useless increment
for x in some_list:
  x = x + 1
print(some_list) # print unchanged list

[7, 9, 11]


The list of some_list has not changed, and this is not surprising. But let's now look at a slightly more complicated situation with a list of lists.

In [26]:
table = [[1, 5], [7, 9]] # create a complex list (table)
# for each row
for row in table:
  row.append(77) # append an integer at the end
print(table) # print the list (table)

[[1, 5, 77], [7, 9, 77]]


And again, "oh." What happened? Let's look at the visualizer

In [27]:
%%tutor --lang python3
table = [[1, 5], [7, 9]] # create a complex list (table)
# for each row
for row in table:
  row.append(77) # append an integer at the end
print(table) # print the list (table)

When executing the first step of the loop (Step 3), the first element of the table list is written to the `row` variable. However, this element is itself a list — or rather a link to the list. In the next step (Step 4), item 77 is added to this list. Then row becomes a reference to the second element of the table list . Element 77 is also added to it .

Pay attention to the parallel with the previous plot: here, too, a call to the list method is involved, which changes this list *in place*.

## Puzzle

What do you think will happen if you execute the following code? Try to do it, look at the result and try to explain.

In [28]:
A = [[]]*5 # create a list that contains 5 references to an empty list
A[0].append(1) # append an integer
print(A) # print the list

[[1], [1], [1], [1], [1]]


## Changing an iterated object in a loop

In the example above, we changed the contents of the "internal" lists, but the table list itself remained unchanged: the number of elements did not change in it and the elements remained links to the same row lists as before. Is it possible to change the list itself during iterations? It turns out you can. Although in most cases it is better not to do this. Before considering the example, let's recall how the `pop()` method works for a list:

In [29]:
L = [6, 9, 44, 8] # create a list of integers
print(L.pop()) # delete last element from the list and print it
print(L) # print whole list

8
[6, 9, 44]


It removes the last item from the list and returns the same. Let's apply it now as follows:

In [30]:
L = [7, 8, 9, 10] # create a list of integers
# on each loop iteration delete last element from list, print it and print whole list 
for x in L:
  print("Pop element", L.pop())
  print(x)

Pop element 10
7
Pop element 9
8


The loop is executed twice: by the time the loop finishes processing element 8, elements 9 and 10 from the list will already be deleted, there will be no unprocessed elements in the list and the cycle will stop.

You can guess what will happen only by studying the code very carefully. This means that the code is not very good: looking at good code, you can understand what it will do.

The situation is different with dictionaries.

In [31]:
d = {1:2, 3:4} # create a dictionary with 2 pairs
# for each pair in dictionary delete pair from it with key 3 and print current pair  
for k, v in d.items():
  del d[3]
  print(k, v)

1 2


RuntimeError: dictionary changed size during iteration

Here the `del d[3]` command removes the element with key 3 from the dictionary. Since the iteration order of dictionary elements is not defined, no one knows how to correctly continue iterations after the dictionary size has been changed. Therefore, such an operation is prohibited.

However, this does not mean that it is forbidden to change the dictionary value when executing a loop. For example, we want to add the number 1 to all the values. The following naive method is not expected to work:

In [32]:
d={1:2, 3:4} # create a dictionary
# for each pair try to change a value
for k, v in d.items():
  v = v + 1
print(d) # print unchanged dictionary

{1: 2, 3: 4}


In fact, this task should be solved like this:

In [33]:
d = {1:2, 3:4} # create a dictionary
# change each dictionary's value by using for loop
for k in d:
  d[k] = d[k] + 1
print(d) # print changed dictionary

{1: 3, 3: 5}


## Sets

Another basic data type in Python is a set. It corresponds to the mathematical concept of a set — that is, a set of some elements. Each element may or may not be included in the set.

In [34]:
my_set = {6, 9, 11, 11, 9, 'hello'} # create a set

In [35]:
my_set # print that set

{11, 6, 9, 'hello'}

As can be seen from this simple example, the elements of the set are also not ordered.

In [36]:
{6, 9, 11, 11, 9, 'hello'} == {9, 'hello', 11, 6} # compare two sets

True

This is how you can check whether an element lies in the set.

In [37]:
9 in my_set # is set contains value 9

True

In [38]:
10 in my_set # is set contains value 10

False

Of course, the in operator does not work only for sets. For example, `4 in [2, 4, 8, 10]` will return `True`. However, for lists, this operation is slow — or rather, massive: the larger the list, the more comparison operations need to be performed to understand whether a particular element lies in it. In the case of sets, the time for checking practically does not increase with the increase in the number of elements of the set.

You can do different operations with sets — we are familiar with them in math courses. For example, the union and intersection of two sets gives a new set:

In [39]:
{6, 8, 9} | {6, 11, 7} # join two sets

{6, 7, 8, 9, 11}

In [40]:
{6, 8, 9} & {6, 11, 7} # get intersection of two sets

{6}

In [41]:
s = {"Hello", "World", "Test", "Guest", "Aaaaa", "Zzzzz","Zz","Q"} # create a set of strings
print(s) # print that set
print(sorted(s)) # printing a list that is the result of applying sorted() function over a set

{'Q', 'Aaaaa', 'Zzzzz', 'World', 'Test', 'Hello', 'Zz', 'Guest'}
['Aaaaa', 'Guest', 'Hello', 'Q', 'Test', 'World', 'Zz', 'Zzzzz']


## Example of using sets

Let's say we ask the user to enter a command, but we want to give him the opportunity to enter the same command in different ways. For example, to stop a program, the user can type the word *stop* or *STOP* or *Stop* or just the letter *s* or *S*. You can handle this case with multiple conditions connected by `or`:

In [42]:
s = 'stop' # create a string
if s == 'stop' or s == 'Stop' or s == 'STOP' or s == 'S' or s == 's': # compare string
  print("Okay, stopping") # print a message

Okay, stopping


And you can create a set for all possible variations of the stop command and check whether our team is included in this set:

In [43]:
s = 'stop' # create a string
STOPS = {'stop', 'Stop', 'STOP', 'S', 's'} # create a set of strings (valid commands)
if s in STOPS: # check that the received command is a part of a set of valid commands
  print("Okay, stopping") # print a message

Okay, stopping


However, in this place, probably, instead of a set, it would be possible to use just a list.

## A little more about the lines

I've been meaning to tell you about methods for working with strings for a long time. In general, there are many of these methods and I will not tell you about everything, but we will discuss some of them now.

In [44]:
s = "hello world, hello" # create a string
new_s = s.replace("hello", "Hi") # replace all occurrences with the specified string and assign a result to the variable
print(new_s) # print result string
print(s) # print source string

Hi world, Hi
hello world, hello


This is how, for example, you can replace a substring in a string. Note: a string `i` an immutable data type, therefore, unlike list methods of the `append()` type, string methods never change the string itself (this is not possible at all), but instead create a new string and return the result.

If you wanted to replace only the first few occurrences (for example, only the first word hello, but not the second one), you could add a third argument to the replace method — it shows how many times you need to replace.

In [45]:
"hello world, hello".replace("hello", "Hi", 1) # replace only first occurrence in the source string

'Hi world, hello'

This is how you can find a substring in a string:

In [46]:
s.index("world") # get index of first occurrence or throw exception

6

In [47]:
s.find("world") # get index of first occurrence or return -1

6

Both methods return the index of the first character of the substring. The difference is that if `index()` cannot find a substring at all, it throws an error (exception), and if `find()` encounters a similar problem, it will return the number $-1$ as an index.

By the way, you can also check whether a substring is included in a string like this:

In [48]:
"world" in s # does source string contain a specific substring

True

And this is how you can calculate how many substrings occur in a string:

In [49]:
s.count("o") # get count of occurrences substring in source string 

3

Details can be found in [the official help]([https://docs.python.org/3/library/string.html]).

## File I/O

We are starting to work with files. Now we will discuss only reading and writing. There is a separate story about how to run files for execution — there is a subprocess method for this, we will get to it someday. (Maybe.) Also, to begin with, we will talk about text files or text-like files (for example, Python code or a CSV file will be text). There are also binary files that are useless to read with "eyes" — there will be a separate story about some of them.

Let's say we want to read a file:

In [50]:
f = open("func.txt") # open a specific file
s = f.read() # read data from file
f.close() # close the file stream
print(s) # print the read data

123
456
Gutten Tag!



What happened here? First, we have opened a file for reading *func.txt*, lying in our current working directory. To find out which directory is working, you can do the following:

In [51]:
# import specific module
import os
os.getcwd() # get Current Work Directory

'/home/alex/Documents/Yandex.Disk/files/Work/University/Activity/IT/activity'

The `open()` function returned an object of the `file` type — a variable that can be used to work with the file. Then we read the contents of the file into the string `s`, and then closed the file. Closing files is very useful: if you forget to close a file, another application will not be able to open it (for example, to write something to it).

The `read()` function reads the entire file into one large string variable. This is not always convenient (considering that strings in Python are immutable and because of this, working with them is not always effective), so there are various other scenarios for working with files. For example, you can read the contents of a file into a list by splitting it into lines.

In [52]:
f = open("func.txt") # open a file
lines = f.readlines() # get list of lines
f.close() # close the file stream
print(lines) # print list of the read lines

['123\n', '456\n', 'Gutten Tag!\n']


Note that each of the lines is wrapped with a newline character `\n` — they were present in the file and we honestly counted them from it. This is how you can output a file by lines, numbering them:

In [53]:
# print all lines in one line beginning from second line
for i, line in enumerate(lines, 1):
  print(i, line, end="")

1 123
2 456
3 Gutten Tag!


Another way to do this is not to create a separate list, but to iterate a file object right away

In [54]:
f = open("func.txt") # open a file stream
# print all lines in one line beginning from second line
for i, line in enumerate(f, 1):
  print(i, line, end="")
f.close() # close the file stream

1 123
2 456
3 Gutten Tag!


This method is more preferable if the file is large. In this case, it may be impossible to read it into memory as a whole, and it is quite possible to process it one line at a time.

There are, however, some tricks. Consider, for example, the following code:

In [55]:
f = open("func.txt") # open a file stream
# print all lines in file on one line
for line in f:
  print(line, end="")
print("----The next one----") # print a message
# try to print lines after the end of file
for line in f:
  print(line, end="")
f.close() # close a file stream

123
456
Gutten Tag!
----The next one----


What happened here? Why didn't the second cycle run at all (nothing is output after the line `----The next one----`)? Very simple: the variable `f`, although it pretends to be a list of strings, when we iterate it, in fact it is not. In fact, when opening a file, we remember the position at which we read this file. Initially, it points to the very beginning of the file, but it shifts with each iteration. When we read the whole file, further attempts to read something from it will lead to nothing: the pointer of the current position has moved to the very end and the file has ended.

However, it is possible to go back to the beginning: to do this, you need to use the `seek()` method.

In [56]:
f = open("func.txt") # open a file stream
# print all lines in file on one line
for line in f:
  print(line, end="")
print("----The next one----") # print a message
f.seek(0) # move to the begin of file
# do above again
for line in f:
  print(line, end="")
f.close() # close a file stream

123
456
Gutten Tag!
----The next one----
123
456
Gutten Tag!


## Writing to files

To create a file and write something to it, you need to open it for recording. This is done by passing the second argument to the `open` function — here you need to write the line `"w"` (from write).

You can write information to a file that is open for writing, for example, using the method `write()`.

In [57]:
f = open("other.txt", "w") # open a file stream
f.write("Hello\n") # write string
f.close() # close a file stream

Let's check what happened:

In [58]:
open('other.txt').read() # open a file stream and read data

  open('other.txt').read() # open a file stream and read data


'Hello\n'

We can see what we really wrote to the file other.txt the line `Hello\n`. Note that here we are they opened the file for writing, but did not assign the file object to any variable, but immediately called the `read()` method from it. In this case, the file will be closed automatically some time after executing this command. (The system issues a warning that we have not explicitly closed the file — in some cases this may lead to some problems).