# List and dictionary comprehensions

## List comprehension

List comprehension is the functionality in Python which offers the shorter syntax for creation of new list based on elements of some other list.

In [15]:
# For example we can create some list with the names of fruits

fruits = ['apple', 'banana', 'cherry', 'kiwi', 'mango', 'pear','pineapple']
newlist = []

for i in fruits:
    if 'a' in i:
        newlist.append(i)
print(newlist)

# It's equivalent of following list comprehension

newlist_comp = [fruit for fruit in fruits if 'a' in fruit]
print(newlist_comp)
print()
print("We can see the same result")

['apple', 'banana', 'mango', 'pear', 'pineapple']
['apple', 'banana', 'mango', 'pear', 'pineapple']

We can see the same result


In [17]:
# Then you can create the logical condition for string using list comprehension

string = [letter for letter in "Hello world!" if 'o' == letter]
print(string)

# Or you can put all the string into a list

string = [letter for letter in "Hello world!"]
print(string)

['o', 'o']
['H', 'e', 'l', 'l', 'o', ' ', 'w', 'o', 'r', 'l', 'd', '!']


In [21]:
# Logical conditions you can apply also to lists with numbers

numbers = [1, 2, 3, 4, 5, 6, 7, 7, 9, 10]
numbers_comp = [x for x in numbers if x % 2 == 0]
print(numbers_comp)
print()

# And even use the nested IF conditions



# Also you can create complicated logical conditions in list comprehension, for example, create a "True-False"
# like mask of list of numbers

numbers = [1, 2, 3, 4, 5, 6, 7, 7, 9, 10]
numbers_mask = [('even' if x % 2== 0 else 'odd') for x in numbers]
print(numbers_mask)

[2, 4, 6, 10]

['odd', 'even', 'odd', 'even', 'odd', 'even', 'odd', 'odd', 'odd', 'even']


In [37]:
# Then you can use list comprehansion for generation of new lists by some condition

numbers = [0, 1, 2, 3, 4, 5, 6, 7, 7, 9]
print("Initial list:", numbers)
new_numbers = [x*x for x in numbers]
print("Comprehansion from initial:", new_numbers)

# Or you can recieve the same result by using range() function

new_numbers = [x*x for x in range(10)]
print("Using range() function:", new_numbers)
print()

# You can use even some function as condition for create elements of new list
def exponentiation_by_some_degree(x, y):
    try:
        return x ** y
    except ZeroDivisionError as e:
        return None

degree = 2
new_list = [exponentiation_by_some_degree(x, degree) for x in range(10)]
print("Function condition:", new_list)

Initial list: [0, 1, 2, 3, 4, 5, 6, 7, 7, 9]
Comprehansion from initial: [0, 1, 4, 9, 16, 25, 36, 49, 49, 81]
Using range() function: [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

Function condition: [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]


## Dictionary comprehension

This functionality the same as lists comprehension and uses for create the dictionaries by using elements of some iterable objects

In [54]:
# Values can be formated from keys by some conditional function

def exponentiation_by_some_degree(x, y):
    try:
        return x ** y
    except ZeroDivisionError as e:
        return None

degree = 2

dictionary = {x : exponentiation_by_some_degree(x, degree) for x in numbers}
print(dictionary)

{}


In [57]:
# Or keys and values can be from two different iterable objects, for example, from two lists
# It can be made by zip() function

words = ['zero', 'one', 'two', 'three', 'four', 'five', 'six', 'seven', 'eight', 'nine', 'ten']
numbers = [x for x in range(11)]

dictionary = dict(zip(words, numbers))
print(dictionary)

{'zero': 0, 'one': 1, 'two': 2, 'three': 3, 'four': 4, 'five': 5, 'six': 6, 'seven': 7, 'eight': 8, 'nine': 9, 'ten': 10}


Functionality of list of dictionary comprehensions faser and shorter than standard functionality by loops.

# Strings

String it's ordered, and immutable data structure

## Slising of strings

In [1]:
string = 'Hello, World!'
print('string: ', string)

s1 = string[1:5] # substring from index 1 to index 4 (the last index do not included)
# on first and second positions in the brackets placed beginning index and (the last - 1) index 
print('s1: ', s1)

s2 = string[:5] # substring from beginning to index 4 (the last index do not included)
print('s2: ', s2)

s3 = string[::1] # substring with symbols from string - every symbol
print('s3: ', s3)

s3 = string[::2] # substring with symbols from string - every second symbol (on the last position in brackets placed step)
print('s4: ', s3)

s4 = string[::-2] # reverse of the string and taking of every second symbol
print('s4: ', s4)

s5 = string[::-1] # nice little trick for reverse of the string
print('s5: ', s5)

string:  Hello, World!
s1:  ello
s2:  Hello
s3:  Hello, World!
s4:  Hlo ol!
s4:  !lo olH
s5:  !dlroW ,olleH


## Interesting string methods - intermediate level

In [2]:
string = "    Hello, World!    "
print('string: ', string)

s1 = string.strip() # this method removes extra spaces in the beginning and in the end of the string
# this method doesn't change initial string, only write changed string into new variable
print('s1: ', s1)

s2 = string.upper() # converts string to upper case
s3 = string.lower() # converts string to upper case
print('s2: ', s2)
print('s3: ', s3)

print()
print(string.startswith(" ")) # startswith('s') returns bool which tells about does it start string with symbol 's' or not
print(string.endswith("k")) # startswith('s') returns bool which tells about does it ends string with symbol 's' or not

symb = string.count("l") # returns of amount of particular symbols (or substrings) in string
print("\nsymb = ", symb)
print()

string1 = 'How are you doing?'
print('initial string: ', string1)
s11 = string1.split(' ') # returns list of substrings which splitted by symbol (or by substring) in round brackets
print('splitted string: ', s11)
s12 = ' '.join(s11) # returns string which collected from list of substrings by symbol (or substring) before .join()
print('joined string: ', s12)
print()



string:      Hello, World!    
s1:  Hello, World!
s2:      HELLO, WORLD!    
s3:      hello, world!    

True
False

symb =  3

initial string:  How are you doing?
splitted string:  ['How', 'are', 'you', 'doing?']
joined string:  How are you doing?



## .format() method, % in strings, f-strings (python 3.6 and later)

### % in strings

It is an old formatting style. % symbol put on place where in the string should stand some variable.
- If we write %s it means that we want to put string on this place.
- If we write %d it means that we want to put integer digit on this place
- If we write %f it means that we want to put the float variable on this place

In the last case %f as default write 6 digits after decimal. In order to customize output in this case we need to directly specify amount of digits after decimal: %.5f means that we want to output 5 digits after decimal. 

In [3]:
s_var = 'string'
d_var = 5
f_var = 5.123456789

s1 = 'Our variable is %s' %s_var
print('string: ', s1)
s1 = 'Our variable is %d' %d_var
print('string: ', s1)
s1 = 'Our variable is %f' %f_var
print('string: ', s1)
print()

s1 = 'Our variable is %.3f' %f_var
print('float formatting string: ', s1)

string:  Our variable is string
string:  Our variable is 5
string:  Our variable is 5.123457

float formatting string:  Our variable is 5.123


### .format() method in string

It is a new formatting style. {} symbol put on place where in the string should stand some variable.

In the case of float variable we can customize output. Amount of digits after decimal makes like this: {:.5f} means that we want to output 5 digits after decimal.

In [4]:
s_var = 'string'
d_var = 5
f_var = 5.123456789

s1 = 'Our variable is {}'.format(s_var)
print('string: ', s1)
s1 = 'Our variable is {}'.format(d_var)
print('digit: ', s1)
s1 = 'Our variable is {}'.format(f_var)
print('float: ', s1)
print()

s1 = 'Our variable is {:.3f}'.format(f_var)
print('float formatting string: ', s1)

string:  Our variable is string
digit:  Our variable is 5
float:  Our variable is 5.123456789

float formatting string:  Our variable is 5.123


If it needed to put many variables into the string, .format() method can provide this - you just should put all the require variables into ().

In [5]:
s_var = 'string'
d_var = 5
f_var = 5.123456789

s1 = 'Our variable is {}, {} and {}'.format(s_var, d_var, f_var)
print(s1)

Our variable is string, 5 and 5.123456789


### f-strings

If you put f before your string, then you can formatting this string directly by putting require variables into {} inside your string.

In [6]:
s_var = 'string'
d_var = 5
f_var = 5.123456789

s1 = f'Our variable is {s_var}, {d_var} and {f_var}'
print(s1)

Our variable is string, 5 and 5.123456789


# Work with files in Python

## Modes of file opening and fields of file object

Functionality of Python provides reading files and writings in it.
For work with files after opening used the following directives:

* 'r': For reading – The file pointer is placed at the beginning of the file. This is the default mode. If we don't specified directive, then 'r' mode will be taken.
* 'r+': Opens a file for both reading and writing. The file pointer will be at the beginning of the file.
* 'w': Opens a file for writing only. Overwrites the file if the file exists. If the file does not exist, creates a new file for writing.
* 'w+': Opens a file for both writing and reading. Overwrites the existing file if the file exists. If the file does not exist, it creates a new file for reading and writing.
* 'rb': Opens a file for reading only in binary format. The file pointer is placed at the beginning of the file. It needs for reading and deserialization of file with binary content (see it in correspondent topic). Also at can be useful for reading of binary files (like .jpg, for example).
* 'rb+': Opens a file for both reading and writing in binary format.
* 'wb': Opens a file for writing only in binary format. It needs for writing and serialization objects to file (see it in correspondent topic). Also at can be useful for writing to binary files (like .jpg, for example).
* 'wb+': Opens a file for both writing and reading in binary format. Overwrites the existing file if the file exists. If the file does not exist, it creates a new file for reading and writing.
* 'a': Opens a file for appending. The file pointer is at the end of the file if the file exists. That is, the file is in the append mode. If the file does not exist, it creates a new file for writing.
* 'ab': Opens a file for appending in binary format. The file pointer is at the end of the file if the file exists. That is, the file is in the append mode. If the file does not exist, it creates a new file for writing.
* 'a+': Opens a file for both appending and reading. The file pointer is at the end of the file if the file exists. The file opens in the append mode. If the file does not exist, it creates a new file for reading and writing.
* 'ab+': Opens a file for both appending and reading in binary format. The file pointer is at the end of the file if the file exists. The file opens in the append mode. If the file does not exist, it creates a new file for reading and writing.
* 'x': Open for exclusive creation, failing if the file already exists (Python 3)

Python operates with file objects which created by open() method.
Deleting of file object doing through close() method.

In [1]:
filename = r'text.txt' # if file not lay in one folder with this notebook filename is the whole path to file

f = open(filename, 'r')

print("name of file: ", f.name) # Field which consists name of file. May be useful when you work with big amount of files
print("mode of opening of the file: ", f.mode) # Field with mode of file opening. May be useful when you work with big amount of files
print("encoding of the file: ", f.encoding) # Field with encoding of file
print("is file closed (before closing): ", f.closed) # Returns True if a file is closed

f.close()

print("is file closed (after closing): ", f.closed)

name of file:  text.txt
mode of opening of the file:  r
encoding of the file:  cp1252
is file closed (before closing):  False
is file closed (after closing):  True


Some fields of file object can be given directly to object constructor in open() method. For example:

In [2]:
filename = r'text.txt' # if file not lay in one folder with this notebook filename is the whole path to file

f = open(filename, mode='r', encoding='utf-8')

print("name of file: ", f.name) # Field which consists name of file. May be useful when you work with big amount of files
print("mode of opening of the file: ", f.mode) # Field with mode of file opening. May be useful when you work with big amount of files
print("encoding of the file: ", f.encoding) # Field with encoding of file
print("is file closed (before closing): ", f.closed) # Returns True if a file is closed

f.close()

print("is file closed (after closing): ", f.closed)

name of file:  text.txt
mode of opening of the file:  r
encoding of the file:  utf-8
is file closed (before closing):  False
is file closed (after closing):  True


## Methods of file objects - reading

### read()

Returns all the file content.

In [3]:
filename = r'text.txt' # if file not lay in one folder with this notebook filename is the whole path to file

f = open(filename, mode='r', encoding='utf-8')

print("All of the symbols in file: ")
print(f.read(), "\n")

f.close()

All of the symbols in file: 
agshagdhsdhgsadhga
sadfasgdafg
asgsadgsadg123124
sa gaadfgdfdsfha
gSGDFGDAHSFHAD
sdf
dfg
dffhfgh 



You also can directly points to read() method what symbols you want to see in output through number_of_symbols:

In [4]:
filename = r'text.txt'

f = open(filename, mode='r', encoding='utf-8')
number_of_symbols = 3
print(f"The first {number_of_symbols} symbols in file: ")
print(f.read(number_of_symbols), "\n")

f.close()

The first 3 symbols in file: 
ags 



If you use read() first and then use read(number_of_symbols) then read(number_of_symbols) returns nothing because of read() already reads all of the avaliable symbols in file:

In [5]:
filename = r'text.txt'

f = open(filename, mode='r', encoding='utf-8')

print("read() and then read(number_of_symbols): ")
number_of_symbols = 3
print(f.read())

print(f.read(number_of_symbols))

f.close()

read() and then read(number_of_symbols): 
agshagdhsdhgsadhga
sadfasgdafg
asgsadgsadg123124
sa gaadfgdfdsfha
gSGDFGDAHSFHAD
sdf
dfg
dffhfgh



In other way, if you use read(number_of_symbols) first and use read() then, you firstly get the number_of_symbols symbols and then all of the rest synbols:

In [6]:
filename = r'text.txt'

f = open(filename, mode='r', encoding='utf-8')

print("read(number_of_symbols) and then read(): ")
number_of_symbols = 3
print(f.read(number_of_symbols))

print(f.read())

f.close()

read(number_of_symbols) and then read(): 
ags
hagdhsdhgsadhga
sadfasgdafg
asgsadgsadg123124
sa gaadfgdfdsfha
gSGDFGDAHSFHAD
sdf
dfg
dffhfgh


And finally if you give to the read() method number_of_symbols1=/=number_of_symbols2 parameters sequentially, then you receive the number_of_symbols1 and number_of_symbols2 symbols sequentially:

In [7]:
filename = r'text.txt'

f = open(filename, mode='r', encoding='utf-8')

print("read(number_of_symbols) and then read(): ")
number_of_symbols1 = 3
number_of_symbols2 = 5
print(f'The first {number_of_symbols1} symbols from file: ',f.read(number_of_symbols1))
print(f'Then {number_of_symbols2} symbols after: ', f.read(number_of_symbols2))
f.close()

read(number_of_symbols) and then read(): 
The first 3 symbols from file:  ags
Then 5 symbols after:  hagdh


### readable()

Returns whether the file stream can be read or not.

In [8]:
filename = r'text.txt' # if file not lay in one folder with this notebook filename is the whole path to file

f1 = open(filename, mode='r', encoding='utf-8')
f2 = open(filename, mode='a', encoding='utf-8')

print("read mode of opening: ", f1.readable())
print("write node of opening: ", f2.readable())

f1.close()
f2.close()

read mode of opening:  True
write node of opening:  False


### readlines()

Read file into a list of lines

In [9]:
filename = r'text.txt' # if file not lay in one folder with this notebook filename is the whole path to file

f = open(filename, mode='r', encoding='utf-8')

print(f.readlines())

f.close()

['agshagdhsdhgsadhga\n', 'sadfasgdafg\n', 'asgsadgsadg123124\n', 'sa gaadfgdfdsfha\n', 'gSGDFGDAHSFHAD\n', 'sdf\n', 'dfg\n', 'dffhfgh']


### readline()

Read single line. If there is no more lines then method nothing to return.

In [10]:
filename = r'text.txt' # if file not lay in one folder with this notebook filename is the whole path to file

f = open(filename, mode='r', encoding='utf-8')

# in order to exclude the \n symbol from output you can use the
# end='' derictive in method
# This derictive provides directly points to end of line which we want to see in output

print(f.readline(), end='')
print(f.readline())
print(f.readline())
print(f.readline())
print(f.readline())
print(f.readline())
print(f.readline())
print(f.readline())
print(f.readline())
print(f.readline())
print(f.readline())

f.close()

agshagdhsdhgsadhga
sadfasgdafg

asgsadgsadg123124

sa gaadfgdfdsfha

gSGDFGDAHSFHAD

sdf

dfg

dffhfgh





Note that you can use iterators for reading files also:

In [11]:
filename = r'text.txt' # if file not lay in one folder with this notebook filename is the whole path to file

f = open(filename, mode='r', encoding='utf-8')

for line in f:
    print(line, end='')
    
f.close()

agshagdhsdhgsadhga
sadfasgdafg
asgsadgsadg123124
sa gaadfgdfdsfha
gSGDFGDAHSFHAD
sdf
dfg
dffhfgh

### seek()

Set file pointer position in a file. If you take seek(x) then reading will be started from x position in file:

In [12]:
filename = r'text.txt'

f = open(filename, mode='r', encoding='utf-8')

print("Reading of all the file content: ")
print(f.read())
print()

f.close()

f = open(filename, mode='r', encoding='utf-8')
new_position = 2
f.seek(new_position)
print(f"Reading from the {new_position} position in file: ")
print(f.read())

f.close()

Reading of all the file content: 
agshagdhsdhgsadhga
sadfasgdafg
asgsadgsadg123124
sa gaadfgdfdsfha
gSGDFGDAHSFHAD
sdf
dfg
dffhfgh

Reading from the 2 position in file: 
shagdhsdhgsadhga
sadfasgdafg
asgsadgsadg123124
sa gaadfgdfdsfha
gSGDFGDAHSFHAD
sdf
dfg
dffhfgh


### seekable()

Returns whether the file allows us to change the file position.

In [13]:
f = open(filename, mode='r', encoding='utf-8')
new_position1 = 2
f.seek(new_position1)
print(f.seekable())

f.close()

True


### tell()

Returns the current file location.

In [14]:
f = open(filename, mode='r', encoding='utf-8')
new_position = 2
f.seek(new_position)
print(f.tell())

f.close()

2


## Methods of file objects - writing

### write()

Writes the specified string to the file.

In [15]:
filename = 'text_w.txt'

# Clear the file before
f = open(filename, mode='w', encoding='utf-8')
f.close()

f = open(filename, mode='r', encoding='utf-8')
print("Before writing: ", f.read())
f.close()

f = open(filename, mode='w', encoding='utf-8')
f.write('We are here!')
f.close()

f = open(filename, mode='r', encoding='utf-8')
print("After writing: ", f.read())
f.close()

Before writing:  
After writing:  We are here!


### writable()

Returns whether the file can be written to or not.

In [16]:
filename = r'text_w.txt' # if file not lay in one folder with this notebook filename is the whole path to file

f1 = open(filename, mode='r', encoding='utf-8')
f2 = open(filename, mode='a', encoding='utf-8')

print(f1.writable())
print(f2.writable())

f.close()

False
True


### writelines()

Writes a list of strings to the file.

In [17]:
filename = r'text_w.txt'

#Clear the file
f = open(filename, mode='w', encoding='utf-8')
f.close()

f = open(filename, mode='r', encoding='utf-8')
print("Before writing: ", f.read())
f.close()

f = open(filename, mode='w', encoding='utf-8')
f.writelines(["We are here!\n", 'and here\n', 'and here\n', 'and here'])
f.close()

f = open(filename, mode='r', encoding='utf-8')
print("After writing: ", f.read())
f.close()

Before writing:  
After writing:  We are here!
and here
and here
and here


## Using of context manager 'with open as'

This context manager (to details see correspondent topic) allows to avoid of using open() and close() constructor and destructor of the file object.

You can use the unify construction 'with open as' instead because of after ending of work in this environment destructor of file object executed automatically.

### Reading with context manager

In [18]:
filename = r'text.txt'

with open(filename, mode='r', encoding='utf-8') as f:
    print(f.read())

agshagdhsdhgsadhga
sadfasgdafg
asgsadgsadg123124
sa gaadfgdfdsfha
gSGDFGDAHSFHAD
sdf
dfg
dffhfgh


### Writing with context manager

In [19]:
filename = r'text_w.txt'

with open(filename, mode='w', encoding='utf-8') as f:
    pass

with open(filename, mode='r', encoding='utf-8') as f:
    print('Before writing: ', f.read())

with open(filename, mode='w', encoding='utf-8') as f:
    f.write('And we here again!')
    
with open(filename, mode='r', encoding='utf-8') as f:
    print('Before writing: ', f.read())

Before writing:  
Before writing:  And we here again!


# Regular expressions

A regular expression (shortened as regex or regexp; sometimes referred to as rational expression) is a sequence of characters that specifies a search pattern in text. Usually such patterns are used by string-searching algorithms for "find" or "find and replace" operations on strings, or for input validation. Regular expression techniques are developed in theoretical computer science and formal language theory.

In Python work with regular expressions provides by re library.
re module operates by metacharacters. There are some kinds of such symbols: 

[ ] ^ $ . * + ? { } ( ) \ | 

Before looking at these symbols one by one let's talk about functions which you can use in re module:

In [1]:
import re

## Functions of module re

There are next the most popular functions in re module:  

* re.match()
* re.search()
* re.findall()
* re.split()
* re.sub()
* re.compile()

Let's look at these one by one.

### re.match()

Method re.match(pattern, string) finds pattern at the start of the string. If pattern does not exist in the string, None returns. Matching carried out by words (between spaces).

In [2]:
result = re.match(r'AV', 'AV Analytics Vidhya AV')
print("Pattern there is in the string: ", result)

result = re.match(r'Analytics', 'AV Analytics Vidhya AV')
print("Pattern is not in the string: ", result)

Pattern there is in the string:  <re.Match object; span=(0, 2), match='AV'>
Pattern is not in the string:  None


If pattern there is in the string in order to get it whole we can use re.Match.group(0) method.

In [4]:
result = re.match(r'AV', 'AV Analytics Vidhya AV')
print(result.group(0))

AV


In the re.Match objects span=(begin, end) contains indices of begin and end of pattern in the string. You can get the span tuple through re.Match.span() method. Begin and end itself you can get through re.Match.begin() and re.Match.end() methods.

In [7]:
result = re.match(r'AV', 'AV Analytics Vidhya AV')
print("Span: ", result.span())
print("Start: ", result.start())
print("End: ", result.end())

Span:  (0, 2)
Start:  0
End:  2


### re.search()

Method re.search(pattern, string) is familiar to re.match(pattern, string) but it finds pattern not only at the start of the string - method finds in all the string but returns only the first match. If pattern does not exist in the string, None returns. Matching here also carried out by words (between spaces). It also as re.match(pattern, string) returns re.Match object.

Thus here also there are methods re.Match.group(), re.Match.span(), re.Match.start() and re.Match.end().

In [10]:
result = re.search(r'Analytics', 'AV Analytics Vidhya AV')
print("The whole matching: ", result.group(0))
print("Span: ", result.span())
print("Start: ", result.start())
print("End: ", result.end())

The whole matching:  Analytics
Span:  (3, 12)
Start:  3
End:  12


### re.findall()

This method returns the list of all the founded matches. Method re.findall(pattern, string) there is no limits on finding at begin or at the end of the string.

It is recommended to use re.findall() for searching, as it can work both as re.search() and as re.match().

In [11]:
result = re.findall(r'AV', 'AV Analytics Vidhya AV')
print(result)

['AV', 'AV']


### re.split()

re.split(pattern, string) method divides the string by pattern and returns the list after dividing.

In [12]:
result = re.split(r'y', 'Analytics')
print(result)

['Anal', 'tics']


### re.sub()

Method re.sub(pattern, repl, string) finds the pattern in the string and replaced it at repl.

In [13]:
result = re.sub(r'India', 'the World', 'AV is largest Analytics community of India')
print(result)

AV is largest Analytics community of the World


### re.compile()

Method re.compile(pattern, repl, string) provides opportunity of construction of regular expression to particular object which can be used for search. It also can help to avoid of rewriting of the same expression. 

In [15]:
pattern = re.compile('AV')
result = pattern.findall('AV Analytics Vidhya AV')
print(result)
result2 = pattern.findall('AV is largest analytics community of India')
print (result2)

['AV', 'AV']
['AV']


## Metacharacters

Now let's look at the metacharacters mentioned above:

### Square brackets ( [ ] )

Into the square brackets specifies some symbols which you match in text, i.e. some directly defined symbols which you want to match in text. If at least one of the symbols in square brackets there is in text, matches successful.

In [77]:
input_str = "The film Titanic was released in 1998"

# Before using template string which uses in regular expressions uses 'r' symbol, then string become 'raw' string
pattern = r'[abc]'

result = re.match(pattern, input_str)
print(result)
print()
# None result because of re.match finds patterns only in begin of text

# Then we can put a symbol in begin of the string

input_str = "all the film Titanic was released in 1998"
result = re.match(pattern, input_str)
print(result)
# Here we resieve the Match object. If you want to see what group of sumbol from square bracket was found (a, b or c)
# you should use the group method
print(result.group())
print()

# If we put in begin b symbol we recieve the following:

input_str = "ball the film Titanic was released in 1998"
result = re.match(pattern, input_str)
print(result)
print(result.group())

None

<re.Match object; span=(0, 1), match='a'>
a

<re.Match object; span=(0, 1), match='b'>
b


In [90]:
# Also you can use the re.search() method which searches all the matches with pattern in text, not only in begin.

input_str = "The film Titanic was released in 1998"
pattern = r'[abc]'

result = re.search(pattern, input_str)
print(result)
# re.MatchObject.span() method returns a tuple containing starting and ending index of the matched string. 
# If group did not contribute to the match it returns(-1,-1).
print(result.span())

<re.Match object; span=(12, 13), match='a'>
(12, 13)


In [91]:
# For finding and returning of pattern in text use the findall() method
# It returns result of searching of pattern

input_str = "The film Titanic was released in 1998"
pattern = r'[abc]'

result = re.findall(pattern, input_str)
print(result)

['a', 'c', 'a', 'a']


In [11]:
# This pattern returns all of the letters in the text in low register
input_str = "The film Titanic was released in 1998"
pattern = r'[a-z]'

result = re.findall(pattern, input_str)
print(result)
print()

# And this pattern returns all of the letters in the text in high register
input_str = "The film Titanic was released in 1998"
pattern = r'[A-Z]'

result = re.findall(pattern, input_str)
print(result)
print()

# And finally this pattern returns all of the numbers in the text
input_str = "The film Titanic was released in 1998"
pattern = r'[0-9]'

result = re.findall(pattern, input_str)
print(result)

['h', 'e', 'f', 'i', 'l', 'm', 'i', 't', 'a', 'n', 'i', 'c', 'w', 'a', 's', 'r', 'e', 'l', 'e', 'a', 's', 'e', 'd', 'i', 'n']

['T', 'T']

['1', '9', '9', '8']


### Period ( . )

This metacharacter matches any single symbol except space.

In [104]:
input_str = "The film Titanic was released in 1998"

print('Using of match():')

pattern = r'.'
result = re.match(pattern, input_str)
print(result)

pattern = r'..'
result = re.match(pattern, input_str)
print(result)

pattern = r'...'

result = re.match(pattern, input_str)
print(result)
print()

print('Using of search():')

pattern = r'.'
result = re.search(pattern, input_str)
print(result)

pattern = r'..'
result = re.search(pattern, input_str)
print(result)

pattern = r'...'

result = re.search(pattern, input_str)
print(result)
print()

print('Using of findall():')
print("One period pattern:")
pattern = r'.'
result = re.findall(pattern, input_str)
print(result)
print()

print("Two period pattern:")
pattern = r'..'
result = re.findall(pattern, input_str)
print(result)
print()

print("Three period pattern:")
pattern = r'...'
result = re.findall(pattern, input_str)
print(result)

# As you can see, period pattern is any symbol including spaces

Using of match():
<re.Match object; span=(0, 1), match='T'>
<re.Match object; span=(0, 2), match='Th'>
<re.Match object; span=(0, 3), match='The'>

Using of search():
<re.Match object; span=(0, 1), match='T'>
<re.Match object; span=(0, 2), match='Th'>
<re.Match object; span=(0, 3), match='The'>

Using of findall():
One period pattern:
['T', 'h', 'e', ' ', 'f', 'i', 'l', 'm', ' ', 'T', 'i', 't', 'a', 'n', 'i', 'c', ' ', 'w', 'a', 's', ' ', 'r', 'e', 'l', 'e', 'a', 's', 'e', 'd', ' ', 'i', 'n', ' ', '1', '9', '9', '8']

Two period pattern:
['Th', 'e ', 'fi', 'lm', ' T', 'it', 'an', 'ic', ' w', 'as', ' r', 'el', 'ea', 'se', 'd ', 'in', ' 1', '99']

Three period pattern:
['The', ' fi', 'lm ', 'Tit', 'ani', 'c w', 'as ', 'rel', 'eas', 'ed ', 'in ', '199']


### Caret ( ^ )

This symbol used for checking if string starts with pattern.

In [107]:
print(re.search(r'^a', "a"))
print(re.search(r'^a', "abc"))
print(re.search(r'^a', "bac"))
print(re.search(r'^a', "bca"))
print(re.search(r'^a', "acb"))

<re.Match object; span=(0, 1), match='a'>
<re.Match object; span=(0, 1), match='a'>
None
None
<re.Match object; span=(0, 1), match='a'>


If Caret use with [ ] then it matches not expression in the square brackets. For example:

In [37]:
print(re.search(r'[^a]', "acb"))
print(re.search(r'[^ac]', "acb"))
print(re.search(r'[^acb]', "acb"))

<re.Match object; span=(1, 2), match='c'>
<re.Match object; span=(2, 3), match='b'>
None


### Dollar ( $ )

This symbol used for checking if string ends with pattern.

In [109]:
print(re.search(r'a$', "a"))
print(re.search(r'a$', "abc"))
print(re.search(r'a$', "bac"))
print(re.search(r'a$', "bca"))
print(re.search(r'a$', "acb"))

<re.Match object; span=(0, 1), match='a'>
None
None
<re.Match object; span=(2, 3), match='a'>
None


### Asterisk ( * )

This symbol matches zero or more occurrences of symbol (or pattern) in left to it in text.

In [121]:
print(re.search(r'a*n', "woman"))
# Zero or more occurrence of symbol a before symbol n in word woman
print(re.findall(r'a*n', "woman"))
# Zero or more occurrence of symbol a in word woman
print(re.findall(r'a*', "woman"))

<re.Match object; span=(3, 5), match='an'>
['an']
['', '', '', 'a', '', '']


Then you can see how this symbol works on other strings:

In [128]:
print(re.search(r'a*n', "mn"))
print(re.findall(r'a*n', "mn"))
print()

print(re.search(r'a*n', "maaaaan"))
print(re.findall(r'a*n', "maaaaan"))
print()

# Here before symbol n stands symbol i and it counts as zero occurrence of symbol a before symbol n
print(re.search(r'a*n', "main"))
print(re.findall(r'a*n', "main"))
print()

# But if we ask that template includes firstly symbol m then zero or more symbols a and finally symbol n
# then we will recieve no one matching in "main" string
print(re.search(r'ma*n', "main"))
print(re.findall(r'ma*n', "main"))
print()

<re.Match object; span=(1, 2), match='n'>
['n']

<re.Match object; span=(1, 7), match='aaaaan'>
['aaaaan']

<re.Match object; span=(3, 4), match='n'>
['n']

None
[]



### Plus ( + )

This symbol matches one or more occurrences of symbol (or pattern) in left to it in text.

In [129]:
print(re.search(r'a+n', "woman"))
# One or more occurrence of symbol a before symbol n in word woman
print(re.findall(r'a+n', "woman"))
# One or more occurrence of symbol a in word woman
print(re.findall(r'a+', "woman"))

<re.Match object; span=(3, 5), match='an'>
['an']
['a']


In [130]:
# Here no one symbol a before symbol n, thus no one matches in string
print(re.search(r'a+n', "mn"))
print(re.findall(r'a+n', "mn"))
print()

print(re.search(r'a+n', "maaaaan"))
print(re.findall(r'a+n', "maaaaan"))
print()

# Here before symbol n stands symbol i and it counts as zero occurrence of symbol a before symbol n
# thus no one matches in string, because of required one or more
print(re.search(r'a+n', "main"))
print(re.findall(r'a+n', "main"))
print()

# And if we ask that template includes firstly symbol m then zero or more symbols a and finally symbol n
# then we will recieve no one matching in "main" string
print(re.search(r'ma+n', "main"))
print(re.findall(r'ma+n', "main"))
print()

None
[]

<re.Match object; span=(1, 7), match='aaaaan'>
['aaaaan']

None
[]

None
[]



### Question mark ( ? )

This symbol matches zero or one occurrences of symbol (or pattern) in left to it in text.

In [132]:
print(re.search(r'a?n', "woman"))
# One or zero occurrence of symbol a before symbol n in word woman
print(re.findall(r'a?n', "woman"))
# One or zero occurrence of symbol a in word woman
print(re.findall(r'a?', "woman"))

<re.Match object; span=(3, 5), match='an'>
['an']
['', '', '', 'a', '', '']


In [136]:
# Here no one symbol a before symbol n, but zero symbols a there is
print(re.search(r'a?n', "mn"))
print(re.findall(r'a?n', "mn"))
print()

# Matches the first "an" combination - in the end of string
print(re.search(r'a?n', "maaaaan"))
print(re.findall(r'a?n', "maaaaan"))
print()

# Here before symbol n stands symbol i and it counts as zero occurrence of symbol a before symbol n
print(re.search(r'a?n', "main"))
print(re.findall(r'a?n', "main"))
print()

# And if we ask that template includes firstly symbol m then zero or more symbols a and finally symbol n
# then we will recieve no one matching in "main" string
print(re.search(r'ma+n', "main"))
print(re.findall(r'ma+n', "main"))
print()

<re.Match object; span=(1, 2), match='n'>
['n']

<re.Match object; span=(5, 7), match='an'>
['an']

<re.Match object; span=(3, 4), match='n'>
['n']

None
[]



### Braces ( { } )

Consider this code: {n,m}. This means at least n and at most m repetitions of the pattern left to it in the text.

In [6]:
# Here 2 or 3 symbols a following one by one searched as pattern in text

print(re.search(r'a{2,3}', "mn"))
print(re.findall(r'a{2,3}', "mn"))
print()


print(re.search(r'a{2,3}', "maaaaan"))
print(re.findall(r'a{2,3}', "maaaaan"))
print()

# Here 1 or 2 symbols a following one by one searched as pattern in text              
print(re.search(r'a{1,2}', "main"))
print(re.findall(r'a?n', "main"))
print()

# And if we ask that template includes firstly symbol m then 1 or 2 symbols a following one by one
# then we will recieve no one matching in "main" string
print(re.search(r'ma{1,2}', "main"))
print(re.findall(r'ma{1,2}', "main"))
print()

# And more complicated expression
print(re.search(r'a{1,2}', "aabc for an all"))
print(re.findall(r'a{1,2}', "aabc for an all"))

None
[]

<re.Match object; span=(1, 4), match='aaa'>
['aaa', 'aa']

<re.Match object; span=(1, 2), match='a'>
['n']

<re.Match object; span=(0, 2), match='ma'>
['ma']

<re.Match object; span=(0, 2), match='aa'>
['aa', 'a', 'a']


Consider the more complicated patterns. For example, we can consider pattern [0-9]{2,3} which means 2 or 3 symbols each of which is number from 0 to 9.

In [8]:
print(re.search(r'[0-9]{2,3}', "One of my men have 90 knives, 500 guns and 79 grenades"))
print(re.findall(r'[0-9]{2,3}', "One of my men have 90 knives, 500 guns and 79 grenades"))
print()


# findall() finds all of the occurrences of the pattern in the text
print(re.search(r'[0-9]{2,3}', "12 and 1234567"))
print(re.findall(r'[0-9]{2,3}', "12 and 1234567"))

<re.Match object; span=(19, 21), match='90'>
['90', '500', '79']

<re.Match object; span=(0, 2), match='12'>
['12', '123', '456']


{n} matches exactly n repetitions of letter or other pattern in text.

In [12]:
print(re.search(r'[0-9]{3}', "12 and 1234567"))
print(re.findall(r'[0-9]{3}', "12 and 1234567"))

<re.Match object; span=(7, 10), match='123'>
['123', '456']


### Vertical bar ( | )

Vertical bar | uses as alternation (logical operator 'or'). For example, regexp of form "pattern1|pattern2" matches occurrences pattern1 or pattern2 in the text.

In [13]:
# And finally this pattern matches numbers or letters of low register in the text  
input_str = "The film Titanic was released in 1998"
pattern = r'[0-9]|[a-z]'

result = re.findall(pattern, input_str)
print(result)

['h', 'e', 'f', 'i', 'l', 'm', 'i', 't', 'a', 'n', 'i', 'c', 'w', 'a', 's', 'r', 'e', 'l', 'e', 'a', 's', 'e', 'd', 'i', 'n', '1', '9', '9', '8']


### Rround brackets ( )

This symbol uses for group of some sub-patterns. For example, expression (a|b|c)xz match any string that matches either a or b or c followed by xz.

In [21]:
# Here we are looking for which letter comes before xz - a, b or c
# Pattern (a|b|c) describes only one of these letters - only one of these letters stands before xz
# Round brackets groups patterns (letters a, b and c) by | symbol

print(re.search(r'(a|b|c)xz', "ab xz"))
print(re.findall(r'(a|b|c)xz', "ab xz"))
print()

print(re.search(r'(a|b|c)xz', "abxz"))
print(re.findall(r'(a|b|c)xz', "abxz"))
print()

print(re.search(r'(a|b|c)xz', "axz cabxz"))
print(re.findall(r'(a|b|c)xz', "axz cabxz"))

None
[]

<re.Match object; span=(1, 4), match='bxz'>
['b']

<re.Match object; span=(0, 3), match='axz'>
['a', 'b']


Let's talk about difference between ordinary patterns and grouped patterns:

In [28]:
import pandas as pd

tabledata = [
         ["bar+", 'The + metacharacter applies only to the character "r"', '"ba" followed by one or more occurences of "r"'],
             
         ["(bar)+",'The + metacharacter applies to the entire string "bar"' , 'One or more occurences of "bar" in the text'],
            ]

pd.DataFrame(tabledata, columns=["Regex", "Interpretation", "Matches"])

Unnamed: 0,Regex,Interpretation,Matches
0,bar+,The + metacharacter applies only to the charac...,"""ba"" followed by one or more occurences of ""r"""
1,(bar)+,The + metacharacter applies to the entire stri...,"One or more occurences of ""bar"" in the text"


More complicated pattern:

The regex (ba[rz]){2,4}(qux)? matches 2 or 4 occurrences of either 'bar' or 'baz', optionally followed by qux (zero or one zero or one occurrences of qux).

In [45]:
print(re.search(r'(ba[rz]){2,4}(qux)?', "bazbarbazqux"))
print(re.findall(r'(ba[rz]){2,4}(qux)?', "bazbarbazqux"))
print()

print(re.search(r'(ba[rz]){2,4}(qux)?', "barbar"))
print(re.findall(r'(ba[rz]){2,4}(qux)?', "barbar"))
print()

<re.Match object; span=(0, 12), match='bazbarbazqux'>
[('baz', 'qux')]

<re.Match object; span=(0, 6), match='barbar'>
[('bar', '')]



Take some notes about how we can return matched string. For this you can use the re.search.group(0) method. It returns either matched text in string format:

In [48]:
print(re.search(r'(ba[rz]){2,4}(qux)?', "bazbarbazqux"))
print(re.search(r'(ba[rz]){2,4}(qux)?', "bazbarbazqux").group(0))

<re.Match object; span=(0, 12), match='bazbarbazqux'>
bazbarbazqux


Other groups is empty since you do not have any capturing groups - http://docs.python.org/library/re.html#re.MatchObject.groups. 
group(0) will always returns the whole text that was matched regardless of if it was captured in a group or not.

### Backslash ( \ )

This is used to escape various characters including all metacharacters. For example expression '\ \\$a' matches symbol \\$ followed by a. Thus here symbol $ is not interrupted by a RegEx engine in a special way.

In [49]:
print(re.search(r'\$a', "b$azbarb$azqux"))
print(re.search(r'\$a', "b$azbarb$azqux").group(0))

<re.Match object; span=(1, 3), match='$a'>
$a


## Special sequences

Special sequences make commonly used patterns easier to write. Let's look at the most popular of these:

### \A

Matches if the specified characters are at the start of a string

In [21]:
print(re.findall(r'\Athe', 'the telegraph'))
print(re.findall(r'\Athe', 'On the Sun'))

['the']
[]


### \Z

Matches if the specified characters are at the end of the string.

In [44]:
print(re.findall(r'the\Z', 'the telegraph'))
print(re.findall(r'the\Z', 'On the Sun the'))

[]
['the']


### \b

Matches if the specified characters are at the begining or end of a word. Matching carried out by words (between spaces).

* \b(word) - matches at the begin of the word
* (word)\b - matches at the end of the word

In [17]:
print(re.findall(r'\bfoo', 'football'))
print(re.findall(r'\bfoo', 'a football'))
print(re.findall(r'\bfoo', 'afootball'))

print(re.findall(r'foo\b', 'the foo'))
print(re.findall(r'foo\b', 'the afoo test'))
print(re.findall(r'foo\b', 'theafootest'))

['foo']
['foo']
[]
['foo']
['foo']
[]


### \B

Matches if the specified characters are not at the begining or end of a word. This operator opposite by \b.

In [18]:
print(re.findall(r'\Bfoo', 'football'))
print(re.findall(r'\Bfoo', 'a football'))
print(re.findall(r'\Bfoo', 'afootball'))

print(re.findall(r'foo\B', 'the foo'))
print(re.findall(r'foo\B', 'the afoo test'))
print(re.findall(r'foo\B', 'theafootest'))

[]
[]
['foo']
[]
[]
['foo']


### \d

Matches any decimal digit. This special sequence is equivalent of [0-9].

In [23]:
print(re.findall(r'\d', '78cbd12'))
print(re.findall(r'\d', 'cbd'))

['7', '8', '1', '2']
[]


It also may be useful to combine this special sequence with such metacharacters as { } or ( ), or +, or ?

In [29]:
print(re.search(r'(foo(bar)?)+(\d\d\d)?', 'foofoobar').group(0))
print(re.search(r'(foo(bar)?)+(\d\d\d)?', 'foofoobar123').group(0))
print(re.search(r'(foo(bar)?)+(\d\d\d)?', 'foofoo123').group(0))

foofoobar
foofoobar123
foofoo123


### \D

Matches any non-digit. This special sequence is opposite to \d and also is equivalent of [^0-9].

In [32]:
print(re.findall(r'\D', '78cbd12'))
print(re.findall(r'\D', '7812'))

['c', 'b', 'd']
[]


### \s
Matches any whitespace character. Equivalent of [\t\n\r\f\v].

In [38]:
print(re.findall(r'\s', "Python is   the   best language for text parsing"))
print(re.findall(r'\s', "Pythonisthebestlanguagefortextparsing"))

[' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ']
[]


### \S

Matches any non-whitespace character. This special sequence is opposite to \s. Equivalent of [^\t\n\r\f\v].

In [41]:
print(re.findall(r'\S', "a     b   c"))
print(re.findall(r'\S', "abc"))
print(re.findall(r'\S', "    "))

['a', 'b', 'c']
['a', 'b', 'c']
[]


### \w

Matches any alphanumeric character (digits and alphabets). Equivalent of [a-zA-Z0-9]

In [42]:
print(re.findall(r'\w', "12&:;c"))
print(re.findall(r'\w', "#&:;%^&"))

['1', '2', 'c']
[]


### \W

Matches any non-alphanumeric character (not digits and not alphabets). Equivalent of [^a-zA-Z0-9]

In [43]:
print(re.findall(r'\W', "12&:;c"))
print(re.findall(r'\W', "#&:;%^&"))

['&', ':', ';']
['#', '&', ':', ';', '%', '^', '&']


# Collections

## Counter

In [7]:
from collections import Counter

In [8]:
# Here you can create some iterable data structure
# For example, string

string = "Can I take more these delicious French bun?"

# This counter returns dictionary with keys of items of data structure
# and with values of frequency of occurrence these items in data structure
my_counter = Counter(string)
print(my_counter)
print()

# You can find N of the most commons elements in datastructure
N = 2
print(my_counter.most_common(N))
print()

# You can create list of seperate elements of your data structure
print(list(my_counter.elements()))

Counter({' ': 7, 'e': 6, 'n': 3, 'a': 2, 't': 2, 'o': 2, 'r': 2, 'h': 2, 's': 2, 'i': 2, 'c': 2, 'u': 2, 'C': 1, 'I': 1, 'k': 1, 'm': 1, 'd': 1, 'l': 1, 'F': 1, 'b': 1, '?': 1})

[(' ', 7), ('e', 6)]

['C', 'a', 'a', 'n', 'n', 'n', ' ', ' ', ' ', ' ', ' ', ' ', ' ', 'I', 't', 't', 'k', 'e', 'e', 'e', 'e', 'e', 'e', 'm', 'o', 'o', 'r', 'r', 'h', 'h', 's', 's', 'd', 'l', 'i', 'i', 'c', 'c', 'u', 'u', 'F', 'b', '?']


## Namedtuples

In [9]:
from collections import namedtuple

In [10]:
# namedtuple('name_of_class', 'names, of, fields')
Point = namedtuple('Point', "x, y")
# Here we created class Point with fields x and y

# And we can use it
pt = Point(3, 4)
print(pt)
print()

# Or directly call the fields
print('fields: ',  pt.x, pt.y)

Point(x=3, y=4)

fields:  3 4


## deque

In [11]:
from collections import deque

In [12]:
# deque is data structure which provides remove and add elements at the both sides

d = deque()

# append from the right side
d.append(1)
d.append(2)
d.append(3)
d.append(4)
print(d,"\n")

# append from the left side
d.appendleft(5)
print(d, '\n')

# remove elements from the left and from the right
d.pop()
d.popleft()
print(d,"\n")
# d.clear() method clean all the deque collection

# extension of deque by other iterable collection
# from ther right side
d.extend([7,8,9])
# and from the left side
d.extendleft([7,8,9])
print(d, "\n")

# extend() method works also in the lists

# you can rotate all the elements of deque on N positions
N = 2
print('original deque: ', d)
d.rotate(N)
print('deque rotated on', N, 'positions: ', d)
# N < 0 means revert rotation
d.rotate(-N)
print('deque rotated on', N, 'positions reverse: ', d)

deque([1, 2, 3, 4]) 

deque([5, 1, 2, 3, 4]) 

deque([1, 2, 3]) 

deque([9, 8, 7, 1, 2, 3, 7, 8, 9]) 

original deque:  deque([9, 8, 7, 1, 2, 3, 7, 8, 9])
deque rotated on 2 positions:  deque([8, 9, 9, 8, 7, 1, 2, 3, 7])
deque rotated on 2 positions reverse:  deque([9, 8, 7, 1, 2, 3, 7, 8, 9])


# Iterators and Generators

## Iterators

Iterator is an object with methods iter() - constructor of class and with method next() - method which provides one step of iteration through iterable collection (strings, lists, dictionaries, e.t.c.).

During the declaration iterators gets from iterable collection:

In [13]:
some_list = [1, 2, 3, 4, 5]

# iterator which point on begin of this list
some_iter = iter(some_list)
print("our iterator: ", some_iter, "\n")

# step of iteration
print(next(some_iter))
# and the next one
print(next(some_iter), "\n")

our iterator:  <list_iterator object at 0x0000023809408580> 

1
2 



if you will iterate to end of the collection and try to call next() method then you raise StopIteration exception

## Generators

Generators provides to create your own iterator function.
This function (generator uses as function) uses yield instead of return. But after yield generator function doesn't ends. It runs until body of generator function is finished.

Generator is a iterable object but they can only be read once. This is because they do not store values in memory after generation.

In another words, generators are used to create an disposable iterable collection.

In [14]:
# yield returns value from generator iterable collection,
# push values until they run out

# Let's say we have three values followed by a code. 
# For example, by displaying a message.

def myGenerator():
    print('First item')
    yield 10
    
    print('Second item')
    yield 20
    
    print('Last item')
    yield 30


# create iterator on begin of the generator
gen = myGenerator()

print(next(gen))
print(next(gen))
print(next(gen))

First item
10
Second item
20
Last item
30


Print function is just example. Before yield we can create any code. For example, we can crate the rules according to which define value which returned by yield. Thus we can create any iterable collections by generators.

In [15]:
def getSequenceUpTo(x):
    for i in range(x):
        if (i <= x):
            yield i
        else: break
        

x = 5

# create iterator on begin of iterable collection
gen = getSequenceUpTo(x)

print(next(gen))
print(next(gen))
print(next(gen))
print(next(gen))
print(next(gen), "\n")

# or we can use familiar iterator 
# but code below doesn't work
for i in gen:
    print(i)

# because of generators can only be read once 
# This is because they do not store values in memory after generation

0
1
2
3
4 



In [16]:
# create a new generator with the same form

gen = getSequenceUpTo(x)

# and use iterators which automatically creates in cycle

for i in gen:
    print(i)
    
# it works!

0
1
2
3
4


Another one example of generator - Fibonacci number series before number maximum

In [17]:
def fibonacci(maximum):
    a, b = 0, 1
    
    while a < maximum:
        yield a
        a, b = b, a + b

In [18]:
maximum = 150

fib = fibonacci(maximum)

for i in fib:
    print(i)

0
1
1
2
3
5
8
13
21
34
55
89
144


# Itertools

The itertools module has special functions for working with iterables. Would you like to duplicate the generator? Connect two generators in one sequence? Group nested list values in an one row? Apply a map or zip code without creating another list of one?

Just add the itertools import.

## product

In [19]:
from itertools import product

product returns the generator, which gives "decart multiplication" for elements of iterable objects

In [20]:
print("For lists: ")
a = [1, 2]
b = [3, 4]
c = [5, 6]
print("product(a, b) =", list(product(a, b)))
print("product(a, b, c) =", list(product(a, b, c)), "\n")

print("For strings: ")
a = "ab"
b = "cd"
c = "ef"
print("product(a, b) =", list(product(a, b)))
print("product(a, b, c) =", list(product(a, b, c)), "\n")

print("For mix of iterable objects: ")
a = "ab"
b = [1, 2]
c = ("c", 3)
print("product(a, b) =", list(product(a, b)))
print("product(a, b, c) =", list(product(a, b, c)), "\n")

#repeat is amount of repeats. This optional argument uses when it needs to 
# take multiplication of iterable object on himself

N = 2
print("For",N ,"repeats : ")
a = [1, 2]
b = [3, 4]
c = [5, 6]
print("product(a, b) =", list(product(a, b, repeat=N)))
print("product(a, b, c) =", list(product(a, b, c, repeat=N)), "\n")

For lists: 
product(a, b) = [(1, 3), (1, 4), (2, 3), (2, 4)]
product(a, b, c) = [(1, 3, 5), (1, 3, 6), (1, 4, 5), (1, 4, 6), (2, 3, 5), (2, 3, 6), (2, 4, 5), (2, 4, 6)] 

For strings: 
product(a, b) = [('a', 'c'), ('a', 'd'), ('b', 'c'), ('b', 'd')]
product(a, b, c) = [('a', 'c', 'e'), ('a', 'c', 'f'), ('a', 'd', 'e'), ('a', 'd', 'f'), ('b', 'c', 'e'), ('b', 'c', 'f'), ('b', 'd', 'e'), ('b', 'd', 'f')] 

For mix of iterable objects: 
product(a, b) = [('a', 1), ('a', 2), ('b', 1), ('b', 2)]
product(a, b, c) = [('a', 1, 'c'), ('a', 1, 3), ('a', 2, 'c'), ('a', 2, 3), ('b', 1, 'c'), ('b', 1, 3), ('b', 2, 'c'), ('b', 2, 3)] 

For 2 repeats : 
product(a, b) = [(1, 3, 1, 3), (1, 3, 1, 4), (1, 3, 2, 3), (1, 3, 2, 4), (1, 4, 1, 3), (1, 4, 1, 4), (1, 4, 2, 3), (1, 4, 2, 4), (2, 3, 1, 3), (2, 3, 1, 4), (2, 3, 2, 3), (2, 3, 2, 4), (2, 4, 1, 3), (2, 4, 1, 4), (2, 4, 2, 3), (2, 4, 2, 4)]
product(a, b, c) = [(1, 3, 5, 1, 3, 5), (1, 3, 5, 1, 3, 6), (1, 3, 5, 1, 4, 5), (1, 3, 5, 1, 4, 6), (1, 3, 5, 2, 

## permutations

In [21]:
from itertools import permutations

permunations(Iter, N) returns the generator, which gives the all possible permutations of elemetns of Iter of lengh N; Iter - is iterable object

Formula of amount of elements: A_{len(Iter)}^{N}

In [22]:
a = [1, 2, 3, 4]
N = 2
print(f"permutations(a, {N}) = ", list(permutations(a, N)))

permutations(a, 2) =  [(1, 2), (1, 3), (1, 4), (2, 1), (2, 3), (2, 4), (3, 1), (3, 2), (3, 4), (4, 1), (4, 2), (4, 3)]


## combinations 

In [23]:
from itertools import combinations

combinations(Iter, N) returns the generator, which gives the all possible combinations (withot repetitions) of elemetns of Iter of lengh N; Iter - is iterable object 

Formula of amount of elements: C_{len(Iter)}^{N}

In [24]:
a = [1, 2, 3, 4]
N = 2
print(f"combinations(a, {N}) = ", list(combinations(a, N)))

combinations(a, 2) =  [(1, 2), (1, 3), (1, 4), (2, 3), (2, 4), (3, 4)]


In [25]:
from itertools import combinations_with_replacement

combinations_with_replacement(Iter, N) returns the generator, which gives the all possible combinations with replacement (combinations with repetitions) of elemetns of Iter of lengh N; Iter - is iterable object 

Formula of amount of elements: C_{len(Iter) + N - 1}^{N}

In [26]:
a = [1, 2, 3, 4]
N = 2
print(f"combinations_with_rep(a, {N}) = ", list(combinations_with_replacement(a, N)))

combinations_with_rep(a, 2) =  [(1, 1), (1, 2), (1, 3), (1, 4), (2, 2), (2, 3), (2, 4), (3, 3), (3, 4), (4, 4)]


## accumulate

In [27]:
from itertools import accumulate

accumulate(Iter) returns the generator, which gives accumulate sum (every follwing object of generator is the sum of all the previous and current elements of initial iterable object) of Iter where Iter - is iterable object

In [28]:
a = [1, 2, 3, 4]
acc = accumulate(a)
print("a = ", a)
print("acc = ", list(acc))

a =  [1, 2, 3, 4]
acc =  [1, 3, 6, 10]


But it is possible to define rule of accumulations. In order to do it, the "operator" module should be imported.

In this module, for example, placed "mul" rule which transform accumulate from accumulate sum to accumulate multiplication

In [29]:
from itertools import accumulate
import operator

In [30]:
a = [1, 2, 3, 4]
acc = accumulate(a, func=operator.mul)
print("a = ", a)
print("acc = ", list(acc))

a =  [1, 2, 3, 4]
acc =  [1, 2, 6, 24]


## groupby

groupby(Iter, key) returns the generator, which groups elements of Iter in correspond with rule "key" function determined

In [31]:
from itertools import groupby

In [32]:
# key function
def smaller_than_3(x):
    return x < 3

a = [1, 2, 3, 4, 5, 6, 7]
gb = groupby(a, key=smaller_than_3)

# True key have elements which satisfy the "key" function rule
# False key have elements which not satisfy the "key" function rule
for key, value in gb:
    print(key, list(value))

True [1, 2]
False [3, 4, 5, 6, 7]


# Object serialization

Object serialization is the process of converting state of an object into byte stream. This byte stream can further be stored in any file-like object such as a disk file or memory stream. It can also be transmitted via sockets etc.

Correspodingly, deserialization is the process of reconstructing the object from the byte stream.

Python refers to serialization and deserialization by terms pickling and unpickling because of module for these operations in Python named 'pickle'.

The pickle module bundled with Python's standard library defines functions for:

* Serialization: dump() and dumbs() methods 
* Deserialization: load() and loads() methods

It's important: it can't be considered to be secure to unpickle data from unauthenticated sources!

In [2]:
import pickle

## Serialization (pickling): dump()

In [4]:
f=open('pickled.txt', 'wb')

dct = {'name': "Alex", 'age':24, 'gender':'male', 'marks':100}

pickle.dump(dct, f)

f.close()

When above code is executed, the dictionary object's byte representation will be stored in 'pickled.txt' file. For execution of this operation file must have 'write and binary'('wb') mode enabled.

## Serialization (pickling): dumps()

dumps() method pickles Python data to a bytes string representation without creation file or writing in it.

In [7]:
dct = {'name': "Alex", 'age':24, 'gender':'male', 'marks':100}

dict_byte_str = pickle.dumps(dct)

dict_byte_str

b'\x80\x04\x955\x00\x00\x00\x00\x00\x00\x00}\x94(\x8c\x04name\x94\x8c\x04Alex\x94\x8c\x03age\x94K\x18\x8c\x06gender\x94\x8c\x04male\x94\x8c\x05marks\x94Kdu.'

## Deserialization (unpickling): loads()

loads() method unpickles Python data from string bytes representation to original object.

In [8]:
dct = pickle.loads(dict_byte_str)

dct

{'name': 'Alex', 'age': 24, 'gender': 'male', 'marks': 100}

## Deserialization (unpickling): load()

In [6]:
f=open('pickled.txt', 'rb')

d=pickle.load(f)

print(d)

f.close()

{'name': 'Alex', 'age': 24, 'gender': 'male', 'marks': 100}


Note that to execution of this operation file must have 'read and binary'('rb') mode enabled.

Also note that the dictionsry object doesn't retain order of insertion of keys. Hence the key-value pairs in retrieved dictionary may not be in original order!

# Lambda functions

Functions of the form "lambda arguments: expression" named as lambda functions or also called anonymous functions. Is the short entry (in one string) of the standard functions. They are only useful when a one-time function is needed - in this case creation of standard function take the extra place in your code.

In [33]:
# lambda with one argument
f1 = lambda x: x*x
print("f1 = ", f1(5), "\n")

# lambda with two arguments
f2 = lambda x, y: x+y
print("f2 = ", f2(5, 6), "\n")

# lambda without any argument
f3 = lambda: True
print("f3 = ", f3(), "\n")

f1 =  25 

f2 =  11 

f3 =  True 



Lambdas often uses as "key" function in filter(), map(), sort() and sorted()

## lambda in sorted()

Method sorted() in Python sorts, for example, list of the tuples by the first item in every typle of the list (by x-axis). Lamba may be create rule according to which sorted() will be sort such collection by second item in every typle of the list (by y-axis). Or it may creates some other custom rules for sorted().

In [34]:
points_2D = [(1, 4), (-2, 5), (6, -3), (4, 12), (11, 2)]
points_2D_sorted = sorted(points_2D, key=lambda x: x[1])
points_2D_sorted_sum = sorted(points_2D, key=lambda x: x[1] + x[0])

print("Initial collection: ",points_2D)
print("Sorted by y-axis: ", points_2D_sorted)
print("Sorted according to sums the typles elements: ", points_2D_sorted_sum)

Initial collection:  [(1, 4), (-2, 5), (6, -3), (4, 12), (11, 2)]
Sorted by y-axis:  [(6, -3), (11, 2), (1, 4), (-2, 5), (4, 12)]
Sorted according to sums the typles elements:  [(-2, 5), (6, -3), (1, 4), (11, 2), (4, 12)]


## lambda in map()

In [35]:
a = [1, 2, 5, 3, 5, 7]
b = map(lambda x: x*3, a)
print("Initial list: ", a)
print("Maped list: ", list(b))

# You actually also use lambda in trivial List comprehension or list generator (implisit use - without lambda key word):

print("List generator: ", [x*3 for x in a])

Initial list:  [1, 2, 5, 3, 5, 7]
Maped list:  [3, 6, 15, 9, 15, 21]
List generator:  [3, 6, 15, 9, 15, 21]


## lambda in filter()

In [36]:
a = [1, 2, 5, 3, 6, 22, 5, 7]
b = filter(lambda x: x%2 == 0, a)

print("Initial list: ", a)
print("Filtered list: ", list(b))

# Filtered list it is the same thing what a following code:

print("Implisit filter in list: ", [x for x in a if x%2 == 0])

Initial list:  [1, 2, 5, 3, 6, 22, 5, 7]
Filtered list:  [2, 6, 22]
Implisit filter in list:  [2, 6, 22]


# Function arguments

Let's talk about arguments of function in details.

## The difference between arguments and parameters

In [51]:
def say_hello(name):
    print(f'Hello, {name}')

say_hello('Alex')

# name - is the parameter, 'Alex' is the argument

Hello, Alex


## Positional and keyword arguments

In [55]:
def things(a, b, c):
    print(a, b, c)
# We can use the positional (standing in his certain positions) arguments 1, 2, 3
things(1, 2, 3)

# or we can use keyword (with direct equalization arguments to parameters, order is not important) arguments 1, 2, 3
things(b=2, a=1, c=3)

# Or we can use the mix of both types of arguments. But positional arguments always stands first
things(1, c=3, b=2)

# And we receive the same result of all of this declarations:

1 2 3
1 2 3
1 2 3


## Default arguments

In [57]:
# Default arguments is the arguments, which passed to the function by default (in declaration of function)
# Default arguments stands last


def things(a, b, c, d=4):
    print(a, b, c, d)

things(1, 2, 3)

1 2 3 4


## Variable-length arhuments (*args and **kwargs)

In [59]:
# *args means any number of positional parameters of function
# args is the tuple

# **kwargs means any number of keyword of functional parameters of function
# kwargs is the dictionary

def things(a, b, *args, **kwargs):
    print(a, b)
    for arg in args:
        print(arg)
    for key in kwargs:
        print(key, kwargs[key])
        
things(2, 3, 4, 5, 6, seven=7, eight=8, nine=9, ten=10)

2 3
4
5
6
seven 7
eight 8
nine 9
ten 10


Keywords arguments always stands after positional arguments. End of list of positional arguments may be daclared by * sign in parameters list.

In [60]:
def things(a, b, *, c, d):
    print(a, b, c, d)

# All the parameters after * must be keywords parameters    
things(2, 3, c=4, d=5)

2 3 4 5


Or list of the all the positional arguments may be stands first, and after that particular keywords arguments stands.

In [64]:
def things(*args, b, c):
    for arg in args:
        print(arg)
    print(b, c)
        
things(2, 3, 4, 5, 6, b=7, c=8)

2
3
4
5
6
7 8


## Container unpacking into function arguments

You can pass arguments to a function from some container.

In [86]:
# If you consider positional arguments, then you must use list or tuple

def foo_args(*args):
    for arg in args:
        print(arg)

print("foo_args(*lst):")
lst = [0, 1, 2, 3]
foo_args(*lst)
print()
print("foo_args(*tpl):")
tpl = (0, 1, 2, 3)
foo_args(*tpl)
print()

# If you consider keyword arguments, then you must use dictionary

def foo_kwargs(**kwargs):
    for key in kwargs:
        print(key, kwargs[key])

dct = {'a':4, 'b':5, 'd':6}
print("foo_kwargs(**dct):")
foo_kwargs(**dct)
print()

# If you consider mix of arguments type, you should use list/tuple and dictionsry together

def foo(*args, **kwargs):
    for arg in args:
        print(arg)
    for key in kwargs:
        print(key, kwargs[key])

print('foo(*tpl, **dct):')
foo(*tpl, **dct)

foo_args(*lst):
0
1
2
3

foo_args(*tpl):
0
1
2
3

foo_kwargs(**dct):
a 4
b 5
d 6

foo(*tpl, **dct):
0
1
2
3
a 4
b 5
d 6


Also your arguments may be directly define in function declaration

In [89]:
# If you consider positional arguments, then you must use list or tuple

def foo_args(a, b, c, d):
    print(a, b, c, d)

lst = [0, 1, 2, 3]
print("foo_args(*lst):")
foo_args(*lst)
print()
tpl = (0, 1, 2, 3)
print("foo_args(*tpl):")
foo_args(*tpl)
print()

# If you consider keyword arguments, then you must use dictionary

def foo_kwargs(a, b, c):
    print(a, b, c)

dct = {'a':4, 'b':5, 'c':6}

print("foo_kwargs(**dct):")
foo_kwargs(**dct)
print()

# If you consider mix of arguments type, you should use list/tuple and dictionsry together

def foo(a, b, c, d,*, e, f, g):
    print(a, b, c, d)
    print(e, f, g)
    
dct_1 = {'e':4, 'f':5, 'g':6}
print("foo_args(*tpl):")
foo(*tpl, **dct_1)

foo_args(*lst):
0 1 2 3

foo_args(*tpl):
0 1 2 3

foo_kwargs(**dct):
4 5 6

foo_args(*tpl):
0 1 2 3
4 5 6


# Closures in Python

Like nested loops, we can also nest functions. That said, Python gives us the power to define functions within functions.

Python Closures are these inner functions that are enclosed within the outer function. Closures can access variables present in the outer function scope. It can access these variables even after the outer function has completed its execution.

In difference to using simple nested functions, where inner function can be used in utility purposes, closures is construction where outer function directly returns the inner function and inner function not executed in outer function:

In [11]:
# Simple nesded function using:

def text_output(text):
    text=text
    
    # Nested function has access to local variables in outer 
    # function scope even without direct using such variable as 
    # function parameter of inner function
    def printing():
        print(text)
        
    printing()

text_output('Hello!')

Hello!


In [12]:
# Closures

# Simple nesded function using:

def text_output(text):
    text=text
    
    # Nested function in closures also has access to local 
    # variables in outer function scope even without direct using 
    # such variable as function parameter of inner function
    def printing():
        print(text)
        
    return printing

text_print = text_output('Hello!')
text_print()

Hello!


Let's consider more complicated example:

In [14]:
# Function for calculation which takes as parameter function
# with directive about actions 

def calc(func):
    
    def math_op(*args):
        print(func(*args))
    return math_op

def add(x, y):
    return x + y
def sub(x, y):
    return x - y

add_func = calc(add)
sub_func = calc(sub)

add_func(3, 4)
sub_func(3, 4)

7
-1


# Decorators

Decorator itself is a design pattern which is that creates some construction, which extend functionality of some function. Decorator takes function, extend its functionality and returns result. In most cases decorator is the outer function where decorated function is nested function in relation to decorator. But it also may be the outer class where decorated class is nested.

Decorator may be as function or class, it also may takes some arguments or takes no one arguments.

## Decorator as a design pattern 

For example creates some decorator without Python methods - as design pattern itself

### Decorator without arguments

In [37]:
# decorator function - function which will extend functionality of func() function 
def start_end_decorator(func):
    
    # wrapper is the function which extend the func() functionality
    def wrapper():
        print('Before execution')
        func()
        print('After execution')
    return wrapper
    
    
# Function which we want to extend
def print_name():
    print('Alex')

# In order to use decorator:
print_name = start_end_decorator(print_name)
print_name()

Before execution
Alex
After execution


In Python methods you can do the same by using @ sign:

In [38]:
# decorator function - function which will extend functionality of func() function 
def start_end_decorator(func):
    
    # wrapper is the function which extend the func() functionality
    def wrapper():
        print('Before execution')
        func()
        print('After execution')
    return wrapper
    
    
# Function which we want to extend
# In order to use decorator:
@start_end_decorator
def print_name():
    print('Alex')

print_name()

Before execution
Alex
After execution


### Decorator with arguments

In [39]:
# decorator function - function which will extend functionality of func() function 
def start_end_decorator(func):
    
    # wrapper is the function which extend the func() functionality
    def wrapper(*args, **kwargs):
        print('Before execution')
        result = func(*args, **kwargs)
        print('After execution')
        return result
    return wrapper
    
    
# Function which we want to extend
# In order to use decorator:
@start_end_decorator
def plus_5(x):
    return x + 5

print(plus_5(3))

Before execution
After execution
8


But arguments in decorator can be more complicated

In [40]:
import functools

def repeat(num_times):
    def decorator_repeat(func):
        
        @functools.wraps(func)
        def wrapper(*args, **kwargs):
            for _ in range(num_times):
                result = func(*args, **kwargs)
            return result
        return wrapper
    return decorator_repeat

N = 3
@repeat(num_times = N)
def greet(name):
    print(f'Hello, {name}')

greet('Alex')

Hello, Alex
Hello, Alex
Hello, Alex


## Preserving of data of initial function

If you take the decorator from previous point and execute 'help method' to initial function, you will see that instead of information about initial function shows information about decorator function.

In [41]:
# decorator function - function which will extend functionality of func() function 
def start_end_decorator(func):
    
    # wrapper is the function which extend the func() functionality
    def wrapper(*args, **kwargs):
        print('Before execution')
        result = func(*args, **kwargs)
        print('After execution')
        return result
    return wrapper
    
    
# Function which we want to extend
# In order to use decorator:
@start_end_decorator
def plus_5(x):
    return x + 5

print(help(plus_5))
print(plus_5.__name__)

Help on function wrapper in module __main__:

wrapper(*args, **kwargs)
    # wrapper is the function which extend the func() functionality

None
wrapper


To fix this and preserve information about initial function you should import 'functools' module and use @functools.wrapps:

In [42]:
import functools

In [43]:
# decorator function - function which will extend functionality of func() function 
def start_end_decorator(func):
    
    # wrapper is the function which extend the func() functionality
    @functools.wraps(func)
    def wrapper(*args, **kwargs):
        print('Before execution')
        result = func(*args, **kwargs)
        print('After execution')
        return result
    return wrapper
    
    
# Function which we want to extend
# In order to use decorator:
@start_end_decorator
def plus_5(x):
    return x + 5

print(help(plus_5))
print(plus_5.__name__)

Help on function plus_5 in module __main__:

plus_5(x)
    # wrapper is the function which extend the func() functionality

None
plus_5


## Template for decorator

Based on previous point, template for a nice decorators can be created.

In [44]:
# Decoratir with arguments
def my_decorator_args(func):
    
    # wrapper is the function which extend the func() functionality
    @functools.wraps(func)
    def wrapper(*args, **kwargs):
        # Do something before execution
        result = func(*args, **kwargs)
        # Do something after execution
        return result
    return wrapper


# Decorator without arguments
def my_decorator_no_args(func):
    
    # wrapper is the function which extend the func() functionality
    @functools.wraps(func)
    def wrapper():
        # Do something before execution
        func()
        # Do something after execution
        return result
    return wrapper

## Nested decorators

You also can stack your decorators. They will executed in declaration order.

In [45]:
def debug(func):
    @functools.wraps(func)
    def wrapper(*args, **kwargs):
        args_repr = [repr(a) for a in args]
        kwargs_repr = [f'{k} = {v!r}' for k, v in kwargs.items()]
        signature = ', '.join(args_repr + kwargs_repr)
        print(f"Calling {func.__name__}({signature})")
        result= func(*args, **kwargs)
        print(f'{func.__name__!r} returned {result!r}')
        return result
    return wrapper



def start_end_decorator(func):
    
    # wrapper is the function which extend the func() functionality
    @functools.wraps(func)
    def wrapper(*args, **kwargs):
        print('Before execution')
        result = func(*args, **kwargs)
        print('After execution')
        return result
    return wrapper


@debug
@start_end_decorator
def say_hello(name):
    greeting = f'Hello, {name}!'
    print(greeting)
    return greeting

say_hello('Alex')
# As you can see, decorators executed in declaration order.

Calling say_hello('Alex')
Before execution
Hello, Alex!
After execution
'say_hello' returned 'Hello, Alex!'


'Hello, Alex!'

## Class decorator

Finally, decorator may be a class with its own features

In [50]:
class CountClass:
    def __init__(self, func):
        self.func = func
        self.num_calls = 0
    # __call__() method executed when object of class calls just as function - object()
    # Or in more formalistic way:
    # "The __call__ method rises when the instance is found as a function. 
    # This is a non-repeating definition - if a __call__ method is encountered, the interpreter calls 
    # it when raised as a function, passing it any positional and named arguments"
    def __call__(self, *args, **kwargs):
        self.num_calls += 1
        print(f"This is executed {self.num_calls} times")
        return self.func(*args, **kwargs)
        

@CountClass
def say_hello():
    print('Hello')

say_hello()
say_hello()
say_hello()

This is executed 1 times
Hello
This is executed 2 times
Hello
This is executed 3 times
Hello


# Exceptions

## Basic information

The main difference of Errors from Exceptions that when Errors occurs code stops. Exceptions is the "planned Errors" i.e. raising of Exceptions don't breaks the code, through Exceptions you can point that programm should do when one or the other Error ocures.

Exceptions raised by command 'raise TypeOfHandledError('error message')'

In [46]:
# For example:

x = 5

if x > 2:
    raise Exception('Exception raised')

Exception: Exception raised

Also you can handle errors by 'assert' command

In [35]:
x = 5

# statement in round brackets should be 'False' to assertion - if x >= 5 then assertion
assert(x < 5), 'Error asserted'

AssertionError: Error asserted

'Exception' is the general type of exceptions. You can put at 'Exceptions' name of any existed in Python error or your custom Error class (about this below).

Also handling of Exceptions provide you tell to program what it should doing when Error occurs instead of just shut down code. In order to do it you should use 'try-except' construction.

In [None]:
try:
    x = 5
    x / 0
except ZeroDivisionError as e:
    # e contains the description of standard Python Error
    print(e)
    #below it the body of 'except' you point to your program what it should doing when ZeroDivisionError occurs
    x = x / 2
    print(x)

If you think that in your code can raised few Errors (more than 1) you can include all of them in 'try-except' construction. It works the same as 'elif' in Python - raises the first exception in the list of except.

In [None]:
try:
    #x = 5
    #x / 0
    
    a = 5 + "10"
except ZeroDivisionError as e:
    # e contains the description of standard python Error
    print(e)
    #below it the body of 'except' you point to your program what it should doing when ZeroDivisionError occurs
    x = x / 2
    print(x)
except TypeError as e:
    print(e)

Also in 'try-except' construction there are 'else', which execute when no one exception raised, and 'finally' which execute without dependence of raising of exceptions. Below two examples can be found.

In [None]:
try:
    x = 5
    x / 1
    a = 5 + 10
except ZeroDivisionError as e:
    # e contains the description of standard python Error
    print(e)
    #below it the body of 'except' you point to your program what it should doing when ZeroDivisionError occurs
    x = x / 2
    print(x)
except TypeError as e:
    print(e)
else:
    print('No one exception raised')
finally:
    print('I don\'t care about exceptions')

In [None]:
try:
    x = 5
    x / 1
    a = 5 + '10'
except ZeroDivisionError as e:
    # e contains the description of standard python Error
    print(e)
    #below it the body of 'except' you point to your program what it should doing when ZeroDivisionError occurs
    x = x / 2
    print(x)
except TypeError as e:
    print(e)
else:
    print('No one exception raised')
finally:
    print('I don\'t care about exceptions')

## Custom exception class

When you create custom exception class it should inherited from base Exception class.

In [None]:
class ValueBelowZeroError(Exception):
    pass
# If you simply print 'pass' in body of class, it will enough to create of custom type of error
# You already can use this type in code

def value_check(x):
    if x < 0:
        raise ValueBelowZeroError('Value should be greater than zero!')

try:
    x = -3
    value_check(x)
except ValueBelowZeroError as e:
    print(e)
    x = abs(x)
    print(x)

In [None]:
class ValueBelowZeroError(Exception):
    pass

# But maybe you want to create improvement functionality. Then you should honestly create your custom error type

class ValueGreaterThanThousandError(Exception):
    def __init__(self, value, message):
        self.message = message
        self.value = value
        
def value_check(x):
    if x > 1000:
        raise ValueGreaterThanThousandError(x, 'Value should be smaller than 1000!')
    elif x < 0:
        raise ValueBelowZeroError('Value should be greater than zero!')
        
try:
    x = 1001
    value_check(x)
except ValueGreaterThanThousandError as e:
    print(e.message, e.value)
    x =  e.value - 1000
    print(x)

# Logging

## General information and basic operations

For logging in Python there is 'logging' module

In [4]:
import logging

First of all, there are few levels of logging (in order of deepening):

* DEBUG - etailed information, typycally of interest only with diagnosis problems.
* INFO - Confirmation that things are working as expected.
* WARNING - An indication that something unexpected happened, or indicative of some problem in the near future (e.g. 'disk space low'). The software is still working as expected.
* ERROR - Due to a more serious problem, the software has not been able to perform some functions.
* CRITICAL - A serious error, indicating that the program itself may be unable to continue running.

In [2]:
# in logging module there is equivlent of print() method

logging.debug('This is a DEBUG message')
logging.info('This is an INFO message')
logging.warning('This is a WARNING message')
logging.error('This is an ERROR message')
logging.critical('This is a CRITICAL message')

# root is name of default logger
# is good practice to make your own logger as object

ERROR:root:This is a ERROR message
CRITICAL:root:This is a CRITICAL message


Default level of logging is WARNING. It means that logging module by default reproduces messages from WARNING level and depeer levels (ERROR and CRITICAL) and not reproduces messages from higher levels (INFO and DEBUG).

In order to change level of logging you can use logging.basicConfig().
This method accepts as parameters not only level of logging but also filename (logfile for writing of log messages), format (format of log message), datefmt (format of data and time in log message), e.t.c (for whole list see documentation of logging module).

In [2]:
# If you can see the same messages as block above, just restart jupyter kernel

logging.basicConfig(level=logging.DEBUG, 
                    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
                   datefmt='%m/%d/%Y %H:%M:%S')

logging.debug('This is a DEBUG message')
logging.info('This is an INFO message')
logging.warning('This is a WARNING message')
logging.error('This is an ERROR message')
logging.critical('This is a CRITICAL message')

07/11/2022 21:21:02 - root - DEBUG - This is a DEBUG message
07/11/2022 21:21:02 - root - INFO - This is an INFO message
07/11/2022 21:21:02 - root - ERROR - This is an ERROR message
07/11/2022 21:21:02 - root - CRITICAL - This is a CRITICAL message


If you give to logging.basicConfig() filename then messages will printed in this file, not in the console.

In [8]:
# If you can see the same messages as block above, just restart jupyter kernel

logging.basicConfig(filename='logging.log', level=logging.DEBUG, 
                    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
                   datefmt='%m/%d/%Y %H:%M:%S')

# Cleaning of the file before logging
with open('logging.log', 'w') as lf:
    pass

logging.debug('This is a DEBUG message')
logging.info('This is an INFO message')
logging.warning('This is a WARNING message')
logging.error('This is an ERROR message')
logging.critical('This is a CRITICAL message')

print("logging.log file opened: ")
with open('logging.log', 'r') as lf:
    print(lf.read())

logging.log file opened: 
07/11/2022 21:29:28 - root - DEBUG - This is a DEBUG message
07/11/2022 21:29:28 - root - INFO - This is an INFO message
07/11/2022 21:29:28 - root - ERROR - This is an ERROR message
07/11/2022 21:29:28 - root - CRITICAL - This is a CRITICAL message



You also can create your own logger and name it as you want.

In [18]:
# If you can see the same messages as block above, just restart jupyter kernel

logging.basicConfig(filename='logging.log', level=logging.DEBUG, 
                    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
                   datefmt='%m/%d/%Y %H:%M:%S')
logger1 = logging.getLogger('logger1')
logger2 = logging.getLogger('logger2')

# Cleaning of the file before logging
with open('logging.log', 'w') as lf:
    pass

logger1.debug('This is a DEBUG message')
logger1.info('This is an INFO message')
logger1.warning('This is a WARNING message')
logger1.error('This is an ERROR message')
logger1.critical('This is a CRITICAL message\n')

logger2.debug('This is a DEBUG message')
logger2.info('This is an INFO message')
logger2.warning('This is a WARNING message')
logger2.error('This is an ERROR message')
logger2.critical('This is a CRITICAL message\n')

print("As you can see you have two loggers objects with different names now")
print("logging.log file opened: \n")
with open('logging.log', 'r') as lf:
    print(lf.read())

As you can see you have two loggers objects with different names now
logging.log file opened: 

07/11/2022 21:49:03 - logger1 - DEBUG - This is a DEBUG message
07/11/2022 21:49:03 - logger1 - INFO - This is an INFO message
07/11/2022 21:49:03 - logger1 - ERROR - This is an ERROR message
07/11/2022 21:49:03 - logger1 - CRITICAL - This is a CRITICAL message

07/11/2022 21:49:03 - logger2 - DEBUG - This is a DEBUG message
07/11/2022 21:49:03 - logger2 - INFO - This is an INFO message
07/11/2022 21:49:03 - logger2 - ERROR - This is an ERROR message
07/11/2022 21:49:03 - logger2 - CRITICAL - This is a CRITICAL message




Let's implement the simple example of logging usage - logging in class Point

In [14]:
# If you can see the same messages as block above, just restart jupyter kernel

class Point:
    def __init__(self, x, y):
        logging.basicConfig(filename='class_Point.log', level=logging.DEBUG, 
                    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
                   datefmt='%m/%d/%Y %H:%M:%S')
        self.x = x
        self.y = y
        logging.info(f'Point with cartesian coordinates x={self.x} and y={self.y} created')

# Cleaning of the file before logging
with open('logging.log', 'w') as lf:
    pass        

P1 = Point(5, 5)
P2 = Point(15, 4)
P3 = Point(-7, 13)
        
print("logging.log file opened: ")
with open('logging.log', 'r') as lf:
    print(lf.read())

logging.log file opened: 
07/11/2022 21:40:59 - root - INFO - Point with cartesian coordinates x=5 and y=5 created
07/11/2022 21:40:59 - root - INFO - Point with cartesian coordinates x=15 and y=4 created
07/11/2022 21:40:59 - root - INFO - Point with cartesian coordinates x=-7 and y=13 created



This concludes the narration about basic operations.

## Handler objects

These objects allow us to divide threads between, for example log file and console. In general handlers allows separate threads for your purposes and create custom loggers using handlers as blocks of loggers.

In [1]:
import logging

logger = logging.getLogger('custom_logger')

# create handler
stream_h = logging.StreamHandler() # deadlock to the console (stream handler)
file_h = logging.FileHandler('logging.log') #deadlock to the file (file handler)

# level
stream_h.setLevel(logging.WARNING) # only warnings, errors and criticals goes to console
file_h.setLevel(logging.ERROR) # only errors and criticals goes to log file

# format
formatter_stream = logging.Formatter("%(name)s - %(levelname)s - %(message)s") # format of logging in console
formatter_file = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s') # format of logging in log file
stream_h.setFormatter(formatter_stream)
file_h.setFormatter(formatter_file)

# consructed logger from the handlers
logger.addHandler(stream_h)
logger.addHandler(file_h)

# Cleaning of the file before logging
with open('logging.log', 'w') as lf:
    pass        

logger.warning('This message will be only in console')
logger.warning('This message will be only in console too')
logger.error('This message will be in console and in file')
logger.error('This message will be in console and in file too')

print("logging.log file opened: ")
with open('logging.log', 'r') as lf:
    print(lf.read())

custom_logger - ERROR - This message will be in console and in file
custom_logger - ERROR - This message will be in console and in file too


logging.log file opened: 
2022-07-11 22:40:30,884 - custom_logger - ERROR - This message will be in console and in file
2022-07-11 22:40:30,885 - custom_logger - ERROR - This message will be in console and in file too



And then you can use this custom logger in your programs as you want.

In [2]:
class Point:
    def __init__(self, x, y):
        logging.basicConfig(filename='class_Point.log', level=logging.DEBUG, 
                    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
                   datefmt='%m/%d/%Y %H:%M:%S')
        self.x = x
        self.y = y
        if self.x < 0:
            logger.warning(f'Point has x<0 (x={self.x})')
        if self.y < 0:
            logger.warning(f'Point has y<0 (y={self.y})')
            
        if self.x > 1000:
            logger.error(f'x coordinate must be <= 1000 but x={self.x}')
        if self.y > 1000:
            logger.error(f'y coordinate must be <= 1000 but y={self.y}')

# Cleaning of the file before logging
with open('logging.log', 'w') as lf:
    pass        

P1 = Point(5000, 5)
P2 = Point(15, 4000)
P3 = Point(-7, 13)
P4 = Point(12, 22)
        
print("logging.log file opened: ")
with open('logging.log', 'r') as lf:
    print(lf.read())

custom_logger - ERROR - x coordinate must be <= 1000 but x=5000
custom_logger - ERROR - y coordinate must be <= 1000 but y=4000


logging.log file opened: 
2022-07-11 22:45:52,955 - custom_logger - ERROR - x coordinate must be <= 1000 but x=5000
2022-07-11 22:45:52,957 - custom_logger - ERROR - y coordinate must be <= 1000 but y=4000



And it's good practice to create your own logger as module in other file and then imported it in new project.

## Configuration files for custom loggers

Above we used logging.basicConfig() to setting our logger directly in program code. But we can use the files configs or dictionary configs. Because the dictionary configs use not often we considering here file configs.

Such file has .conf or .ini extension.

In [3]:
with open('logger_config.ini', 'r') as lf:
    print(lf.read())

[loggers]
keys=root,simpleExample

[handlers]
keys=consoleHandler, fileHandler

[formatters]
keys=simpleFormatter

[logger_root]
level=DEBUG
handlers=

[logger_simpleExample]
level=DEBUG
handlers=consoleHandler, fileHandler
qualname=simpleExample
propogate=0

[handler_consoleHandler]
class=StreamHandler
level=DEBUG
formatter=simpleFormatter
args=(sys.stdout,)

[handler_fileHandler]
class=FileHandler
level=DEBUG
formatter=simpleFormatter
args=('logging.log', 'w')

[formatter_simpleFormatter]
format=%(asctime)s - %(name)s - %(levelname)s - %(message)s
datefmt=%Y-%m-%d %H:%M:%S



In [loggers] stored names of loggers,which you can use. If you use logging.config.fileConfig() but name which you give to constructor does not exist in [loggers] then will use the root logger.

All the handlers from [handlers] and all the formatters form [formatters] must be described below in correspondingly sections.

Example of usage:

In [2]:
import logging
import logging.config

logging.config.fileConfig('logger_config.ini') # Load configs for logger from file

logger = logging.getLogger('simpleExample')

# and then you can use the logger as you want
class Point:
    def __init__(self, x, y):
        logging.basicConfig(filename='class_Point.log', level=logging.DEBUG, 
                    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
                   datefmt='%m/%d/%Y %H:%M:%S')
        self.x = x
        self.y = y
        if self.x < 0:
            logger.warning(f'Point has x<0 (x={self.x})')
        if self.y < 0:
            logger.warning(f'Point has y<0 (y={self.y})')
            
        if self.x > 1000:
            logger.error(f'x coordinate must be <= 1000 but x={self.x}')
        if self.y > 1000:
            logger.error(f'y coordinate must be <= 1000 but y={self.y}')

print('Logs from console: ')
P1 = Point(5000, 5)
P2 = Point(15, 4000)
P3 = Point(-7, 13)
P4 = Point(12, 22)
print()
        
print("logging.log file opened: ")
with open('logging.log', 'r') as lf:
    print(lf.read())

Logs from console: 
2022-07-11 23:44:29 - simpleExample - ERROR - x coordinate must be <= 1000 but x=5000
2022-07-11 23:44:29 - simpleExample - ERROR - y coordinate must be <= 1000 but y=4000

logging.log file opened: 
2022-07-11 23:44:29 - simpleExample - ERROR - x coordinate must be <= 1000 but x=5000
2022-07-11 23:44:29 - simpleExample - ERROR - y coordinate must be <= 1000 but y=4000



## Logging with exceptions

We also can logged exceptions with help of logging module

If we know what kind of exception we are looking for and want to see it in logs:

In [1]:
import logging

# For simplification we use the standard logger here
try:
    a = [1,2,3]
    val = a[4]
except IndexError as e:
    logging.error(e)

ERROR:root:list index out of range


In [2]:
# Also if we want to see the details about exceptions in logs we should use exc_info=True as additional parameter 

# For simplification we use the standard logger here
try:
    a = [1,2,3]
    val = a[4]
except IndexError as e:
    logging.error(e, exc_info=True)

ERROR:root:list index out of range
Traceback (most recent call last):
  File "<ipython-input-2-b4d9a7fe1be0>", line 6, in <module>
    val = a[4]
IndexError: list index out of range


If we don't know what exception we are looking for but we want to see it in our logs then we additionally need traceback module:

In [3]:
import logging
import traceback

# For simplification we use the standard logger here
try:
    a = [1,2,3]
    val = a[4]
except:
    logging.error('The error is %s', traceback.format_exc())

ERROR:root:The error is Traceback (most recent call last):
  File "<ipython-input-3-81ca53e072b2>", line 7, in <module>
    val = a[4]
IndexError: list index out of range



## RotatingFileHandler

This file handler allows to limit of created log file by size. It can be seful if your program generate many log information.
If file size exceed the value which you give to constructor then log file closed and created new log file and over and over again while program getegate log information.

In [1]:
import logging
from logging.handlers import RotatingFileHandler

logger = logging.getLogger('Rot_logger')
logger.setLevel(logging.INFO)

# roll over 2KB, and keep backup logs RotatingFileHandler.log.1, RotatingFileHandler.log.2 etc.
handler = RotatingFileHandler('RotatingFileHandler.log', maxBytes=2000, backupCount=5)
# backupCount is parameter which tells how many files will be created in process of logging

logger.addHandler(handler)

for i in range(10000):
    logger.info('Hello, world!')

In [4]:
import os

log_lts = [x for x in os.listdir() if 'RotatingFileHandler' in x]
log_lts

['RotatingFileHandler.log',
 'RotatingFileHandler.log.1',
 'RotatingFileHandler.log.2',
 'RotatingFileHandler.log.3',
 'RotatingFileHandler.log.4',
 'RotatingFileHandler.log.5']

## TimedRotatingFileHandler

This file handler also allows to limit of created log file by time of work of program. It can be useful if your program works during the long period of time and you want to divide logs by timing. If time of program work exceed the value which you give to constructor then log file closed and created new log file and over and over again while program works.

In [1]:
import logging
import time
from logging.handlers import TimedRotatingFileHandler

logger = logging.getLogger('Rot_logger')
logger.setLevel(logging.INFO)

# Avaliable timings:
# s - seconds
# m - minutes
# h - hours
# d - days
# midnight
# w0

handler = TimedRotatingFileHandler('TimedRotatingFileHandler.log', when='s', interval=5, backupCount=5)
# backupCount is parameter which tells how many files will be created in process of logging

logger.addHandler(handler)

for _ in range(6):
    logger.info('Hello, world!')
    time.sleep(5) # programm stops for 5 second

In [2]:
import os

log_lts = [x for x in os.listdir() if 'TimedRotatingFileHandler' in x]
log_lts

['TimedRotatingFileHandler.log',
 'TimedRotatingFileHandler.log.2022-07-12_00-37-09',
 'TimedRotatingFileHandler.log.2022-07-12_00-37-14',
 'TimedRotatingFileHandler.log.2022-07-12_00-37-19',
 'TimedRotatingFileHandler.log.2022-07-12_00-37-24',
 'TimedRotatingFileHandler.log.2022-07-12_00-37-29']

# JSON

JSON (JavaScript Object Notation) is an open standard file format and data interchange format that uses human-readable text to store and transmit data objects consisting of attribute–value pairs and arrays (or other serializable values). It is a common data format with diverse uses in electronic data interchange, including that of web applications with servers.

JSON is a language-independent data format. It was derived from JavaScript, but many modern programming languages include code to generate and parse JSON-format data. JSON filenames use the extension .json.

You can transformate dictionary to JSON object and write in down to file.

It is important that transforming of dictionary to JSON is particular type of serialization (aka Encoding). Reverse transform is deserialization (aka Decoding). Thus it in essence is standatd picking and unpicking. Thus, here the same methods as in standard serialization works.

In [11]:
import json

## Creation of dict with data

In [10]:
datadict = {
    'firstname': 'Jane',
    'lastname': 'Doe',
    'hobbies': ['running', 'swiming', 'singing'],
    'age': 28,
    'hasChildren': True,
    'children':
        [
            {
                'firstname': 'Alex',
                'age': 5
            },
            
            {
                'firstname': 'Bob',
                'age': 7
            }
        ]
    
}

## json.dump()

It returns the stream of JSON format which writes down with method fp.write(). dump() method takes the following parameters:

* obj - Python object
* fp - stream of JSON format
* skipkeys=False - ignoring of unknown types of keys in dicts
* ensure_ascii=True - shielding (экранирование) not-ASCII symbols
* allow_nan=True - representation of values of nan, inf, -inf in JSON
* cls=None - method for serialization of additional types
* indent=None - amount of indents while serialization
* separators=None - separators using in JSON (it should be a tuple when use)
* default=None - custom function for objects which can't be serialized
* sort_keys=False - soft of keys in output

In [9]:
with open("example.json", "w") as ej:
    json.dump(datadict, ej, indent=4)
    
with open("example.json", "r") as ej:
    print(ej.read())

{
    "firstname": "Jane",
    "lastname": "Doe",
    "hobbies": [
        "running",
        "swiming",
        "singing"
    ],
    "age": 28,
    "hasChildren": true,
    "children": [
        {
            "firstname": "Alex",
            "age": 5
        },
        {
            "firstname": "Bob",
            "age": 7
        }
    ]
}


## json.dumps()


The dumps() function of the json module serializes the Python object obj into a JSON str file. dumps() method takes the following parameters:

* obj - Python object
* skipkeys=False - ignoring of unknown types of keys in dicts
* ensure_ascii=True - shielding (экранирование) not-ASCII symbols
* allow_nan=True - representation of values of nan, inf, -inf in JSON
* cls=None - method for serialization of additional types
* indent=None - amount of indents while serialization
* separators=None - separators using in JSON (it should be a tuple when use)
* default=None - custom function for objects which can't be serialized
* sort_keys=False - soft of keys in output

In [17]:
personJSON = json.dumps(datadict, indent=4)
print(personJSON)
print()
print("Type of dumps() object: ", type(personJSON))

{
    "firstname": "Jane",
    "lastname": "Doe",
    "hobbies": [
        "running",
        "swiming",
        "singing"
    ],
    "age": 28,
    "hasChildren": true,
    "children": [
        {
            "firstname": "Alex",
            "age": 5
        },
        {
            "firstname": "Bob",
            "age": 7
        }
    ]
}

Type of dumps() object:  <class 'str'>


## json.load()

This method returns the Python object from JSON. For example, you can decode your .json file to dictionary. load() method takes the following parameters:

* fp - stream of JSON format
* cls=None - custom subclass JSONDecoder

* object_hook=None - custom function for transforming every literal of dictionary
* parse_float=None - custom function for transforming of literals similar to float
* parse_int=None - custom function for transforming of literals similar to int
* parse_constant=None - custom function for transforming of literals - Infinity, Infinit и NaN,
* object_pairs_hook=None - custom function for transforming of literals which decoded literals decode by list of ordered pairs

In [19]:
with open("example.json", "r") as ej:
    dict_of_json = json.load(ej)
    
print(dict_of_json)

{'firstname': 'Jane', 'lastname': 'Doe', 'hobbies': ['running', 'swiming', 'singing'], 'age': 28, 'hasChildren': True, 'children': [{'firstname': 'Alex', 'age': 5}, {'firstname': 'Bob', 'age': 7}]}


## json.loads()

The loads() function of the json module converts a JSON string into a Python object. loads() method takes the following parameters:

* s - string of JSON format (it creates after dumps() method)
* cls=None - custom subclass JSONDecoder
* object_hook=None - custom function for transforming every literal of dictionary
* parse_float=None - custom function for transforming of literals similar to float
* parse_int=None - custom function for transforming of literals similar to int
* parse_constant=None - custom function for transforming of literals - Infinity, Infinit и NaN,
* object_pairs_hook=None - custom function for transforming of literals which decoded literals decode by list of ordered pairs

In [22]:
personJSON = json.dumps(datadict, indent=4)

str_of_json = json.loads(personJSON)
print(str_of_json)

{'firstname': 'Jane', 'lastname': 'Doe', 'hobbies': ['running', 'swiming', 'singing'], 'age': 28, 'hasChildren': True, 'children': [{'firstname': 'Alex', 'age': 5}, {'firstname': 'Bob', 'age': 7}]}


There are the following rules of transformation of datatypes from Python to JSON:

In [3]:
import pandas as pd

tabledata = [
         ["dict", 'object'],
         ["list, tuple",'array'],
         ["str",'string'],
         ["int, long, float",'number'],
         ["True",'true'],
         ["False",'false'],
         ["None",'null']
            ]

pd.DataFrame(tabledata, columns=["Python", "JSON"])

Unnamed: 0,Python,JSON
0,dict,object
1,"list, tuple",array
2,str,string
3,"int, long, float",number
4,True,true
5,False,false
6,,


Let's consider the more complicated examples.

## Class of JSON serialization

In [1]:
import json

class User:
    def __init__(self, name, age):
        self.name = name
        self.age = age

user = User('Mike', 22)
jsonObject = json.dumps(user)

TypeError: Object of type User is not JSON serializable

If you execute the code above, you can see that object of User class is not be able to be serialize. For serializing of custom objects you should create the custom encoding function for serialization.

In [4]:
import json

class User:
    def __init__(self, name, age):
        self.name = name
        self.age = age
        
def custom_encoding_function(obj):
    if isinstance(obj, User):
        # obj.__class__.__name__: True is manual check for using of right class User (it is not necessarily)
        return {'name': obj.name, 'age': obj.age, obj.__class__.__name__: True}
    else:
        raise TypeError(f'Object of type {o} is not JSON serializable')

user = User('Mike', 22)
jsonObject = json.dumps(user, default=custom_encoding_function)

jsonObject

'{"name": "Mike", "age": 22, "User": true}'

You also can use JSONEncoder module for json-encoding of custom objects. Then your custom class will be enheritance of JSONEncoder class. And your encoding functions will be overriding of "default" function within class itself.

In [10]:
import json

class User:
    def __init__(self, name, age):
        self.name = name
        self.age = age
        
def custom_encoding_function(obj):
    if isinstance(obj, User):
        # obj.__class__.__name__: True is manual check for using of right class User (it is not necessarily)
        return {'name': obj.name, 'age': obj.age, obj.__class__.__name__: True}
    else:
        raise TypeError(f'Object of type {o} is not JSON serializable')

from json import JSONEncoder
        
class UserEncoder(JSONEncoder):
    def default(self, obj):
        if isinstance(obj, User):
            # obj.__class__.__name__: True is manual check for using of right class User (it is not necessarily)
            return {'name': obj.name, 'age': obj.age, obj.__class__.__name__: True}
        else:
            JSONEncoder.default(self, o)
            
# Then you should give to your 'dump' function the class UserEncoder (cls) instead of custom function as in example above

user = User('Mike', 22)
jsonObject = json.dumps(user, cls=UserEncoder)

print("json.dumps(): ", jsonObject)

# Or you can use instead:

userJSON = UserEncoder().encode(user)
print()
print("UserEncoder().encode: ", jsonObject)

json.dumps():  {"name": "Mike", "age": 22, "User": true}

UserEncoder().encode:  {"name": "Mike", "age": 22, "User": true}


If you want to load the JSON object which was received from custom object you still can use the standard 'load' or 'loads' methods.

In [14]:
import json

class User:
    def __init__(self, name, age):
        self.name = name
        self.age = age
        
def custom_encoding_function(obj):
    if isinstance(obj, User):
        # obj.__class__.__name__: True is manual check for using of right class User (it is not necessarily)
        return {'name': obj.name, 'age': obj.age, obj.__class__.__name__: True}
    else:
        raise TypeError(f'Object of type {o} is not JSON serializable')

user = User('Mike', 22)
jsonObject = json.dumps(user, default=custom_encoding_function)
print(jsonObject, type(jsonObject))

# Now we have the custom json string

user_dict = json.loads(jsonObject)
print(user_dict, type(user_dict))

# As you can see, decoding were into a dictionary

{"name": "Mike", "age": 22, "User": true} <class 'str'>
{'name': 'Mike', 'age': 22, 'User': True} <class 'dict'>


But for json-decoding in your custom objects also use custom decoding function or ready module in json library as in encoding approach.

In [19]:
import json

class User:
    def __init__(self, name, age):
        self.name = name
        self.age = age

def custom_encoding_function(obj):
    if isinstance(obj, User):
        # obj.__class__.__name__: True is manual check for using of right class User (it is not necessarily)
        return {'name': obj.name, 'age': obj.age, obj.__class__.__name__: True}
    else:
        raise TypeError(f'Object of type {o} is not JSON serializable')

user = User('Mike', 22)
jsonObject = json.dumps(user, default=custom_encoding_function)
print("Decoded json string: ", jsonObject)
print()

# This custom function only for User class and for decoding into object of this class
def decode_user(dct):
    if User.__name__ in dct:
        return User(name=dct['name'], age=dct['age'])
    else: dct
        
user_obj = json.loads(jsonObject, object_hook=decode_user)
print('Decoding to object of class User')
print("User.name: ", user_obj.name)
print("User.age: ", user_obj.age)

Decoded json string:  {"name": "Mike", "age": 22, "User": true}

Decoding to object of class User
User.name:  Mike
User.age:  22


# Random numbers

In [None]:
import random
import secrets
import numpy as np

## Pseudo random numbers

Such numbers contains in 'random' module

In [None]:
# The random float number from range [0, 1] uniformly distributed
r = random.random()
r

In [None]:
a = 1
b = 10

# The random float number from range [a, b] uniformly distributed
r = random.uniform(a, b)
r

In [None]:
a = 1
b = 10

# The random integer number from range [a, b) uniformly distributed
r1 = random.randint(a, b)

# The random integer number from range [a, b] uniformly distributed
r2 = random.randrange(a, b)

print("r1 = ", r1)
print("r2 = ", r2)

In [None]:
mu = 0
sigma = 1

# The random float number, Gausian distribution with mu and sigma
r = random.normalvariate(mu, sigma)
r

In [None]:
lst = list("ABCDEFGHIJK")

# Random element of iterable collection lst
r = random.choice(lst)
r

## Truly random numbers

Such numbers contains in 'secrets' module

In [None]:
N = 10

# Truly random non-negative integer number which smaller than N
r = secrets.randbelow(N)
r

In [None]:
N = 5

# Truly random number whih consists of binary bits 
# N is the number of random bits, for example if N = 4 we have 1010 or 0111 or something similar, but from 4 bits
# if N = 5, then, for example 11001

# And then this number transform to decimal

r = secrets.randbits(N)
r

In [None]:
lst = list("ABCDEFGHIJK")

# Truly random element of iterable collection lst (not reprodusible)
r = secrets.choice(lst)
r

## Sequences

In [None]:
lst = list("ABCDEFGHIJK")

N = 3
# N random elements of iterable collection lst
# This is unique set of elements (without repetitions of elements of initial collection)
r = random.sample(lst, N)
r

In [None]:
lst = list("ABCDEFGHIJK")

k = 3
# k random elements of iterable collection lst
# This is not unique set of elements (with repetitions of elements of initial collection)
r = random.choices(lst, k=k)
r

In [None]:
lst = list("ABCDEFGHIJK")

# Shuffles elements of the initial collection
random.shuffle(lst)
lst

## Seed

Seed is the method which uses in order to make random numbers reproducible.

For example, you want to reuse the same random numbers without creation of new variables, i.e. you need to your random numbers just created the same. And it's exactly what seed() does. Random numbers, which initialized by 'random' module with the same seed will the same numbers. Moreover, seed() uses to store a random method for generating the same random numbers when using the code multiple times on the same or different machines.

Seed is important for computer security because it pseudo-randomly generates a secure private encryption key. Thus, by using a custom initial value, you can initialize the secure pseudo-random number generator in the place you want. It is nit recomended to use in order to provide security.

In [None]:
random.seed(1)
print("seed = 1")
print(random.random())
print(random.randint(1, 10),"\n")

random.seed(2)
print("seed = 2")
print(random.random())
print(random.randint(1, 10),"\n")

random.seed(1)
print("seed = 1")
print(random.random())
print(random.randint(1, 10),"\n")

random.seed(2)
print("seed = 2")
print(random.random())
print(random.randint(1, 10))

## Arrays

Work with random arrays provide wih help of 'numpy.random' module 

In [None]:
N = 3

# Create numpy array with N random float numbers
r = np.random.rand(N)
r

In [None]:
N = 2
M = 3

# Create numpy array with NxM random float numbers
r = np.random.rand(N, M)
r

In [None]:
N = 2
M = 3

a = 1
b = 10

# Create numpy array with NxM random integer numbers from range [a, b]
r = np.random.randint(a, b, (N, M))
r

In [None]:
arr = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

np.random.shuffle(arr)

# elements of raws never switch between each other, but raws switch

arr

In numpy.random also uses the seed

In [None]:
np.random.seed(1)

r = np.random.rand(N, M)
print(r, "\n")

np.random.seed(1)

r = np.random.rand(N, M)
print(r)

# Shallow vs Deep copying

First of all, copying can do with help of assignment operator. For non-mutable types it is no problem:

In [17]:
original = 5
copy = original
# then we change the copy
copy = 6
print("original =",original)
print('copy =', copy)

# here no problem because of re-assignment

original = 5
copy = 6


But you must be careful with mutable datatypes (for example, with lists):

In [18]:
original = [1,2,3,4,5,6]
copy = original
# then we change the copy
copy.append(7)
print("original =", original)
print('copy =', copy)

# As you can see, changing of copy provides changing of original
# This is because in this case copy is not actual copy - original and copy point on the same memory area

original = [1, 2, 3, 4, 5, 6, 7]
copy = [1, 2, 3, 4, 5, 6, 7]


For copying of mutable custom objects or standard datatypes you need use the copy module:

In [20]:
import copy

## Shallow copy

One level deep, it only copies references of nested child objects

In [21]:
import copy

original = [1,2,3,4,5,6]
copy = copy.copy(original)
# then we change the copy
copy.append(7)
print("original =", original)
print('copy =', copy)

original = [1, 2, 3, 4, 5, 6]
copy = [1, 2, 3, 4, 5, 6, 7]


As you can see above original is not affected by changing of copy.

In [24]:
import copy

# There is few ways to create shallow copy:

original = [1,2,3,4,5,6]
copy1 = copy.copy(original)
copy2 = original.copy()
copy3 = list(original)
copy4 = original[:]
# then we change the copy
copy1.append(7)
copy2.append(8)
copy3.append(9)
copy4.append(10)
print("original =", original)
print('copy1 =', copy1)
print('copy2 =', copy2)
print('copy3 =', copy3)
print('copy4 =', copy4)

original = [1, 2, 3, 4, 5, 6]
copy1 = [1, 2, 3, 4, 5, 6, 7]
copy2 = [1, 2, 3, 4, 5, 6, 8]
copy3 = [1, 2, 3, 4, 5, 6, 9]
copy4 = [1, 2, 3, 4, 5, 6, 10]


Shallow copy works good if our object has only one level deep - if it have no nested levels (for example, nested list has the first level as list and have other nested levels as lists too).

In [26]:
import copy

original = [[1],[2],[3],[4],[5],[6]]
# shallow copy of nested list
copy = copy.copy(original)
copy[0].append(1)

print("original =", original)
print('copy =', copy)

original = [[1, 1], [2], [3], [4], [5], [6]]
copy = [[1, 1], [2], [3], [4], [5], [6]]


Above you can see that original affected by changing of copy again! For object with nested structure you must use the deep copies.

## Deep copy

Full independent copy

In [27]:
import copy

original = [[1],[2],[3],[4],[5],[6]]
# shallow copy of nested list
copy = copy.deepcopy(original)
copy[0].append(1)

print("original =", original)
print('copy =', copy)

original = [[1], [2], [3], [4], [5], [6]]
copy = [[1, 1], [2], [3], [4], [5], [6]]


In [29]:
import copy

original = [[[[1]]],[2],[3],[4],[5],[6]]
# shallow copy of nested list
copy = copy.deepcopy(original)
copy[0][0][0].append(1)

print("original =", original)
print('copy =', copy)

original = [[[[1]]], [2], [3], [4], [5], [6]]
copy = [[[[1, 1]]], [2], [3], [4], [5], [6]]


You can see above that doesn't matter how many nested layers object has. Deep copy create full independent copy in another area of memory.

This works for objects of standard Python classes and for custom objects.

You must remember the following rules of copying usage in Python:
* Use shallow copy if your object has only one layer in his structure (it has no nested layers - including object layers access to which you can receive by dot)
* Use deep copy if your object has at least one nested layer in his structure (including object layers access to which you can receive by dot)

# Context managers

Context managers allow you to allocate and release resources precisely when you want to. The most widely used example of context managers is the with statement ('with open' for files and 'with lock' for Lock() object for multiprocessing/multithreading). Suppose you have two related operations which you’d like to execute as a pair, with a block of code in between. Context managers allow you to do specifically that.

In [1]:
# reading files without context managers

file = open("text.txt", "r")
try:
    file.read()
finally:
    file.close()

In [None]:
# reading files with context managers

with open("text.txt", "r")
    file.read()

After comparison of two examples above of reading of file you can make a conclusion about how context managers makes your code shorter and more clear.

But in Python you can implement your own custom context manager. For example, let it be context manager for files too:

In [4]:
class ManagedFile():
    def __init__(self, filename):
        print("Init")
        self.filename = filename
    
    # this method will be executed as soon as you enter 'with' statement
    # then here you must allocate your resourses
    def __enter__(self):
        print("Enter")
        self.file = open(self.filename, "r")
        return self.file
    
    # here you make sure that you correctly close file
    def __exit__(self, exc_type, exc_value, exc_traceback):
        if self.file:
            self.file.close()
        #exeption handling:
        if exc_type is not None:
            print("Some exception has been handled")
        print("Exit")
        return True
        

# now use our context manager:

with ManagedFile("text.txt") as file:
    print(file.read())

# As you can see it works absolutely correct

print()
# Work with exception
with ManagedFile("text.txt") as file:
    # there is no such method in class - exception raised 
    print(file.somemethod())


Init
Enter
agshagdhsdhgsadhga
sadfasgdafg
asgsadgsadg123124
sa gaadfgdfdsfha
gSGDFGDAHSFHAD
sdf
dfg
dffhfgh
Exit

Init
Enter
Some exception has been handled
Exit


Also you can use the 'contextlib' library for creation of context managers. Usage of this library based on using of decorators of functions.

In [6]:
from contextlib import contextmanager

# use decorator of contextmanager
@contextmanager
def some_file_manager(filename):
    f = open(filename, "r")
    try:
        # this is generator
        # it will first make sure that allocation of resources correct 
        yield f
        # by yielding it will temporary suspend its own execution
        # so we can continue use this file (do some operations) 
    finally:
        # and at the end we close the file
        f.close()
        
# and now we can use the context manager with name some_file_manager
with some_file_manager("text.txt") as file:
    print(file.read())
    
# As you can see it works absolutely correct

agshagdhsdhgsadhga
sadfasgdafg
asgsadgsadg123124
sa gaadfgdfdsfha
gSGDFGDAHSFHAD
sdf
dfg
dffhfgh


# Multithreading and multiprocessing

## Processes and Threads

First of all, it is important to understand difference between processes and threads. And then it is important to understand advantages and disadvantages both of them. Processes and threads are related to each other, but they have significant differences.

A process is an instance of a program at runtime, an independent entity that is allocated system resources (such as CPU time and memory). Each process runs in a separate address space: one process cannot access the variables and data structures of another. If a process wants to access other people's resources, it must use inter-process communication. These can be pipelines, files, communication channels between computers, and much more.

A thread uses the same stack space as a process, and multiple threads share their state data. As a rule, each thread can work (read and write) with the same area of memory, unlike processes, which cannot simply access the memory of another process. Each thread has its own registers and its own stack, but other threads can use them.

A thread is a specific way of executing a process. When one thread changes a resource in a process, the change is immediately visible to other threads in that process.

In other words, process is some program at runtime condition, and thred is some handler within process.

As a summary it can be noticed the following advantages and disadvantages of using of processes and threads:

### Processes

Process: An instance of a program (e.g. a Python interpreter at runtime condition)

Advantages of multiprocessing:
* Takes advantage of multiple CPUs and cores (you can execute your code on multiple CPUs parallel)
* Processes uses different memory spaces thus memory is not shared between processes (corollary: new processes started independently from other processes)
* Multiprocessing provides using of multiple CPUs for execution of multiple programs that speeds up your code - it great for CPU-bound processing
* Processes are interruptable\killable
* One GIL (Global Interpreter Lock - mutex in other words) for each process thus it avoids GIL limitation (In other words, Deadlock is not possible with processes).


Disadvantages of multiprocessing:
* Process consumes more memory than thread - Heavyweight
* Starting a process is slower than starting a thread - memory sharing between processes is not so easy
* IPC (inter-process communication) is more complicated

### Threads

Threads: An entity within a process that can be scheduled (aka "leightweight process"). A process can spawn multiple threads.

Advantages of multithreading:
* All threads within process share the same memory
* Thread consumes more memory than process - Leightweight
* Starting a thread is faster than starting a process
* Threads great for I/O(input-output)-bound task

Disadvantages of multithreading:
* Threading is limited by GIL (Global Interpreter Lock - mutex in other words): Only one thread at a time when it catched by GIL (nessesarity of using mutexes).
* Multithreading has no effect for CPU-bound tasks
* Threads not interruptable/killable
* You must be careful with situations when few threads may modify the same object or variable or some other area of memory (race conditions) - here you must synchronize threads and use mutexes. Otherwise, you risk causing a bug or crush of your program.
* For two threads Deadlock is possible when order of mutexes catching is wrong.

### GIL (Global Interpreter Lock)

It is a lock that allows only one thread at a time to execute in Python (in C++ it corresponds to lock() method in mutexes)

GIL is needed in CPython because memory management is not thread-safe there (this is inheritance of mutexes from C++).

You should avoid of using GIL when:
* Using multiporcessing
* Using a different, free-threads Python implementation (Jyton, IronPython)
* Using Python as wrapper for third-party libraries (C/C++) such as numpy or scipy.

## Multithreading vs Multiprocessing

Here we consider simple examples of using of Multithreading and Multiprocessing.

In [3]:
# Multiprocessing

from multiprocessing import Process
import os

def square_numbers(N):
    for i in range(N):
        i*i

processes = []
# number of processes preferable take equals to cpu cores number
num_of_processes = os.cpu_count()
N = 100
for i in range(num_of_processes):
    # target is a callable object or a function which must be executed within this process
    #args is always a tuple - this is parameters of function
    p = Process(target=square_numbers, args=(N,))
    processes.append(p)
    
#start
for p in processes:
    #you need to start the particular process manual
    p.start()

#join - it is equivalent of join() method in C++
#it tells to main thread of program that it must wait to execution of all the processes
for p in processes:
    p.join()

print('End main thread')

End main thread


In [4]:
# Multithreading

from threading import Thread
import os

def square_numbers(N):
    for i in range(N):
        i*i

threads = []

num_of_threads = 10
N = 100
for i in range(num_of_threads):
    # target is a callable object or a function which must be executed within this process
    #args is always a tuple - this is parameters of function
    th = Thread(target=square_numbers, args=(N,))
    threads.append(th)
    
#start
for th in threads:
    #you need to start the particular process manual
    th.start()

#join - it is equivalent of join() method in C++
#it tells to main thread of program that it must wait to execution of all the processes
for th in threads:
    th.join()

print('End main thread')

End main thread


## Multithreading

Here we consider more details about multithreading in Python.

### Dividing of work between threads

In [12]:
from threading import Thread, current_thread
import time
from datetime import datetime

def Worker1(a, b):
    time.sleep(1.2)
    print(f"This is the {current_thread().name}, a+b={a+b}")
    
def Worker2():
    time.sleep(1.3)
    print(f"This is the {current_thread().name}")

print("Calling of functions in main thread:")
nt = datetime.now()
Worker1(1, 4)
Worker2()
print(f"Execution time={datetime.now()-nt}")
print()


print("Calling of functions in different threads:")
nt = datetime.now()
th1 = Thread(target=Worker1, args=(1, 4))
th2 = Thread(target=Worker2)

th1.start()
th2.start()

th1.join()
th2.join()
print(f"Execution time={datetime.now()-nt}")

Calling of functions in main thread:
This is the MainThread, a+b=5
This is the MainThread
Execution time=0:00:02.521849

Calling of functions in different threads:
This is the Thread-14, a+b=5
This is the Thread-15
Execution time=0:00:01.322589


As you can see above, there is significant advance in execution time.

### Using of GIL when race condition (Locks of threads in Python)

In [11]:
from threading import Thread, Lock
import time

database_value = 0

def increase(lock):
    global database_value
    local_copy = database_value
    local_copy += 1
    time.sleep(0.1)
    
    database_value = local_copy

print("Start database value", database_value)

lock = Lock()

#here database value increases
th1 = Thread(target=increase, args=(lock,))
th2 = Thread(target=increase, args=(lock,))

th1.start()
th2.start()

th1.join()
th2.join()

print("End database value", database_value)

Start database value 0
End database value 1


Result above is not expected result because of race condition occurs

In [7]:
from threading import Thread, Lock
import time

database_value = 0

def increase(lock):
    global database_value
    #because of this function modifies the value
    #we must use the locks of threads (mutexes)
    lock.acquire()
    local_copy = database_value
    local_copy += 1
    time.sleep(0.1)
    lock.release()
    
    database_value = local_copy

print("Start database value", database_value)

lock = Lock()

#here database value increases
th1 = Thread(target=increase, args=(lock,))
th2 = Thread(target=increase, args=(lock,))

th1.start()
th2.start()

th1.join()
th2.join()

print("End database value", database_value)

Start database value 0
End database value 2


In [9]:
from threading import Thread, Lock
import time

database_value = 0

def increase(lock):
    global database_value
    #because of this function modifies the value
    #we must use the locks of threads (mutexes)
    #but we also can use the context manager and recieve the same result
    with lock:
        local_copy = database_value
        local_copy += 1
        time.sleep(0.1)
    
    database_value = local_copy

print("Start database value", database_value)

lock = Lock()

#here database value increases
th1 = Thread(target=increase, args=(lock,))
th2 = Thread(target=increase, args=(lock,))

th1.start()
th2.start()

th1.join()
th2.join()

print("End database value", database_value)

Start database value 0
End database value 2


### Queues in Python and multithreading

Queues are exellent for thread-safe and process-safe data exchnges and data processing in miltithread or multiprocessing environments.

In [1]:
# Consider the simple example of using
from threading import Thread, Lock
import time
from queue import Queue

q = Queue()

#putting of elements in queue
q.put(1)
q.put(2)
q.put(3)

#get() - pops the first current element of queue
print("q is empty: ", q.empty())
first = q.get()
second = q.get()
third = q.get()
print(first, second, third)
print("q is empty: ", q.empty())

#this method always need to use when you end process of your task
#it tells to program that we finish processing and now continue
q.task_done()

#this blocks until all the items in the queue
#have been gotten and processed
#this is similar to Thread().join() method, here we block the main thread 
q.join()

q is empty:  False
1 2 3
q is empty:  True


In [5]:
# Consider the more complicated example
from threading import Thread, Lock, current_thread
import time
from queue import Queue

def worker(q, lock):
    while True:
        value = q.get()
        with lock:
            print(f'Thread {current_thread().name}, value {value}')
        #also we can use the standard break by condition mechanism of exiting from infinite loop
        q.task_done()

lock = Lock()
q = Queue()

num_threads = 10

for i in range(num_threads):
    thread = Thread(target=worker, args=(q, lock))
    #daemon thread is needed because we use the infinite loop in target function
    #the daemon thread does not block the main thread from exiting and continues to run in the background
    #daemon thread will die when the main thread dies then we exit from infinite loop
    thread.daemon=True
    #by default thread.daemon=False
    thread.start()

for i in range(1, 21):
    q.put(i)

q.join()

Thread Thread-21, value 1
Thread Thread-20, value 2
Thread Thread-22, value 3
Thread Thread-23, value 4
Thread Thread-24, value 5
Thread Thread-24, value 15
Thread Thread-24, value 16
Thread Thread-24, value 17
Thread Thread-24, value 18
Thread Thread-24, value 19
Thread Thread-24, value 20
Thread Thread-20, value 12
Thread Thread-22, value 13
Thread Thread-23, value 14
Thread Thread-25, value 6
Thread Thread-26, value 7
Thread Thread-18, value 8
Thread Thread-19, value 9
Thread Thread-17, value 10
Thread Thread-21, value 11


## Multiprocessing

(In Jupyter multiprocessing module doesn't work (because here there is no processors for work), then here lay only examples of code without execution)

### Dividing of work between processes

In [None]:
from multiprocessing import Process, current_process
import time
from datetime import datetime

def Worker1(a, b):
    time.sleep(1.2)
    print(f"This is the {current_process().name}, a+b={a+b}")
    
def Worker2():
    time.sleep(1.3)
    print(f"This is the {current_process().name}")

print("Calling of functions in main process:")
nt = datetime.now()
Worker1(1, 4)
Worker2()
print(f"Execution time={datetime.now()-nt}")
print()


print("Calling of functions in different processes:")
nt = datetime.now()
p1 = Process(target=Worker1, args=(1, 4))
p2 = Process(target=Worker2)

p1.start()
p2.start()

p1.join()
p2.join()
print(f"Execution time={datetime.now()-nt}")

### Using of GIL when race condition (Locks of processes in Python)

In order to see how race conditions occures in multiprocessing you can use the shared datastructures from multiprocessing module:

In [None]:
# shared value

from multiprocessing import Process, Value, Lock
import time

def increase_100(number, lock):
    for i in range(100):
        time.sleep(0.01)
        lock.acquire()
        number.value += 1
        lock.release()

# i means int datatype
shared_number = Value('i', 0)
lock = Lock()
print(f"Shared value at the begining: {shared_number.value}")

p1 = Process(target=increase_100, args=(shared_number,lock))
p2 = Process(target=increase_100, args=(shared_number,lock))

p1.start()
p2.start()

p1.join()
p2.join()

print(f"Shared value at the finish: {shared_number.value}")

In [None]:
# shared array

from multiprocessing import Process, Array, Lock
import time

def increase_100(numbers, lock):
    for i in range(100):
        time.sleep(0.01)
        for i in range(len(numbers)):
            lock.acquire()
            numbers[i] += 1
            lock.release()

# d means double datatype
shared_array = Array('d', [0.0, 100.0, 200.0])
lock = Lock()
print(f"Shared array at the begining: {shared_array[:]}")

p1 = Process(target=increase_100, args=(shared_array,lock))
p2 = Process(target=increase_100, args=(shared_array,lock))

p1.start()
p2.start()

p1.join()
p2.join()

print(f"Shared array at the finish: {shared_array[:]}")

Also you can use the context manager:

In [None]:
# shared array

from multiprocessing import Process, Array, Lock
import time

def increase_100(numbers, lock):
    for i in range(100):
        time.sleep(0.01)
        for i in range(len(numbers)):
            with lock:
                numbers[i] += 1

# d means double datatype
shared_array = Array('d', [0.0, 100.0, 200.0])
lock = Lock()
print(f"Shared array at the begining: {shared_array[:]}")

p1 = Process(target=increase_100, args=(shared_array,lock))
p2 = Process(target=increase_100, args=(shared_array,lock))

p1.start()
p2.start()

p1.join()
p2.join()

print(f"Shared array at the finish: {shared_array[:]}")

### Queues in Python and multiprocessing

In difference to multithreading here we import queue from multiprocessing module

In [18]:
from multiprocessing import Process, Array, Lock
from multiprocessing import Queue
import time

def square(numbers, queue):
    for i in numbers:
        queue.put(i*i)

def make_negative(numbers, queue):
    for i in numbers:
        queue.put(-1*i)
numbers = range(1, 6)
q = Queue()

p1 = Process(target=square, args=(numbers, q))
p2 = Process(target=make_negative, args=(numbers, q))

p1.start()
p2.start()

p1.join()
p2.join()

while not q.empty():
    print(q.get())

### Pools of workers in multiprocessing in Python

The multiprocessing module also introduces APIs which do not have analogs in the threading module. A prime example of this is the Pool object which offers a convenient means of parallelizing the execution of a function across multiple input values, distributing the input data across processes (data parallelism). The following example demonstrates the common practice of defining such functions in a module so that child processes can successfully import that module. This basic example of data parallelism using Pool.

In [None]:
# if your function takes the iterable object as a parameter

from multiprocessing import Pool

def cube(number):
    return number * number * number

pool = Pool()

# most important methods of Pool class is the map, apply, join and close
numbers = range(10)

# this automaticly allocate the maximum of avaliable processors for execution
result = pool.map(cube, numbers)

pool.close()
pool.join()

print(result)

In [None]:
# if your function takes the one not object as a parameter

from multiprocessing import Pool

def cube(number):
    return number * number * number

pool = Pool()

# most important methods of Pool class is the map, apply, join and close
numbers = range(10)

# this automaticly allocate the maximum of avaliable processors for execution
result = pool.apply(cube, numbers[0])

pool.close()
pool.join()

print(result)