# A Tour of Python
## by Pierre Nugues
A quick introduction to Python’s syntax for readers with some knowledge in programming.

## Elementatry flow control

### Variables

We create variables and we assign them with values, numbers, and strings with the equal sign. Using them, we carry out a few arithmetic operations:

In [10]:
a = 1
b = 2
c = a / (b
         + 1)
text = 'Result:'
print(text, c)


Result: 0.3333333333333333


## The `for` loop

In Python, blocks are identified by an indentation like in this `for` loop:

In [11]:
for i in [1, 2, 3, 4, 5, 6]:
    print(i)
print('Done')


1
2
3
4
5
6
Done


### Conditionals

Conditionals use the `if` and `else` keywords:

In [12]:
for i in [1, 2, 3, 4, 5, 6]:
    if i % 2 == 0:
        print('Even:', i)
    else:
        print('Odd:', i)
print('Done')


Odd: 1
Even: 2
Odd: 3
Even: 4
Odd: 5
Even: 6
Done


## Strings
We create strings with single quotes and multiline strings with triple double quotes

In [13]:
iliad = """Sing, O goddess, the anger of Achilles son of
Peleus, that brought countless ills upon the Achaeans."""
iliad


'Sing, O goddess, the anger of Achilles son of\nPeleus, that brought countless ills upon the Achaeans.'

In [14]:

iliad2 = 'Sing, O goddess, the anger of Achilles son of \
Peleus, that brought countless ills upon the Achaeans.'
iliad2


'Sing, O goddess, the anger of Achilles son of Peleus, that brought countless ills upon the Achaeans.'

We access the characters in a string through their index in square brackets:

In [15]:
alphabet = 'abcdefghijklmnopqrstuvwxyz'
alphabet[0]  # ’a’
alphabet[1]  # ’b’
alphabet[25]  # ’z’


'z'

Possibly with negative indices

In [16]:
alphabet[-1]  # the last character of a string: ’z’
alphabet[-2]  # the second last: ’y’
alphabet[-26]  # ’a’


'a'

An index outside the range of the string throws an index error.

In [17]:
alphabet[27]


IndexError: string index out of range

We get the length with `len()`

In [18]:
len(alphabet)  # 26


26

Strings are immutable. Trying to change their value throws an error:

In [20]:
alphabet[0] = 'b'  # throws an error


TypeError: 'str' object does not support item assignment

### String Operations and Functions

String operations. We can concatenate and repeat strings with `+` and `*`:

In [21]:
'abc' + 'def'  # 'abcdef'


'abcdef'

In [22]:
'abc' * 3  # 'abcabcabc'


'abcabcabc'

Some string functions

In [23]:
# join()
''.join(['abc', 'def', 'ghi'])  # equivalent to a +:
# 'abcdefghi'


'abcdefghi'

In [24]:
' '.join(['abc', 'def', 'ghi'])  # places a space between the
# elements: 'abc def ghi'


'abc def ghi'

In [25]:
', '.join(['abc', 'def', 'ghi'])  # 'abc, def, ghi'


'abc, def, ghi'

In [26]:
# upper() and lower()
accented_e = 'eéèêë'
accented_e.upper()  # 'EÉÈÊË'


'EÉÈÊË'

In [27]:
accented_E = 'EÉÈÊË'
accented_E.lower()  # 'eéèêë'


'eéèêë'

In [28]:
alphabet.find('def')  # 3


3

In [29]:
alphabet.find('é')  # -1, not found


-1

In [30]:
alphabet.replace('abc', 'αβγ')  # 'αβγdefghijklmnopqrstuvwxyz'


'αβγdefghijklmnopqrstuvwxyz'

A program to extract the vowels

In [31]:
text_vowels = ''
for c in iliad:
    if c in 'aeiou':
        text_vowels = text_vowels + c
print(text_vowels)   # 'ioeeaeoieooeeuaououeiuoeaea'


ioeeaeoieooeeuaououeiuoeaea


### Slices

Slides are substrings of strings

In [32]:
# Slices
alphabet[0:3]  # the three first letters of the alphabet: 'abc'


'abc'

In [33]:
alphabet[:3]  # equivalent to alphabet[0:3]


'abc'

In [34]:
alphabet[3:6]  # substring from index 3 to index 5: 'def'


'def'

In [35]:
alphabet[-3:]  # the three last letters of the alphabet: 'xyz'


'xyz'

In [36]:
alphabet[10:-10]  # 'klmnop'


'klmnop'

In [37]:
alphabet[:]  # all the letters: 'a...z'


'abcdefghijklmnopqrstuvwxyz'

The whole string

In [38]:
i = 10
alphabet[:i] + alphabet[i:]


'abcdefghijklmnopqrstuvwxyz'

Slices with a step

In [39]:
alphabet[0::2]  # acegikmoqzuwy


'acegikmoqsuwy'

### Special characters

Two characters have a special meaning in strings the quote and the backslash. They need to be escaped: `\'` and `\\\\`

In [40]:
'Python\'s strings'  # "Python's strings"


"Python's strings"

In [41]:
"Python's strings"  # "Python's strings"


"Python's strings"

Python defines escape sequences. It uses the UTF-8 standard

In [42]:
'\N{COMMERCIAL AT}'  # '@'


'@'

In [43]:
'\x40'  # '@'


'@'

In [44]:
'\u0152'  # 'Œ'


'Œ'

We use the `r` prefix to treat the backslashes as normal characters:

In [45]:
r'\N{COMMERCIAL AT}'  # '\\N{COMMERCIAL AT}'


'\\N{COMMERCIAL AT}'

In [46]:
r'\x40'  # '\\x40'


'\\x40'

In [47]:
r'\u0152'  # '\\u0152'


'\\u0152'

### Formatting strings

In [48]:
begin = 'my'
'{} string {}'.format(begin, 'is empty')
# 'my string is empty'


'my string is empty'

In [49]:
begin = 'my'
'{1} string {0}'.format('is empty', begin)
# 'my string is empty'


'my string is empty'

## Data identities and types

In [50]:
12


12

In [51]:
id(12)


2958542793360

In [52]:
print(12)


12


In [53]:
a = 12
id(a)


2958542793360

In [54]:
print(type(a))  # <class 'int'>


<class 'int'>


In [55]:
print(type(12.0))  # <class 'float'>


<class 'float'>


In [56]:
print(type(True))  # <class 'bool'>


<class 'bool'>


In [57]:
print(type(1 < 2))  # <class 'bool'>


<class 'bool'>


In [58]:
print(type(None))  # <class 'NoneType'>


<class 'NoneType'>


In [59]:
id('12')


2958646330544

In [60]:
print(type('12'))


<class 'str'>


In [61]:
alphabet       # abcdefghijklmnopqrstuvwxyz


'abcdefghijklmnopqrstuvwxyz'

In [62]:
id(alphabet)


2958553330112

In [63]:

type(alphabet)     # <class 'str'>


str

Type conversions

In [64]:
int('12')  # 12


12

In [65]:
str(12)  # '12'


'12'

In [66]:
int('12.0')  # ValueError


ValueError: invalid literal for int() with base 10: '12.0'

In [None]:
int(alphabet)  # ValueError


ValueError: invalid literal for int() with base 10: 'abcdefghijklmnopqrstuvwxyz'

In [67]:
int(True)  # 1


1

In [68]:
int(False)  # 0


0

In [69]:
bool(7)  # True


True

In [70]:
bool(0)  # False


False

In [71]:
bool(None)  # False


False

## Data structures

### Lists

Lists are data structures that can hold any type of elements. We can read and write data in a list using indexes

In [72]:
list1 = []  # An empty list
list1 = list()  # Another way to create an empty list
list2 = [1, 2, 3]  # List containing 1, 2, and 3


Their Python type

In [73]:
print(type(list2))


<class 'list'>


In [74]:
list2[1]  # 2


2

In [75]:
list2[1] = 8
list2  # [1, 8, 3]


[1, 8, 3]

In [76]:
list2[4]  # Index error


IndexError: list index out of range

In [77]:
var1 = 3.14
var2 = 'my string'


In [78]:
list3 = [1, var1, 'Prolog', var2]
list3  # [1, 3.14, 'Prolog', 'my string']


[1, 3.14, 'Prolog', 'my string']

Slices

In [79]:
list3[1:3]  # [3.14, 'Prolog']
list3[1:3] = [2.72, 'Perl', 'Python']
list3  # [1, 2.72, 'Perl', 'Python', 'my string']


[1, 2.72, 'Perl', 'Python', 'my string']

Lists of lists

In [80]:
list4 = [list2, list3]
# [[1, 8, 3], [1, 2.72, 'Perl', 'Python', 'my string']]
list4


[[1, 8, 3], [1, 2.72, 'Perl', 'Python', 'my string']]

In [81]:
list4[0][1]  # 8


8

In [82]:
list4[1][3]  # 'Python'


'Python'

In [83]:
list5 = list2
[v1, v2, v3] = list5


In [84]:
[v1, v2, v3]


[1, 8, 3]

### List Copy
##### Shallow copy

In [85]:
list2


[1, 8, 3]

In [86]:
list5


[1, 8, 3]

In [87]:
print(id(list2))
print(id(list5))


2958646362368
2958646362368


In [88]:
list6 = list2.copy()
id(list6)


2958646670912

#### Identity and equality

In [89]:
list2 == list5    # True


True

In [90]:
list2 == list6    # True


True

In [91]:
list2 is list5    # True


True

In [92]:
list2 is list6    # False


False

In [93]:
list2[1] = 2


In [94]:
print(list2)
print(list5)
print(list6)


[1, 2, 3]
[1, 2, 3]
[1, 8, 3]


In [95]:
id(list2)


2958646362368

#### Deep copy

In [96]:
id(list4.copy()[0])


2958646362368

In [97]:
import copy

id(copy.deepcopy(list4)[0])


2958646731392

### List operations and functions

In [98]:
list2


[1, 2, 3]

In [99]:

list3[:-1]  # [1, 2.72, 'Perl', 'Python']


[1, 2.72, 'Perl', 'Python']

In [100]:
[1, 2, 3] + ['a', 'b']  # [1, 2, 3, 'a', 'b']


[1, 2, 3, 'a', 'b']

In [101]:
list2[:2] + list3[2:-1]


[1, 2, 'Perl', 'Python']

In [102]:
list2 * 2


[1, 2, 3, 1, 2, 3]

In [103]:
[0.0] * 4  # Initializes a list of four 0.0s
# [0.0, 0.0, 0.0, 0.0]


[0.0, 0.0, 0.0, 0.0]

In [104]:
len(list2)  # 3


3

In [105]:
list2.extend([4, 5])  # [1, 2, 3, 4, 5]
list2


[1, 2, 3, 4, 5]

In [106]:
list2.append(6)  # [1, 2, 3, 4, 5, 6]
list2


[1, 2, 3, 4, 5, 6]

In [107]:
list2.append([7, 8])  # [1, 2, 3, 4, 5, 6, [7, 8]]
list2


[1, 2, 3, 4, 5, 6, [7, 8]]

In [108]:
list2.pop(-1)  # [1, 2, 3, 4, 5, 6]
list2


[1, 2, 3, 4, 5, 6]

In [109]:
list2.remove(1)  # [2, 3, 4, 5, 6]
list2


[2, 3, 4, 5, 6]

In [110]:
list2.insert(0, 'a')  # ['a', 2, 3, 4, 5, 6]
list2


['a', 2, 3, 4, 5, 6]

In [111]:
list5


['a', 2, 3, 4, 5, 6]

In [112]:
list6


[1, 8, 3]

### Tuples

Tuples are similar to list, but they are immutable

In [113]:
tuple1 = ()  # An empty tuple
tuple1 = tuple()  # Another way to create an empty tuple
tuple2 = (1, 2, 3, 4)


In [114]:
tuple2[3]  # 4


4

In [115]:
tuple2[1:4]  # (2, 3, 4)


(2, 3, 4)

In [116]:
tuple2[3] = 8  # Type error: Tuples are immutable


TypeError: 'tuple' object does not support item assignment

Tuple can include elements of different type, including lists that can be changed (not a good programming practice)

In [117]:
list7 = ['a', 'b', 'c']
tuple3 = tuple(list7)  # conversion to a tuple: ('a', 'b', 'c')
tuple3


('a', 'b', 'c')

In [118]:
type(tuple3)  # <class 'tuple'>


tuple

In [119]:
list8 = list(tuple2)  # [1, 2, 3, 4]


In [120]:
tuple([1])


(1,)

In [121]:
list((1,))


[1]

In [122]:
tuple4 = (tuple2, list7)  # ((1, 2, 3, 4), ['a', 'b', 'c'])
tuple4[0]  # (1, 2, 3, 4),


(1, 2, 3, 4)

In [123]:
tuple4[1]  # ['a', 'b', 'c']


['a', 'b', 'c']

In [124]:
tuple4[0][2]  # 3


3

In [125]:
tuple4[1][1]  # 'b'


'b'

In [126]:
tuple4[1][1] = 'β'  # ((1, 2, 3, 4), ['a', 'β', 'c'])
tuple4


((1, 2, 3, 4), ['a', 'β', 'c'])

### Sets

Sets are collections that have no duplicates

In [127]:
set1 = set()  # An empty set
set2 = {'a', 'b', 'c', 'c', 'b'}  # {'a', 'b', 'c'}
set2


{'a', 'b', 'c'}

In [128]:
print(type(set2))


<class 'set'>


In [129]:
set2.add('d')  # {'a', 'b', 'c', 'd'}
set2


{'a', 'b', 'c', 'd'}

In [130]:
set2.remove('a')  # {'b', 'c', 'd'}
set2


{'b', 'c', 'd'}

In [131]:
list9 = ['a', 'b', 'c', 'c', 'b']


In [132]:
set3 = set(list9)  # {'a', 'b', 'c'}
set3


{'a', 'b', 'c'}

In [133]:
iliad_chars = set(iliad.lower())
# The set of unique characters of the iliad string
iliad_chars


{'\n',
 ' ',
 ',',
 '.',
 'a',
 'b',
 'c',
 'd',
 'e',
 'f',
 'g',
 'h',
 'i',
 'l',
 'n',
 'o',
 'p',
 'r',
 's',
 't',
 'u'}

We can create a sorted list from a set

In [134]:
sorted(iliad_chars)


['\n',
 ' ',
 ',',
 '.',
 'a',
 'b',
 'c',
 'd',
 'e',
 'f',
 'g',
 'h',
 'i',
 'l',
 'n',
 'o',
 'p',
 'r',
 's',
 't',
 'u']

`sort()` calls the underlying operating system.
This means that it produces different results on different systems.
It does not work properly on macOS. (Update for macOS 11.5.1: Apparently it does)

In [135]:
import locale

loc = locale.getdefaultlocale()
loc


('sv_SE', 'cp1252')

In [136]:
accented = 'aàäeéèêëiîïoôöœuûüαβγ'
locale.setlocale(locale.LC_ALL, loc)
print("Without locale:\t", sorted(accented))
print("With locale ", loc, '\t', sorted(accented, key=locale.strxfrm))


Error: unsupported locale setting

With an English locale

In [137]:
locale.setlocale(locale.LC_ALL, 'en_US.UTF-8')
print("With an English locale:\t", sorted(accented, key=locale.strxfrm))


With an English locale:	 ['a', 'à', 'ä', 'e', 'é', 'è', 'ê', 'ë', 'i', 'î', 'ï', 'o', 'ô', 'ö', 'œ', 'u', 'û', 'ü', 'α', 'β', 'γ']


With a Swedish locale

In [138]:
locale.setlocale(locale.LC_ALL, 'sv_SE.UTF-8')
accented_sv = 'aåäeéioöuαβγ'
print("With a Swedish locale:\t", sorted(accented_sv, key=locale.strxfrm))


With a Swedish locale:	 ['a', 'e', 'é', 'i', 'o', 'u', 'å', 'ä', 'ö', 'α', 'β', 'γ']


Operations on sets

In [139]:
set2.intersection(set3)  # {'c', 'b'}


{'b', 'c'}

In [140]:
set2.union(set3)  # {'d', 'b', 'a', 'c'}


{'a', 'b', 'c', 'd'}

In [141]:
set2.symmetric_difference(set3)  # {'a', 'd'}


{'a', 'd'}

In [142]:
set2.issubset(set3)  # False


False

In [143]:
iliad_chars.intersection(set(alphabet))
# characters of the iliad string that are letters:
# {'a', 's', 'g', 'p', 'u', 'h', 'c', 'l', 'i',
#  'd', 'o', 'e', 'b', 't', 'f', 'r', 'n'}


{'a',
 'b',
 'c',
 'd',
 'e',
 'f',
 'g',
 'h',
 'i',
 'l',
 'n',
 'o',
 'p',
 'r',
 's',
 't',
 'u'}

### Dictionaries

Dictionaries are collections of values indexed by keys:

In [144]:
wordcount = {}  # We create an empty dictionary
wordcount = dict()  # Another way to create a dictionary
wordcount['a'] = 21  # The key 'a' has value 21
wordcount['And'] = 10  # 'And' has value 10
wordcount['the'] = 18


In [145]:
wordcount


{'a': 21, 'And': 10, 'the': 18}

In [146]:
print(type(wordcount))


<class 'dict'>


In [147]:
wordcount['a']  # 21


21

In [148]:
wordcount['And']  # 10


10

In [149]:
'And' in wordcount  # True


True

In [150]:
'is' in wordcount  # False


False

In [151]:
wordcount['is']  # Key error


KeyError: 'is'

### Dictionary functions

In [152]:
wordcount.get('And')  # 10


10

In [153]:
wordcount.get('is', 0)  # 0


0

In [154]:
wordcount.get('is')  # None


In [155]:
wordcount.keys()  # dict_keys(['the', 'a', 'And'])


dict_keys(['a', 'And', 'the'])

In [156]:
wordcount.values()  # dict_values([18, 21, 10])


dict_values([21, 10, 18])

In [157]:
wordcount.items()  # dict_items([('the', 18), ('a', 21),
# ('And', 10)])


dict_items([('a', 21), ('And', 10), ('the', 18)])

Keys must be immutable. We can use tuples, but not lists

In [158]:
my_dict = {}
my_dict[('And', 'the')] = 3  # OK, we use a tuple


In [159]:
my_dict[['And', 'the']] = 3  # Type error:
# unhashable type: 'list'


TypeError: unhashable type: 'list'

### Counting letters with a dictionary

In [160]:
letter_count = {}
for letter in iliad.lower():
    if letter in alphabet:
        if letter in letter_count:
            letter_count[letter] += 1
        else:
            letter_count[letter] = 1

print('Iliad')
letter_count


Iliad


{'s': 10,
 'i': 3,
 'n': 6,
 'g': 4,
 'o': 8,
 'd': 2,
 'e': 9,
 't': 6,
 'h': 6,
 'a': 6,
 'r': 2,
 'f': 2,
 'c': 3,
 'l': 6,
 'p': 2,
 'u': 4,
 'b': 1}

In [161]:
for letter in sorted(letter_count.keys()):
    print(letter, letter_count[letter])


a 6
b 1
c 3
d 2
e 9
f 2
g 4
h 6
i 3
l 6
n 6
o 8
p 2
r 2
s 10
t 6
u 4


Sorting the letters by frequency

In [162]:
for letter in sorted(letter_count.keys(),
                     key=letter_count.get, reverse=True):
    print(letter, letter_count[letter])


s 10
e 9
o 8
n 6
t 6
h 6
a 6
l 6
g 4
u 4
i 3
c 3
d 2
r 2
f 2
p 2
b 1


## Control structures

### Conditionals

In [163]:
digits = '0123456789'
punctuation = '.,;:?!'


In [164]:
char = '.'


In [165]:
if char in alphabet:
    print('Letter')
elif char in digits:
    print('Number')
elif char in punctuation:
    print('Punctuation')
else:
    print('Other')


Punctuation


### The `for...in` loop

In [166]:
sum = 0
for i in range(100):
    sum += i
print(sum)  # Sum of integers from 0 to 99: 4950
# Using the built-in sum() function,
# sum(range(100)) would produce the same result.


4950


Useful functions for `for`

In [167]:
list10 = list(range(5))  # [0, 1, 2, 3, 4]
list10


[0, 1, 2, 3, 4]

In [168]:
for inx, letter in enumerate(alphabet):
    print(inx, letter)


0 a
1 b
2 c
3 d
4 e
5 f
6 g
7 h
8 i
9 j
10 k
11 l
12 m
13 n
14 o
15 p
16 q
17 r
18 s
19 t
20 u
21 v
22 w
23 x
24 y
25 z


We cannot change an iteration variable in Python

In [169]:
for i in list10:
    if i == 0:
        i = 10
list10    # [0, 1, 2, 3, 4]


[0, 1, 2, 3, 4]

### The `while` loop

A `while` loop

In [170]:
sum, i = 0, 0
while i < 100:
    sum += i
    i += 1
sum


4950

Another loop

In [171]:
sum, i = 0, 0
while True:
    sum += i
    i += 1
    if i >= 100:
        break
sum


4950

### Exceptions

All the exceptions in one block

In [172]:
try:
    int(alphabet)
    int('12.0')
except:
    pass
print('Cleared the exception!')


Cleared the exception!


Specific exceptions

In [173]:
try:
    int(alphabet)
    int('12.0')
except ValueError:
    print('Caught a value error!')
except TypeError:
    print('Caught a type error!')


Caught a value error!


## Functions

We define a function with the `def` keyword:

In [174]:
# lc is for lowercase. It is to set the characters in lowercase
def count_letters(text, lc=True):
    letter_count = {}
    if lc:
        text = text.lower()
    for letter in text:
        if letter.lower() in alphabet:
            if letter in letter_count:
                letter_count[letter] += 1
            else:
                letter_count[letter] = 1
    return letter_count


We call the function with it default arguments

In [175]:
odyssey = """Tell me, O Muse, of that many-sided hero who
traveled far and wide after he had sacked the famous town
of Troy."""
print('Start')
od = count_letters(odyssey)
for letter in sorted(od.keys()):
    print(letter, od[letter])


Start
a 9
c 1
d 7
e 12
f 5
h 6
i 2
k 1
l 3
m 4
n 3
o 8
r 5
s 4
t 8
u 2
v 1
w 3
y 2


Or with lower case set to `False`

In [176]:
od = count_letters(odyssey, False)
for letter in sorted(od.keys()):
    print(letter, od[letter])


M 1
O 1
T 2
a 9
c 1
d 7
e 12
f 5
h 6
i 2
k 1
l 3
m 3
n 3
o 7
r 5
s 4
t 6
u 2
v 1
w 3
y 2


In [177]:
print(type(count_letters))


<class 'function'>


## Comprehensions and Generators

Comprehensions and generators are alternatives to loops

### Comprehensions

Generating a set of edits for a string with a comprehension:

In [178]:
word = 'acress'
splits = [(word[:i], word[i:]) for i in range(len(word) + 1)]
splits


[('', 'acress'),
 ('a', 'cress'),
 ('ac', 'ress'),
 ('acr', 'ess'),
 ('acre', 'ss'),
 ('acres', 's'),
 ('acress', '')]

In [179]:
deletes = [a + b[1:] for a, b in splits if b]
deletes


['cress', 'aress', 'acess', 'acrss', 'acres', 'acres']

And the same with loops. Comprehensions are more compact

In [180]:
splits = []
for i in range(len(word) + 1):
    splits.append((word[:i], word[i:]))
splits


[('', 'acress'),
 ('a', 'cress'),
 ('ac', 'ress'),
 ('acr', 'ess'),
 ('acre', 'ss'),
 ('acres', 's'),
 ('acress', '')]

In [181]:
deletes = []
for a, b in splits:
    if b:
        deletes.append(a + b[1:])
deletes


['cress', 'aress', 'acess', 'acrss', 'acres', 'acres']

### Generators

Generators are similar to comprehensions, but they create the elements on demand

In [182]:
splits_generator = ((word[:i], word[i:])
                    for i in range(len(word) + 1))

for i in splits_generator:
    print(i)


('', 'acress')
('a', 'cress')
('ac', 'ress')
('acr', 'ess')
('acre', 'ss')
('acres', 's')
('acress', '')


We can traverse a generator only once

In [183]:
for i in splits_generator:
    print(i)  # Nothing


### Iterators and `zip()`

In [184]:
latin_alphabet = 'abcdefghijklmnopqrstuvwxyz'
len(latin_alphabet)  # 26


26

In [185]:
greek_alphabet = 'αβγδεζηθικλμνξοπρστυφχψω'
len(greek_alphabet)  # 24


24

In [186]:
cyrillic_alphabet = 'абвгдеёжзийклмнопрстуфхцчшщъыьэюя'
len(cyrillic_alphabet)  # 33


33

In [187]:
la_gr = zip(latin_alphabet[:3], greek_alphabet[:3])
la_gr


<zip at 0x2b0dd3c4300>

In [188]:
list(la_gr)


[('a', 'α'), ('b', 'β'), ('c', 'γ')]

In [189]:
list(la_gr)  # You can traverse it only once


[]

In [190]:
la_gr_cy = zip(latin_alphabet[:3], greek_alphabet[:3],
               cyrillic_alphabet[:3])
la_gr_cy


<zip at 0x2b0dd3c5840>

Iterators have a `__next__()` function:

We recreate the iterator

In [191]:
la_gr = zip(latin_alphabet[:3], greek_alphabet[:3])  # We recreate the iterator


In [192]:
la_gr.__next__()  # ('a', 'α')


('a', 'α')

In [193]:
la_gr.__next__()  # ('b', 'β')


('b', 'β')

In [194]:
la_gr.__next__()  # ('c', 'γ')


('c', 'γ')

Until we reach the end

In [195]:
la_gr.__next__()


StopIteration: 

We can traverse an iterator only once. To traverse it two or more times, we convert it to a list

In [196]:
la_gr_cy_list = list(la_gr_cy)


First time

In [197]:
la_gr_cy_list  # [('a', 'α', 'а'), ('b', 'β', 'б'), ('c', 'γ', 'в')]


[('a', 'α', 'а'), ('b', 'β', 'б'), ('c', 'γ', 'в')]

Second time, etc.

In [198]:
la_gr_cy_list  # [('a', 'α', 'а'), ('b', 'β', 'б'), ('c', 'γ', 'в')]


[('a', 'α', 'а'), ('b', 'β', 'б'), ('c', 'γ', 'в')]

In [199]:
list(la_gr_cy)  # []


[]

Zipping

In [200]:
la_gr_cy = list(zip(latin_alphabet[:3], greek_alphabet[:3],
                    cyrillic_alphabet[:3]))
la_gr_cy


[('a', 'α', 'а'), ('b', 'β', 'б'), ('c', 'γ', 'в')]

And unzipping

In [201]:
list(zip(*la_gr_cy))  # [('a', 'b', 'c'), ('α', 'β', 'γ'), ('а', 'б', 'в')]


[('a', 'b', 'c'), ('α', 'β', 'γ'), ('а', 'б', 'в')]

Transposing lists with `zip(*)`

In [202]:
la_gr_cy_list


[('a', 'α', 'а'), ('b', 'β', 'б'), ('c', 'γ', 'в')]

In [203]:
list(zip(*la_gr_cy_list))


[('a', 'b', 'c'), ('α', 'β', 'γ'), ('а', 'б', 'в')]

## Modules

The `math` module

In [204]:
import math

math.sqrt(2)  # 1.4142135623730951


1.4142135623730951

In [205]:
math.sin(math.pi / 2)  # 1.0


1.0

In [206]:
math.log(8, 2)  # 3.0


3.0

In [207]:
print(type(math))


<class 'module'>


The `statistics` module

In [208]:
import statistics as stats

stats.mean([1, 2, 3, 4, 5])  # 3.0


3

In [209]:
stats.stdev([1, 2, 3, 4, 5])  # 1.5811388300841898


1.5811388300841898

Running the program or importing it

In [210]:
if __name__ == '__main__':
    print("Running the program")
    # Other statements
else:
    print("Importing the program")
    # Other statements


Running the program


## Basic File Input/Output

Before you run the code below, you will need a file. To follow the example, download Homer's _Iliad_ and _Odyssey_ from the department of classics at the Massachusetts Institute of Technology (MIT): http://classics.mit.edu and store them on your computer. Adjust the `PATH` variable.

In [211]:
import os

CORPUS_PATH = os.getcwd() + "/../corpus/" 


In [212]:
try:
    # We open a file and we get a file object
    file_path = CORPUS_PATH + 'iliad.mb.txt'
    print(file_path)
    f_iliad = open(file_path, 'r', encoding='utf-8')
    iliad_txt = f_iliad.read()  # We read all the file
    f_iliad.close()  # We close the file
except:
    pass


d:\Git-Repos\edan20\labs_2023/../corpus/iliad.mb.txt


In [213]:
iliad_stats = count_letters(iliad_txt)  # We count the letters
iliad_stats


{'p': 9116,
 'r': 36487,
 'o': 51302,
 'v': 6069,
 'i': 38198,
 'd': 28351,
 'e': 77516,
 'b': 8951,
 'y': 11917,
 't': 54223,
 'h': 50214,
 'n': 42231,
 'c': 11586,
 'l': 25332,
 'a': 51052,
 's': 41285,
 'm': 16660,
 'f': 16119,
 'g': 12606,
 'u': 18422,
 'k': 4413,
 'w': 15671,
 'j': 1624,
 'q': 284,
 'z': 284,
 'x': 597}

In [214]:
with open('iliad_stats.txt', 'w') as f:
    f.write(str(iliad_stats))
    # we automatically close the file


## Collecting a Corpus

We create a dictionary with URLs

In [215]:
classics_url = {'iliad': 'http://classics.mit.edu/Homer/iliad.mb.txt',
                'odyssey': 'http://classics.mit.edu/Homer/odyssey.mb.txt',
                'eclogue': 'http://classics.mit.edu/Virgil/eclogue.mb.txt',
                'georgics': 'http://classics.mit.edu/Virgil/georgics.mb.txt',
                'aeneid': 'http://classics.mit.edu/Virgil/aeneid.mb.txt'}


We read the texts from the URLs

In [216]:
import requests

classics = {}
for key in classics_url:
    classics[key] = requests.get(classics_url[key]).text


We remove the license information to keep only the text

In [217]:
text_bounds = {'iliad': (136, -486), 'odyssey': (138, -486),
               'eclogue': (139, -486), 'georgics': (140, -486), 'aeneid': (138, -486)}


In [218]:
for key in classics:
    classics[key] = classics[key][text_bounds[key][0]:text_bounds[key][1]]


In [219]:
classics['iliad'][:50]


'The Iliad\nBy Homer\n\n\nTranslated by Samuel Butler\n\n'

We additionally write the Iliad and the Odyssey in two text files

In [220]:
with open('iliad.txt', 'w') as f_il, open('odyssey.txt', 'w') as f_od:
    f_il.write(classics['iliad'])
    f_od.write(classics['odyssey'])


We store the corpus in a JSON file

In [221]:
import json

with open('classics.json', 'w') as f:
    json.dump(classics, f)


We read it again

In [222]:
with open('classics.json', 'r') as f:
    classics = json.loads(f.read())


## Decorators and memo-functions

In [223]:
__author__ = "Pierre Nugues"


def memo_function(f):
    cache = {}

    def memo(x):
        if x in cache:
            return cache[x]
        else:
            cache[x] = f(x)
            return cache[x]

    return memo


@memo_function
def fibonacci(n):
    """
    Fibonacci with memo function
    :param n:
    :return:
    """
    if n == 1:
        return 1
    elif n == 2:
        return 1
    else:
        return fibonacci(n - 1) + fibonacci(n - 2)


f_numbers = {}


def fibonacci2(n):
    """
    Fibonacci with memoization. Ad hoc implementation
    :param n:
    :return:
    """
    if n == 1:
        return 1
    elif n == 2:
        return 1
    elif n in f_numbers:
        return f_numbers[n]
    else:
        f_numbers[n] = fibonacci2(n - 1) + fibonacci2(n - 2)
        return f_numbers[n]


print(fibonacci(400))
print(fibonacci2(900))


176023680645013966468226945392411250770384383304492191886725992896575345044216019675
54877108839480000051413673948383714443800519309123592724494953427039811201064341234954387521525390615504949092187441218246679104731442473022013980160407007017175697317900483275246652938800


## Classes and Objects
Defining a class

In [224]:
class Text:
    """Text class to hold and process text"""

    alphabet = 'abcdefghijklmnopqrstuvwxyz'

    def __init__(self, text=None):
        """The constructor called when an object
        is created"""

        self.content = text
        self.length = len(text)
        self.letter_counts = {}

    def count_letters(self, lc=True):
        """Function to count the letters of a text"""

        letter_counts = {}
        if lc:
            text = self.content.lower()
        else:
            text = self.content
        for letter in text:
            if letter.lower() in self.alphabet:
                if letter in letter_counts:
                    letter_counts[letter] += 1
                else:
                    letter_counts[letter] = 1
        self.letter_counts = letter_counts
        return letter_counts


In [225]:
print(type(Text))


<class 'type'>


Creating objects and calling methods

In [226]:
txt = Text("""Tell me, O Muse, of that many-sided hero who
traveled far and wide after he had sacked the famous town
of Troy.""")
print(type(txt))


<class '__main__.Text'>


In [227]:
print(txt.length)

print(txt.count_letters())
print(txt.count_letters(False))


111
{'t': 8, 'e': 12, 'l': 3, 'm': 4, 'o': 8, 'u': 2, 's': 4, 'f': 5, 'h': 6, 'a': 9, 'n': 3, 'y': 2, 'i': 2, 'd': 7, 'r': 5, 'w': 3, 'v': 1, 'c': 1, 'k': 1}
{'T': 2, 'e': 12, 'l': 3, 'm': 3, 'O': 1, 'M': 1, 'u': 2, 's': 4, 'o': 7, 'f': 5, 't': 6, 'h': 6, 'a': 9, 'n': 3, 'y': 2, 'i': 2, 'd': 7, 'r': 5, 'w': 3, 'v': 1, 'c': 1, 'k': 1}


Assigning the object variables

In [228]:
txt.my_var = 'a'
txt.content = classics['iliad']
print(txt.count_letters())
print(txt.my_var)


{'t': 54177, 'h': 50194, 'e': 77466, 'i': 38151, 'l': 25311, 'a': 51020, 'd': 28333, 'b': 8941, 'y': 11908, 'o': 51270, 'm': 16648, 'r': 36457, 'n': 42194, 's': 41243, 'u': 18409, 'k': 4413, 'g': 12595, 'f': 16114, 'c': 11558, 'p': 9104, 'v': 6060, 'w': 15665, 'j': 1624, 'q': 283, 'z': 284, 'x': 597}
a


### Subclassing

In [229]:
class Word(Text):
    def __init__(self, word=None):
        super().__init__(word)
        self.part_of_speech = None

    def annotate(self, part_of_speech):
        self.part_of_speech = part_of_speech


In [230]:
type(Word)


type

In [231]:
word = Word('Muse')


In [232]:
type(word)


__main__.Word

In [233]:
word.length


4

In [234]:
word.count_letters(lc=False)


{'M': 1, 'u': 1, 's': 1, 'e': 1}

In [235]:
word.annotate('Noun')
word.part_of_speech


'Noun'

## Functional programming

`map()`

In [236]:
text_lengths = map(len, [iliad, odyssey])
list(text_lengths)  # [100, 111]


[100, 111]

In [237]:
def file_length(file):
    return len(open(file).read())


file_length('iliad.txt')


807676

In [238]:
files = ['iliad.txt', 'odyssey.txt']
files = [file for file in files]

text_lengths = map(lambda x: len(open(x).read()), files)
list(text_lengths)  # [807677, 610676]


[807676, 610676]

In [239]:
text_lengths = (
    map(lambda x: (open(x).read(), len(open(x).read())),
        files))
text_lengths = list(text_lengths)
[text_lengths[0][1], text_lengths[1][1]]  # [807676, 610676]


[807676, 610676]

In [240]:
text_lengths = (
    map(lambda x: (x, len(x)),
        map(lambda x: open(x).read(), files)))
text_lengths = list(text_lengths)
[text_lengths[0][1], text_lengths[1][1]]  # [807676, 610676]


[807676, 610676]

`reduce()`

In [241]:
import functools

char_count = functools.reduce(
    lambda x, y: x[1] + y[1],
    map(lambda x: (x, len(x)),
        map(lambda x: open(x).read(), files)))

char_count


1418352

In [242]:
iliad = """Sing, O goddess, the anger of Achilles son of
Peleus, that brought countless ills upon the Achaeans."""
iliad


'Sing, O goddess, the anger of Achilles son of\nPeleus, that brought countless ills upon the Achaeans.'

In [243]:
''.join(filter(lambda x: x in 'aeiou', iliad))


'ioeeaeoieooeeuaououeiuoeaea'

In [244]:
''.join(filter(lambda x: x in 'aeiou',
               open('iliad.txt').read()))[:100]


'eiaoeaaeaueueioeeaeoieooeeuaououeiuoeaeaaaaeouiieuiooaeaaaeoiiieaeooauueooeeeoueooeuieoeaoieooeuioea'

In [245]:
map(lambda y:
    ''.join(filter(lambda x: x in 'aeiou',
                   open(y).read())),
    files)


<map at 0x2b0dd683e50>

In [246]:
list(map(len,
         map(lambda y:
             ''.join(filter(lambda x: x in 'aeiou',
                            open(y).read())),
             files)))

# print(list(map(lambda x: x if x in 'aeiuo' else '', map(lambda x: open(x).read(), files))))


[230637, 176073]