# Web Intelligence
## Python crash course

#### Prof. Claudio Lucchese

## Books

 - Learning Python. O'Reilly. Mark Lutz.
 
 - advanced notes: https://github.com/satwikkansal/wtfpython
 
## How can I run my Python code ?

In this course we use **Jupyter notebooks** as provided by **Anaconda for Python 3.0**
 - see instructions: https://www.anaconda.com/distribution/
 - be careful! python 3 and not python 2.7

Jupyter notebooks allow 
 - to write slides like this.
 - to write complex documents interleaving text with programs
 - it is basically an interactive interpreter accessed via browser
 
 
Additional tools:
 - PyCharm by JetBrains https://www.jetbrains.com/pycharm/

## Your best friends in learning Python

1. The Python website:
    - plenty of links to books and tutorials!
        - e.g., https://docs.python.org/3/tutorial/
0. The official Python documentation:
    - https://docs.python.org/3/library/index.html
0. Google & StackOverflow:
    - try googling for `TypeError: can't multiply sequence by non-int of type 'float'`
0. Python Tutor
    - visualizes the execution of python code
    - http://pythontutor.com/

## Who uses python

 - The popular *YouTube* video sharing service is largely written in Python
 - The *Dropbox* storage service codes both its server and desktop client software primarily in Python
 - The widespread *BitTorrent* peer-to-peer file sharing system began its life as a Python program
 - *Netflix* and *Yelp* have both documented the role of Python in their software infrastructures
 - *JPMorgan, Chase, UBS, Getco, and Citadel* apply Python to financial market forecasting
 - *NASA, Los Alamos, Fermilab, JPL*, and others use Python for scientific programming tasks
 
 - In "The Anatomy of a Large-Scale Hypertextual Web Search Engine" 1998, Google founders describe the Google architecture
    - crawlers were written in python !

# Python types

Python provides the following types:

| Object type | Examples |
|:-:|:-:|
| Numbers | `1234`, `3.1415`, `3+4j`, ... |
| Strings | `'spam'`, `"Bob's"`, ... |
| Lists   | `[1, [2, 'three'], 4.5]`, `list(range(10))`, ... |
| Dictionaries | `{'food': 'spam', 'taste': 'yum'}`, `dict(hours=10)`, ... |
| Tuples |  `(1, 'spam', 4, 'U')`, `tuple('spam')`, ...|
| Files |   `open('eggs.txt')`, `open(r'C:\ham.bin', 'wb')`, ... |
| Sets  | `set('abc')`, `{'a', 'b', 'c'}`, ... |
| Other core types | `Booleans`, `None`, ... |

 - The type of a variable is inferred from the expression.
 - You can use the function `type` to ask Python which type is being used
 - The type determines the set of valid operators

In [1]:
a = 2.0
print (type(a))
a = 7.
print (type(a))
a = "Hello!"
print (type(a))

<class 'float'>
<class 'float'>
<class 'str'>


# Numbers

Check integer vs. floating point division. Type of the results is determined by the operation.

In [1]:
print ("What is the output of 11/2:", 11/2)
print ("What is the output of 11%2:", 11%2)
print ("What is the output of  2**10:", 2**10)

What is the output of 11/2: 5.5
What is the output of 11%2: 1
What is the output of  2**10: 1024


In [2]:
print ("What is the output of 11//2:", 11//2)

What is the output of 11//2: 5


# Strings

Check the `*` operation.

In [3]:
print ("What is the output of 'a'+'b':",  'a'+'b'   )
print ("What is the output of 'a'=='b':", 'a'=='b'  )
print ("What is the output of 'a'<='b':", 'a'<='b'  )
print ("What is the output of 'a'<='A':", 'a'=='A'  )

What is the output of 'a'+'b': ab
What is the output of 'a'=='b': False
What is the output of 'a'<='b': True
What is the output of 'a'<='A': False


In [4]:
print ("What is the output of 'a'*5:",    'a'*5 )
print ("What is the output of 'aaaa'/5:", 'a'/5 )

What is the output of 'a'*5: aaaaa


TypeError: unsupported operand type(s) for /: 'str' and 'int'

In [5]:
print ("What is the output of int('10')/5:", int('10')/5)

What is the output of int('10')/5: 2.0


# Conditional Statements

Tabbing is used to identify the body of `if`-`else` and other constructs such as `for`, `while`, `functions`.

Check if a variable x is within the interval $[0,10]$.

In [6]:
x = 33
if x<=10 and x>=0 :
    print ("x is in the interval [0,10]")
    pass # pass does nothing
    pass
    print ("I'm here !")
else :
    print ("x is not in the interval [0,10]")
    pass
    pass


x is not in the interval [0,10]


In [7]:
x = 33
# This is a special compact form
if 0<=x<=10 :
    print ("x is in the interval [0,10]")
else :
    print ("x is not in the interval [0,10]")

x is not in the interval [0,10]


In [8]:
x = 33
if 0<=x<=10 : print ("x is in the interval [0,10]")
else : print ("x is not in the interval [0,10]")

x is not in the interval [0,10]


# While loops

Nothing new: `while`, `break` `continue`


In [None]:
i = 0
while i<10:
    pass
    if i==8: break
    pass
    print ("This is Iteration N.", i)
    pass
    i += 1
    pass
    pass
    if i==5: continue
    pass
    pass


This is Iteration N. 0
This is Iteration N. 1
This is Iteration N. 2
This is Iteration N. 3
This is Iteration N. 4


# For Loops

A `range` is a special tool to create sequences of numbers, given start, end, and step parameters.

In [9]:
for i in range(5):
    print ("This is Iteration N.", i)

This is Iteration N. 0
This is Iteration N. 1
This is Iteration N. 2
This is Iteration N. 3
This is Iteration N. 4


In [10]:
for i in range(0,10,2):
    print ("This is Iteration N.", i)

This is Iteration N. 0
This is Iteration N. 2
This is Iteration N. 4
This is Iteration N. 6
This is Iteration N. 8


In [11]:
for i in range(10,0,-2):
    print ("This is Iteration N.", i)

This is Iteration N. 10
This is Iteration N. 8
This is Iteration N. 6
This is Iteration N. 4
This is Iteration N. 2


In [None]:
print ( range(5) )

This is called **iterable**! You can only iterate through it ...

# Lists

Lists are used very frequently, and they can be dynamically modified.

In [12]:
for i in [0,1,2,3,4]:
    print ("This is Iteration N.", i)

This is Iteration N. 0
This is Iteration N. 1
This is Iteration N. 2
This is Iteration N. 3
This is Iteration N. 4


In [13]:
my_list = [1,2,3] + [4,5]
print (my_list)

[1, 2, 3, 4, 5]


In [14]:
my_list = [1,2,3]
my_list += [4,5]
print (my_list)

[1, 2, 3, 4, 5]


In [15]:
my_list = [1,2,3] + ["donald duck", 42.0]
print (my_list)

[1, 2, 3, 'donald duck', 42.0]


In [16]:
my_list = [1,2,3] + ["donald duck", ["this", "is", 1, "nested", "list"] ]
print (my_list)

[1, 2, 3, 'donald duck', ['this', 'is', 1, 'nested', 'list']]


In [17]:
print ( len([1,2,3,4,5]) )

5


In [18]:
my_list = [1,2,3,4,5]
print ( my_list[0])
print ( my_list[4])
print ( my_list[5])

1
5


IndexError: list index out of range

In [19]:
my_list = [1,2,3,4,5]
print ( my_list[-1])
print ( my_list[-2])
print ( my_list[-100])

5
4


IndexError: list index out of range

In [20]:
my_list = [1,2,3,4,5,4,3,2,1]

print ( 3 in my_list)

print ( my_list.count(3))

print ( my_list.index(1))
# print ( my_list.index(33)) # this raises an error

True
2
0


# Slicing

Slicing allows to access a sublist

In [21]:
my_list = ['red', 'orange', 'yellow', 'green', 'blue', 'indigo',
           'violet']

print ( my_list[1:3] ) 

['orange', 'yellow']


In [22]:
print ( my_list[3:-1] ) 

['green', 'blue', 'indigo']


In [23]:
print ( my_list[3:] )

['green', 'blue', 'indigo', 'violet']


In [24]:
print ( my_list[0:7:2] )

['red', 'yellow', 'blue', 'violet']


In [25]:
print ( my_list[0::2] )

['red', 'yellow', 'blue', 'violet']


In [26]:
print ( my_list[::2] )

['red', 'yellow', 'blue', 'violet']


In [27]:
print ( my_list[::-1] )

['violet', 'indigo', 'blue', 'green', 'yellow', 'orange', 'red']


# Lists are mutable

Elements of a list can be replaced. Sublists can be replaced with other sublists.

In [None]:
# original list
my_list = ['red', 'orange', 'yellow', 'green', 'blue', 'indigo', 'violet']
print (my_list)

# modify one element
my_list[-2] = 'ultramarine'

# the new list
print (my_list)

In [28]:
my_list = ['red', 'orange', 'yellow', 'green', 'blue', 'indigo', 'violet']
my_list[4] = ['light blue', 'blue', 'dark blue']
print (my_list)

['red', 'orange', 'yellow', 'green', ['light blue', 'blue', 'dark blue'], 'indigo', 'violet']


In [29]:
# here we replace one slice with another slice
my_list = ['red', 'orange', 'yellow', 'green', 'blue', 'indigo', 'violet']
my_list[4:5] = ['light blue', 'blue', 'dark blue']
print (my_list)

['red', 'orange', 'yellow', 'green', 'light blue', 'blue', 'dark blue', 'indigo', 'violet']


In [30]:
# A special case of replacement when start and end index are the same
my_list = ['red', 'orange', 'yellow', 'green', 'blue', 'indigo', 'violet']
my_list[5:5] = ['dark blue', 'darker blue'] 
print (my_list)

['red', 'orange', 'yellow', 'green', 'blue', 'dark blue', 'darker blue', 'indigo', 'violet']


In [31]:
my_list = ['red', 'orange', 'yellow', 'green', 'blue', 'indigo', 'violet']
my_list[2] = []
print (my_list)

['red', 'orange', [], 'green', 'blue', 'indigo', 'violet']


In [32]:
my_list = ['red', 'orange', 'yellow', 'green', 'blue', 'indigo', 'violet']
my_list[2:3] = []
print (my_list)

['red', 'orange', 'green', 'blue', 'indigo', 'violet']


In [None]:
my_list = ['red', 'orange', 'yellow', 'green', 'blue', 'indigo', 'violet']

print ("Is orange in the rainbow?", 'orange' in my_list )

print ("Is brown in the rainbow?", 'brown' in my_list )

print ("Is it true that cobal is not in the rainbow?", 'cobalt' not in my_list )

# Tuple

Like lists, but **immutable**.

In [33]:
my_tuple = (1,2,3,4, "five")

print (my_tuple)
print (my_tuple[2])

(1, 2, 3, 4, 'five')
3


In [34]:
my_tuple = (1,2,3) + (4, "five")

print (my_tuple)
print (my_tuple[2])

(1, 2, 3, 4, 'five')
3


In [35]:
my_tuple[2] = 3

TypeError: 'tuple' object does not support item assignment

# Unpacking

Multiple assignment, typical of function returning multiple values.

In [36]:
my_tuple = (1,2,3)
a,b,c = my_tuple
print (a,b,c)

1 2 3


In [37]:
my_list = [1,2,3]
a,b,c = my_list
print (a,b,c)

1 2 3


# Sorting

In-place vs. returning a new list.

In [None]:
my_list = [2,3,1]

my_list.sort()

print (my_list)

In [None]:
my_list = [2,3,1]

new_list = sorted( my_list )

print (my_list)
print (new_list)

# Careful !

**Check** in python tutor: http://pythontutor.com/  !

In [38]:
a = 11
b = a
a = 22
print (a,b)

22 11


In [39]:
a = [11]
b = a
a[0] = 22
print (a,b)

[22] [22]


In [40]:
my_list = [1,2,3]
new_list = my_list
new_list[1] = 77

print ( new_list + my_list)

[1, 77, 3, 1, 77, 3]


In [41]:
my_list = [1,2,3] *2
print (my_list)

[1, 2, 3, 1, 2, 3]


In [42]:
my_list = [ [1,2,3] ]*2
print (my_list)

[[1, 2, 3], [1, 2, 3]]


In [43]:
my_list[0]+= [4]
print ( my_list )

[[1, 2, 3, 4], [1, 2, 3, 4]]


In [44]:
my_tuple = (1,2,3)
new_tuple = my_tuple
my_tuple += tuple([77])

print ( new_tuple + my_tuple)

(1, 2, 3, 1, 2, 3, 77)


In [45]:
# if you want to actually copy a list
a = [11]
b = a.copy()
a[0] = 22
print (a,b)

[22] [11]


In [None]:
a = [11]
b = list(a)
a[0] = 22
print (a,b)

In [None]:
a = [11]
b = a[:]
a[0] = 22
print (a,b)

# Iterating through lists

Or through multiple lists.

In [None]:
my_list = [2,3,1]
for x in my_list:
    print (x)

In [47]:
my_list = [2,3,1]
for i,x in enumerate(my_list):
    print (i,x)

0 2
1 3
2 1


In [46]:
my_list = [2,3,1]
for z in enumerate(my_list):
    print (z, type(z))

(0, 2) <class 'tuple'>
(1, 3) <class 'tuple'>
(2, 1) <class 'tuple'>


In [48]:
A = [2,3,1]
B = ["two", "three", "one"]
for a,b in zip(A,B):
    print (a,b)

2 two
3 three
1 one


# More about strings

Strings are like lists of character, but they are immutable.

In [49]:
msg = "I like programming with python!"

In [50]:
print (msg[2])

l


In [51]:
print (msg[2:6])

like


In [52]:
msg[3] = "x"

TypeError: 'str' object does not support item assignment

In [53]:
for c in msg:
    print (c)

I
 
l
i
k
e
 
p
r
o
g
r
a
m
m
i
n
g
 
w
i
t
h
 
p
y
t
h
o
n
!


In [54]:
print (msg.split())

['I', 'like', 'programming', 'with', 'python!']


In [55]:
print (msg.split("i"))

['I l', 'ke programm', 'ng w', 'th python!']


In [58]:
#Remove leading and training whitespaces

my_string = "     A Bit of Python \n"

print ( "---", my_string, "---", sep="" )
print ( "---", my_string.strip(), "---", sep="" )

---     A Bit of Python 
---
---A Bit of Python---


In [59]:
# Remove leading and training characters of choice

my_string = "###!#!#!##!#A Bit of Python?!!???##"

print ( "---", my_string.strip("#"), "---", sep="" )
print ( "---", my_string.strip("#?"), "---", sep="" )
print ( "---", my_string.strip("!?#"), "---", sep="" )

---!#!#!##!#A Bit of Python?!!???---
---!#!#!##!#A Bit of Python?!!---
---A Bit of Python---


# Sets

The mathematical notion of set.

In [60]:
my_set = set([1,2,3,4,5,4,3,2,1])

print (my_set)

{1, 2, 3, 4, 5}


In [61]:
A = set([1,2,3])
B = set([4,5])
C = A | B

print (C)

{1, 2, 3, 4, 5}


In [62]:
A = set([1,2,3])
B = set([3,4,5])
C = A & B

print (C)

{3}


In [63]:
A = set([1,2,3])
B = set([3,4,5])
C = A - B

print (C)

{1, 2}


In [64]:
A = set([1,2,3])
B = set([3,4,5])

print (1 in A)
print (7 not in A)

True
True


# Dictionaries

A dictionary is a map

In [65]:
my_dict = {1:"Jan", 2:"Feb", 3:"Mar", 4:"Apr", 5:"May", 6:"Jun",
           7:"Jul", 8:"Aug", 9:"Sep", 10:"Oct", 11:"Nov", 12:"Dec"}

print (my_dict[0])

KeyError: 0

In [66]:
my_dict = {1:"Jan", 2:"Feb", 3:"Mar", 4:"Apr", 5:"May", 6:"Jun",
           7:"Jul", 8:"Aug", 9:"Sep", 10:"Oct", 11:"Nov", 12:"Dec"}

print (my_dict[1])
print (my_dict[12])

Jan
Dec


In [67]:
my_dict = {1:"Jan", 2:"Feb", 3:"Mar", 4:"Apr", 5:"May", 6:"Jun",
           7:"Jul", 8:"Aug", 9:"Sep", 10:"Oct", 11:"Nov", 12:"Dec"}

my_dict[1] = 777
del my_dict[12]
print (my_dict)

{1: 777, 2: 'Feb', 3: 'Mar', 4: 'Apr', 5: 'May', 6: 'Jun', 7: 'Jul', 8: 'Aug', 9: 'Sep', 10: 'Oct', 11: 'Nov'}


In [68]:
my_dict[8474] = "claudio"
print (my_dict)

{1: 777, 2: 'Feb', 3: 'Mar', 4: 'Apr', 5: 'May', 6: 'Jun', 7: 'Jul', 8: 'Aug', 9: 'Sep', 10: 'Oct', 11: 'Nov', 8474: 'claudio'}


In [69]:
print (my_dict.keys())

dict_keys([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 8474])


In [70]:
print (my_dict.values())

dict_values([777, 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'claudio'])


In [71]:
my_dict = {1:"Jan", 2:"Feb", 3:"Mar", 4:"Apr", 5:"May", 6:"Jun",
           7:"Jul", 8:"Aug", 9:"Sep", 10:"Oct", 11:"Nov", 12:"Dec"}

for k in my_dict:
    print (k)

1
2
3
4
5
6
7
8
9
10
11
12


In [72]:
my_dict = {1:"Jan", 2:"Feb", 3:"Mar", 4:"Apr", 5:"May", 6:"Jun",
           7:"Jul", 8:"Aug", 9:"Sep", 10:"Oct", 11:"Nov", 12:"Dec"}

for k,v in my_dict.items():
    print (k,v)

1 Jan
2 Feb
3 Mar
4 Apr
5 May
6 Jun
7 Jul
8 Aug
9 Sep
10 Oct
11 Nov
12 Dec


# Comprehensions

Creating lists by iterating through other lists.

In [73]:
my_list = [x**2 for x in range(10)]
print (my_list)

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]


In [74]:
my_list = [x**2 for x in range(10) if x%2==0]
print (my_list)

[0, 4, 16, 36, 64]


In [75]:
my_dict = {x:x**2 for x in range(10) if x%2==0}
print (my_dict)

{0: 0, 2: 4, 4: 16, 6: 36, 8: 64}


# Functions

Do not write code outside functions! 

Careful when passing lists as parameters.

You can return lists, tuples, sets, dictionaries.

Get used to move *stable* functions in a separate `module.py`.

In [None]:
def square(x):
    return x**2

print ( square(3) )

In [76]:
def powers(x,n):
    return [ x**i for i in range(n) ]

print ( powers(2,5) )

[1, 2, 4, 8, 16]


In [77]:
copy_f = powers

print ( copy_f(2,5) )

[1, 2, 4, 8, 16]


In [78]:
powers_3 = lambda x:powers(x,3)

print (powers_3(3))

[1, 3, 9]


In [79]:
a = [1,-2,3,-4,5,-6]

print (sorted(a))

print (sorted(a, key=lambda x:abs(x)))

[-6, -4, -2, 1, 3, 5]
[1, -2, 3, -4, 5, -6]


In [80]:
def add1(x):
    x+=1
    return x

y = 10
z = add1(y)
print( y,z )

10 11


In [81]:
def add1(x):
    for i in range(len(x)):
        x[i] = x[i]+1
    return x


y = [1,2,3,4,5]
z = add1(y)
print( y,z )

[2, 3, 4, 5, 6] [2, 3, 4, 5, 6]


# JSON

JavaScript Object Notation, very popular in Web APIs.

In [17]:
import json

a  = {"key": 10}

# to string
s = json.dumps(a)

print (type(s))
print (s)


<class 'str'>
{"key": 10}


In [None]:
a = json.loads('{"key": 10}')

print (type(a))
print (a)

# Files

Check how to iterate through a file, and how to run shell commands in Jupyter.

In [85]:
out_file = open("test.txt", "w")
out_file.write("line 1\n")
out_file.write("line 2\n")
out_file.close()

In [18]:
!cat test.txt

"cat" non è riconosciuto come comando interno o esterno,
 un programma eseguibile o un file batch.


In [83]:
out_file = open("test.txt", "w")
print ("line 11", file=out_file)
print ("line 22", file=out_file)
out_file.close()

In [86]:
in_file = open("test.txt", "r")
line = in_file.readline()
print ( line )
line = in_file.readline()
print ( line )

in_file.close()

line 1

line 2



In [7]:
with open("test.txt", "r") as in_file:
    line = in_file.readline()
    print ( line )

line 11



In [8]:
with open("test.txt", "r") as in_file:
    for line in in_file:
        print ( line )

line 11

line 22



In [9]:
with open("test.txt", "r") as in_file:
    for line in in_file:
        print ( "**" + line + "** ")

**line 11
** 
**line 22
** 


In [10]:
with open("test.txt", "r") as in_file:
    for line in in_file:
        print (line, end="")

line 11
line 22


# JSON

In [None]:
import json

a  = {"key": 10}

# to string
with open("test.txt", "w") as out_file:
    json.dump(a,out_file)

!cat test.txt

In [None]:
with open("test.txt", "r") as in_file:
    b = json.load(in_file)
    print (type(b))
    print (b)

# Let's collect some stats from our data

I used excel to transform the data file from http://tennis-data.co.uk/alldata.php into a csv file.

In [1]:
!ls datasets/tennis/

"ls" non è riconosciuto come comando interno o esterno,
 un programma eseguibile o un file batch.


In [2]:
!head datasets/tennis/2019.csv

"head" non è riconosciuto come comando interno o esterno,
 un programma eseguibile o un file batch.


In [2]:
def load_data(data_file):
    # read text lines
    raw_lines = []
    with open(data_file) as f:
        raw_lines = [line.strip() for line in f]
    
    # extract header
    header = raw_lines[0]
    fields = header.split(";")
    
    # put data into a "transposed" dictionary
    data = { c:[] for c in fields }
    for line in raw_lines[1:]:
        values = line.split(";")
        for c,v in zip(fields, values):
            data[c] += [v]
    
    return data

In [3]:
dataset = "./datasets/tennis/2019.csv"

data = load_data(dataset)
print ( data.keys() )

dict_keys(['ATP', 'Location', 'Tournament', 'Date', 'Series', 'Court', 'Surface', 'Round', 'Best of', 'Winner', 'Loser', 'WRank', 'LRank', 'WPts', 'LPts', 'W1', 'L1', 'W2', 'L2', 'W3', 'L3', 'W4', 'L4', 'W5', 'L5', 'Wsets', 'Lsets', 'Comment', 'B365W', 'B365L', 'PSW', 'PSL', 'MaxW', 'MaxL', 'AvgW', 'AvgL'])


In [None]:
#print ( data["Location"] )

**Todo**: check `csv.reader` and `csv.dictreader`
 https://docs.python.org/3/library/csv.html.

## Now Try answering the following questions

 - What is the number of matches?
 - List the tournament names
 - List the player names
 - List the comments
 - Find player with most wins
 
 
*Do not look at the answers below!*

In [18]:
print (len(data['ATP']))

2234


In [17]:
print (set(data['Tournament']))

{'ASB Classic', 'Hungarian Open', "Queen's Club Championships", 'Brisbane International', 'Antalya Open', 'Suisse Open Gstaad', 'Open 13', 'Geneva Open', 'Rosmalen Grass Court Championships', 'Hall of Fame Championships', 'Sofia Open', 'Open Sud de France', 'ABN AMRO World Tennis Tournament', 'Maharashtra Open', 'Halle Open', 'Abierto Mexicano', 'BNP Paribas Open', 'Sony Ericsson Open', 'SkiStar Swedish Open', 'Open Banco Sabadell ', 'Open de Moselle', 'French Open', 'Croatia Open', 'German Tennis Championships', 'St. Petersburg Open', 'Argentina Open', 'Delray Beach Open', 'Wimbledon', 'Winston-Salem Open at Wake Forest University', 'Citi Open', 'Monte Carlo Masters', "U.S. Men's Clay Court Championships", "Internazionali BNL d'Italia", 'Western & Southern Financial Group Masters', 'Grand Prix Hassan II', 'Sydney International', 'Cordoba Open', 'Australian Open', 'BMW Open', 'Rogers Masters', 'Rio Open', 'US Open', 'Lyon Open', 'Generali Open', 'New York Open', 'Dubai Tennis Champions

In [15]:
print (set(data['Winner']) | set(data['Loser']))

{'Murray A.', 'Altmaier D.', 'Fognini F.', 'Simon G.', 'Bachinger M.', 'Arguello F.', 'Lazarov A.', 'Kadhe A.', 'Delbonis F.', 'Li Z.', 'Garcia-Lopez G.', 'Lamasine T.', 'Djere L.', 'Eubanks C.', 'Berlocq C.', 'Schwartzman D.', 'Federer R.', 'Ramanathan R.', 'Brown D.', 'Ymer E.', 'Gromley C.', 'Kokkinakis T.', 'Verdasco F.', 'Muller A.', 'Gomez L.', 'Mcdonald M.', 'Paire B.', 'Ward J.', 'Bolelli S.', 'Hanfmann Y.', 'Bolt A.', 'Basilashvili N.', 'Jung J.', 'Lestienne C.', 'Cachin P.', 'Giannessi A.', 'Paul T.', 'Krstin P.', 'Otte O.', 'Jarry N.', 'Martin A.', 'Lin J.M.', 'Sugita Y.', 'Ramos-Vinolas A.', 'Galan D.E.', 'Tsitsipas S.', 'Albot R.', 'Koepfer D.', 'Lopez Villasenor G.', 'Thompson J.', 'Torebko P.', 'Weintraub A.', 'Del Potro J.M.', 'Skugor F.', 'Travaglia S.', 'Rublev A.', 'Uchiyama Y.', 'Evans D.', 'Gasquet R.', 'Monteiro T.', 'Coric B.', 'Escobedo E.', 'Mayer L.', 'Zverev M.', 'Kwiatkowski T.S.', 'Dimitrov G.', 'Rubin N.', 'Tiafoe F.', 'Gerasimov E.', 'Munar J.', 'Kubler J

In [16]:
print (set(data['Comment']))

{'Retired', 'Sched', 'Awarded', 'Completed', 'Walkover'}


In [8]:
maximum = 0
person = ''
for x in set(data['Winner']):
    count = data['Winner'].count(x)
    if count >= maximum:
        maximum = count
        person = x
print (person + ': ' + str(maximum))

Medvedev D.: 53


## What is the number of matches?

In [None]:
print ( len(data["ATP"]) )

## List the tournament names

In [None]:
tournament_names = set(data["Tournament"])
print ( len(tournament_names) )
print (tournament_names)

## List the player names

In [None]:
player_names = set(data["Winner"]) | set(data["Loser"])
print ( len(player_names) )
print (player_names)

## List the comments

In [None]:
comments = set(data["Comment"])
print ( len(comments) )
print (comments)

## Find player with most wins

In [4]:
def most_wins(data):
    # mapping playes : number of wins
    wins = {}
    for p in data["Winner"]:
        if p not in wins:
            wins[p] = 0
        wins[p] += 1
    
    # create a list (#wins, player)
    wins = [ (w,p) for p,w in wins.items() ]
    best_player = max(wins)
    
    return best_player
    
best_wins, best_player = most_wins(data)

print ( "The best playes is", best_player, "with", best_wins, "victories." )

The best playes is Medvedev D. with 53 victories.


## Start thinking and evaluating your baseline strategy !

For instance:
 - how many times the player with the best ranking won the match?
 - how much would you gain or lose by always betting 1€ on the best ranked player?

In [1]:
import pandas as pd

dataset_file = './datasets/tennis/2019.xlsx'
df = pd.read_excel(dataset_file) 

# QUESTION 1

n_rows = (df[df['WRank']>df['LRank']]).shape[0]

print(n_rows*100/(df.shape[0]))

# QUESTION 2

38.00358102059087
