# Correction exercices cours S1 - S2

In [1]:
import pandas as pd
import sqlite3

In [2]:
c=sqlite3.connect('data/european-soccer.sqlite')
cursor = c.cursor()

In [3]:
def exe(cursor: object, query: 'string'):
    cursor.execute(query)
    for row in  cursor.fetchall():
        print(row)

## 1. How many matches where played in Belgium ?

Countries names and matches are stored in two different tables. We need to do a join to cross those informations. If we don’t know how to do a join, we can simply get the `Country_id` for Belgium in the `Country` table (by hand) and select the matches with this country id. 

In [4]:
countries = 'SELECT * FROM Country'
exe(cursor, countries)

(1, 'Belgium')
(1729, 'England')
(4769, 'France')
(7809, 'Germany')
(10257, 'Italy')
(13274, 'Netherlands')
(15722, 'Poland')
(17642, 'Portugal')
(19694, 'Scotland')
(21518, 'Spain')
(24558, 'Switzerland')


Belgium country id is 1 :

In [5]:
matches_belgium = 'SELECT COUNT(*) FROM Match WHERE Country_id = 1'
exe(cursor, matches_belgium)

(1728,)


Second method, if we know how to join tables :

In [6]:
matches_belgium_join = '''
SELECT 
    COUNT(*) 
FROM 
    Match AS m
JOIN 
    Country AS c 
    ON 
        m.Country_id = c.id 
WHERE 
    c.name = 'Belgium';
'''
exe(cursor,matches_belgium_join)

(1728,)


## 2. How many matches where played in Belgium or France ?

We can use either methods seen in 1, with a `OR` condition :

In [7]:
matches_belgium_or_france = '''
SELECT 
    count(*) 
FROM 
    Match AS m
JOIN 
    Country AS c 
    ON 
        m.Country_id = c.id 
WHERE 
    c.name = 'Belgium' 
    OR 
        c.name = 'France';
'''
exe(cursor, matches_belgium_or_france)

(4768,)


## 3. What is the average weight of the 20 tallest player, and same for the 20 shortest ?

First, let’s have a look on the `Player` table ?

In [8]:
players = 'SELECT * FROM Player LIMIT 10'
exe(cursor, players)

(1, 505942, 'Aaron Appindangoye', 218353, '1992-02-29 00:00:00', 182.88, 187)
(2, 155782, 'Aaron Cresswell', 189615, '1989-12-15 00:00:00', 170.18, 146)
(3, 162549, 'Aaron Doran', 186170, '1991-05-13 00:00:00', 170.18, 163)
(4, 30572, 'Aaron Galindo', 140161, '1982-05-08 00:00:00', 182.88, 198)
(5, 23780, 'Aaron Hughes', 17725, '1979-11-08 00:00:00', 182.88, 154)
(6, 27316, 'Aaron Hunt', 158138, '1986-09-04 00:00:00', 182.88, 161)
(7, 564793, 'Aaron Kuhl', 221280, '1996-01-30 00:00:00', 172.72, 146)
(8, 30895, 'Aaron Lennon', 152747, '1987-04-16 00:00:00', 165.1, 139)
(9, 528212, 'Aaron Lennox', 206592, '1993-02-19 00:00:00', 190.5, 181)
(10, 101042, 'Aaron Meijers', 188621, '1987-10-28 00:00:00', 175.26, 170)


First, get the weights of the 20 tallest players

In [11]:
query = '''
SELECT AVG(weight) FROM (
SELECT weight, height FROM player 
ORDER BY height desc 
limit 20)
'''
exe(cursor, query)

(200.9,)


How do we calculate the average of thoses values ?
It is not possible in one request with SQLite, because you can’t use an aggregate function like `AVG()` *after* the `LIMIT` keyword.

First method to deal with this limitation is to use Python : we got the individual weights with a SQL request, we can calculate the average with Python.

In [17]:
query = '''
SELECT weight, height FROM player 
ORDER BY height desc 
limit 20
'''
cursor.execute(query)
poid=[]
for row in  cursor.fetchall():
    poid.append(row[0])
print(sum(poid)/len(poid))

200.9


Second method is to use sub-requests (method learned in the S2 course).

To get the average weight of the 20 shortest players, just do the same, ordering this time the heights in crescent order :

## 3. What are the birthdates of players named Adil ?

This one is easy. Manage `TEXT` type with `LIKE`. If you know other string functions like `SUBSTR()` you can also format the output.

In [28]:
dn='''
SELECT SUBSTR(birthday,1,10) FROM player WHERE player_name LIKE 'Adil %'
'''
exe(cursor,dn)

('1986-10-06',)
('1988-02-21',)
('1986-06-27',)
('1985-12-27',)
('1977-07-14',)


## 4. What is the average weight of players named Sylvain ?
This time, you can use an aggregate function with a `WHERE` clause…

In [29]:
pm= '''
SELECT AVG(weight) FROM player WHERE player_name LIKE 'Sylvain %'
'''
exe(cursor,pm)

(174.66666666666666,)


## 5. How many players have their names derived from Thomas (Tomas, Tomi, etc.) ?

First let’s explore and look at what names starting with tom... or thom... look like :

In [38]:
name='''
SELECT COUNT (*) FROM player WHERE (player_name LIKE "Tom%" OR player_name LIKE "Thom%") AND player_name IS NOT "Tomoaki Makino"
'''
exe(cursor, name)

(134,)


It seems that some names begining by tom… are abviously not derived from Thomas (Tomoaki, Tomane…). We should refine our test condition :

## 6. How many matches where played in each country ? In each league ?

This time it seems we should use `GROUP BY`

In [44]:
query = '''
SELECT COUNT (*) FROM MATCH GROUP BY country_id
'''
exe(cursor, query)


(1728,)
(3040,)
(3040,)
(2448,)
(3017,)
(2448,)
(1920,)
(2052,)
(1824,)
(3040,)
(1422,)


If we know how to do a join, we could replace `Country_id` by countries name :

In [45]:
query = '''
SELECT COUNT (*), Country.name FROM Match JOIN Country ON Country.id = Match.country_id  GROUP BY country_id
'''
exe(cursor, query)

(1728, 'Belgium')
(3040, 'England')
(3040, 'France')
(2448, 'Germany')
(3017, 'Italy')
(2448, 'Netherlands')
(1920, 'Poland')
(2052, 'Portugal')
(1824, 'Scotland')
(3040, 'Spain')
(1422, 'Switzerland')


Present the precedent results by descendant numbers of matches

In [46]:
query = '''
SELECT COUNT (*) AS m_nb, Country.name FROM Match JOIN Country ON Country.id = Match.country_id  GROUP BY country_id ORDER BY m_nb DESC
'''
exe(cursor, query)

(3040, 'Spain')
(3040, 'France')
(3040, 'England')
(3017, 'Italy')
(2448, 'Netherlands')
(2448, 'Germany')
(2052, 'Portugal')
(1920, 'Poland')
(1824, 'Scotland')
(1728, 'Belgium')
(1422, 'Switzerland')


Matches for each league :

In [47]:
query = '''
SELECT COUNT (*) AS m_nb, League.name FROM Match JOIN League ON League.id = Match.league_id  GROUP BY League.name ORDER BY m_nb DESC
'''
exe(cursor, query)

(3040, 'Spain LIGA BBVA')
(3040, 'France Ligue 1')
(3040, 'England Premier League')
(3017, 'Italy Serie A')
(2448, 'Netherlands Eredivisie')
(2448, 'Germany 1. Bundesliga')
(2052, 'Portugal Liga ZON Sagres')
(1920, 'Poland Ekstraklasa')
(1824, 'Scotland Premier League')
(1728, 'Belgium Jupiler League')
(1422, 'Switzerland Super League')


## 7. Who are the 10 players with the best ratings ?

As usual, let’s take a look at the content of the table concerned :

In [53]:
query= '''
SELECT player_name, Player_Attributes.overall_rating AS nb FROM Player JOIN Player_Attributes ON Player.id = Player_Attributes.id ORDER BY nb DESC LIMIT 10
'''
exe(cursor,query)

('Manu Molina', 91)
('Manu Torres', 91)
('Fede Vico', 89)
('Manu Lanzarote', 88)
('Lorenzo Pique', 87)
('Lorenzo Squizzi', 87)
('Lorenzo Stovini', 87)
('Lorenzo Tonelli', 87)
('Miguel Portillo', 87)
('Faysel Kasmi', 86)


It appears that a player has several ratings. Therefore, we should : 
1. group average ratings by player
2. order them by ratings
3. limit results to 10
4. optional : cross players id with players names

If we know how to do a join, we can display players name rather than their api id :

## 8. For each league, how many matches where played ? Order your response by countries name.

## 9. Make an inner join between the `Country` table and the `League_null` table. Note the difference : which lines disappeared ?

## 10. Write RIGHT JOIN clauses (inclusive and exclusive)

## 11. Try to implement a full join 

## 12. Cross join
1. Create a new data base called `club.sqlite`
2. Create a table `Members` with id, name (you can quickly create data by creating a `.csv` file, importing it and converting it to a table with `pandas`.)
3. Insert some members (3 or 4)
4. Create a table `Reunions` with id, reunion_date (you can set default to current datetime to save time)
5. Insert some reunions : you can save time by writing a script to fill the table with loop…
6. Write a query that produce a matrix with every reunion for each members (`CROSS JOIN`) ordered by date

## 13. Self join

Create a detailed table employee with such relation manager/managed. You can imagine other situation in which SELF JOIN would be pertinent. Create a request that shows for a list of employee their manager (take care of the appearance of the results, ordered by manager, you can use string function to present a phrase like : « X manages Y »).