# SQL basics (2)

In [1]:
import sqlite3
import pandas as pd

In [2]:
conn = sqlite3.connect('data/european-soccer.sqlite')
c = conn.cursor()

In [7]:
def exe(cursor: object, query: 'string'):
    cursor.execute(query)
    for row in  cursor.fetchall():
        print(row)

In [4]:
query = '''
SELECT *
FROM Country;
'''

exe(c, query)

(1, 'Belgium')
(1729, 'England')
(4769, 'France')
(7809, 'Germany')
(10257, 'Italy')
(13274, 'Netherlands')
(15722, 'Poland')
(17642, 'Portugal')
(19694, 'Scotland')
(21518, 'Spain')
(24558, 'Switzerland')


##### Explore the database

Having the possibility of asking the list of the tables and their columns in a database is useful. Halas there is no standardised instructions to do that in SQL, access the metadata of the database is rather an action executed at the level of the DBMS. Each DBMS has a different method. For exemple the commands – after logging into user account – `\dt` or `\dt+` in PostgreSQL, `show tables;` in MySQL, `.tables` in SQLite, `SELECT table_name FROM dba_tables;` in Oracle, `db2 list tables for schema schema_name` in IBM db2 or `SELECT * FROM information_schema.tables;` in Microsoft SQL Server. 

When using SQLite with Python there are other method to get those metadata, some or them are powerful, but you should expect to learn specific instructions that fit the environment and DBMS you will be working with in your project, company or client. Keep in mind that if even if SQLite is widely used – and that’s why you should know it –, it is not designed for a scalable production environment.

### `PRAGMA` statements

SQLite uses [PRAGMAs](https://www.sqlite.org/pragma.html) to define or changes some behavior of the engine. For example, the instruction `PRAGMA case_sensitive_like = TRUE` makes SQLite engine case sensitive (when using `LIKE` for example). You can also use `PRAGMA` to access to metadatas.

#### Get the list of tables

In [5]:
query = '''
PRAGMA table_list;
'''

exe(c, query)

('main', 'Team_Attributes', 'table', 25, 0, 0)
('main', 'Team', 'table', 5, 0, 0)
('main', 'Country', 'table', 2, 0, 0)
('main', 'League', 'table', 3, 0, 0)
('main', 'Match', 'table', 115, 0, 0)
('main', 'Player', 'table', 7, 0, 0)
('main', 'Player_Attributes', 'table', 42, 0, 0)
('main', 'sqlite_sequence', 'table', 2, 0, 0)
('main', 'sqlite_schema', 'table', 5, 0, 0)
('temp', 'sqlite_temp_schema', 'table', 5, 0, 0)


1. The first column is the [schema](https://database.guide/what-is-a-database-schema/) to which the table belong
2. The 2d column is the tables names
3. The 3d column is the table type. Most common are `table` and `view`. A view is a query written that is saved as… a view
4. The 4th column is the number of fields in tables
5. The 5th and 6th columns are options defined at the creation of the tables, no need to go that deep for the moment in this introductory course.

#### Get columns metadatas

The `PRAGMA table_info(<table_name>)` is used to get infos about columns metadatas of the table passed as argument :

In [6]:
query = '''
PRAGMA table_info(Country)
'''

exe(c, query)

(0, 'id', 'INTEGER', 0, None, 1)
(1, 'name', 'TEXT', 0, None, 0)


1. The first column is the columns indexes
2. The 2d is the columns names
3. The 3d is the columns types
4. The 4th indicates if the columns can be NULL (0 = False = no, 1 = True = yes)
5. The 5th gives the columns default values (when defined)
6. The 6th indicates if the columns are primary keys

### Access to metadatas with Python

The Cursor() class has a `.description` attribute that corresponds to `table_info()` and thus contains the columns name :

In [7]:
c.description

(('cid', None, None, None, None, None, None),
 ('name', None, None, None, None, None, None),
 ('type', None, None, None, None, None, None),
 ('notnull', None, None, None, None, None, None),
 ('dflt_value', None, None, None, None, None, None),
 ('pk', None, None, None, None, None, None))

It contains nothing until you place the cursor on a table by sending it a query (we will add a `LIMIT 0` to the query : we don’t really want an output, just place the cursor on the table we want to inspect).

In [8]:
c.execute('SELECT * FROM Country LIMIT 0')
c.description

(('id', None, None, None, None, None, None),
 ('name', None, None, None, None, None, None))

We can extract column names with a comprehensive list to get a one-liner (we could have done the same with the `PRAGMA` method, of course) :

In [9]:
columns_names = [description[0] for description in c.description]
columns_names

['id', 'name']

The ability to get the columns list from within our script (whether via `PRAGMA` or Cursor() class) give us the possibility to adapt dynamically our queries to the current state of the database. There is numerous usecases when an app has to change the schema of a database it uses. In this case we can easily adapt automatically our queries to the modified schema. For exemple we don’t even know all the fields in the `Matches` table, we can automatically generate a query calling those columns :

In [10]:
c.execute('SELECT * FROM Match LIMIT 0')
columns_names = [description[0] for description in c.description]
query = 'SELECT ' + ', '.join(columns_names) + ' FROM Match'
print(query)

SELECT id, country_id, league_id, season, stage, date, match_api_id, home_team_api_id, away_team_api_id, home_team_goal, away_team_goal, home_player_X1, home_player_X2, home_player_X3, home_player_X4, home_player_X5, home_player_X6, home_player_X7, home_player_X8, home_player_X9, home_player_X10, home_player_X11, away_player_X1, away_player_X2, away_player_X3, away_player_X4, away_player_X5, away_player_X6, away_player_X7, away_player_X8, away_player_X9, away_player_X10, away_player_X11, home_player_Y1, home_player_Y2, home_player_Y3, home_player_Y4, home_player_Y5, home_player_Y6, home_player_Y7, home_player_Y8, home_player_Y9, home_player_Y10, home_player_Y11, away_player_Y1, away_player_Y2, away_player_Y3, away_player_Y4, away_player_Y5, away_player_Y6, away_player_Y7, away_player_Y8, away_player_Y9, away_player_Y10, away_player_Y11, home_player_1, home_player_2, home_player_3, home_player_4, home_player_5, home_player_6, home_player_7, home_player_8, home_player_9, home_player_10, 

Of course selecting all columns that way is useless, but we could write a more elaborated script that select only the columns beginning by `'home_'` for example. Try to do this as an exercise ! (modify the line where columns_names is assigned).

In [11]:
# your code here




Another exercice : create a function to get tables in a data base, and another to get columns in a table

In [12]:
# your code here




Test your function, and then close the connector and cursor.

In [13]:
c.close()
conn.close()

## Create a database

### From scratch

Let’s create a database about movies. The database will be made of two tables : the table movies, which contains information like Title, budget, vote obtained… and the table credits, which contains information about the cast, the producer…

Here the ERD :



The steps to create a database from scratch are :

1. Create the base along with a connector
2. Create a cursor
3. Execute `CREATE TABLE` instructions, specifying fields/column names, their type, and if needed if a column is a `PRIMARY KEY` and other options
4. In comparison of queries, you don’t just execute a `CREATE TABLE` instruction, you hav to "commit" it, transmit it to the database via the connector with the `.commit()` methods
5. Once the table is created, it’s just a container : content (data) must be gathered and inserted into the database wit the `INSERT INTO` instructions indicating the `VALUES` to be inserted

Note : SQLite has very limited dataypes (5 classes) : `NULL`, `INTEGER`, `REAL`(float), `TEXT` and `BLOB`, while you have aroud 45 types in PostgreSQL for example, several types of int, float, char, byte types, and more structured types like timestamp, geometry and even XML, JSON, IP address… Such precision about the type (for example int of different sizes : from two bytes to height) improves security and efficiency : it saves a lot of space to use int on 2 bytes rather 8 if it represents small enough values…

Blobs is for data stored exactly as it was input. Dates can be stored as int, real or text, according to different methods and functions available, respectivelly as Unix Time (number of seconds since first of january 1970), as Julian day number or ISO8601 strings (`'YYYY-MM-DD HH:MM:SS,SSS'`)

In [14]:
# connector creation
conn_mdb = sqlite3.connect('data/movies.db')
# cursor creation
c_mdb = conn_mdb.cursor()

Instructions to create the `Movies` table :

In [15]:
creation_instructions = '''
CREATE TABLE IF NOT EXISTS Movies(
    Id INTEGER PRIMARY KEY AUTOINCREMENT UNIQUE,
    Title TEXT,
    Date TEXT,
    Duration INTEGER,
    Budget INTEGER,
    First_week_viewers INTEGER,
    Votes REAl
    );
'''

Some keywords explanations :

* `IF NOT EXISTS` : the table is created only if it doesn’t exist yet
* `PRIMARY KEY` : identifies the field as a primary key (thank you captain obvious !)
* `AUTOINCREMENT` : if a new line is created or added, the primary key (PK) value will be automatically incremented by one
* `UNIQUE` : puts a constraint ont this field, each Id value must be unique (by definition of a PK) so if you add a line where the Id is a value already used, it will return an error. A lot of keyword at this stage are just constrains to avoid mistake or incoherence.

Once the isntructions are written, execute those instructions and commit them to the database :


In [16]:
c_mdb.execute(creation_instructions)
conn_mdb.commit()

Verify that we now have a database and a `Movies` table (replace the cell with a call to your function written in the exercices above):

In [17]:
query = '''
PRAGMA table_list;
'''

exe(c_mdb, query)

('main', 'Movies', 'table', 7, 0, 0)
('main', 'sqlite_sequence', 'table', 2, 0, 0)
('main', 'sqlite_schema', 'table', 5, 0, 0)
('temp', 'sqlite_temp_schema', 'table', 5, 0, 0)


Now verify the columns of the `Movies` table (replace the cell with a call to your other function written in the exercices above):

In [18]:
query = '''
PRAGMA table_info(Movies);
'''

exe(c_mdb, query)

(0, 'Id', 'INTEGER', 0, None, 1)
(1, 'Title', 'TEXT', 0, None, 0)
(2, 'Date', 'TEXT', 0, None, 0)
(3, 'Duration', 'INTEGER', 0, None, 0)
(4, 'Budget', 'INTEGER', 0, None, 0)
(5, 'First_week_viewers', 'INTEGER', 0, None, 0)
(6, 'Votes', 'REAL', 0, None, 0)


Let’s put some data in it ! Remember : a record (a line) in a table is a *tuple*, data will be a list of tuple :

In [19]:
data = [(1, 'A good movie', '2024-12-04', 120, 2000000, 259023, 4.36),
        (2, 'Another good movie, slightly better', '2024-12-05', 110, 500000, 354352, 4.63),
        (3, 'A bad movie, but with some success', '1985-01-01', 84, 600000, 165904, 4.26),
        (4, 'A very bad movie', '2005-04-25', 93, 1000000, 235, 2.86),
        (5, 'A not so bad movie', '2019-03-23', 104, 1500000, 40334, 3.86)]

Data will be inserted line by line (using a for loop) with the command `INSERT INTO`:

In [20]:
for d in data:
    c_mdb.execute('''
    INSERT INTO Movies(id, Title, Date, Duration, Budget, First_week_viewers, Votes)
    VALUES(?, ?, ?, ?, ?, ?, ?);''', d)

Don’t forget to commit:

In [21]:
conn_mdb.commit()

Of course we don’t forget to close connector and cursor :

In [22]:
c_mdb.close()
conn_mdb.close()

Let’s verify nothing went wrong by printing the content of the table : 

In [23]:
# re-open the database
conn_mdb = sqlite3.connect('data/movies.db')
c_mdb = conn_mdb.cursor()

#query all the lines
query = 'SELECT * FROM Movies'
exe(c_mdb, query)

(1, 'A good movie', '2024-12-04', 120, 2000000, 259023, 4.36)
(2, 'Another good movie, slightly better', '2024-12-05', 110, 500000, 354352, 4.63)
(3, 'A bad movie, but with some success', '1985-01-01', 84, 600000, 165904, 4.26)
(4, 'A very bad movie', '2005-04-25', 93, 1000000, 235, 2.86)
(5, 'A not so bad movie', '2019-03-23', 104, 1500000, 40334, 3.86)


### From a .csv file

If it can be in some situation convenient to add few data to a table by hardcoding them in a script, it becomes quickly boring and confusing. In most cases, data are availables in flatfiles like `.csv` that favorises accessibility by its simplicity. Pull the data from a `.csv` in a database secures and improves the efficiency of data processing and diffusion.

The `Credits` table contains 8 columns :
```
Id;Movie_id;Direction;Producer;Studio;Playscreen;Cast;Country
1;3;"Big director";"Big producer";"Big studio";"Big screenwriter";"Big Actor 1, Big Actor 2, Other big actors";"Big country"
2;1;"Unknown director";"Unknown producteur";"Unknown studio";"Unknown screenwriter";"Unknown actor 1, Unknown acteur 2, Other unknown actors";"Unknown country"
3;2;"Small director";"Small producer";"Small studio";"Small screenwriter";"Small actor 1, Small actor 2, Small other actors";"Small country"
4;5;"Acceptable director";"Acceptable producer";"Acceptable studio";"Acceptable screenwriter";"Acceptable actor 1, Acceptable actor 2, Other acceptable actors";"Acceptable country"
5;4;"Incompetent director";"Incompetent producer";"Incompetent studio"; "Incompetent screenwriter";"Incompetente actor 1, Incompetent actor 2, Other incompetent actors";"Incompetent country"
```

Let’s see how we can get the data for our second table, `Credits`, from this `.csv` file. The difficult part is that we have to parse data (identify which part of the files corresponds to what information and where it should go in the database). Each line must be decomposed in values matching the fields and their types. Everything in the file will be loaded as a string type ! We’ll have to split and cast. 

1. Declare data as an empty list : it will be a list of lists, each column will be stored as list
2. Open the `data/credits.csv` file, get rid of the headers, reading it line by line, split each line (separator is `';'` in our example) and
3. Create the Movies table. Declare all the column as before, indicating their type, even the column that will be the foreign key (`Movies_id`). We’ll just add the constraint `NOT NULL` to this declaration, to be sure that we dont have credits records in this table and no related movie in the table `Movies`.
4. To declare the `Movies_id` column as a foreign key (FK), we write another line with the keywords `FOREIGN KEY`, specifying between parentheses which column is the FK, then indicate with the keyword `REFERENCES` to which table and PK it relates :

In [24]:
data = [] 
with open('data/credits.csv', 'r') as f:
    f.readline() # get rid of the first line containing headers - we could test if headers match columns name
    for line in f:
        data.append(line.rstrip().split(';')) 

for d in data:
    d[0] = int(d[0]) # cast the id (primary key) to int 
    d[1] = int(d[1]) # cast the movie_id (foreign key) to int 

# creation instructions 
create_moviestable = '''
CREATE TABLE IF NOT EXISTS Credits(
    Id INTEGER PRIMARY KEY AUTOINCREMENT UNIQUE,
    Movie_id INTEGER NOT NULL,
    Direction TEXT,
    Producer TEXT,
    Studio TEXT,
    Playscreen TEXT,
    Cast TEXT,
    Country TEXT,
    FOREIGN KEY (Movie_id)
        REFERENCES Movies (Id);
)
''' 

# table creation
c_mdb.execute(create_moviestable)
conn_mdb.commit()

# pull data in database

# instructions
insert_instructions = '''
INSERT INTO Credits(Id,
                    Movie_id,
                    Direction,
                    Producer,
                    Studio,
                    Playscreen,
                    Cast,
                    Country)
    VALUES(?, ?, ?, ?, ?, ?, ?, ?);
'''
for d in data:
    c_mdb.execute(insert_instructions, d)
conn_mdb.commit()

In [27]:
query = 'SELECT * FROM Credits'
exe(c_mdb, query)

(1, 3, '"Big director"', '"Big producer"', '"Big studio"', '"Big screenwriter"', '"Big Actor 1, Big Actor 2, Other big actors"', '"Big country"')
(2, 1, '"Unknown director"', '"Unknown producteur"', '"Unknown studio"', '"Unknown screenwriter"', '"Unknown actor 1, Unknown acteur 2, Other unknown actors"', '"Unknown country"')
(3, 2, '"Small director"', '"Small producer"', '"Small studio"', '"Small screenwriter"', '"Small actor 1, Small actor 2, Small other actors"', '"Small country"')
(4, 5, '"Acceptable director"', '"Acceptable producer"', '"Acceptable studio"', '"Acceptable screenwriter"', '"Acceptable actor 1, Acceptable actor 2, Other acceptable actors"', '"Acceptable country"')
(5, 4, '"Incompetent director"', '"Incompetent producer"', '"Incompetent studio"', ' "Incompetent screenwriter"', '"Incompetente actor 1, Incompetent actor 2, Other incompetent actors"', '"Incompetent country"')


Of course we close connector and cursor :

In [28]:
c_mdb.close()
conn_mdb.close()

### From a dataframe

We will process most of the data using dataframes. Moreover, `pandas` allows to load data from multiple sources − saving us from writing boilerplate to load different formats and address a lot of exceptions, difficulties, etc. Therefore it is important to know how to push data to a database directly from a dataframe.
`df.to_sql(<table name>, <connector name>)` is a method to , providing the table name and connector to the database. Providing wo other optionnal arguments may be useful : `if_exists='replace'` to replace the table if a table with the same name already exists (`'fail'` by default, you can also chose `'append'`), and `index=False` if the dataframe index has no interest (`True` by defaults, a column `index_label` will be created).

1. load data in a dataframe (if it does not already exists)
2. create a connector to a database (will be created if doesn’t exist yet)
3. use the method `.to_sql()`. Returns the number of rows written.

Let’s forget mock-up data for this part, and download real data on [Kaggle](https://www.kaggle.com/datasets/tmdb/tmdb-movie-metadata?resource=download). Take the time to read context and metadata.

In [5]:
df = pd.read_csv("data/TMDB-moviedatabase/tmdb_5000_movies.csv")

conn_mdb = sqlite3.connect('data/tmdb_movies.db')
df.to_sql('Movies', conn_mdb, if_exists='replace', index=False)

4803

In [10]:
c_mdb = conn_mdb.cursor()
show = 'SELECT * FROM Movies LIMIT 1' # always put a limit 
exe(c_mdb, show)

(237000000, '[{"id": 28, "name": "Action"}, {"id": 12, "name": "Adventure"}, {"id": 14, "name": "Fantasy"}, {"id": 878, "name": "Science Fiction"}]', 'http://www.avatarmovie.com/', 19995, '[{"id": 1463, "name": "culture clash"}, {"id": 2964, "name": "future"}, {"id": 3386, "name": "space war"}, {"id": 3388, "name": "space colony"}, {"id": 3679, "name": "society"}, {"id": 3801, "name": "space travel"}, {"id": 9685, "name": "futuristic"}, {"id": 9840, "name": "romance"}, {"id": 9882, "name": "space"}, {"id": 9951, "name": "alien"}, {"id": 10148, "name": "tribe"}, {"id": 10158, "name": "alien planet"}, {"id": 10987, "name": "cgi"}, {"id": 11399, "name": "marine"}, {"id": 13065, "name": "soldier"}, {"id": 14643, "name": "battle"}, {"id": 14720, "name": "love affair"}, {"id": 165431, "name": "anti war"}, {"id": 193554, "name": "power relations"}, {"id": 206690, "name": "mind and soul"}, {"id": 209714, "name": "3d"}]', 'en', 'Avatar', 'In the 22nd century, a paraplegic Marine is dispatched t

In [11]:
columns = 'PRAGMA table_info(Movies)' # always put a limit 
exe(c_mdb, columns)

(0, 'budget', 'INTEGER', 0, None, 0)
(1, 'genres', 'TEXT', 0, None, 0)
(2, 'homepage', 'TEXT', 0, None, 0)
(3, 'id', 'INTEGER', 0, None, 0)
(4, 'keywords', 'TEXT', 0, None, 0)
(5, 'original_language', 'TEXT', 0, None, 0)
(6, 'original_title', 'TEXT', 0, None, 0)
(7, 'overview', 'TEXT', 0, None, 0)
(8, 'popularity', 'REAL', 0, None, 0)
(9, 'production_companies', 'TEXT', 0, None, 0)
(10, 'production_countries', 'TEXT', 0, None, 0)
(11, 'release_date', 'TEXT', 0, None, 0)
(12, 'revenue', 'INTEGER', 0, None, 0)
(13, 'runtime', 'REAL', 0, None, 0)
(14, 'spoken_languages', 'TEXT', 0, None, 0)
(15, 'status', 'TEXT', 0, None, 0)
(16, 'tagline', 'TEXT', 0, None, 0)
(17, 'title', 'TEXT', 0, None, 0)
(18, 'vote_average', 'REAL', 0, None, 0)
(19, 'vote_count', 'INTEGER', 0, None, 0)


In [12]:
c_mdb.close()
conn_mdb.close()

Of course, the reverse is also possible, you can read data from a database and store it into a dataframe, with the `pd.read_sql()` method, providing it a query and a connector. A useful optional argument is `index_col`, column(s) to set as dataframe index :

In [19]:
conn_sdb = sqlite3.connect('data/european-soccer.sqlite')
c_sdb = conn_sdb.cursor()

selection = '''
SELECT *
FROM Country;
'''

df = pd.read_sql(select, conn_sdb, index_col='id')
df

Unnamed: 0_level_0,name
id,Unnamed: 1_level_1
1,Belgium
1729,England
4769,France
7809,Germany
10257,Italy
13274,Netherlands
15722,Poland
17642,Portugal
19694,Scotland
21518,Spain


With a more complex query : 

In [22]:
selection = '''
SELECT p.player_name, pa.overall_rating
FROM Player_Attributes AS pa
JOIN Player AS p ON pa.id = p.id
ORDER BY overall_rating DESC
LIMIT 20;
'''

df = pd.read_sql(selection, conn_sdb)
df

Unnamed: 0,player_name,overall_rating
0,Manu Molina,91
1,Manu Torres,91
2,Fede Vico,89
3,Manu Lanzarote,88
4,Lorenzo Pique,87
5,Lorenzo Squizzi,87
6,Lorenzo Stovini,87
7,Lorenzo Tonelli,87
8,Miguel Portillo,87
9,Faysel Kasmi,86


If you expect to need or plan to use Pandas and relationnal databases, you should get familiar with [SQLalchemy](https://www.sqlalchemy.org/), an [Object Relational Mapper](https://en.wikipedia.org/wiki/Object%E2%80%93relational_mapping). That’s true for any context implying a database with Python, actually.

## The big part : `join`

We do a join when we query several tables at once, tables beeing linked by relationships between keys, the primary key in one table being a foreign key in the other. A join operation selects records for which the primary key value in one table is equal to the value of a foreign key in another table, and output this matching data. That’s where the different kind of relationships (one-to-one, one-to-many, etc.) have an impact. 

In the first lecture, we outlined how much the underlying model of databases where formally inspired by set theory. Joins can be seen as operation on sets (of data). Each operation define a different way to cross-reference, or match, data from different tables. The different joins are :

* inner join
* left and right join
* outer join (or exclusive join)
* cross join
* self join

and we can represent most of them with Wenn diagramms.

![Wenn diagramm of joins](./images/sql_joins.png)

Those are theoretical joins, not all of them are implemented in all DBMS. For example `LEFT JOIN` is rarelly implemented as it is equivalent to a `RIGHT JOIN` (with the tables taken in reverse order). Same for the different `OUTER JOIN` as they can be implemented just by adding a `WHERE` clause. For example SQLite do not implement `FULL OUTER JOIN` (called here `EXCLUSIVE FULL JOIN`).

Let’s see the differences in details with SQLite.

In [5]:
conn_sdb = sqlite3.connect('data/european-soccer.sqlite')
c_sdb = conn_sdb.cursor()

### Inner Join

The default join. Only returns the elements where matching foreign and primary key. Element in B or A without counterparts in the other table are ignored.

Recall of the columns name :

In [13]:
query = '''
PRAGMA table_info(Country);
'''

exe(c_sdb, query)

(0, 'id', 'INTEGER', 0, None, 1)
(1, 'name', 'TEXT', 0, None, 0)


In [9]:
query = 'SELECT * FROM Country;'
exe(c_sdb, query)

(1, 'Belgium')
(1729, 'England')
(4769, 'France')
(7809, 'Germany')
(10257, 'Italy')
(13274, 'Netherlands')
(15722, 'Poland')
(17642, 'Portugal')
(19694, 'Scotland')
(21518, 'Spain')
(24558, 'Switzerland')


Recall of the columns name :

In [15]:
query = '''
PRAGMA table_info(League);
'''

exe(c_sdb, query)

(0, 'id', 'INTEGER', 0, None, 1)
(1, 'country_id', 'INTEGER', 0, None, 0)
(2, 'name', 'TEXT', 0, None, 0)


In [11]:
query = 'SELECT * FROM League'
exe(c_sdb, query)

(1, 1, 'Belgium Jupiler League')
(1729, 1729, 'England Premier League')
(4769, 4769, 'France Ligue 1')
(7809, 7809, 'Germany 1. Bundesliga')
(10257, 10257, 'Italy Serie A')
(13274, 13274, 'Netherlands Eredivisie')
(15722, 15722, 'Poland Ekstraklasa')
(17642, 17642, 'Portugal Liga ZON Sagres')
(19694, 19694, 'Scotland Premier League')
(21518, 21518, 'Spain LIGA BBVA')
(24558, 24558, 'Switzerland Super League')


Execution of the inner join to show which League takes place in which country :

In [27]:
inner_join = '''
SELECT l.name, c.name
FROM League AS l
JOIN Country AS c ON l.country_id = c.id;
'''
exe(c_sdb, inner_join)

('Belgium Jupiler League', 'Belgium')
('England Premier League', 'England')
('France Ligue 1', 'France')
('Germany 1. Bundesliga', 'Germany')
('Italy Serie A', 'Italy')
('Netherlands Eredivisie', 'Netherlands')
('Poland Ekstraklasa', 'Poland')
('Portugal Liga ZON Sagres', 'Portugal')
('Scotland Premier League', 'Scotland')
('Spain LIGA BBVA', 'Spain')
('Switzerland Super League', 'Switzerland')


A country without associated League would not have been listed here (same for league, but it would be strange to have a league not associated with a country). Let’s build a table League_null with missing countries. That will be an excuse to see how to copy a table, and to change some values.
1. Create a new table
2. Copy the value by selecting them in the original table 

In [20]:
create = '''
CREATE TABLE IF NOT EXISTS League_null
(
Id INTEGER,
Country_id INTEGER,
Name TEXT
);
'''
exe(c_sdb, create)
conn_sdb.commit()

copy = '''
INSERT INTO League_null
(
Id,
Country_id,
Name
)

SELECT Id, Country_id, Name
FROM League;
'''
exe(c_sdb, copy)
conn_sdb.commit()

In [22]:
show = 'SELECT * FROM League_null'
exe(c_sdb, query)

(1, 1, 'Belgium Jupiler League')
(1729, 1729, 'England Premier League')
(4769, 4769, 'France Ligue 1')
(7809, 7809, 'Germany 1. Bundesliga')
(10257, 10257, 'Italy Serie A')
(13274, 13274, 'Netherlands Eredivisie')
(15722, 15722, 'Poland Ekstraklasa')
(17642, 17642, 'Portugal Liga ZON Sagres')
(19694, 19694, 'Scotland Premier League')
(21518, 21518, 'Spain LIGA BBVA')
(24558, 24558, 'Switzerland Super League')


3. modifying some values (we set 2 values in `Country_id` at `NULL` :

In [24]:
modify = '''
UPDATE League_null
SET Country_id = NULL
WHERE Id = 1729 OR Id = 4769
ORDER BY Id
LIMIT -1;
'''
exe(c_sdb, modify)
conn_sdb.commit()
exe(c_sdb, show)

(1, 1, 'Belgium Jupiler League')
(1729, None, 'England Premier League')
(4769, None, 'France Ligue 1')
(7809, 7809, 'Germany 1. Bundesliga')
(10257, 10257, 'Italy Serie A')
(13274, 13274, 'Netherlands Eredivisie')
(15722, 15722, 'Poland Ekstraklasa')
(17642, 17642, 'Portugal Liga ZON Sagres')
(19694, 19694, 'Scotland Premier League')
(21518, 21518, 'Spain LIGA BBVA')
(24558, 24558, 'Switzerland Super League')


* `UPDATE <table name>` : this keyword introduce a modification of a value in the named table
* `SET <column_1> = <new_value_1>, <column_2> = <new_value_2>` : we set the new values (we can update several columns)
* `WHERE <conditions>` : select the lines to update according to our conditions
* `GROUP BY <column>` : **mandatory** as we don’t know in which order are stored the lines in the database, we have to ask an order in which the lines will be processed
* `LIMIT <number of lines` : **mandatory** we have to set a limit of the number of lines that will be processed. If we want to process all the rows, set a negative value : every row that meet the conditions will be updated.

Exercice : make an inner join between the `Country` table and the `League_null` table. Note the difference : which lines disappeared ?

In [30]:
# Your code here


### Left/Right Join

### Outer join

### Self join

## Sub-query : `WITH`

## Conditions (2) : `CASE WHEN`

## Window Function (`OVER`)

## Other SQL keywords





### `UPDATE`

### `DELETE`

### `UNION`

### String functions

## Exercices