<a href="https://colab.research.google.com/github/UniVR-DH/DBMS-course/blob/main/notebooks/lab03-duckdb.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# SQL exercise with DuckDB in Jupyter Notebooks
In this notebook we use DuckDB as a DBMS, plus we use some plugins to simplify your way to run SQL queries.

## Library Import and Configuration

In [1]:
!pip install --quiet duckdb
!pip install --quiet jupysql
!pip install --quiet duckdb-engine
!pip install --quiet pandas

In [2]:
import duckdb
import pandas as pd
# Import jupysql Jupyter extension to create SQL cells
# this avoids the need to run SQL in python
%load_ext sql

**We configure jupysql to return data as a Pandas dataframe and have less verbose output**

In [3]:
%config SqlMagic.autopandas = True
%config SqlMagic.feedback = False
%config SqlMagic.displaycon = False

## Initialize the Database

In [4]:
# Run DuckDB in main memory, remember to export to file
%sql duckdb:///:memory:
# If we want to save the DB to file we can use the following,
# but then we need to remember to download the file
# %sql duckdb:///myfile.db

An entire Jupyter cell can be used as a SQL cell by placing `%%sql` at the start of the cell. Query results will be displayed as a Pandas DF.

In [5]:
%%sql
SELECT 1=2 as test, 'Hello people' as message, 3*12345 as math  ;

Unnamed: 0,test,message,math
0,False,Hello people,37035


**We can use any CSV file**, we can add it to jupyter or download it from the web

In [6]:
!wget  -O irpef.regione.2024.csv https://www1.finanze.gov.it/finanze/analisi_stat/public/v_4_0_0/contenuti/REG_calcolo_irpef_2024.csv?d=1615465800
!wget  -O irpef.sesso.2024.csv https://www1.finanze.gov.it/finanze/analisi_stat/public/v_4_0_0/contenuti/sesso_calcolo_irpef_2024.csv?d=1615465800

--2025-11-20 12:39:33--  https://www1.finanze.gov.it/finanze/analisi_stat/public/v_4_0_0/contenuti/REG_calcolo_irpef_2024.csv?d=1615465800
Resolving www1.finanze.gov.it (www1.finanze.gov.it)... 217.175.52.178
Connecting to www1.finanze.gov.it (www1.finanze.gov.it)|217.175.52.178|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 194624 (190K) [text/csv]
Saving to: ‘irpef.regione.2024.csv’


2025-11-20 12:39:34 (370 KB/s) - ‘irpef.regione.2024.csv’ saved [194624/194624]

--2025-11-20 12:39:34--  https://www1.finanze.gov.it/finanze/analisi_stat/public/v_4_0_0/contenuti/sesso_calcolo_irpef_2024.csv?d=1615465800
Resolving www1.finanze.gov.it (www1.finanze.gov.it)... 217.175.52.178
Connecting to www1.finanze.gov.it (www1.finanze.gov.it)|217.175.52.178|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 22044 (22K) [text/csv]
Saving to: ‘irpef.sesso.2024.csv’


2025-11-20 12:39:35 (168 KB/s) - ‘irpef.sesso.2024.csv’ saved [22044/22044]



In [7]:
%%sql
SELECT *
FROM read_csv('irpef.regione.2024.csv', header=true, auto_detect=true)
LIMIT 10;

Unnamed: 0,Classi di reddito complessivo in euro,Regione,Numero contribuenti,Reddito complessivo - Frequenza,Reddito complessivo - Ammontare in euro,Reddito complessivo al netto della cedolare secca - Frequenza,Reddito complessivo al netto della cedolare secca - Ammontare in euro,Deduzione per abitazione principale - Frequenza,Deduzione per abitazione principale - Ammontare in euro,Oneri deducibili - Frequenza,...,Differenza - Frequenza,Differenza - Ammontare in euro,Eccedenza d'imposta risultante dalla precedente dichiarazione - Frequenza,Eccedenza d'imposta risultante dalla precedente dichiarazione - Ammontare in euro,Acconti versati - Frequenza,Acconti versati - Ammontare in euro,Irpef a credito - Frequenza,Irpef a credito - Ammontare in euro,Irpef a debito - Frequenza,Irpef a debito - Ammontare in euro
0,minore di -1.000,Piemonte,152.0,152.0,-2.565.663,152.0,-2.625.873,69.0,47.662,75.0,...,76.0,-329.201,70.0,666.133,,,112.0,1.021.709,0.0,0.0
1,da -1.000 a 0,Piemonte,53.0,53.0,-22.715,53.0,-22.715,20.0,11.090,27.0,...,10.0,-10.968,23.0,77.873,,,28.0,89.321,0.0,0.0
2,zero,Piemonte,85.41,0.0,0,0.0,0,0.0,0,6.591,...,1.294,-1.358.567,5.675,5.337.157,1.2,1.992.348,7.59,8.714.771,59.0,29.348
3,da 0 a 1.000,Piemonte,141.686,141.686,60.396.511,141.053,59.633.501,54.997,21.474.821,9.824,...,16.53,-1.916.819,6.814,6.609.629,3.321,3.045.579,15.349,11.817.548,6.699,327.446
4,da 1.000 a 1.500,Piemonte,34.992,34.992,43.455.671,34.539,42.106.068,8.605,6.296.115,2.505,...,7.368,-634.617,1.595,1.705.394,1.384,698.141,5.388,3.151.579,2.669,252.451
5,da 1.500 a 2.000,Piemonte,29.656,29.656,51.894.397,29.146,49.440.257,5.633,3.805.535,1.977,...,6.657,-876.305,1.212,1.210.843,1.141,668.142,5.135,2.807.682,2.028,260.045
6,da 2.000 a 2.500,Piemonte,26.388,26.388,59.397.838,25.678,55.557.875,4.859,2.754.442,1.821,...,6.394,-939.782,1.222,1.238.735,1.169,728.478,5.161,2.920.728,1.796,281.736
7,da 2.500 a 3.000,Piemonte,24.353,24.353,67.017.265,23.729,62.444.522,4.496,2.260.772,1.714,...,6.144,-1.170.343,1.076,909.266,1.049,762.668,5.031,2.774.464,1.632,296.634
8,da 3.000 a 3.500,Piemonte,22.801,22.801,74.124.075,22.39,69.952.704,4.168,2.080.273,1.813,...,6.245,-1.479.297,1.092,1.000.044,1.014,688.844,5.297,3.073.033,1.512,311.43
9,da 3.500 a 4.000,Piemonte,22.045,22.045,82.709.166,21.404,77.079.073,4.106,1.916.134,1.928,...,6.489,-1.667.360,1.074,1.081.212,970.0,692.550,5.397,3.342.558,1.608,400.866


In [8]:
%%sql
SELECT *
FROM read_csv('irpef.sesso.2024.csv', header=true, auto_detect=true)
LIMIT 10;

Unnamed: 0,Classi di reddito complessivo in euro,Sesso,Numero contribuenti,Reddito complessivo - Frequenza,Reddito complessivo - Ammontare in euro,Reddito complessivo al netto della cedolare secca - Frequenza,Reddito complessivo al netto della cedolare secca - Ammontare in euro,Deduzione per abitazione principale - Frequenza,Deduzione per abitazione principale - Ammontare in euro,Oneri deducibili - Frequenza,...,Differenza - Frequenza,Differenza - Ammontare in euro,Eccedenza d'imposta risultante dalla precedente dichiarazione - Frequenza,Eccedenza d'imposta risultante dalla precedente dichiarazione - Ammontare in euro,Acconti versati - Frequenza,Acconti versati - Ammontare in euro,Irpef a credito - Frequenza,Irpef a credito - Ammontare in euro,Irpef a debito - Frequenza,Irpef a debito - Ammontare in euro
0,minore di -1.000,Maschi,1.542,1.542,-17.102.700,1.542,-17.580.273,633.0,490.148,757.0,...,857.0,-4.327.869,781.0,5.346.184,132.0,733.188,1.182,10.422.880,,
1,da -1.000 a 0,Maschi,620.0,620.0,-263.406,620.0,-304.120,203.0,119.149,227.0,...,199.0,-380.691,285.0,1.087.917,35.0,91.166,387.0,1.559.427,,
2,zero,Maschi,670.362,0.0,0,0.0,0,,,54.58,...,11.219,-11.847.552,46.769,53.444.288,9.993,17.454.036,63.276,83.031.191,539.0,325.796
3,da 0 a 1.000,Maschi,985.306,985.306,426.389.300,981.773,422.724.226,324.404,127.973.613,73.263,...,95.871,-16.278.952,51.312,58.610.090,20.758,24.179.017,108.252,100.235.347,30.623,1.725.834
4,da 1.000 a 1.500,Maschi,247.408,247.408,307.421.998,244.94,301.135.929,50.466,37.818.719,18.136,...,43.355,-4.756.572,11.643,13.201.800,8.414,5.670.715,35.789,24.084.423,13.102,1.312.454
5,da 1.500 a 2.000,Maschi,203.54,203.54,356.339.225,200.936,346.592.293,30.026,20.503.816,13.899,...,40.351,-6.307.929,8.878,10.066.952,6.653,4.506.608,33.859,21.008.991,10.381,1.494.586
6,da 2.000 a 2.500,Maschi,177.689,177.689,400.180.779,174.587,385.941.091,23.595,13.676.228,12.526,...,39.721,-8.040.480,8.024,9.191.017,5.722,3.963.986,33.986,20.882.382,9.332,1.628.892
7,da 2.500 a 3.000,Maschi,164.554,164.554,453.110.419,161.157,434.547.619,21.88,11.455.524,12.037,...,40.398,-9.609.673,7.602,8.221.015,5.816,4.359.743,34.429,21.466.600,9.493,1.939.041
8,da 3.000 a 3.500,Maschi,151.461,151.461,492.558.279,149.386,474.556.252,20.857,10.105.001,11.985,...,41.951,-11.195.955,7.176,7.928.412,5.748,4.375.089,35.852,22.775.087,9.5,2.125.362
9,da 3.500 a 4.000,Maschi,146.523,146.523,549.653.236,143.079,523.599.921,20.194,9.508.470,12.066,...,43.489,-12.834.961,7.402,7.989.632,5.788,4.490.835,37.103,24.309.981,9.902,2.431.176


In [9]:
%%sql
-- In case of previous errors
ROLLBACK;

-- Create sequence for auto-increment
CREATE SEQUENCE classe_id_seq START 1;
CREATE SEQUENCE sesso_id_seq START 1;
CREATE SEQUENCE regione_id_seq START 1;

-- In case of previous errors
DROP TABLE IF EXISTS classe_codes;
DROP TABLE IF EXISTS sesso_codes;
DROP TABLE IF EXISTS regione_codes;


-- Create lookup tables with auto-increment ID
CREATE TABLE IF NOT EXISTS classe_codes (
    id INTEGER PRIMARY KEY DEFAULT nextval('classe_id_seq'),
    classe_name VARCHAR(255) UNIQUE
);

CREATE TABLE IF NOT EXISTS sesso_codes (
    id INTEGER PRIMARY KEY DEFAULT nextval('sesso_id_seq'),
    sesso_name VARCHAR(255) UNIQUE
);

CREATE TABLE IF NOT EXISTS regione_codes (
    id INTEGER PRIMARY KEY DEFAULT nextval('regione_id_seq'),
    regione_name VARCHAR(255) UNIQUE
);

-- Extract distinct names from CSV with auto-increment
INSERT INTO classe_codes (classe_name)
    SELECT DISTINCT "Classi di reddito complessivo in euro"
    FROM read_csv('irpef.sesso.2024.csv', header=true)
    ORDER BY "Classi di reddito complessivo in euro";


INSERT INTO sesso_codes (sesso_name)
    SELECT DISTINCT "Sesso"
    FROM read_csv('irpef.sesso.2024.csv', header=true)
    ORDER BY "Sesso";


INSERT INTO regione_codes (regione_name)
    SELECT DISTINCT "Regione"
    FROM read_csv('irpef.regione.2024.csv', header=true)
    ORDER BY "Regione";




SELECT * FROM sesso_codes;

Unnamed: 0,id,sesso_name
0,1,Femmine
1,2,Maschi


In [10]:
%%sql
-- Show the generated IDs
SELECT * FROM classe_codes;

Unnamed: 0,id,classe_name
0,1,da -1.000 a 0
1,2,da 0 a 1.000
2,3,da 1.000 a 1.500
3,4,da 1.500 a 2.000
4,5,da 10.000 a 12.000
5,6,da 100.000 a 120.000
6,7,da 12.000 a 15.000
7,8,da 120.000 a 150.000
8,9,da 15.000 a 20.000
9,10,da 150.000 a 200.000


In [11]:
%%sql
-- Add min_value and max_value columns
ALTER TABLE classe_codes ADD COLUMN min_value BIGINT;
ALTER TABLE classe_codes ADD COLUMN max_value BIGINT;

-- Update for standard range format: "da X a Y"
UPDATE classe_codes
SET
    min_value = CAST(REPLACE(REGEXP_EXTRACT(classe_name, 'da (-?[0-9.]+)', 1), '.', '') AS BIGINT),
    max_value = CAST(REPLACE(REGEXP_EXTRACT(classe_name, 'a (-?[0-9.]+)$', 1), '.', '') AS BIGINT)
WHERE classe_name LIKE 'da % a %';

-- Update for "minore di -1.000"
UPDATE classe_codes
SET
    min_value = NULL,
    max_value = CAST(REPLACE(REGEXP_EXTRACT(classe_name, 'minore di (-?[0-9.]+)', 1), '.', '') AS BIGINT)
WHERE classe_name LIKE 'minore di %';

-- Update for "oltre X" format
UPDATE classe_codes
SET
    min_value = CAST(REPLACE(REGEXP_EXTRACT(classe_name, 'oltre ([0-9.]+)', 1), '.', '') AS BIGINT),
    max_value = NULL
WHERE classe_name LIKE 'oltre %';

-- Update for "zero" or exact value
UPDATE classe_codes
SET
    min_value = 0,
    max_value = 0
WHERE classe_name = 'zero';

Unnamed: 0,Success


In [13]:
%%sql
-- Show the generated IDs
SELECT * FROM classe_codes ORDER BY min_value;

Unnamed: 0,id,classe_name,min_value,max_value
0,1,da -1.000 a 0,-1000.0,0.0
1,2,da 0 a 1.000,0.0,1000.0
2,34,zero,0.0,0.0
3,3,da 1.000 a 1.500,1000.0,1500.0
4,4,da 1.500 a 2.000,1500.0,2000.0
5,11,da 2.000 a 2.500,2000.0,2500.0
6,12,da 2.500 a 3.000,2500.0,3000.0
7,17,da 3.000 a 3.500,3000.0,3500.0
8,18,da 3.500 a 4.000,3500.0,4000.0
9,20,da 4.000 a 5.000,4000.0,5000.0


In [42]:
%%sql
ROLLBACK;
SELECT classe_name, REPLACE(REGEXP_EXTRACT(classe_name, 'da ([0-9.]+)', 1), '.', ''), REPLACE(REGEXP_EXTRACT(classe_name, 'a ([0-9.]+)$', 1), '.', '')
FROM classe_codes
WHERE classe_name LIKE 'da % a %';


Unnamed: 0,classe_name,"""replace""(regexp_extract(classe_name, 'da ([0-9.]+)', 1), '.', '')","""replace""(regexp_extract(classe_name, 'a ([0-9.]+)$', 1), '.', '')"
0,da -1.000 a 0,,0
1,da 0 a 1.000,0.0,1000
2,da 1.000 a 1.500,1000.0,1500
3,da 1.500 a 2.000,1500.0,2000
4,da 10.000 a 12.000,10000.0,12000
5,da 100.000 a 120.000,100000.0,120000
6,da 12.000 a 15.000,12000.0,15000
7,da 120.000 a 150.000,120000.0,150000
8,da 15.000 a 20.000,15000.0,20000
9,da 150.000 a 200.000,150000.0,200000


In [12]:
%%sql
-- Show the generated IDs
SELECT * FROM regione_codes;

Unnamed: 0,id,regione_name
0,1,Abruzzo
1,2,Basilicata
2,3,Calabria
3,4,Campania
4,5,Emilia Romagna
5,6,Friuli Venezia Giulia
6,7,Lazio
7,8,Liguria
8,9,Lombardia
9,10,Mancante/errata


In [14]:
%%sql
ROLLBACK;
-- Drop the table to clear the aborted state and old data
DROP TABLE IF EXISTS irpef_reg;

CREATE TABLE IF NOT EXISTS irpef_reg (
    classe_id INTEGER,
    regione_id INTEGER,
    contribuenti BIGINT,
    reddito BIGINT,
    PRIMARY KEY (classe_id,regione_id),
    FOREIGN KEY (classe_id) REFERENCES classe_codes(id),
    FOREIGN KEY (regione_id) REFERENCES regione_codes(id)
);

INSERT INTO irpef_reg (classe_id, regione_id, contribuenti, reddito)
    SELECT
    cc.id,
    rc.id,
    CAST(REPLACE("Numero contribuenti", '.', '') AS BIGINT),
    CAST(REPLACE("Reddito complessivo - Ammontare in euro", '.', '') AS BIGINT),
    FROM read_csv('irpef.regione.2024.csv', header=true) csv
       INNER JOIN classe_codes cc ON csv."Classi di reddito complessivo in euro" = cc.classe_name
       INNER JOIN regione_codes rc ON csv."Regione" = rc.regione_name;

SELECT * FROM irpef_reg;

Unnamed: 0,classe_id,regione_id,contribuenti,reddito
0,15,13,152,-2565663
1,18,13,53,-22715
2,17,13,85410,0
3,19,13,141686,60396511
4,20,13,34992,43455671
...,...,...,...,...
743,23,10,18,1968045
744,25,10,10,1398303
745,27,10,13,2267911
746,31,10,14,3549434


In [17]:
%%sql
ROLLBACK;
-- Drop the table to clear the aborted state and old data
DROP TABLE IF EXISTS irpef_sex;

CREATE TABLE IF NOT EXISTS irpef_sex (
    classe_id INTEGER,
    sesso_id INTEGER,
    contribuenti BIGINT,
    reddito BIGINT,
    PRIMARY KEY (classe_id,sesso_id),
    FOREIGN KEY (classe_id) REFERENCES classe_codes(id),
    FOREIGN KEY (sesso_id) REFERENCES sesso_codes(id)
);

INSERT INTO irpef_sex (classe_id, sesso_id, contribuenti, reddito)
    SELECT
    cc.id,
    sc.id,
    CAST(REPLACE("Numero contribuenti", '.', '') AS BIGINT),
    CAST(REPLACE("Reddito complessivo - Ammontare in euro", '.', '') AS BIGINT),
    FROM read_csv('irpef.sesso.2024.csv', header=true) csv
       INNER JOIN classe_codes cc ON csv."Classi di reddito complessivo in euro" = cc.classe_name
       INNER JOIN sesso_codes sc ON csv."Sesso" = sc.sesso_name;

SELECT * FROM irpef_sex;

Unnamed: 0,classe_id,sesso_id,contribuenti,reddito
0,15,1,1542,-17102700
1,18,1,620,-263406
2,17,1,670362,0
3,19,1,985306,426389300
4,20,1,247408,307421998
...,...,...,...,...
63,23,2,61362,6680643546
64,25,2,42981,5716927324
65,27,2,28322,4839525975
66,31,2,15578,3695921516


In [23]:
%%sql
SELECT regione_name, classe_name, reddito, contribuenti FROM irpef_reg i
JOIN regione_codes rc ON i.regione_id = rc.id
JOIN classe_codes cc ON i.classe_id = cc.id
ORDER BY contribuenti DESC LIMIT 5 ;


Unnamed: 0,regione_name,classe_name,reddito,contribuenti
0,Lombardia,da 20.000 a 26.000,30126521933,1308594
1,Lombardia,da 15.000 a 20.000,15283334501,868094
2,Lombardia,da 29.000 a 35.000,27541910667,866816
3,Veneto,da 20.000 a 26.000,16165822218,701585
4,Emilia Romagna,da 20.000 a 26.000,14498038846,629660


In [26]:
%%sql
SELECT 'Somma Regioni:', SUM(contribuenti) FROM irpef_reg
UNION
SELECT 'Somma Sesso:', SUM(contribuenti) FROM irpef_sex;

Unnamed: 0,'Somma Regioni:',sum(contribuenti)
0,Somma Sesso:,42570078.0
1,Somma Regioni:,42570059.0


In [None]:
%%sql
SELECT 'Somma Regioni:', SUM(contribuenti) FROM irpef_reg WHERE classe_id = 1
UNION
SELECT 'Somma Sesso:', SUM(contribuenti) FROM irpef_sex;

----------------

## Different File: Reviews

In [27]:
!wget https://gist.github.com/mosesvemana/f9868d6d2980b39bf8bf5287a28c7d21/raw/d6ba88f7952370582ecc206d47c4fd0d5448ae20/reviews.csv

--2025-11-20 11:41:43--  https://gist.github.com/mosesvemana/f9868d6d2980b39bf8bf5287a28c7d21/raw/d6ba88f7952370582ecc206d47c4fd0d5448ae20/reviews.csv
Resolving gist.github.com (gist.github.com)... 140.82.112.3
Connecting to gist.github.com (gist.github.com)|140.82.112.3|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://gist.githubusercontent.com/mosesvemana/f9868d6d2980b39bf8bf5287a28c7d21/raw/d6ba88f7952370582ecc206d47c4fd0d5448ae20/reviews.csv [following]
--2025-11-20 11:41:43--  https://gist.githubusercontent.com/mosesvemana/f9868d6d2980b39bf8bf5287a28c7d21/raw/d6ba88f7952370582ecc206d47c4fd0d5448ae20/reviews.csv
Resolving gist.githubusercontent.com (gist.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to gist.githubusercontent.com (gist.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1277892 (1.2M) [text/plain]
Saving to: ‘re

In [28]:
%%sql
SELECT * FROM read_csv('reviews.csv') LIMIT 10 ;

Unnamed: 0,listing_id,id,date,reviewer_id,reviewer_name,comments
0,7202016,38917982,2015-07-19,28943674,Bianca,Cute and cozy place. Perfect location to every...
1,7202016,39087409,2015-07-20,32440555,Frank,Kelly has a great room in a very central locat...
2,7202016,39820030,2015-07-26,37722850,Ian,"Very spacious apartment, and in a great neighb..."
3,7202016,40813543,2015-08-02,33671805,George,Close to Seattle Center and all it has to offe...
4,7202016,41986501,2015-08-10,34959538,Ming,Kelly was a great host and very accommodating ...
5,7202016,43979139,2015-08-23,1154501,Barent,"Kelly was great, place was great, just what I ..."
6,7202016,45265631,2015-09-01,37853266,Kevin,Kelly was great! Very nice and the neighborhoo...
7,7202016,46749120,2015-09-13,24445447,Rick,hola all bnb erz - Just left Seattle where I h...
8,7202016,47783346,2015-09-21,249583,Todd,Kelly's place is conveniently located on a qui...
9,7202016,48388999,2015-09-26,38110731,Tatiana,"The place was really nice, clean, and the most..."


## Move some data inside a table

In [29]:
%%sql
CREATE TABLE reviewer (
    rid BIGINT PRIMARY KEY,
    rname VARCHAR(255)
);

INSERT INTO reviewer (rid, rname)
    SELECT DISTINCT reviewer_id as 'rid', reviewer_name as 'rname'
    FROM read_csv('reviews.csv');

Unnamed: 0,Success


In [31]:
%%sql
SELECT * FROM reviewer ORDER BY rid LIMIT 10;

Unnamed: 0,rid,rname
0,2543,Mike And Fabian
1,9763,Taylor
2,12793,Kelly
3,15174,Scott
4,17196,Kawika
5,19457,Ron
6,26098,Jonathan
7,37709,Seh
8,38157,Annie
9,41555,Craig


In [32]:
%%sql
DROP TABLE IF EXISTS review;
CREATE TABLE review (
    review_id BIGINT PRIMARY KEY,
    apartment_id BIGINT,
    reviewer_id BIGINT,
    date DATE,
    comment TEXT
);

INSERT INTO review (review_id, apartment_id, reviewer_id, date, comment)
    SELECT  id, listing_id, reviewer_id, date, comments
    FROM read_csv('reviews.csv');

Unnamed: 0,Success


In [33]:
%%sql
SELECT COUNT(*) FROM review;

Unnamed: 0,count_star()
0,3044


In [34]:
%%sql
SELECT * FROM review LIMIT 10;

Unnamed: 0,review_id,apartment_id,reviewer_id,date,comment
0,38917982,7202016,28943674,2015-07-19,Cute and cozy place. Perfect location to every...
1,39087409,7202016,32440555,2015-07-20,Kelly has a great room in a very central locat...
2,39820030,7202016,37722850,2015-07-26,"Very spacious apartment, and in a great neighb..."
3,40813543,7202016,33671805,2015-08-02,Close to Seattle Center and all it has to offe...
4,41986501,7202016,34959538,2015-08-10,Kelly was a great host and very accommodating ...
5,43979139,7202016,1154501,2015-08-23,"Kelly was great, place was great, just what I ..."
6,45265631,7202016,37853266,2015-09-01,Kelly was great! Very nice and the neighborhoo...
7,46749120,7202016,24445447,2015-09-13,hola all bnb erz - Just left Seattle where I h...
8,47783346,7202016,249583,2015-09-21,Kelly's place is conveniently located on a qui...
9,48388999,7202016,38110731,2015-09-26,"The place was really nice, clean, and the most..."


In [35]:
%%sql
SELECT date, comment
FROM review
WHERE comment LIKE '%pool%';

Unnamed: 0,date,comment
0,2015-05-26,Kirsten's home is lovely. We had access to a n...
1,2014-05-28,The apartment was very nice and luxurious. It ...
2,2015-05-21,"Jordan & Stay Alfred provided thorough, detail..."
3,2015-11-27,We had a great trip and loved this condo and l...
4,2014-11-24,The room was as described and the view was won...


In [36]:
%%sql
SELECT COUNT(*)
FROM review
WHERE date BETWEEN  '2015-07-01' AND  '2015-07-31';

Unnamed: 0,count_star()
0,231


In SQL we can use some special functions to process the values in the columns,
for example see `strftime` which works with date/time and how to extract values

https://duckdb.org/docs/sql/functions/dateformat.html

In [37]:
%%sql
SELECT strftime('%Y', date) AS review_year, COUNT(*) AS review_count
FROM review
GROUP BY review_year
ORDER BY review_year DESC;

Unnamed: 0,review_year,review_count
0,2016,6
1,2015,1932
2,2014,650
3,2013,234
4,2012,183
5,2011,39


## Exercises

### Q1: Find the number of different apartments with a review

### Q2: Find the date of the first review written

### Q3: Find the number of apartments with more than 1 review

### Q4: Find the names of all reviewers with more than 3 review

__Can you use a nested query?__

### Q5: Find the user that has written the largest number of reviews

### Q6: Find the top-5 apartments with the largest number of reviews


## We can export to file and also load from file
**Note:** DuckDB export 3 files:

1. A file to create the scehma
2. One or more files containing the data
3. A file to load the data

In [None]:
%%sql
EXPORT DATABASE 'reviews_db';

Unnamed: 0,Success


In [None]:
%%sql
DROP TABLE IF EXISTS reviewer;
DROP TABLE IF EXISTS review;
IMPORT DATABASE 'reviews_db';

Unnamed: 0,Success
