### **Basic SQL**

<font color="red">File access required:</font> In Colab this notebook requires first uploading files **Cities.csv**, **Countries.csv**, **Players.csv**, and **Teams.csv** using the *Files* feature in the left toolbar. If running the notebook on a local computer, simply ensure these files are in the same workspace as the notebook.

In [1]:
!pip install prettytable==0.7.2
!pip install ipython-sql

Collecting prettytable==0.7.2
  Downloading prettytable-0.7.2.zip (28 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: prettytable
  Building wheel for prettytable (setup.py) ... [?25l[?25hdone
  Created wheel for prettytable: filename=prettytable-0.7.2-py3-none-any.whl size=13695 sha256=73fbb2228542d60054285ebf4d8b29a1b6b59dec64f0e839d8da3538d38ae4b7
  Stored in directory: /root/.cache/pip/wheels/ca/f9/66/1ebeb8cdff2211eebb6fce02957f9e0a9ae3da4b7e65512d1b
Successfully built prettytable
Installing collected packages: prettytable
  Attempting uninstall: prettytable
    Found existing installation: prettytable 3.17.0
    Uninstalling prettytable-3.17.0:
      Successfully uninstalled prettytable-3.17.0
Successfully installed prettytable-0.7.2
Collecting jedi>=0.16 (from ipython->ipython-sql)
  Downloading jedi-0.19.2-py2.py3-none-any.whl.metadata (22 kB)
Downloading jedi-0.19.2-py2.py3-none-any.whl (1.6 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━

In [2]:
# Set-up
%load_ext sql
%sql sqlite://
import pandas as pd

In [3]:
# Create database tables from CSV files
with open('Cities.csv') as f: Cities = pd.read_csv(f, index_col=0)
%sql drop table if exists Cities;
%sql --persist Cities


with open('Countries.csv') as f: Countries = pd.read_csv(f, index_col=0)
%sql drop table if exists Countries;
%sql --persist Countries

 * sqlite://
Done.
 * sqlite://
 * sqlite://
Done.
 * sqlite://


'Persisted countries'

#### Look at sample of Cities and Countries tables

In [4]:
%%sql
select * from Cities limit 5

 * sqlite://
Done.


city,country,latitude,longitude,temperature
Aalborg,Denmark,57.03,9.92,7.52
Aberdeen,United Kingdom,57.17,-2.08,8.1
Abisko,Sweden,63.35,18.83,0.2
Adana,Turkey,36.99,35.32,18.67
Albacete,Spain,39.0,-1.87,12.62


In [5]:
%%sql
select * from Countries limit 5

 * sqlite://
Done.


country,population,EU,coastline
Albania,2.9,no,yes
Andorra,0.07,no,no
Austria,8.57,yes,no
Belarus,9.48,no,no
Belgium,11.37,yes,yes


### Basic Select statement
Select columns  
From tables  
Where condition  

*Find all countries not in the EU*

In [6]:
%%sql
select country
from Countries
where EU = 'no'

 * sqlite://
Done.


country
Albania
Andorra
Belarus
Bosnia and Herzegovina
Iceland
Kosovo
Liechtenstein
Macedonia
Moldova
Montenegro


*Find all cities with temperature between -5 and 5; return city, country, and temperature*

In [7]:
%%sql
select city, country, temperature
from Cities
where temperature > -5 and temperature < 5

 * sqlite://
Done.


city,country,temperature
Abisko,Sweden,0.2
Augsburg,Germany,4.54
Bergen,Norway,1.75
Bodo,Norway,4.5
Helsinki,Finland,4.19
Innsbruck,Austria,4.54
Kiruna,Sweden,-2.2
Orsha,Belarus,4.93
Oslo,Norway,2.32
Oulu,Finland,1.45


### Ordering

*Modify previous query to sort by temperature*

In [8]:
%%sql
select city, country, temperature
from Cities
where temperature > -5 and temperature < 5
order by temperature

 * sqlite://
Done.


city,country,temperature
Kiruna,Sweden,-2.2
Abisko,Sweden,0.2
Oulu,Finland,1.45
Bergen,Norway,1.75
Oslo,Norway,2.32
Tampere,Finland,3.59
Uppsala,Sweden,4.17
Helsinki,Finland,4.19
Tartu,Estonia,4.36
Bodo,Norway,4.5


*Modify previous query to sort by country, then temperature descending*

In [9]:
%%sql
select city, country, temperature
from Cities
where temperature > -5 and temperature < 5
order by country ASC, temperature DESC

 * sqlite://
Done.


city,country,temperature
Salzburg,Austria,4.62
Innsbruck,Austria,4.54
Orsha,Belarus,4.93
Tallinn,Estonia,4.82
Tartu,Estonia,4.36
Turku,Finland,4.72
Helsinki,Finland,4.19
Tampere,Finland,3.59
Oulu,Finland,1.45
Augsburg,Germany,4.54


### <font color = 'green'>**Your Turn**</font>

*Find all countries with no coastline and with population > 9. Return the country and population, in descending order of population.*

In [11]:
%%sql
SELECT country, population
FROM Countries
WHERE coastline = 'no'
  AND population > 9
ORDER BY population DESC;

 * sqlite://
Done.


country,population
Czech Republic,10.55
Hungary,9.82
Belarus,9.48


### Multiple tables in From clause - Joins

*Find all cities with longitude < 10 not in the EU, return city and longitude*

In [12]:
Cities.head(2) # python command = dataframe

Unnamed: 0_level_0,country,latitude,longitude,temperature
city,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Aalborg,Denmark,57.03,9.92,7.52
Aberdeen,United Kingdom,57.17,-2.08,8.1


In [13]:
Countries.head(2)

Unnamed: 0_level_0,population,EU,coastline
country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Albania,2.9,no,yes
Andorra,0.07,no,no


In [14]:
%%sql
select city, longitude
from Cities, Countries -- 2 tables
where Cities.country = Countries.country -- get data from the two tables.
and longitude < 10 and EU = 'no' -- this are their conditions

-- SQL: comment "--"

 * sqlite://
Done.


city,longitude
Andorra,1.52
Basel,7.59
Bergen,5.32
Geneva,6.14
Stavanger,5.68
Zurich,8.56


*Modify previous query to also return country (error then fix)*

In [18]:
%%sql
SELECT c.city,
       c.longitude,
       co.country
FROM Cities c
JOIN Countries co
  ON c.country = co.country
WHERE c.longitude < 10
  AND co.EU = 'no';

 * sqlite://
Done.


city,longitude,country
Andorra,1.52,Andorra
Basel,7.59,Switzerland
Bergen,5.32,Norway
Geneva,6.14,Switzerland
Stavanger,5.68,Norway
Zurich,8.56,Switzerland


*Find all cities with latitude < 50 in a country with population < 5; return city, country, and population, sorted by country*

In [19]:
%%sql
select city, Cities.country, population
from Cities, Countries
where Cities.country = Countries.country
and latitude < 50 and population < 5
order by Cities.country

 * sqlite://
Done.


city,country,population
Elbasan,Albania,2.9
Andorra,Andorra,0.07
Sarajevo,Bosnia and Herzegovina,3.8
Rijeka,Croatia,4.23
Split,Croatia,4.23
Skopje,Macedonia,2.08
Balti,Moldova,4.06
Chisinau,Moldova,4.06
Podgorica,Montenegro,0.63
Ljubljana,Slovenia,2.07


#### Inner Join -- just FYI

*Same query as above*

In [20]:
%%sql
select city, Cities.country, population
from Cities inner join Countries
     on Cities.country = Countries.country -- condition of the INNER JOIN.
where latitude < 50 and population < 5
order by Cities.country

 * sqlite://
Done.


city,country,population
Elbasan,Albania,2.9
Andorra,Andorra,0.07
Sarajevo,Bosnia and Herzegovina,3.8
Rijeka,Croatia,4.23
Split,Croatia,4.23
Skopje,Macedonia,2.08
Balti,Moldova,4.06
Chisinau,Moldova,4.06
Podgorica,Montenegro,0.63
Ljubljana,Slovenia,2.07


### Select *

*Modify previous queries to return all columns*

### <font color = 'green'>**Your Turn**</font>

*Find all cities with latitude > 45 in a country with no coastline and with population > 9. Return the city, country, latitude, and whether it's in the EU.*

In [21]:
%%sql
SELECT
    c.city,
    c.country,
    c.latitude,
    co.EU AS in_EU
FROM Cities c
JOIN Countries co ON c.country = co.country
WHERE c.latitude > 45
    AND co.coastline = 'no'
    AND co.population > 9
ORDER BY c.latitude DESC;

 * sqlite://
Done.


city,country,latitude,in_EU
Orsha,Belarus,54.52,no
Minsk,Belarus,53.9,no
Hrodna,Belarus,53.68,no
Pinsk,Belarus,52.13,no
Brest,Belarus,52.1,no
Mazyr,Belarus,52.05,no
Prague,Czech Republic,50.08,yes
Ostrava,Czech Republic,49.83,yes
Brno,Czech Republic,49.2,yes
Gyor,Hungary,47.7,yes


### Aggregation and Grouping

*Find the average temperature for all cities*

In [22]:
%%sql
select avg(temperature) as avgTemp
from Cities

 * sqlite://
Done.


avgTemp
9.497840375586858


*Modify previous query to find average temperature of cities with latitude > 55*

In [23]:
%%sql
select avg(temperature)
from Cities
where latitude > 55

 * sqlite://
Done.


avg(temperature)
4.985185185185185


*Modify previous query to also find minimum and maxiumum temperature of cities with latitude > 55*

In [24]:
%%sql
select min(temperature) as Min_val, max(temperature) as Max_val
from Cities
where latitude > 55

 * sqlite://
Done.


Min_val,Max_val
-2.2,8.6


*Modify previous query to return number of cities with latitude > 55*

In [33]:
%%sql
SELECT COUNT(*) AS number_of_cities
FROM Cities
WHERE latitude > 55;

 * sqlite://
Done.


number_of_cities
27


*Rename result column as northerns*



In [25]:
Cities.head(1)

Unnamed: 0_level_0,country,latitude,longitude,temperature
city,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Aalborg,Denmark,57.03,9.92,7.52


In [26]:
Countries.head(1)

Unnamed: 0_level_0,population,EU,coastline
country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Albania,2.9,no,yes


*Find the minimum and maximum temperature of cities in the EU (then not in the EU)*

In [27]:
%%sql
select min(temperature), max(temperature)
from Cities, Countries
where Cities.country = Countries.Country
and EU = 'no'

 * sqlite://
Done.


min(temperature),max(temperature)
1.75,18.67


### <font color = 'green'>**Your Turn**</font>

*Find the number of cities with latitude > 45 in countries with no coastline and with population > 9; also return the minimum and maximum latitude among those cities*

In [29]:
%%sql
SELECT
    COUNT(*) AS number_of_cities,
    MIN(c.latitude) AS min_latitude,
    MAX(c.latitude) AS max_latitude
FROM Cities c
JOIN Countries co ON c.country = co.country
WHERE c.latitude > 45
    AND co.coastline = 'no'
    AND co.population > 9;

 * sqlite://
Done.


number_of_cities,min_latitude,max_latitude
13,46.25,54.52


*Find the average temperature for each country*

In [28]:
%%sql
select country, avg(temperature)
from Cities
group by country

 * sqlite://
Done.


country,avg(temperature)
Albania,15.18
Andorra,9.6
Austria,6.144
Belarus,5.946666666666666
Belgium,9.65
Bosnia and Herzegovina,9.6
Bulgaria,10.44
Croatia,10.865
Czech Republic,7.856666666666666
Denmark,7.625


*Modify previous query to sort by descending average temperature*

In [31]:
%%sql
SELECT country, AVG(temperature) AS avg_temperature
FROM Cities
GROUP BY country
ORDER BY AVG(temperature) DESC;

 * sqlite://
Done.


country,avg_temperature
Greece,16.9025
Albania,15.18
Portugal,14.47
Spain,14.238333333333332
Italy,13.474666666666668
Turkey,11.726666666666665
Croatia,10.865
Bulgaria,10.44
France,10.151111111111112
Montenegro,9.99


*Modify previous query to show countries only*

In [32]:
%%sql
SELECT country
FROM Cities
GROUP BY country
ORDER BY AVG(temperature) DESC;

 * sqlite://
Done.


country
Greece
Albania
Portugal
Spain
Italy
Turkey
Croatia
Bulgaria
France
Montenegro


*Find the average temperature for cities in countries with and without coastline*

In [30]:
%%sql
select coastline, avg(temperature)
from Cities, Countries
where Cities.country = Countries.country
group by coastline

 * sqlite://
Done.


coastline,avg(temperature)
no,7.748000000000001
yes,9.784699453551914


*Modify previous query to find the average temperature for cities in the EU and not in the EU, then all combinations of coastline and EU*

In [34]:
# Average temperature for cities in the EU and not in the EU:
%%sql
SELECT EU, AVG(temperature) AS avg_temperature
FROM Cities, Countries
WHERE Cities.country = Countries.country
GROUP BY EU;

 * sqlite://
Done.


EU,avg_temperature
no,9.030476190476188
yes,9.694133333333331


In [35]:
# All combinations of coastline and EU:
%%sql
SELECT coastline, EU, AVG(temperature) AS avg_temperature
FROM Cities, Countries
WHERE Cities.country = Countries.country
GROUP BY coastline, EU;

 * sqlite://
Done.


coastline,EU,avg_temperature
no,no,7.67375
no,yes,7.832857142857144
yes,no,9.492340425531914
yes,yes,9.885735294117652


*Modify previous query to only include cities with latitude < 50, then latitude < 40*

In [36]:
# Only include cities with latitude < 50:
%%sql
SELECT coastline, AVG(temperature) AS avg_temperature
FROM Cities, Countries
WHERE Cities.country = Countries.country
AND Cities.latitude < 50
GROUP BY coastline;

 * sqlite://
Done.


coastline,avg_temperature
no,8.204782608695655
yes,11.52513043478261


In [37]:
# Only include cities with latitude < 40:
%%sql
SELECT coastline, AVG(temperature) AS avg_temperature
FROM Cities, Countries
WHERE Cities.country = Countries.country
AND Cities.latitude < 40
GROUP BY coastline;

 * sqlite://
Done.


coastline,avg_temperature
yes,14.177647058823528


### <font color = 'green'>**Your Turn**</font>

*For each country in the EU, find the latitude of the northernmost city in the country, i.e., the maximum latitude. Return the country and its maximum latitude, in descending order of maximum latitude.*

In [38]:
%%sql
SELECT Cities.country, MAX(Cities.latitude) AS max_latitude
FROM Cities, Countries
WHERE Cities.country = Countries.country
AND Countries.EU = 'yes'
GROUP BY Cities.country
ORDER BY MAX(Cities.latitude) DESC;

 * sqlite://
Done.


country,max_latitude
Sweden,67.85
Finland,65.0
Estonia,59.43
United Kingdom,57.47
Denmark,57.03
Latvia,56.95
Lithuania,55.72
Poland,54.2
Germany,54.07
Ireland,53.33


#### A Bug in SQLite - just FYI

In [39]:
%%sql
select country, avg(temperature)
from Cities
group by country

 * sqlite://
Done.


country,avg(temperature)
Albania,15.18
Andorra,9.6
Austria,6.144
Belarus,5.946666666666666
Belgium,9.65
Bosnia and Herzegovina,9.6
Bulgaria,10.44
Croatia,10.865
Czech Republic,7.856666666666666
Denmark,7.625


*Modify previous query - add city to Select clause*

In [42]:
%%sql
SELECT country, city, AVG(temperature) AS avg_temperature
FROM Cities
GROUP BY country, city;

 * sqlite://
Done.


country,city,avg_temperature
Albania,Elbasan,15.18
Andorra,Andorra,9.6
Austria,Graz,6.91
Austria,Innsbruck,4.54
Austria,Linz,6.79
Austria,Salzburg,4.62
Austria,Vienna,7.86
Belarus,Brest,6.73
Belarus,Hrodna,6.07
Belarus,Mazyr,6.25


*Now focus on Austria and Sweden*

In [40]:
%%sql
select *
from Cities
where country = 'Austria' or country = 'Sweden'
order by country

 * sqlite://
Done.


city,country,latitude,longitude,temperature
Graz,Austria,47.08,15.41,6.91
Innsbruck,Austria,47.28,11.41,4.54
Linz,Austria,48.32,14.29,6.79
Salzburg,Austria,47.81,13.04,4.62
Vienna,Austria,48.2,16.37,7.86
Abisko,Sweden,63.35,18.83,0.2
Göteborg,Sweden,57.75,12.0,5.76
Kiruna,Sweden,67.85,20.22,-2.2
Malmö,Sweden,55.58,13.03,7.33
Stockholm,Sweden,59.35,18.1,6.26


In [41]:
%%sql
select country, city, avg(temperature)
from Cities
where country = 'Austria' or country = 'Sweden'
group by country

 * sqlite://
Done.


country,city,avg(temperature)
Austria,Graz,6.144
Sweden,Abisko,3.586666666666668


*Modify previous query to min(temperature), max(temperature), then together in both orders*

In [45]:
# Min temperature:
%%sql
SELECT country, city, MIN(temperature) AS min_temperature
FROM Cities
WHERE country = 'Austria' OR country = 'Sweden'
GROUP BY country;

 * sqlite://
Done.


country,city,min_temperature
Austria,Innsbruck,4.54
Sweden,Kiruna,-2.2


In [46]:
# Max temperature:
%%sql
SELECT country, city, MAX(temperature) AS max_temperature
FROM Cities
WHERE country = 'Austria' OR country = 'Sweden'
GROUP BY country;

 * sqlite://
Done.


country,city,max_temperature
Austria,Vienna,7.86
Sweden,Malmö,7.33


In [47]:
# Both - min then max:
%%sql
SELECT country, city, MIN(temperature) AS min_temperature, MAX(temperature) AS max_temperature
FROM Cities
WHERE country = 'Austria' OR country = 'Sweden'
GROUP BY country;

 * sqlite://
Done.


country,city,min_temperature,max_temperature
Austria,Vienna,4.54,7.86
Sweden,Malmö,-2.2,7.33


### The Limit clause

*Return any three countries with population > 20*

In [43]:
%%sql
select country
from Countries
where population > 20
limit 3

 * sqlite://
Done.


country
France
Germany
Italy


*Find the ten coldest cities*

In [44]:
%%sql
select city, temperature
from Cities
order by temperature
limit 10

 * sqlite://
Done.


city,temperature
Kiruna,-2.2
Abisko,0.2
Oulu,1.45
Bergen,1.75
Oslo,2.32
Tampere,3.59
Uppsala,4.17
Helsinki,4.19
Tartu,4.36
Bodo,4.5


### <font color = 'green'>**Your Turn**</font>

*Find the five easternmost (greatest longitude) cities in countries with no coastline. Return the city and country names.*

In [48]:
%%sql
%%sql
SELECT Cities.city, Cities.country
FROM Cities, Countries
WHERE Cities.country = Countries.country
AND Countries.coastline = 'no'
ORDER BY Cities.longitude DESC
LIMIT 5;

 * sqlite://
(sqlite3.OperationalError) near "%": syntax error
[SQL: %%sql
SELECT Cities.city, Cities.country
FROM Cities, Countries
WHERE Cities.country = Countries.country
AND Countries.coastline = 'no'
ORDER BY Cities.longitude DESC
LIMIT 5;]
(Background on this error at: https://sqlalche.me/e/20/e3q8)


### <font color = 'green'>**Your Turn - Basic SQL on World Cup Data**</font>

In [49]:
# Create database tables from CSV files
with open('Players.csv') as f: Players = pd.read_csv(f, index_col=0)
%sql drop table if exists Players;
%sql --persist Players
with open('Teams.csv') as f: Teams = pd.read_csv(f, index_col=0)
%sql drop table if exists Teams;
%sql --persist Teams

 * sqlite://
Done.
 * sqlite://
 * sqlite://
Done.
 * sqlite://


'Persisted teams'

#### Look at sample of Players and Teams tables

In [50]:
%%sql
select * from Players limit 5

 * sqlite://
Done.


surname,team,position,minutes,shots,passes,tackles,saves
Abdoun,Algeria,midfielder,16,0,6,0,0
Belhadj,Algeria,defender,270,1,146,8,0
Boudebouz,Algeria,midfielder,74,3,28,1,0
Bougherra,Algeria,defender,270,1,89,11,0
Chaouchi,Algeria,goalkeeper,90,0,17,0,2


In [51]:
%%sql
select * from Teams limit 5

 * sqlite://
Done.


team,ranking,games,wins,draws,losses,goalsFor,goalsAgainst,yellowCards,redCards
Brazil,1,5,3,1,1,9,4,7,2
Spain,2,6,5,0,1,7,2,3,0
Portugal,3,4,1,2,1,7,1,8,1
Netherlands,4,6,6,0,0,12,5,15,0
Italy,5,3,0,2,1,4,5,5,0


*1)  What player on a team with “ia” in the team name played less than 200 minutes and made more than 100 passes? Return the player surname. Note: To check if attribute A contains string S use "A like '%S%'"*

In [52]:
%%sql
SELECT surname
FROM Players
WHERE team LIKE '%ia%'
AND minutes < 200
AND passes > 100;

 * sqlite://
Done.


surname
Kuzmanovic


*2) Find all players who took more than 20 shots. Return all player information in descending order of shots taken.*

In [54]:
%%sql
SELECT *
FROM Players
WHERE shots > 20
ORDER BY shots DESC;

 * sqlite://
Done.


surname,team,position,minutes,shots,passes,tackles,saves
Gyan,Ghana,forward,501,27,151,1,0
Villa,Spain,forward,529,22,169,2,0
Messi,Argentina,forward,450,21,321,10,0


*3) Find the goalkeepers of teams that played more than four games. List the surname of the goalkeeper, the team, and the number of minutes the goalkeeper played.*

In [55]:
%%sql
SELECT Players.surname, Players.team, Players.minutes
FROM Players, Teams
WHERE Players.team = Teams.team
AND Players.position = 'goalkeeper'
AND Teams.games > 4;

 * sqlite://
Done.


surname,team,minutes
Romero,Argentina,450
Julio Cesar,Brazil,450
Neuer,Germany,540
Kingson,Ghana,510
Stekelenburg,Netherlands,540
Villar,Paraguay,480
Casillas,Spain,540
Muslera,Uruguay,570


*4) How many players who play on a team with ranking <10 played more than 350 minutes? Return one number in a column named 'superstar'.*

In [56]:
%%sql
SELECT COUNT(*) AS superstar
FROM Players, Teams
WHERE Players.team = Teams.team
AND Teams.ranking < 10
AND Players.minutes > 350;

 * sqlite://
Done.


superstar
54


*5) What is the average number of passes made by forwards? By midfielders? Write one query that gives both values with the corresponding position.*

In [57]:
%%sql
SELECT position, AVG(passes) AS avg_passes
FROM Players
WHERE position = 'forward' OR position = 'midfielder'
GROUP BY position;

 * sqlite://
Done.


position,avg_passes
forward,50.82517482517483
midfielder,95.2719298245614


*6) Which team has the highest ratio of goalsFor to goalsAgainst? Return the team and the ratio.*

In [58]:
%%sql
SELECT team, (goalsFor * 1.0 / goalsAgainst) AS ratio
FROM Teams
WHERE goalsAgainst > 0
ORDER BY (goalsFor * 1.0 / goalsAgainst) DESC
LIMIT 1;

 * sqlite://
Done.


team,ratio
Portugal,7.0


### <font color = 'green'>**Your Turn Extra - Basic SQL on Titanic Data**</font>

<font color="red">File access required:</font> In Colab these extra problems require first uploading **Titanic.csv** using the *Files* feature in the left toolbar. If running the notebook on a local computer, simply ensure this file is in the same workspace as the notebook.

In [59]:
# Create database table from CSV file
with open('Titanic.csv') as f: Titanic = pd.read_csv(f, index_col=0)
%sql drop table if exists Titanic;
%sql --persist Titanic

 * sqlite://
Done.
 * sqlite://


'Persisted titanic'

#### Look at sample of Titanic table

In [60]:
%%sql
select * from Titanic limit 5

 * sqlite://
Done.


last,first,gender,age,class,fare,embarked,survived
Abbing,Mr. Anthony,M,42.0,3,7.55,Southampton,no
Abbott,Mrs. Stanton (Rosa Hunt),F,35.0,3,20.25,Southampton,yes
Abbott,Mr. Rossmore Edward,M,16.0,3,20.25,Southampton,no
Abelson,Mr. Samuel,M,30.0,2,24.0,Cherbourg,no
Abelson,Mrs. Samuel (Hannah Wizosky),F,28.0,2,24.0,Cherbourg,yes


*1) How many passengers sailed for free (i.e, fare is zero)?*

In [61]:
%%sql
SELECT COUNT(*) AS free_passengers
FROM Titanic
WHERE fare = 0;

 * sqlite://
Done.


free_passengers
15


*2) How many married women over age 50 embarked in Cherbourg? (Married women’s first names begin with "Mrs."). Note: To check if attribute A begins with string S use "A like 'S%'"*

In [62]:
%%sql
SELECT COUNT(*) AS married_women_over_50
FROM Titanic
WHERE first LIKE 'Mrs.%'
AND age > 50
AND embarked = 'Cherbourg';

 * sqlite://
Done.


married_women_over_50
4


*3) Write three queries to find: (i) the total number of passengers; (ii) the number of passengers under 18; (iii) the number of passengers 18 or older. Notice that the second and third numbers don't add up to the first.*

In [63]:
%%sql
SELECT COUNT(*) AS total_passengers
FROM Titanic;

 * sqlite://
Done.


total_passengers
891


In [64]:
%%sql
SELECT COUNT(*) AS passengers_under_18
FROM Titanic
WHERE age < 18;

 * sqlite://
Done.


passengers_under_18
113


In [65]:
%%sql
SELECT COUNT(*) AS passengers_18_or_older
FROM Titanic
WHERE age >= 18;

 * sqlite://
Done.


passengers_18_or_older
601


*Missing values in SQL tables are given a special value called 'null', and conditions 'A is null' and 'A is not null' can be use in Where clauses to check whether attribute A has the 'null' value. Write a query to find the number of passengers whose age is missing -- now your passenger numbers should add up. Modify the query to also return the average fare paid by those passengers.*

In [72]:
%%sql
SELECT COUNT(*) AS passengers_with_missing_age
FROM Titanic
WHERE age IS NULL;

 * sqlite://
Done.


passengers_with_missing_age
177


In [74]:
%%sql
SELECT COUNT(*) AS passengers_with_missing_age, AVG(fare) AS avg_fare
FROM Titanic
WHERE age IS NULL;

 * sqlite://
Done.


passengers_with_missing_age,avg_fare
177,22.15949152542376


*4) Find all passengers whose age is not an integer; return last name, first name, and age, from youngest to oldest. Note: Consider using the round() function*

In [68]:
%%sql
SELECT last, first, age
FROM Titanic
WHERE age IS NOT NULL
AND age != ROUND(age)
ORDER BY age ASC;

 * sqlite://
Done.


last,first,age
Thomas,Master Assad Alexander,0.42
Hamalainen,Master Viljo,0.67
Baclini,Miss Helene Barbara,0.75
Baclini,Miss Eugenie,0.75
Caldwell,Master Alden Gates,0.83
Richards,Master George Sibley,0.83
Allison,Master Hudson Trevor,0.92
Zabour,Miss Hileni,14.5
Lovell,"Mr. John Hall (""Henry"")",20.5
Hanna,Mr. Mansour,23.5


*5) What is the most common last name among passengers, and how many passengers have that last name?*

In [69]:
%%sql
SELECT last, COUNT(*) AS count
FROM Titanic
GROUP BY last
ORDER BY COUNT(*) DESC
LIMIT 1;

 * sqlite://
Done.


last,count
Andersson,9


*6) What is the average fare paid by passengers in the three classes, and the average age of passengers in the three classes?*

In [70]:
%%sql
SELECT class, AVG(fare) AS avg_fare, AVG(age) AS avg_age
FROM Titanic
GROUP BY class;

 * sqlite://
Done.


class,avg_fare,avg_age
1,84.15499999999999,38.233440860215055
2,20.66222826086957,29.87763005780347
3,13.676863543788176,25.14061971830986


*7) For male survivors, female survivors, male non-survivors, and female non-survivors, how many passengers are in each of those four categories and what is their average fare? Return your results from lowest to highest
average fare.*

In [71]:
%%sql
SELECT gender, survived, COUNT(*) AS count, AVG(fare) AS avg_fare
FROM Titanic
GROUP BY gender, survived
ORDER BY AVG(fare) ASC;

 * sqlite://
Done.


gender,survived,count,avg_fare
M,no,468,21.961923076923
F,no,81,23.02555555555556
M,yes,109,40.82229357798166
F,yes,233,51.93901287553647
