# Querying Data
Queries are what SQL does best. A query is a request for data or information from a database table or combination of tables. 

These exercises will use the cities.db database. The data for this has been taken from https://simplemaps.com/data/world-cities. If you are really interested you can see the processing that has gone on to create the cities databse in `setup\create_cities.ipynb`.

The cities.db database has just one table: `cities`.

## Part 1: Basic SELECT queries
**The code below:**
- Imports the duckdb library (this has to run once per session)
- Connects to the database using the duckdb library
- Runs a simple query to select all columns from the cities table
- Shows the results of the query

In [1]:
# You just have to run this cell once to load the database.
import duckdb
import pandas as pd

# Note we are using the cities.db database. This should already be in your data folder, but if not you can 
# re-create it by opening the setup notebook (setup/create_cities.ipynb) and running the cells there.
%load_ext sql
conn = duckdb.connect('..\\data\\cities.db')
%sql conn --alias duckdb

# Adding to the display limit to be able to see more results of our queries
%config SqlMagic.displaylimit = 20


In [2]:
%%sql

SELECT * 
FROM cities;


RuntimeError: If using snippets, you may pass the --with argument explicitly.
For more details please refer: https://jupysql.ploomber.io/en/latest/compose.html#with-argument


Original error message from DB driver:
Catalog Error: Table with name cities does not exist!
Did you mean "pg_views"?
LINE 2: FROM cities;
             ^

If you need help solving this issue, send us a message: https://ploomber.io/community


## Things to notice: ##
- The query is a string that is passed to the `execute` method of the connection object
- The cities table has 7 columns: `city`, `lat`, `lng`, `country`, `country_code`, `capital` and `population`
- Each column shows its data type:
    - `city`, `country`, `country_code`, `capital` are all VARCHAR, which means they store text data
    - `lat`, `lng`: are both `DOUBLE`, which means they store floating point or decimal numbers
    - `population`: `INTEGER`, which means it stores whole numbers
- You can see 20 rows from the table, but it is clear there are more. *Would be interesting to see how many rows there are in total.*

## Next we will... ##
- Write a query to count the number of rows in the cities table (using `COUNT(*)`)
- Find the city names and population for cities in Australia
- Order the Australian cities by latitude


In [None]:
%%sql
-- Counting the number of rows in the table. 

SELECT COUNT(*)
FROM cities;

In [None]:
%%sql
-- Finding the Australian cities in the table.
            SELECT city, population 
            FROM cities
            WHERE country_code = 'AUS'; 

In [None]:
%%sql
-- Modifying our query to return the cities in order. 

-- Finding the Australian cities in the table.

SELECT city, population, lat 
FROM cities
WHERE country_code = 'AUS'
ORDER BY lat;

## Your Turn ##
Write queries to:
- Find all the capital cities in the world (capital = 'primary')
- Find the cities in Germany, ordered by longitude

In [None]:
%%sql
-- All the capital cities in the table


In [None]:
%%sql
-- Cities in Germany ordered by longitude.


## Aggregate Functions in SQL ##
Aggregate functions are used to perform calculations on a set of values to return a single value. We already used a simple aggregate function in the previous exercise: `COUNT(*)`. 

**Aggregate functions include:**
- `COUNT()`: returns the number of rows that match a specified criteria
- `SUM()`: returns the sum of all values in a column
- `AVG()`: returns the average *(mean)* of all values in a column
- `MIN()`: returns the minimum value in a column
- `MAX()`: returns the maximum value in a column
Note that for any of the functions besides `COUNT()`, you need to specify the column you want to perform the calculation on.

**The code below shows:**
- Larget (`MAX`) population of a city in the table
- Total (`SUM`) population of all cities in the table
- Average (`AVG`) population of US cities in the table

In [4]:
%%sql

-- Example:
--  * Largest population in the table.
SELECT MAX(population) AS Largest_Population
FROM cities;

KeyError: 'DEFAULT'

In [None]:
   
%%sql

-- Example:
--  * Total population from all cities in the table.
SELECT SUM(population) AS Total_Population
FROM cities;
        


In [3]:
%%sql
SELECT AVG(population) AS Average_US_City_Population
FROM cities
where country_code = 'USA';

KeyError: 'DEFAULT'

## Now You Try ##
Write queries to:
- Find the minimum population in the database
- Find the total population of cities in Australia


In [None]:

%%sql
-- Min population in the cities table



In [None]:
%%sql
-- Total population of cities in Australia


## GROUP BY ##
The `GROUP BY` statement is  with aggregate functions to group the result-set by one or more columns. Instead of performing a calculation on all the rows, you can perform it on groups of rows that have the same value in one or more columns.

The order of SQL statements is important. The `GROUP BY` statement must come after any `WHERE` statements, but before an `ORDER BY` statement.

## HAVING ##
When you use the `GROUP BY` statement, you can use the `HAVING` statement to filter the groups based on specified conditions.
- `WHERE` filters the rows before the calculation is applied (only counting the relevant rows)
- `HAVING` filters the groups after the calculation is applied (like a filter on the results)

**The queries in the code below:**
- Count the number of cities in each country with more than 1 million people, ordered by the number of cities. Includes only the countries with more than 5 big cities

*Note: These queries uses a column alias to make the output more readable. The `AS` keyword is used to create an alias.*

In [None]:
%%sql
-- How many cities in each country with population > 1 million

SELECT country, count(*) AS big_cities
FROM cities
WHERE population > 5000000
GROUP BY country
HAVING big_cities >= 10
ORDER BY big_cities DESC;


# TODO: Still need some examples here