# SELECT basics
world

In [1]:
import pandas as pd

In [2]:
pd.read_clipboard()

Unnamed: 0,name,continent,area,population,gdp
0,Afghanistan,Asia,652230.0,25500100.0,20343000000.0
1,Albania,Europe,28748.0,2831741.0,12960000000.0
2,Algeria,Africa,2381741.0,37100000.0,188681000000.0
3,Andorra,Europe,468.0,78115.0,3712000000.0
4,Angola,Africa,1246700.0,20609294.0,100990000000.0
5,....,,,,


**1. The example uses a WHERE clause to show the population of 'France'. Note that strings (pieces of text that are data) should be in 'single quotes';**

**Modify it to show the population of Germany**

`SELECT population FROM world
  WHERE name = 'France'`

> `SELECT population FROM world
  WHERE name = 'Germany'`

**2. Checking a list: The word IN allows us to check if an item is in a list. The example shows the name and population for the countries 'Brazil', 'Russia', 'India' and 'China'.**

**Show the name and the population for 'Sweden', 'Norway' and 'Denmark'.**

`SELECT name, population FROM world
  WHERE name IN ('Brazil', 'Russia', 'India', 'China');`

> `SELECT name, population FROM world
  WHERE name IN ('Sweden', 'Norway', 'Denmark');`

**3. Which countries are not too small and not too big? BETWEEN allows range checking (range specified is inclusive of boundary values). The example below shows countries with an area of 250,000-300,000 sq. km.**

**Modify it to show the country and the area for countries with an area between 200,000 and 250,000.**

`SELECT name, area FROM world
  WHERE area BETWEEN 250000 AND 300000`

> `SELECT name, area FROM world
  WHERE area BETWEEN 200000 AND 250000`

# SELECT Quiz


In [3]:
pd.read_clipboard()

Unnamed: 0,name,region,area,population,gdp
0,Afghanistan,South Asia,652225.0,26000000.0,
1,Albania,Europe,28728.0,3200000.0,6656000000.0
2,Algeria,Middle East,2400000.0,32900000.0,75012000000.0
3,Andorra,Europe,468.0,64000.0,
4,...,,,,


**1. Select the code which produces this table**

In [4]:
pd.read_clipboard()

Unnamed: 0,name,population
0,Bahrain,1234571
1,Swaziland,1220000
2,Timor-Leste,1066409


> `SELECT name, population
  FROM world
 WHERE population BETWEEN 1000000 AND 1250000`
 
**2. Pick the result you would obtain from this code:**

`SELECT name, population
FROM world
WHERE name LIKE 'Al%'`

> Table E

| name | population |
| ---- | ---------- |
| Albania | 3200000 |
| Algeria | 32900000 |

**3. Select the code which shows the countries that end in A or L**

> ``SELECT name FROM world
 WHERE name LIKE '%a' OR name LIKE '%l'
 
**4. Pick the result from the query**

`SELECT name,length(name)
FROM world
WHERE length(name)=5 and continent='Europe'`



In [2]:
pd.read_clipboard()

Unnamed: 0,name,length(name)
0,Italy,5
1,Malta,5
2,Spain,5


**5. Here are the first few rows of the world table:**

In [4]:
pd.read_clipboard()

Unnamed: 0,name,region,area,population,gdp
0,Afghanistan,South Asia,652225.0,26000000.0,
1,Albania,Europe,28728.0,3200000.0,6656000000.0
2,Algeria,Middle East,2400000.0,32900000.0,75012000000.0
3,Andorra,Europe,468.0,64000.0,
4,...,,,,


**Pick the result you would obtain from this code:**

`SELECT name, area*2 FROM world WHERE population = 64000`

> | name | area*2 | 
| ----- | ------ |
| Andorra | 936 |

**6. Select the code that would show the countries with an area larger than 50000 and a population smaller than 10000000**

> `SELECT name, area, population
  FROM world
 WHERE area > 50000 AND population < 10000000`

**7. Select the code that shows the population density of China, Australia, Nigeria and France**

> `SELECT name, population/area
  FROM world
 WHERE name IN ('China', 'Nigeria', 'France', 'Australia')`
 
 

***
# Additional Notes
### Where filters
`SELECT attribute-list
   FROM table-name
   WHERE condition`
   
- SELECT attribute-list
    - This is usually a comma separated list of attributes (field names)
    - Expressions involving these attributes may be used. The normal mathematical operators `+, -, *, /` may be used on numeric values. String values may be concatenated using `||`
    - To select all attributes use `*`
    - The attributes in this case are: `name`, `region`, `area`, `population` and `gdp`

- FROM table-name
    - In these examples the table is always world.

- WHERE condition
    - This is a boolean expression which each row must satisfy.
    - Operators which may be used include `AND`, `OR`, `NOT`, `>`, `>=`, `=`, `<`, `<=`
    - The `LIKE` operator permits strings to be compared using 'wild cards'. The symbols _ and % are used to represent a single character or a sequence of characters. (Note that MS Access SQL uses ? and * instead of _ and % .)
    - The `IN` operator allows an item to be tested against a list of values.
    - There is a `BETWEEN` operator for checking ranges.

### WHERE clause examples

**1. The population of 'France'. Strings should be in 'single quotes'**

> `SELECT population FROM world
  WHERE name = 'France'`

**2. The names and population densities for the very large countries. We can use mathematical and string expressions as well as field names and constants.**

> `SELECT name, population/area 
FROM world
  WHERE area > 5000000`

**3. Where to find some very small, very rich countries. We use AND to ensure that two or more conditions hold true.

> `SELECT name
  FROM world
  WHERE area < 2000
    AND gdp > 5000000000`
    
**4. Which of Ceylon, Iran, Persia and Sri Lanka is the name of a country? The word IN allows us to check if an item is in a list.**

> `SELECT name FROM world
  WHERE name IN ('Sri Lanka', 'Ceylon',
                 'Persia',    'Iran')`

**5. What are the countries beginning with D? The word LIKE permits pattern matching - % is the wildcard.**

> `SELECT name FROM world
  WHERE name LIKE 'D%'`

**6. Which countries are not too small and not too big? BETWEEN allows range checking - note that it is inclusive.**

> `SELECT name, area FROM world
  WHERE area BETWEEN 207600 AND 244820`

### ROUND
ROUND(attribute, number) returns attribute rounded to number of decimal places.

The number of decimal places may be negative, this will round to the nearest 10 (when number is -1), or 100 (when number is -2), or 1000 (when number is -3) etc..

### ROUND examples
`ROUND(7253.86, 0)    ->  7254
ROUND(7253.86, 1)    ->  7253.9
ROUND(7253.86,-3)    ->  7000`

**1. In this example we calculate the population in millions to one decimial place.**

> `SELECT name,
       ROUND(population/1000000,1)
  FROM world`

### FLOOR function
FLOOR(f) returns the integer value of f

FLOOR(f) give the integer that is equal to, or just less than f. FLOOR always rounds __*down*__.

### FLOOR examples
`  FLOOR(2.7) ->  2
  FLOOR(-2.7) -> -3`
  
**1. In this example we calculate the population in millions.**

> `SELECT name,
       FLOOR(population/1000000)
  FROM world`

### CEIL function
CEIL(c) is ceiling, it returns the integer that is equal to or just more than c

CEIL(c) give the integer that is equal to, or just higher than c. CEIL always rounds __*up*__.

### CEIL examples
` CEIL(2.7)  ->  3
 CEIL(-2.7) -> -2`
 
**1. In this example we calculate the population in millions.**

> `SELECT population/1000000 AS a,
       CEIL(population/1000000) AS b
  FROM world`
  
### MOD function
MOD(a,b) returns the remainder when a is divied by b

If you use MOD(a, 2) you get 0 for even numbers and 1 for odd numbers.

If you use MOD(a, 10) you get the last digit of the number a.

### MOD examples
` MOD(27,2) ->  1
 MOD(27,10) ->  7`
 
**1. In this example you get the final digit year of the games.**

> `SELECT MOD(yr,10),
       yr, city
  FROM games`
  
### LEN function
LEN(s) returns the number of characters in string s.

### Length examples
`LEN('Hello') -> 5`

`SELECT LEN(name), name
  FROM world`
  
### SUBSTRING function
SUBSTRING allows you to extract part of a string.

### SUBSTRING examples
`SUBSTRING('Hello world', 2, 3) -> 'ell'`

**1. In this example you get the 2nd to 5th character from each country's name.**

> `SELECT name,
       SUBSTRING(name, 2, 5)
  FROM world`
  
### CONCAT function
CONCAT allows you to stick two or more strings together.

This operation is concatenation.

### CONCAT examples
`CONCAT(s1, s2 ...)`


**1. In this example you put the name, a space, and continent together.**

> `SELECT CONCAT(name, ' ', continent)
  FROM world`
  
### TRIM function
TRIM(s) returns the string with leading and trailing spaces removed.

### TRIM examples
`TRIM('Hello world  ') -> 'Hello world'`

**1. This function is particularly useful when working with CHAR fields. Typically a CHAR field is paddded with spaces. In contrast a VARCHAR field does not require padding.**

> `SELECT name,
       TRIM(name)
  FROM world`
  
### LEFT function
LEFT(s, n) allows you to extract n characters from the start of the string s.

### LEFT examples
`LEFT('Hello world', 4) -> 'Hell'`

`SELECT name, LEFT(name, 3) FROM world -> Afghanistan	Afg`

### RIGHT function
RIGHT(s, n) allows you to extract n characters from the end of the string s.

### RIGHT examples
`RIGHT('Hello world', 4) -> 'orld'`

`SELECT name, RIGHT(name, 3) FROM world -> Afghanistan	tan`

### POSITION function
POSITION(s1 IN s2) returns the character position of the substring s1 within the larger string s2. The first character is in position 1. If s1 does not occur in s2 it returns 0.

### POSITION examples
`POSITION('ll' IN 'Hello world') -> 3`

`In this example you return the position of the string 'an' within the name of the country.`

> `SELECT name,
       POSITION('an' IN name)
  FROM bbc`
  
***
# SELECT FROM world Tutorial
In this tutorial you will use the SELECT command on the table world:


In [5]:
pd.read_clipboard()

Unnamed: 0,name,continent,area,population,gdp
0,Afghanistan,Asia,652230.0,25500100.0,20343000000.0
1,Albania,Europe,28748.0,2831741.0,12960000000.0
2,Algeria,Africa,2381741.0,37100000.0,188681000000.0
3,Andorra,Europe,468.0,78115.0,3712000000.0
4,Angola,Africa,1246700.0,20609294.0,100990000000.0
5,...,,,,


**1. Show the name for the countries that have a population of at least 200 million. 200 million is 200000000, there are eight zeros.**
> `SELECT name
  FROM world
 WHERE population >= 200000000`
 
**2. Give the name and the per capita GDP for those countries with a population of at least 200 million.** 

**(per capita GDP is the GDP divided by the population GDP/population)**

> `SELECT name, gdp/population
FROM world
WHERE population >= 200000000`

**3. Show the name and population in millions for the countries of the continent 'South America'. Divide the population by 1000000 to get population in millions.**

> `SELECT name, population/1000000 AS pop_in_millions
FROM world
WHERE continent = 'South America'`

**4. Show the name and population for France, Germany, Italy**

> `SELECT name, population
FROM world
WHERE name IN ('France', 'Germany', 'Italy')`

**5. Show the countries which have a name that includes the word 'United'**

> `SELECT name
FROM world
WHERE name LIKE '%United%'`

**6. Two ways to be big: A country is big if it has an area of more than 3 million sq km or it has a population of more than 250 million.**

**Show the countries that are big by area or big by population. Show name, population and area.**

> `SELECT name, population, area
FROM world
WHERE area > 3000000 OR population > 250000000`

**7. Exclusive OR (XOR). Show the countries that are big by area (more than 3 million) or big by population (more than 250 million) but not both. Show name, population and area.**

> `SELECT name, population, area
FROM world
WHERE (area > 3000000 OR population > 250000000) AND NOT (area > 3000000 AND population > 250000000)`

**8. Show the name and population in millions and the GDP in billions for the countries of the continent 'South America'. Use the ROUND function to show the values to two decimal places.**

**For South America show population in millions and GDP in billions both to 2 decimal places.**

**Divide by 1000000 (6 zeros) for millions. Divide by 1000000000 (9 zeros) for billions.**

> `SELECT name, ROUND(population/1000000, 2) AS pop_in_millions, ROUND(gdp/1000000000, 2) AS gdp_in_billions
FROM world
WHERE continent = 'South America'`

**9. Show the name and per-capita GDP for those countries with a GDP of at least one trillion (1000000000000; that is 12 zeros). Round this value to the nearest 1000.** 

**Show per-capita GDP for the trillion dollar countries to the nearest $1000.**

> `SELECT name, ROUND(gdp/population, -3)
FROM world
WHERE gdp >= 1000000000000`

**10. Greece has capital Athens. Each of the strings 'Greece', and 'Athens' has 6 characters.** 

**Show the name and capital where the name and the capital have the same number of characters. You can use the LENGTH function to find the number of characters in a string**

> `SELECT name, capital
FROM world
WHERE LEN(name) = LEN(capital)`

**11. The capital of Sweden is Stockholm. Both words start with the letter 'S'.** 

**Show the name and the capital where the first letters of each match. Don't include countries where the name and the capital are the same word.** 

**You can use the function LEFT to isolate the first character. You can use <> as the NOT EQUALS operator.**

> `SELECT name, capital
FROM world
WHERE LEFT(name, 1) = LEFT(capital, 1) AND NOT name = capital`

**12. Equatorial Guinea and Dominican Republic have all of the vowels (a e i o u) in the name. They don't count because they have more than one word in the name.** 

**Find the country that has all the vowels and no spaces in its name.** 

**You can use the phrase name NOT LIKE '%a%' to exclude characters from your results. The query shown misses countries like Bahamas and Belarus because they contain at least one 'a'**

> `SELECT name FROM world
WHERE name LIKE '%a%' 
  AND name LIKE '%e%' 
  AND name LIKE '%i%' 
  AND name LIKE '%o%' 
  AND name LIKE '%u%' 
  AND name NOT LIKE '% %';`

## Examples
### Using SUM, Count, MAX, DISTINCT and ORDER BY.

#### BBC Country Profile

##### Aggregates
The functions `SUM`, `COUNT`, `MAX` and `AVG` are "aggregates", each may be applied to a numeric attribute resulting in a single row being returned by the query. (These functions are even more useful when used with the `GROUP BY` clause.)

##### Distinct
By default the result of a `SELECT` may contain duplicate rows. We can remove these duplicates using the `DISTINCT` key word.

##### Order by
`ORDER BY` permits us to see the result of a `SELECT` in any particular order. We may indicate `ASC` or `DESC` for ascending (smallest first, largest last) or descending order.

1. The total population and GDP of Europe.

> `SELECT SUM(population), SUM(gdp)
  FROM bbc
  WHERE region = 'Europe'`

2. What are the regions?

> `SELECT DISTINCT region FROM bbc`

3. Show the name and population for each country with a population of more than 100000000. Show countries in descending order of population.

> `SELECT name, population
  FROM bbc
  WHERE population > 100000000
  ORDER BY population DESC`
  
### Using GROUP BY and HAVING
#### World Country Profile

##### GROUP BY
By including a `GROUP BY` clause, functions such as `SUM` and `COUNT` are applied to groups of items sharing values. When you specify `GROUP BY` `continent` the result is that you get only one row for each different value of `continent`. All the other columns must be "aggregated" by one of `SUM`, `COUNT` ...

##### HAVING
The `HAVING` clause allows use to filter the groups which are displayed. The `WHERE` clause filters rows before the aggregation, the `HAVING` clause filters after the aggregation.

If an `ORDER BY` clause is included we can refer to columns by their position.

1. For each continent show the number of countries:

> `SELECT continent, COUNT(name)
  FROM world
 GROUP BY continent`

2. For each continent show the total population:

> `SELECT continent, SUM(population)
  FROM world
 GROUP BY continent`

3. `WHERE` and `GROUP BY`. The `WHERE` filter takes place before the aggregating function. For each relevant continent show the number of countries that has a population of at least 200000000.

> `SELECT continent, COUNT(name)
  FROM world
 WHERE population>200000000
 GROUP BY continent`

4. `GROUP BY` and `HAVING`. The `HAVING` clause is tested after the `GROUP BY`. You can test the aggregated values with a `HAVING` clause. Show the total population of those continents with a total population of at least half a billion.

> `SELECT continent, SUM(population)
  FROM world
 GROUP BY continent
HAVING SUM(population)>500000000`

***


# SUM and COUNT
### World Country Profile: Aggregate functions
This tutorial is about aggregate functions such as COUNT, SUM and AVG. An aggregate function takes many values and delivers just one value. For example the function SUM would aggregate the values 2, 4 and 5 to deliver the single value 11.

`world(name, continent, area, population, gdp)
`


In [2]:
pd.read_clipboard()

Unnamed: 0,name,continent,area,population,gdp
0,Afghanistan,Asia,652230.0,25500100.0,20343000000.0
1,Albania,Europe,28748.0,2831741.0,12960000000.0
2,Algeria,Africa,2381741.0,37100000.0,188681000000.0
3,Andorra,Europe,468.0,78115.0,3712000000.0
4,Angola,Africa,1246700.0,20609294.0,100990000000.0
5,...,,,,


1. Show the total `population` of the world.

> `SELECT SUM(population)
FROM world`

2. List all the continents - just once each.
> `SELECT DISTINCT continent
FROM world;`

3. Give the total GDP of Africa
> `SELECT SUM(gdp)
FROM world
WHERE continent = 'Africa'
GROUP BY continent;`

4. How many countries have an `area` of at least 1000000?
>`SELECT COUNT(name)
FROM world
WHERE area >= 1000000;`

5. What is the total `population` of ('Estonia', 'Latvia', 'Lithuania')?
> `SELECT SUM(population)
FROM world
WHERE name IN ('Estonia', 'Latvia', 'Lithuania');`

6. For each `continent` show the `continent` and number of countries.
> `SELECT continent, COUNT(name)
FROM world
GROUP BY continent;`

7. For each `continent` show the `continent` and number of countries with populations of at least 10 million.
> `SELECT continent, COUNT(name)
FROM world
WHERE population >= 10000000
GROUP BY continent;`

8. List the continents that `have` a total population of at least 100 million.
> `SELECT continent
FROM world
GROUP BY continent
HAVING SUM(population) >= 100000000;`

# SUM and COUNT Quiz
bbc

In [3]:
pd.read_clipboard()

Unnamed: 0,name,region,area,population,gdp
0,Afghanistan,South Asia,652225.0,26000000.0,
1,Albania,Europe,28728.0,3200000.0,6656000000.0
2,Algeria,Middle East,2400000.0,32900000.0,75012000000.0
3,Andorra,Europe,468.0,64000.0,
4,...,,,,


1. Select the statement that shows the sum of population of all countries in 'Europe'

> ` SELECT SUM(population) FROM bbc WHERE region = 'Europe'`

2. Select the statement that shows the number of countries with population smaller than 150000
> ` SELECT COUNT(name) FROM bbc WHERE population < 150000`

3. Select the list of core SQL aggregate functions
> `AVG(), COUNT(), MAX(), MIN(), SUM()`

4. Select the result that would be obtained from the following code:
` SELECT region, SUM(area)
   FROM bbc 
  WHERE SUM(area) > 15000000 
  GROUP BY region`
  
> No result due to invalid use of the WHERE function
(need to use HAVING function instead)

5. Select the statement that shows the average population of 'Poland', 'Germany' and 'Denmark'
> ` SELECT AVG(population) FROM bbc WHERE name IN ('Poland', 'Germany', 'Denmark')`

6. Select the statement that shows the medium population density of each region
> ` SELECT region, SUM(population)/SUM(area) AS density FROM bbc GROUP BY region`

7. Select the statement that shows the name and population density of the country with the largest population
> ` SELECT name, population/area AS density FROM bbc WHERE population = (SELECT MAX(population) FROM bbc)`

8. Pick the result that would be obtained from the following code:
` SELECT region, SUM(area) 
   FROM bbc 
  GROUP BY region 
  HAVING SUM(area)<= 20000000`
  
> | region | SUM(area) |
| --------| ---------- |
| Americas | 732240 |
| Middle East | 13403102 |
| South America | 17740392 |
| South Asia | 9437710 |