# SQL: Query data based on criteria _ Deal with null values _ Create a table from others
---- 
## OUTLINE:
1. SELECT statement:
- Query all the data from the table
- Query specified columns
- Show the number of rows within the table: COUNT()
- Show unique values: DISTINCT 
- Show the number of unique values: COUNT(DISTINCT )
- NESTED STATEMENTS
2. WHERE statement:
- pass the condition into the query
- Logical operators
- Deal with missing values: null or empty strings?
3. Wildcards and LIKE operator:
- Query rows associated with values that are partly remembered
4. Create a table from other tables: CREATE TABLE tab_name AS

WELL-NOTED:
- An `empty string` is `NOT a null` in SQL.
=> Before processing the data, ask yourself how are the empty cells represented, with no values or with empty strings.

In [1]:
%load_ext sql
#the magic command to load the SQL extension

%sql sqlite:///testdatabase.db
#connect and create a db on your computer if it does not exist

#### a.Query the all data from the table:
```sql
SELECT *
FROM table
```
- The symbol * represents all contents within the table.

In [2]:
%%sql
SELECT *
FROM population

 * sqlite:///testdatabase.db
Done.


YEAR,ETHNIC_GROUP,GENDER,AGE,VALUE
2022,All ethnicity,All genders,All ages,3553749.0
2022,All ethnicity,All genders,0 - 4 Years,169201.0
2022,All ethnicity,All genders,5 - 9 Years,182516.0
2022,All ethnicity,All genders,10 - 14 Years,179901.0
2022,All ethnicity,All genders,15 - 19 Years,184960.0
2022,All ethnicity,All genders,20 - 24 Years,215664.0
2022,All ethnicity,All genders,25 - 29 Years,242043.0
2022,All ethnicity,All genders,30 - 34 Years,254000.0
2022,All ethnicity,All genders,35 - 39 Years,222105.0
2022,All ethnicity,All genders,40 - 44 Years,226300.0


### b. Query specified columns from the table:
```sql
SELECT col1, col2,...
FROM table
```

In [3]:
%%sql
SELECT YEAR, ETHNIC_GROUP 
FROM population

 * sqlite:///testdatabase.db
Done.


YEAR,ETHNIC_GROUP
2022,All ethnicity
2022,All ethnicity
2022,All ethnicity
2022,All ethnicity
2022,All ethnicity
2022,All ethnicity
2022,All ethnicity
2022,All ethnicity
2022,All ethnicity
2022,All ethnicity


### c. Show the number of rows within the table:
```sql
SELECT COUNT(*)
FROM table
```

In [4]:
%%sql
SELECT COUNT(*) AS number_of_rows
FROM population

 * sqlite:///testdatabase.db
Done.


number_of_rows
13125


### d. SQL alias:

- number_of_rows is a SQL alias
> **SQL alias**: 
- A SQL alias is a temporary name assigned to a specific set of columns or a table within a SQL statement. 
- A SQL alias makes it easier for users to work with tables and columns within a complex SQL statement.

### e. Show all the unique values of a column or unique tuples of a table:
```sql
SELECT DISTINCT col1, col2,...
FROM table
```

In [5]:
%%sql
SELECT DISTINCT ETHNIC_GROUP
FROM population

 * sqlite:///testdatabase.db
Done.


ETHNIC_GROUP
All ethnicity
Malays
Chinese
Indians
Other Ethnic Groups


### f. Show the total number of unique values within a table :
```sql
SELECT COUNT(DISTINCT COL)
FROM table
```

In [6]:
%%sql
SELECT COUNT(DISTINCT ETHNIC_GROUP) AS numb_of_groups
FROM population

 * sqlite:///testdatabase.db
Done.


numb_of_groups
5


In [7]:
%%sql
SELECT COUNT(DISTINCT YEAR, ETHNIC_GROUP, GENDER, AGE, VALUE)
FROM population;

 * sqlite:///testdatabase.db
(sqlite3.OperationalError) wrong number of arguments to function COUNT()
[SQL: SELECT COUNT(DISTINCT YEAR, ETHNIC_GROUP, GENDER, AGE, VALUE)
FROM population;]
(Background on this error at: https://sqlalche.me/e/14/e3q8)


> An error arises since it is impossible for us to use COUNT() function with DISTINCT keywords and multiple rows. To calculate the number of unique tuples within the SQL table, we must put a SQL NESTED STATEMENT to use.

### g. NESTED STATEMENTS:
- A SQL NESTED STATEMENT, or so-called a SUBQUERY, is a sub statment embedded within another SQL statement and is surrounded by a pair of parentheses. 
- A nested statement is used to break down a complicated process into smaller parts. For instance:

In [8]:
%%sql
SELECT COUNT(*) AS num_of_unique
FROM (
    SELECT DISTINCT YEAR, ETHNIC_GROUP, GENDER, AGE, VALUE
    FROM population
)

 * sqlite:///testdatabase.db
Done.


num_of_unique
13125


> Since the number of unique rows equals the total number of rows within the table. Hence, there are no duplicated rows within the table.

### h. Pass the condition into the query:
- using the WHERE clause

### i. LOGICAL OPERATORS in SQL:
    - AND: all of the specified criteria are satisfied
    - OR: at least one of the specified criteria is satisfied
    - NOT: the opposite of the criterion

### j. Deal with missing values: null or empty strings?
- Show the rows that contain null values

In [9]:
%%sql
SELECT *
FROM population
WHERE YEAR IS NULL 
OR ETHNIC_GROUP IS NULL
OR GENDER IS NULL
OR AGE IS NULL
OR VALUE IS NULL


 * sqlite:///testdatabase.db
Done.


YEAR,ETHNIC_GROUP,GENDER,AGE,VALUE


> No empty row pops up even though there are blank cells within the table (we can tell that by looking at the CSV file). 
- Since within the process of loading the table into the database, all empty cells were filled with empty strings.
--> Find rows including empty strings

In [10]:
%%sql
SELECT *
FROM population
WHERE YEAR =''
OR ETHNIC_GROUP =''
OR GENDER  =''
OR AGE =''
OR VALUE =''


 * sqlite:///testdatabase.db
Done.


YEAR,ETHNIC_GROUP,GENDER,AGE,VALUE
1999,All ethnicity,All genders,85 - 89 Years,
1999,All ethnicity,All genders,90 Years & Over,
1999,All ethnicity,Male,85 - 89 Years,
1999,All ethnicity,Male,90 Years & Over,
1999,All ethnicity,Female,85 - 89 Years,
1999,All ethnicity,Female,90 Years & Over,
1999,Malays,All genders,85 - 89 Years,
1999,Malays,All genders,90 Years & Over,
1999,Malays,Male,85 - 89 Years,
1999,Malays,Male,90 Years & Over,


### k. How to query rows with values that you partly remember.
=> Use wildcards and `LIKE` operator
- `WILDCARDS` are characters used to represent other characters:
    - `%`: represents multiple characters
    - `_`: represents a single character

- For instance, you want to extract all data related to the Malaysian group within the table but do not clearly remember the name assigned to this group.

In [11]:
%%sql
SELECT *
FROM population 
WHERE ETHNIC_GROUP LIKE 'M%'

 * sqlite:///testdatabase.db
Done.


YEAR,ETHNIC_GROUP,GENDER,AGE,VALUE
2022,Malays,All genders,All ages,538454.0
2022,Malays,All genders,0 - 4 Years,40132.0
2022,Malays,All genders,5 - 9 Years,34562.0
2022,Malays,All genders,10 - 14 Years,30918.0
2022,Malays,All genders,15 - 19 Years,33922.0
2022,Malays,All genders,20 - 24 Years,41578.0
2022,Malays,All genders,25 - 29 Years,46312.0
2022,Malays,All genders,30 - 34 Years,46975.0
2022,Malays,All genders,35 - 39 Years,38015.0
2022,Malays,All genders,40 - 44 Years,29643.0


- Show the number of Indian females of each age group in 2021:

In [12]:
%%sql
SELECT *
FROM population 
WHERE GENDER = 'Female'
AND ETHNIC_GROUP = 'Indians'
AND YEAR = 2021

 * sqlite:///testdatabase.db
Done.


YEAR,ETHNIC_GROUP,GENDER,AGE,VALUE
2021,Indians,Female,0 - 4 Years,5991
2021,Indians,Female,10 - 14 Years,7422
2021,Indians,Female,15 - 19 Years,8448
2021,Indians,Female,20 - 24 Years,8965
2021,Indians,Female,25 - 29 Years,9434
2021,Indians,Female,30 - 34 Years,8991
2021,Indians,Female,35 - 39 Years,8147
2021,Indians,Female,40 - 44 Years,8201
2021,Indians,Female,45 - 49 Years,8738
2021,Indians,Female,5 - 9 Years,6468


### l. Create a table from other tables:
- For the convenience of our analysis, let's create a table of the queried result.
```sql
CREATE TABLE tab_name AS
SELECT col1, col2, ...
FROM tab
WHERE <condition>
```

In [13]:
#to enhance the smoothflow of the code snippets:
%sql DROP TABLE IF EXISTS Indian_female_2021

 * sqlite:///testdatabase.db
Done.


[]

In [14]:
%%sql
CREATE TABLE Indian_female_2021 AS
SELECT *
FROM population 
WHERE GENDER = 'Female'
AND ETHNIC_GROUP = 'Indians'
AND YEAR = 2021

 * sqlite:///testdatabase.db
Done.


[]

In [15]:
#to enhance the smoothflow of the code snippets:
%sql DROP TABLE IF EXISTS Indian_male_2021

 * sqlite:///testdatabase.db
Done.


[]

In [16]:
%%sql
CREATE TABLE Indian_male_2021 AS
SELECT *
FROM population 
WHERE GENDER = 'Male'
AND ETHNIC_GROUP = 'Indians'
AND YEAR = 2021

 * sqlite:///testdatabase.db
Done.


[]

In [17]:
#to enhance the smoothflow of the code snippets:
%sql DROP TABLE IF EXISTS Indian_all_2021

 * sqlite:///testdatabase.db
Done.


[]

In [18]:
%%sql
CREATE TABLE Indian_all_2021 AS
SELECT *
FROM population 
WHERE GENDER = 'All genders'
AND ETHNIC_GROUP = 'Indians'
AND YEAR = 2021

 * sqlite:///testdatabase.db
Done.


[]