## Filtering Data

* We use `WHERE` clause to filter the data.

* All comparison operators such as `=`, `!=`, `>`, `<`, `<=`, `>=` etc can be used to compare a column or expression or literal with another column or expression or literal.

* We can use operators such as `LIKE` with `%` or `~` with regular expressions for pattern matching.

* Boolean `OR` and `AND` can be performed when we want to apply multiple conditions.
  * Get all orders with order_status equals to COMPLETE or CLOSED. We can also use IN operator.
  
  * Get all orders from month 2014 January with order_status equals to COMPLETE or CLOSED
    
* We can also use `BETWEEN` along with `AND` to compare a column or expression against range of values.

* We need to use `IS NULL` and `IS NOT NULL` to compare against null values.

In [1]:
%load_ext sql

In [2]:
%env DATABASE_URL=postgresql://suryakantkumar:None@localhost:5432/suryakantkumar

env: DATABASE_URL=postgresql://suryakantkumar:None@localhost:5432/suryakantkumar


* Get number of orders which are either Complete or closed

In [3]:
%%sql 

SELECT
    COUNT(order_id)
FROM
    orders
WHERE
    order_status IN ('COMPLETE', 'CLOSED')

1 rows affected.


count
30455


In [4]:
%%sql 

SELECT
    count(1)
FROM
    orders 
WHERE 
    order_status = 'COMPLETE' 
    OR 
    order_status = 'CLOSED'

 * postgresql://suryakantkumar:***@localhost:5432/suryakantkumar
1 rows affected.


count
30455


* Get all the orders placed on '2014-01-01'

In [5]:
%%sql

SELECT 
    * 
FROM 
    orders
WHERE 
    order_date = '2014-01-01 00:00:00'
LIMIT
    5

 * postgresql://suryakantkumar:***@localhost:5432/suryakantkumar
5 rows affected.


order_id,order_date,order_customer_id,order_status
25876,2014-01-01 00:00:00,3414,PENDING_PAYMENT
25877,2014-01-01 00:00:00,5549,PENDING_PAYMENT
25878,2014-01-01 00:00:00,9084,PENDING
25879,2014-01-01 00:00:00,5118,PENDING
25880,2014-01-01 00:00:00,10146,CANCELED


In [6]:
%%sql

SELECT 
    * 
FROM 
    orders
WHERE 
    order_date = '2014-01-01'
LIMIT 
    5

 * postgresql://suryakantkumar:***@localhost:5432/suryakantkumar
5 rows affected.


order_id,order_date,order_customer_id,order_status
25876,2014-01-01 00:00:00,3414,PENDING_PAYMENT
25877,2014-01-01 00:00:00,5549,PENDING_PAYMENT
25878,2014-01-01 00:00:00,9084,PENDING
25879,2014-01-01 00:00:00,5118,PENDING
25880,2014-01-01 00:00:00,10146,CANCELED


* Get all the records placed in January 2014

* This query will not work as LIKE cannot be used to compare against columns with date data type but Characters

In [7]:
%%sql

SELECT 
    * 
FROM 
    orders
WHERE 
    order_date LIKE '2014-01%'
LIMIT 
    5

 * postgresql://suryakantkumar:***@localhost:5432/suryakantkumar
(psycopg2.errors.UndefinedFunction) operator does not exist: timestamp without time zone ~~ unknown
LINE 6:     order_date LIKE '2014-01%'
                       ^
HINT:  No operator matches the given name and argument types. You might need to add explicit type casts.

[SQL: SELECT 
    * 
FROM 
    orders
WHERE 
    order_date LIKE '2014-01%%'
LIMIT 
    5]
(Background on this error at: https://sqlalche.me/e/14/f405)


* Get all the completed or closed records for orders placed in January 2014

In [8]:
%%sql

SELECT
    * 
FROM 
    orders 
WHERE 
    order_status IN ('COMPLETE', 'CLOSED')
    AND 
    to_char(order_date, 'yyyy-MM-dd') LIKE '2014-01%'
LIMIT
    5

 * postgresql://suryakantkumar:***@localhost:5432/suryakantkumar
5 rows affected.


order_id,order_date,order_customer_id,order_status
25882,2014-01-01 00:00:00,4598,COMPLETE
25888,2014-01-01 00:00:00,6735,COMPLETE
25889,2014-01-01 00:00:00,10045,COMPLETE
25891,2014-01-01 00:00:00,3037,CLOSED
25895,2014-01-01 00:00:00,1044,COMPLETE


* Get Count of completed or closed records for orders placed in January 2014

In [9]:
%%sql

SELECT 
    count(1) 
FROM 
    orders 
WHERE 
    order_status IN ('COMPLETE', 'CLOSED')
    AND 
    to_char(order_date, 'yyyy-MM-dd') LIKE '2014-01%'

 * postgresql://suryakantkumar:***@localhost:5432/suryakantkumar
1 rows affected.


count
2544


* Get all the completed or closed records for orders placed in January 2014

In [10]:
%%sql

SELECT 
    * 
FROM 
    orders 
WHERE 
    order_status IN ('COMPLETE', 'CLOSED')
    AND 
    to_char(order_date, 'yyyy-MM') = '2014-01'
LIMIT 
    5

 * postgresql://suryakantkumar:***@localhost:5432/suryakantkumar
5 rows affected.


order_id,order_date,order_customer_id,order_status
25882,2014-01-01 00:00:00,4598,COMPLETE
25888,2014-01-01 00:00:00,6735,COMPLETE
25889,2014-01-01 00:00:00,10045,COMPLETE
25891,2014-01-01 00:00:00,3037,CLOSED
25895,2014-01-01 00:00:00,1044,COMPLETE


In [11]:
%%sql

SELECT 
    count(1) 
FROM 
    orders 
WHERE 
    order_status IN ('COMPLETE', 'CLOSED')
    AND 
    to_char(order_date, 'yyyy-MM') = '2014-01'

 * postgresql://suryakantkumar:***@localhost:5432/suryakantkumar
1 rows affected.


count
2544


In [12]:
%%sql

SELECT 
    count(1) 
FROM 
    orders 
WHERE 
    order_status IN ('COMPLETE', 'CLOSED')
    AND 
    to_char(order_date, 'yyyy-MM-dd') ~ '2014-01'

 * postgresql://suryakantkumar:***@localhost:5432/suryakantkumar
1 rows affected.


count
2544


In [13]:
%%sql

SELECT 
    count(1), 
    min(order_date), 
    max(order_date), 
    count(DISTINCT order_date) 
FROM 
    orders 
WHERE 
    order_status IN ('COMPLETE', 'CLOSED')
    AND 
    order_date BETWEEN '2014-01-01' AND '2014-03-31'

 * postgresql://suryakantkumar:***@localhost:5432/suryakantkumar
1 rows affected.


count,min,max,count_1
7594,2014-01-01 00:00:00,2014-03-31 00:00:00,89
