<a href="https://colab.research.google.com/github/arnav-is-op/google-collab/blob/main/date_and_time_formating.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
import sys
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

# If running in Google Colab, install PostgreSQL and restore the database
if 'google.colab' in sys.modules:
    # Update package installer
    !sudo apt-get update -qq > /dev/null 2>&1

    # Install PostgreSQL
    !sudo apt-get install postgresql -qq > /dev/null 2>&1

    # Start PostgreSQL service (suppress output)
    !sudo service postgresql start > /dev/null 2>&1

    # Set password for the 'postgres' user to avoid authentication errors (suppress output)
    !sudo -u postgres psql -c "ALTER USER postgres WITH PASSWORD 'password';" > /dev/null 2>&1

    # Create the 'colab_db' database (suppress output)
    !sudo -u postgres psql -c "CREATE DATABASE contoso_100k;" > /dev/null 2>&1

    # Download the PostgreSQL .sql dump
    !wget -q -O contoso_100k.sql https://github.com/lukebarousse/Int_SQL_Data_Analytics_Course/releases/download/v.0.0.0/contoso_100k.sql

    # Restore the dump file into the PostgreSQL database (suppress output)
    !sudo -u postgres psql contoso_100k < contoso_100k.sql > /dev/null 2>&1

    # Shift libraries from ipython-sql to jupysql
    !pip uninstall -y ipython-sql > /dev/null 2>&1
    !pip install jupysql > /dev/null 2>&1

# Load the sql extension for SQL magic
%load_ext sql

# Connect to the PostgreSQL database
%sql postgresql://postgres:password@localhost:5432/contoso_100k

# Enable automatic conversion of SQL results to pandas DataFrames
%config SqlMagic.autopandas = True

# Enable named parameters for SQL magic
%config SqlMagic.named_parameters = "enabled"

# Display pandas number to two decimal places
pd.options.display.float_format = '{:.2f}'.format



---



# **DATE & TIME--Date & Time Formatting**

**a)DATE_TRUNC()--Revenue & Customers by Month**

In [14]:
%%sql

SELECT
orderdate,
-- DATE_TRUNC('month', orderdate) extracts the first day of the month from the orderdate.
-- ::DATE casts the result to a DATE type.
-- AS order_month renames this new column to 'order_month'.
DATE_TRUNC('month', orderdate):: DATE AS order_month
FROM
sales
ORDER BY RANDOM()
LIMIT 10

Unnamed: 0,orderdate,order_month
0,2015-01-22,2015-01-01
1,2018-05-02,2018-05-01
2,2024-02-10,2024-02-01
3,2022-02-24,2022-02-01
4,2018-12-29,2018-12-01
5,2017-03-09,2017-03-01
6,2018-12-18,2018-12-01
7,2022-07-09,2022-07-01
8,2022-03-12,2022-03-01
9,2022-05-28,2022-05-01


q) give me order_month, net revenue each month and total no of unique customers

In [16]:
%%sql
SELECT
DATE_TRUNC('month', orderdate):: DATE AS order_month,
SUM(quantity*netprice*exchangerate) AS net_revenue,
COUNT(DISTINCT customerkey) AS unique_customers,
FROM
sales
GROUP BY
order_month

Unnamed: 0,order_month,net_revenue,unique_customers
0,2015-01-01,384092.66,200
1,2015-02-01,706374.12,291
2,2015-03-01,332961.59,139
3,2015-04-01,160767.00,78
4,2015-05-01,548632.63,236
...,...,...,...
107,2023-12-01,2928550.93,1484
108,2024-01-01,2677498.55,1340
109,2024-02-01,3542322.55,1718
110,2024-03-01,1692854.89,877




---



**b)TO CHAR-- Revenue & Customers by Month**

In [19]:
%%sql

SELECT
orderdate,
TO_CHAR(orderdate,'YYYY-MM')
-- TO_CHAR(orderdate, 'YYYY')
-- TO_CHAR(orderdate, 'YYYY-MM') AS order_year_month -- Example of using TO_CHAR to format a date The TO_CHAR() function in SQL is used to convert
--various data types (like dates, numbers, intervals) into a formatted text string. It's very powerful for customizing how you display dates or numbers.
FROM
sales
ORDER BY RANDOM()
LIMIT 10

Unnamed: 0,orderdate,to_char
0,2015-01-29,2015-01
1,2024-02-12,2024-02
2,2023-08-19,2023-08
3,2024-01-27,2024-01
4,2022-08-11,2022-08
5,2019-07-03,2019-07
6,2022-12-28,2022-12
7,2018-02-13,2018-02
8,2020-09-09,2020-09
9,2023-11-12,2023-11




---



so optimised version of previous code.. AS BY DOING IT BY MONTH WE ARE REMOVING A LOT OF NOISE AND VISUALIZATION IS MORE EASY NOW

In [23]:
%%sql
SELECT
TO_CHAR(orderdate, 'YYYY-MM') AS order_month,
SUM(quantity*netprice*exchangerate) AS net_revenue,
COUNT(DISTINCT customerkey) AS unique_customers
FROM
sales
GROUP BY
order_month

Unnamed: 0,order_month,net_revenue,unique_customers
0,2015-01,384092.66,200
1,2015-02,706374.12,291
2,2015-03,332961.59,139
3,2015-04,160767.00,78
4,2015-05,548632.63,236
...,...,...,...
107,2023-12,2928550.93,1484
108,2024-01,2677498.55,1340
109,2024-02,3542322.55,1718
110,2024-03,1692854.89,877




---



# **DATE & TIME--Date & Time Filtering**

**a) DATE_PART() & EXTRACT()--Category Net Revenue Per Year**

DATE_PART

· DATE_PART() extracts specific components (e.g., year, month, day) from a date or timestamp.
· Syntax:

DATE_PART('unit', source)

ex:-

hour

minute

second

day

year

month

but it has decimals

In [24]:
%%sql

SELECT
orderdate,
DATE_PART('year', orderdate) AS order_year,
DATE_PART('month', orderdate) AS order_month,
DATE_PART('day', orderdate) AS order_day
FROM
sales
ORDER BY RANDOM( )
LIMIT 10

Unnamed: 0,orderdate,order_year,order_month,order_day
0,2023-07-22,2023.0,7.0,22.0
1,2017-08-14,2017.0,8.0,14.0
2,2023-08-12,2023.0,8.0,12.0
3,2018-06-05,2018.0,6.0,5.0
4,2023-12-27,2023.0,12.0,27.0
5,2023-07-17,2023.0,7.0,17.0
6,2015-11-21,2015.0,11.0,21.0
7,2022-04-18,2022.0,4.0,18.0
8,2015-02-06,2015.0,2.0,6.0
9,2023-06-08,2023.0,6.0,8.0




---



**EXTRACT**

· EXTRACT() is a more verbose way to extract specific components from a date or timestamp.

· Syntax:

EXTRACT(unit FROM source)

very similar to date part but most used and better

In [27]:
%%sql

SELECT
orderdate,
EXTRACT(DAY FROM orderdate) AS order_day,
EXTRACT(MONTH FROM orderdate) AS order_month,
EXTRACT(YEAR FROM orderdate) AS order_year
FROM
sales
ORDER BY RANDOM( )
LIMIT 10

Unnamed: 0,orderdate,order_day,order_month,order_year
0,2022-02-20,20,2,2022
1,2024-02-23,23,2,2024
2,2022-07-12,12,7,2022
3,2024-03-15,15,3,2024
4,2024-02-01,1,2,2024
5,2015-12-18,18,12,2015
6,2015-01-12,12,1,2015
7,2020-05-20,20,5,2020
8,2017-07-24,24,7,2017
9,2020-09-08,8,9,2020




---



q) NOW LETS use this to find net revenume per month and years

In [29]:
%%sql
SELECT
EXTRACT(YEAR FROM orderdate) AS order_year,
EXTRACT(MONTH FROM orderdate) AS order_month,
SUM(quantity*netprice*exchangerate) AS net_revenue,
COUNT(DISTINCT customerkey) AS unique_customers
FROM
sales
GROUP BY
order_year,order_month

Unnamed: 0,order_year,order_month,net_revenue,unique_customers
0,2015,1,384092.66,200
1,2015,2,706374.12,291
2,2015,3,332961.59,139
3,2015,4,160767.00,78
4,2015,5,548632.63,236
...,...,...,...,...
107,2023,12,2928550.93,1484
108,2024,1,2677498.55,1340
109,2024,2,3542322.55,1718
110,2024,3,1692854.89,877




---



**b)CURRENT_DATE & NOW()-- Net Revenue Last 5 Years**

In [30]:
%%sql
SELECT
CURRENT_DATE -- This function returns the current date without the time component.

Unnamed: 0,current_date
0,2026-02-22


In [31]:
%%sql
SELECT
NOW() -- This function returns the current date and time with time zone information.

/*CURRENT_DATE: You would use this to find all sales that happened today, or within a certain number of days from today.
For example, to find orders placed in the last week, you might compare orderdate with CURRENT_DATE - INTERVAL '7 days'.
NOW(): This is useful when you need more precise, timestamp-based filtering. For example, if you want to find all system log entries that occurred in the last hour,
you would use log_timestamp with NOW() - INTERVAL '1 hour'.*/

Unnamed: 0,now
0,2026-02-22 13:46:50.553594+00:00




---



q) find net revenue per category 5 yrs ago from today

In [56]:
%%sql
SELECT
CURRENT_DATE,
orderdate,
p.categoryname,
SUM(s.quantity * s.netprice * s.exchangerate) AS net_revenue
FROM sales s
LEFT JOIN product p ON s.productkey = p.productkey
WHERE
EXTRACT(YEAR FROM s.orderdate) = EXTRACT(YEAR FROM CURRENT_DATE) - 5
GROUP BY
p.categoryname,
orderdate
ORDER BY
p.categoryname,
orderdate

Unnamed: 0,current_date,orderdate,categoryname,net_revenue
0,2026-02-22,2021-01-01,Audio,1206.67
1,2026-02-22,2021-01-02,Audio,1262.09
2,2026-02-22,2021-01-04,Audio,241.58
3,2026-02-22,2021-01-05,Audio,719.64
4,2026-02-22,2021-01-06,Audio,323.17
...,...,...,...,...
2529,2026-02-22,2021-12-27,TV and Video,16013.10
2530,2026-02-22,2021-12-28,TV and Video,15376.15
2531,2026-02-22,2021-12-29,TV and Video,8155.00
2532,2026-02-22,2021-12-30,TV and Video,47705.52




---



# **DATE & TIME--Date & Time Differences**

**INTERVAL--Net Revenue Last 5 Years (contd.)**

since in the last problem we didnt get optimal answers as when we did 5 yrs ago.. we are getting orderdate values from jan also but its feb now ie it went more back.. so we use invertal to fix it

INTERVAL

· INTERVAL represents a span of time, such as days, months, hours, or seconds.

. Commonly used for date arithmetic (e.g., CURRENT_DATE + INTERVAL '1 month' adds one month to the current date).

· Syntax:

SELECT INTERVAL 'value unit'

units :- years
months,
days,
hours,
minutes,
seconds,
microseconds,
millenniums,
centuries,
decades,
weeks



---



In [52]:
%%sql
SELECT INTERVAL '5 months' -- represnts the output in days

Unnamed: 0,interval
0,150 days


In [53]:
%%sql
SELECT
CURRENT_DATE,
orderdate
FROM sales
WHERE
--EXTRACT(YEAR FROM s.orderdate) = EXTRACT(YEAR FROM CURRENT_DATE) - 5 this gives jan values also but i am in feb now so not effective
orderdate >= CURRENT_DATE - INTERVAL '5 years' -- this is op

Unnamed: 0,current_date,orderdate
0,2026-02-22,2021-02-22
1,2026-02-22,2021-02-22
2,2026-02-22,2021-02-22
3,2026-02-22,2021-02-22
4,2026-02-22,2021-02-22
...,...,...
111903,2026-02-22,2024-04-20
111904,2026-02-22,2024-04-20
111905,2026-02-22,2024-04-20
111906,2026-02-22,2024-04-20


so now previous question


In [55]:
%%sql
SELECT
CURRENT_DATE,
orderdate,
p.categoryname,
SUM(s.quantity * s.netprice * s.exchangerate) AS net_revenue
FROM sales s
LEFT JOIN product p ON s.productkey = p.productkey
WHERE
orderdate>=CURRENT_DATE - INTERVAL '5 years'
GROUP BY
p.categoryname,
orderdate
ORDER BY
p.categoryname,
orderdate

Unnamed: 0,current_date,orderdate,categoryname,net_revenue
0,2026-02-22,2021-02-22,Audio,1284.44
1,2026-02-22,2021-02-23,Audio,2678.84
2,2026-02-22,2021-02-24,Audio,47.95
3,2026-02-22,2021-02-25,Audio,3103.17
4,2026-02-22,2021-02-26,Audio,884.34
...,...,...,...,...
8635,2026-02-22,2024-04-13,TV and Video,9583.14
8636,2026-02-22,2024-04-17,TV and Video,1880.06
8637,2026-02-22,2024-04-18,TV and Video,1229.48
8638,2026-02-22,2024-04-19,TV and Video,2756.54




---



**AGE()& EXTRACT()--Average Processing Time**

In [65]:
%%sql

-- AGE(timestamp_end, timestamp_start) calculates the difference between two dates/timestamps. ex AGE('2024-01-14', '2024-01-08')
SELECT EXTRACT(DAYS FROM AGE('2024-01-14', '2024-01-08'))- 5

Unnamed: 0,?column?
0,1


q) calculate net revenue and avg procressing or waiting time for an order to be delivered in last 5 yrs

In [78]:
%%sql df_sql_result <<
SELECT
DATE_PART('Year', orderdate) AS order_year,
ROUND(AVG(EXTRACT(DAYS FROM AGE(deliverydate,orderdate))),2) AS avg_procressing_time,
CAST(SUM(quantity*netprice*exchangerate)AS INTEGER) AS net_revenue
FROM
sales
WHERE
orderdate >= CURRENT_DATE - INTERVAL '5 years'
GROUP BY
order_year
ORDER BY
order_year

[date and time formating google collab notes](https://colab.research.google.com/drive/1nvcyM43PtFue9DNhdIW6D5rZucECKNiq?authuser=0#scrollTo=9JzEVDlCu7Jx)