# **Lecture 10C**
# **Using date, datetime and time**
After we read date, datetime and time columns into DataFrame as datetime64 and timedelta64, we can use them in various kinds of calculations and applications.

In [1]:
# Run the code below to access files in your Google Drive
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [2]:
# We also need Panadas module in this lecture
# Import Pandas module
import pandas as pd

---
**Example 1:** One common operation is to extract the year, month, day, hour, minute and second for a given **date** or **datetime**. It can be done using the function **pd.DatetimeIndex()**.
* **pd.DateteimIndex(*Date*).year** will return the year part of the date.
* **pd.DateteimIndex(*Date*).month** will return the month part of the date.
* **pd.DateteimIndex(*Date*).day** will return the day part of the date.
* suffixes like **.hour**, **.minute**, **.second** and **.microsecond** are also allowed.



In [3]:
# Read date_time_examples.xlsx data file
# In the first worksheet "datetime", the Submission_DT contains datetime values in Excel
data = pd.read_excel("/content/drive/MyDrive/Data/date_examples.xlsx",sheet_name="datetime")
display(data)

# Extract the year and put it in a new column
data["SubmitYear"] = pd.DatetimeIndex(data["SubmissionDT"]).year

# Extract the month and put it in a new column
data["SubmitMonth"] = pd.DatetimeIndex(data["SubmissionDT"]).month

# Extract the day and put it in a new column
data["SubmitDay"] = pd.DatetimeIndex(data["SubmissionDT"]).day

# Extract the hour and put it in a new column
data["SubmitHour"] = pd.DatetimeIndex(data["SubmissionDT"]).hour

# Extract the minute and put it in a new column
data["SubmitMin"] = pd.DatetimeIndex(data["SubmissionDT"]).minute

# Extract the second and put it in a new column
data["SubmitSec"] = pd.DatetimeIndex(data["SubmissionDT"]).second

# Extract the microsecond and put it in a new column
data["SubmitMicroSec"] = pd.DatetimeIndex(data["SubmissionDT"]).microsecond
print(data.dtypes)
display(data)

Unnamed: 0,student,SubmissionDT
0,1,2022-07-21 14:22:51.510
1,2,2022-12-15 06:27:18.060
2,3,2022-01-01 18:09:11.340
3,4,2022-03-06 11:12:33.100
4,5,2022-10-30 09:45:11.220


student                    int64
SubmissionDT      datetime64[ns]
SubmitYear                 int64
SubmitMonth                int64
SubmitDay                  int64
SubmitHour                 int64
SubmitMin                  int64
SubmitSec                  int64
SubmitMicroSec             int64
dtype: object


Unnamed: 0,student,SubmissionDT,SubmitYear,SubmitMonth,SubmitDay,SubmitHour,SubmitMin,SubmitSec,SubmitMicroSec
0,1,2022-07-21 14:22:51.510,2022,7,21,14,22,51,510000
1,2,2022-12-15 06:27:18.060,2022,12,15,6,27,18,60000
2,3,2022-01-01 18:09:11.340,2022,1,1,18,9,11,340000
3,4,2022-03-06 11:12:33.100,2022,3,6,11,12,33,100000
4,5,2022-10-30 09:45:11.220,2022,10,30,9,45,11,220000


---
**Example 2:** Another common operation is to obtain a new date by adding or subtracting **days** (or other durations) from a given date. For example, subtracting 3 days from 1988-08-22 will give 1988-08-19. It is very useful for calculating things such as deadlines. The offset is calculated by the function **pd.DateOffset()**.

The offset can be specified by arguments such as **years=**, **months=**, **days=**, **weeks=**, **hours=**, **minutes=**, **seconds=**.

In [4]:
# Read date_examples.xlsx data file
# In the first worksheet "ex1", the BirthDate is already a date in Excel
data = pd.read_excel("/content/drive/MyDrive/Data/date_examples.xlsx",sheet_name="date")
display(data)
print(data.dtypes)

# Subtract 3 days from BirthDate
data["BD1"] = data["BirthDate"]+pd.DateOffset(days=-3)

# Alternatively you can subtract a positive offset
data["BD1"] = data["BirthDate"]-pd.DateOffset(days=3)

# Add 1 month to BirthDate
data["BD2"] = data["BirthDate"]+pd.DateOffset(months=1)

# Subtract 1 week from BirthDate
data["BD3"] = data["BirthDate"]+pd.DateOffset(weeks=-1)

# Add 2 years to BirthDate
data["BD4"] = data["BirthDate"]+pd.DateOffset(years=2)

# Add 1 month & 2 days to BirthDate
data["BD5"] = data["BirthDate"]+pd.DateOffset(months=1,days=2)

# Add 1 hour, 30 minutes to BirthDate
data["BD6"] = data["BirthDate"]+pd.DateOffset(hours=1,minutes=30)
x=pd.DateOffset(days=3)
display(x)
display(data)

Unnamed: 0,student,BirthDate
0,1,1988-08-22
1,2,1990-07-30
2,3,2002-12-10
3,4,2008-01-15
4,5,1997-06-07


student               int64
BirthDate    datetime64[ns]
dtype: object


<DateOffset: days=3>

Unnamed: 0,student,BirthDate,BD1,BD2,BD3,BD4,BD5,BD6
0,1,1988-08-22,1988-08-19,1988-09-22,1988-08-15,1990-08-22,1988-09-24,1988-08-22 01:30:00
1,2,1990-07-30,1990-07-27,1990-08-30,1990-07-23,1992-07-30,1990-09-01,1990-07-30 01:30:00
2,3,2002-12-10,2002-12-07,2003-01-10,2002-12-03,2004-12-10,2003-01-12,2002-12-10 01:30:00
3,4,2008-01-15,2008-01-12,2008-02-15,2008-01-08,2010-01-15,2008-02-17,2008-01-15 01:30:00
4,5,1997-06-07,1997-06-04,1997-07-07,1997-05-31,1999-06-07,1997-07-09,1997-06-07 01:30:00


---
**Example 3:** Date calculations will often refer to fixed dates or datetimes, such as deadlines. "Today" or "current datetime" are often used in calculations as well.
* **pd.to_datetime("today")** will give you the current datetime.
* **pd.to_datetime("today").floor("d")** will give you the date part only.


In [None]:
# Read date_examples.xlsx data file
# In the first worksheet "ex1", the BirthDate is already a date in Excel
data = pd.read_excel("/content/drive/MyDrive/Data/date_examples.xlsx",sheet_name="date")
display(data)

# Create a new column with current datetime
data["now_datetime"] = pd.to_datetime("today")

# Create a new column with current date only
data["now_date"] = pd.to_datetime("today").floor("d")

# Show the DataFrame
display(data)

Unnamed: 0,student,BirthDate
0,1,1988-08-22
1,2,1990-07-30
2,3,2002-12-10
3,4,2008-01-15
4,5,1997-06-07


Unnamed: 0,student,BirthDate,now_datetime,now_date
0,1,1988-08-22,2022-10-30 07:30:21.633806,2022-10-30
1,2,1990-07-30,2022-10-30 07:30:21.633806,2022-10-30
2,3,2002-12-10,2022-10-30 07:30:21.633806,2022-10-30
3,4,2008-01-15,2022-10-30 07:30:21.633806,2022-10-30
4,5,1997-06-07,2022-10-30 07:30:21.633806,2022-10-30


---
**Example 4:** When you need to calculate number of days before or after certain deadline, you often need to refer to some "constant" dates or datetimes. You can also use **pd.to_datetime()** to define such contant dates or datetimes.
* **pd.to_datetime("yyyy/mm/dd")** will produce the datetime64 values for the given date in string.
* **pd.to_datetime("yyyy/mm/dd hh:mm:ss")** will produce the datetime64 values for the given datetime in string.
* The 2 cases listed above are using default format. If you are not using default format, you can add the **format=** option as in Lecture 10A - Example 4.

In [None]:
# Read date_examples.xlsx data file
# In the first worksheet "ex1", the BirthDate is already a date in Excel
data = pd.read_excel("/content/drive/MyDrive/Data/date_examples.xlsx",sheet_name="datetime")
display(data)

# Create a constant date in DataFrame
data["deadline1"] = pd.to_datetime("2022/12/20")

# Create a constant datetime in DataFrame
data["deadline2"] = pd.to_datetime("2022/12/20 15:20:58")

# Create a constant datetime using non-standard format
# Note: %I is Hour (12 hour), %p is AM/PM
data["deadline3"] = pd.to_datetime("31/12/2021 02:33:01 PM",format="%d/%m/%Y %I:%M:%S %p")

display(data)

Unnamed: 0,student,SubmissionDT
0,1,2022-07-21 14:22:51.510
1,2,2022-12-15 06:27:18.060
2,3,2022-01-01 18:09:11.340
3,4,2022-03-06 11:12:33.100
4,5,2022-10-30 09:45:11.220


Unnamed: 0,student,SubmissionDT,deadline1,deadline2,deadline3
0,1,2022-07-21 14:22:51.510,2022-12-20,2022-12-20 15:20:58,2021-12-31 14:33:01
1,2,2022-12-15 06:27:18.060,2022-12-20,2022-12-20 15:20:58,2021-12-31 14:33:01
2,3,2022-01-01 18:09:11.340,2022-12-20,2022-12-20 15:20:58,2021-12-31 14:33:01
3,4,2022-03-06 11:12:33.100,2022-12-20,2022-12-20 15:20:58,2021-12-31 14:33:01
4,5,2022-10-30 09:45:11.220,2022-12-20,2022-12-20 15:20:58,2021-12-31 14:33:01


---
**Example 5:**
If you want to convert a given **Time** to a numeric value in seconds, minutes or hours, you can divide the **Time** by a duration of 1 second, 1 minute or 1 hour.
* **Time**/pd.to_timedelta("00:00:01") will give you total no. of seconds.
* **Time**/pd.to_timedelta("00:01:00") will give you total no. of minutes.
* **Time**/pd.to_timedelta("01:00:00") will give you total no. of hours.

In [None]:
# Read time_examples.xlsx data file
# In the first worksheet "time_txt", Duration column is string in Excel
data = pd.read_excel("/content/drive/MyDrive/Data/time_examples.xlsx",sheet_name="time_num")

# We will calculate the duration as seconds (Duration has type timedelta64)
data["Duration"] = pd.to_timedelta(data["Duration_Hour"]*3600+data["Duration_Min"]*60+data["Duration_Sec"],unit="seconds")
display(data)
print(data.dtypes)

# Convert Duration into total no. of seconds (numeric)
data["Total_sec"] = data["Duration"]/pd.to_timedelta("00:00:01")

# Convert Duration into total no. of minutes (numeric)
data["Total_min"] = data["Duration"]/pd.to_timedelta("00:01:00")

# Convert Duration into total no. of hours (numeric)
data["Total_hour"] = data["Duration"]/pd.to_timedelta("01:00:00")

display(data)
print(data.dtypes)


Unnamed: 0,JobID,Duration_Hour,Duration_Min,Duration_Sec,Duration
0,1,45,45,23.45,1 days 21:45:23.450000
1,2,11,32,11.3,0 days 11:32:11.300000
2,3,56,11,55.1,2 days 08:11:55.100000
3,4,18,29,45.4,0 days 18:29:45.400000
4,5,9,38,7.88,0 days 09:38:07.880000


JobID                      int64
Duration_Hour              int64
Duration_Min               int64
Duration_Sec             float64
Duration         timedelta64[ns]
dtype: object


Unnamed: 0,JobID,Duration_Hour,Duration_Min,Duration_Sec,Duration,Total_sec,Total_min,Total_hour
0,1,45,45,23.45,1 days 21:45:23.450000,164723.45,2745.390833,45.756514
1,2,11,32,11.3,0 days 11:32:11.300000,41531.3,692.188333,11.536472
2,3,56,11,55.1,2 days 08:11:55.100000,202315.1,3371.918333,56.198639
3,4,18,29,45.4,0 days 18:29:45.400000,66585.4,1109.756667,18.495944
4,5,9,38,7.88,0 days 09:38:07.880000,34687.88,578.131333,9.635522


JobID                      int64
Duration_Hour              int64
Duration_Min               int64
Duration_Sec             float64
Duration         timedelta64[ns]
Total_sec                float64
Total_min                float64
Total_hour               float64
dtype: object


---
**Example 6:** We can easily calculate difference between dates or datetimes. The difference will have the type timedelta64 because date difference is a time duration. This is useful for finding quantities such as the number of days past the deadline.


In [None]:
# Read date_examples.xlsx data file
# In the first worksheet "ex1", the BirthDate is already a date in Excel
data = pd.read_excel("/content/drive/MyDrive/Data/DatetimeDemo.xlsx",sheet_name="Sheet1")
display(data)

# Define a constant deadline in the DataFrame
data["Deadline"] = pd.to_datetime("2022-10-02 12:00:00")

# Calculate the difference between submission time and deadline
data["Diff"] = data["Submission"] - data["Deadline"]
data["late"]=data["Diff"]>pd.to_timedelta("0:0:0")
# Output
# Note that Diff is NOT of datetime64. It has a type of timedelta64.
display(data)
print(data.dtypes)

Unnamed: 0,ID,Name,Submission
0,1,David,2022-10-03 15:06:12
1,2,Nancy,2022-10-02 23:45:01
2,3,John,2022-09-30 02:05:44
3,4,Susan,2022-10-03 11:23:12


Unnamed: 0,ID,Name,Submission,Deadline,Diff,late
0,1,David,2022-10-03 15:06:12,2022-10-02 12:00:00,1 days 03:06:12,True
1,2,Nancy,2022-10-02 23:45:01,2022-10-02 12:00:00,0 days 11:45:01,True
2,3,John,2022-09-30 02:05:44,2022-10-02 12:00:00,-3 days +14:05:44,False
3,4,Susan,2022-10-03 11:23:12,2022-10-02 12:00:00,0 days 23:23:12,True


ID                      int64
Name                   object
Submission     datetime64[ns]
Deadline       datetime64[ns]
Diff          timedelta64[ns]
late                     bool
dtype: object


---
**Example 7:** In the Excel file assignment.xlsx, it contains the assignment scores and the assignment submission times of several students. Suppose the deadline of the submission is 16-Nov-2022 6:00PM. Students who submitted late will subject to a deduction of 10 marks. 
* Construct a column **IsLate** to indicate if the submission is late or not.
* Construct a column **adjScore**, which is the assignment score after late deduction.

In [None]:
# Read assignment.xlsx data file
data = pd.read_excel("/content/drive/MyDrive/Data/assignment.xlsx",sheet_name="Sheet1")
display(data)

# Create the column IsLate. True if the submission is late, otherwise False
data["IsLate"] = (data["SubmissionTime"]-pd.to_datetime("2022/11/16 18:00:00")) > pd.to_timedelta("0:0:0")

# Construct the column adjScore, i.e. deduct 10 marks if the submission is late
data["adjScore"] = data["AssignScore"]
data.loc[data["IsLate"]==True,"adjScore"] = data["adjScore"] - 10
display(data)

Unnamed: 0,ID,Name,AssignScore,SubmissionTime
0,1,Tom,89,2022-11-16 17:12:03
1,2,Berry,91,2022-11-15 23:52:13
2,3,Jerry,30,2022-11-16 11:08:11
3,4,Doris,67,2022-11-16 19:05:34
4,5,Frank,88,2022-11-17 01:09:42


Unnamed: 0,ID,Name,AssignScore,SubmissionTime,IsLate,adjScore
0,1,Tom,89,2022-11-16 17:12:03,False,89
1,2,Berry,91,2022-11-15 23:52:13,False,91
2,3,Jerry,30,2022-11-16 11:08:11,False,30
3,4,Doris,67,2022-11-16 19:05:34,True,57
4,5,Frank,88,2022-11-17 01:09:42,True,78


**Example 8:** Using the **BirthDate** given in the worksheet **date** of **date_examples.xlsx**, calculate the **Age** on a given day (say 2010-01-15). 

Calculation of Age is not as simple as subtracting 2 dates. Typically, a person's age will increase by 1 when he/she passes the birthday in that year.

In [None]:
# Read date_examples.xlsx data file
# In the first worksheet "ex1", the BirthDate is already a date in Excel
data = pd.read_excel("/content/drive/MyDrive/Data/date_examples.xlsx",sheet_name="date")

# Get the year, month and day on 2010-01-15
data["Now"] = pd.to_datetime("2010-01-15").floor("d")
data["Now_y"] = pd.DatetimeIndex(data["Now"]).year
data["Now_m"] = pd.DatetimeIndex(data["Now"]).month
data["Now_d"] = pd.DatetimeIndex(data["Now"]).day

# Get the year, month and day of BirthDate
data["Birth_y"] = pd.DatetimeIndex(data["BirthDate"]).year
data["Birth_m"] = pd.DatetimeIndex(data["BirthDate"]).month
data["Birth_d"] = pd.DatetimeIndex(data["BirthDate"]).day

display(data)

# Calculate Age by year difference, but it may be bigger than expected.
data["Age"] = data["Now_y"] - data["Birth_y"]

# check if month+day of 2010-01-15 is less than that 
# of the month+day of BirthDate (i.e. not passing birthday yet)
# If True, the age should be reduced by 1.
data["cond"] = (data["Now_m"]<data["Birth_m"]) | ((data["Now_m"]==data["Birth_m"]) & (data["Now_d"]<data["Birth_d"]))

# Adjust the Age if needed
data.loc[data["cond"]==True,"Age"] = data["Age"]-1

display(data)

Unnamed: 0,student,BirthDate,Now,Now_y,Now_m,Now_d,Birth_y,Birth_m,Birth_d
0,1,1988-08-22,2010-01-15,2010,1,15,1988,8,22
1,2,1990-07-30,2010-01-15,2010,1,15,1990,7,30
2,3,2002-12-10,2010-01-15,2010,1,15,2002,12,10
3,4,2008-01-15,2010-01-15,2010,1,15,2008,1,15
4,5,1997-06-07,2010-01-15,2010,1,15,1997,6,7


Unnamed: 0,student,BirthDate,Now,Now_y,Now_m,Now_d,Birth_y,Birth_m,Birth_d,Age,cond
0,1,1988-08-22,2010-01-15,2010,1,15,1988,8,22,21,True
1,2,1990-07-30,2010-01-15,2010,1,15,1990,7,30,19,True
2,3,2002-12-10,2010-01-15,2010,1,15,2002,12,10,7,True
3,4,2008-01-15,2010-01-15,2010,1,15,2008,1,15,2,False
4,5,1997-06-07,2010-01-15,2010,1,15,1997,6,7,12,True


---
**Example 9:** We can also calculate summary statistics on date, dateime and time duration. For example, **race_times.xlsx** contains the times for some runners to complete a marathon race. We can calculate the summary statistics such as average time for completing the race and standard deviation of the time.

In [None]:
# Read date_examples.xlsx data file
# In the first worksheet "ex1", the BirthDate is already a date in Excel
data = pd.read_excel("/content/drive/MyDrive/Data/race_times.xlsx",sheet_name="time_txt")
display(data)
print(data.dtypes)
# Convert Duration from string to timedelta64
data["RaceTime"] = pd.to_timedelta(data["RaceTime"])
display(data)
print(data.dtypes)

# Mean and SD of RaceTime
print("Mean of race times =",data["RaceTime"].mean())
print("SD of race times =",data["RaceTime"].std())

# We can also use describe() on RaceTime
display(data["RaceTime"].describe())


Unnamed: 0,Runner,RaceTime
0,1,3:08:32.12
1,2,3:15:42.11
2,3,3:07:12.41
3,4,2:58:39.15
4,5,2:50:21.17


Runner       int64
RaceTime    object
dtype: object


Unnamed: 0,Runner,RaceTime
0,1,0 days 03:08:32.120000
1,2,0 days 03:15:42.110000
2,3,0 days 03:07:12.410000
3,4,0 days 02:58:39.150000
4,5,0 days 02:50:21.170000


Runner                int64
RaceTime    timedelta64[ns]
dtype: object
Mean of race times = 0 days 03:04:05.392000
SD of race times = 0 days 00:09:46.794529132


count                            5
mean        0 days 03:04:05.392000
std      0 days 00:09:46.794529132
min         0 days 02:50:21.170000
25%         0 days 02:58:39.150000
50%         0 days 03:07:12.410000
75%         0 days 03:08:32.120000
max         0 days 03:15:42.110000
Name: RaceTime, dtype: object

---
**Example 10:** We can also compare time duration as in dates. Suppose runners who complete the race in less than 3 hours can receive an award. We can create a column **Award** to indicate if the runner as an award or not.

Other than comparison, we can also perform subtraction and addition using time durations. For example we can calculate the additional time need for each runner as compared to the fastest runner.

In [None]:
# Read date_examples.xlsx data file
# In the first worksheet "ex1", the BirthDate is already a date in Excel
data = pd.read_excel("/content/drive/MyDrive/Data/race_times.xlsx",sheet_name="time_txt")

# Convert Duration from string to timedelta64
data["RaceTime"] = pd.to_timedelta(data["RaceTime"])
display(data)

# Create Award column
data["Award"] = data["RaceTime"]<pd.to_timedelta("03:00:00")

# Find the RaceTime of the fastest runner
fastest = data["RaceTime"].min()
print(fastest)

# Find the additional time needed by each runner as 
# compared to the fastest runner
data["Delta"] = data["RaceTime"] - fastest

display(data)

Unnamed: 0,Runner,RaceTime
0,1,0 days 03:08:32.120000
1,2,0 days 03:15:42.110000
2,3,0 days 03:07:12.410000
3,4,0 days 02:58:39.150000
4,5,0 days 02:50:21.170000


TypeError: ignored