# How to work with dates in a <font color='red'>DataFrame</font>

In the previous lesson, we covered different ways to extract a piece of information from a <font color='red'>Timestamp</font> object. However, we only worked with a single object. In most cases, we don’t work with single values but with data stored in a <font color='red'>DataFrame</font>. A typical <font color='red'>DataFrame</font> of transactional sales at a retail store might contain millions of rows.

For instance, an operation that extracts the month from a date is performed on an entire column. The Pandas library also provides methods to perform such tasks, which are accessed via the dt accessor. The <font color='red'>dt</font> accessor is similar to the <font color='red'>str</font> accessor. It serves as a gate to Pandas functions and methods for date and time manipulation on a <font color='red'>DataFrame</font>. Let’s run through an example on the staff:

In [1]:
import pandas as pd

# read DataFrame
staff = pd.read_csv("staff.csv")

# change the data type of date columns
staff = staff.astype({
    "date_of_birth": "datetime64[ns]",
    "start_date": "datetime64[ns]",
})

# create start_month column
staff["start_month"] = staff["start_date"].dt.month

print(staff[["start_date","start_month"]])

  start_date  start_month
0 2018-08-11            8
1 2017-08-24            8
2 2020-04-16            4
3 2021-02-11            2
4 2020-09-01            9
5 2021-10-20           10


In [3]:
staff

Unnamed: 0,name,city,date_of_birth,start_date,salary,department,start_month
0,John Doe,"Houston, TX",1998-11-04,2018-08-11,"$65,000",Accounting,8
1,Jane Doe,"San Jose, CA",1995-08-05,2017-08-24,"$70,000",Field Quality,8
2,Matt smith,"Dallas, TX",1996-11-25,2020-04-16,"$58,500",human resources,4
3,Ashley Harris,"Miami, FL",1995-01-08,2021-02-11,"$49,500",accounting,2
4,Jonathan targett,"Santa Clara, CA",1998-08-14,2020-09-01,"$62,000",field quality,9
5,Hale Cole,"Atlanta, GA",2000-10-24,2021-10-20,"$54,500",engineering,10


# Methods under the <font color='red'>dt</font> accessor

We can get the year and dayparts using the <font color='red'>year</font> and <font color='red'>day</font> in a similar way. Some other methods available through the <font color='red'>dt</font> accessor are:

1. <font color='red'>weekday</font>
2. <font color='red'>hour</font>
3. <font color='red'>minute</font>
4. <font color='red'>second</font>
5. <font color='red'>week</font> (deprecated since version 1.1.0)
6. <font color='red'>weekofyear</font> (deprecated since version 1.1.0)

In Pandas version 1.1.0 or higher, <font color='red'>isocalendar</font> is a highly useful alternative to <font color='red'>week</font> and <font color='red'>weekofyear</font>. When applied to a column, it returns a <font color='red'>DataFrame</font> that contains the year, calendar week, and day of week information.

In [8]:
staff[["year","week","day"]]=staff["start_date"].dt.isocalendar()

In [9]:
staff

Unnamed: 0,name,city,date_of_birth,start_date,salary,department,start_month,year,week,day
0,John Doe,"Houston, TX",1998-11-04,2018-08-11,"$65,000",Accounting,8,2018,32,6
1,Jane Doe,"San Jose, CA",1995-08-05,2017-08-24,"$70,000",Field Quality,8,2017,34,4
2,Matt smith,"Dallas, TX",1996-11-25,2020-04-16,"$58,500",human resources,4,2020,16,4
3,Ashley Harris,"Miami, FL",1995-01-08,2021-02-11,"$49,500",accounting,2,2021,6,4
4,Jonathan targett,"Santa Clara, CA",1998-08-14,2020-09-01,"$62,000",field quality,9,2020,36,2
5,Hale Cole,"Atlanta, GA",2000-10-24,2021-10-20,"$54,500",engineering,10,2021,42,3


It’s important to note that the data type of a column needs to be <font color='red'>datetime64[ns]</font> to apply the methods under the <font color='red'>dt</font> accessor.


