# Working With Dates in Pandas

In [1]:
import pandas as pd

### Pandas Timestamp
**Timestamp** is the Pandas standart datetime object.<br>Pandas is extremely flexible and can convert plenty of formats from a string to a Timestamp

In [7]:
pd.Timestamp('2019 - 05 - 07')

Timestamp('2019-05-07 00:00:00')

In [5]:
pd.Timestamp('2018 / 10 / 19')

Timestamp('2018-10-19 00:00:00')

In [9]:
pd.Timestamp('2019, 4, 3')

Timestamp('2019-04-03 00:00:00')

In [10]:
pd.Timestamp('09-14-2015')

Timestamp('2015-09-14 00:00:00')

We can even flip the date. As long as the 'day' value is bigger then 12 Pandas will understand what we mean

In [12]:
pd.Timestamp('14-09-2015')

Timestamp('2015-09-14 00:00:00')

If the day value is smaller then 12 and we use this format, Pandas will default to the 'American' format (MM-DD-YYYY)

In [13]:
pd.Timestamp('08-09-2015')

Timestamp('2015-08-09 00:00:00')

We can even use month names

In [20]:
pd.Timestamp('23 March 2017')

Timestamp('2017-03-23 00:00:00')

And month abbriviations as well...

In [17]:
pd.Timestamp('jun 15, 2010')

Timestamp('2010-06-15 00:00:00')

In [19]:
pd.Timestamp('apr 23rd 2017')

Timestamp('2017-04-23 00:00:00')

The same goes for time parsing

In [22]:
pd.Timestamp('2000-12-07, 10:5:30')

Timestamp('2000-12-07 10:05:30')

In [27]:
pd.Timestamp('2000-12-07, 10:5:30 PM')

Timestamp('2000-12-07 22:05:30')

## DatetimeIndex
The DatetimeIndex object is like a Series of Pandas dates and the method can take a sequence of strings and convert them into a DatetimeIndex object

In [53]:
dates = ['01/01/2000', '2000-06-15', '2000-DEC-24']

Instead of doing this

In [43]:
pandas_dates = []
for date in dates:
    pandas_dates.append(pd.Timestamp(date))

pd.Series(pandas_dates)

0   2000-01-01
1   2000-06-15
2   2000-12-24
dtype: datetime64[ns]

We can use the DatetimeIndex method

In [45]:
pd.DatetimeIndex(dates)

DatetimeIndex(['2000-01-01', '2000-06-15', '2000-12-24'], dtype='datetime64[ns]', freq=None)

We will usually use it as a nidex of a Series or Dataframe

In [51]:
values = [100, 200, 300]
date_index = pd.DatetimeIndex(dates)

In [52]:
pd.Series(data = values, index = date_index)

2000-01-01    100
2000-06-15    200
2000-12-24    300
dtype: int64

## to_datetime()
The ***to_datetime*** method converts defferent arguments to datetime objects. Ideal for converting columns inside a Dataframe to datetime type

In [78]:
dates = ['01/01/2000', '2000-06-15', '2000-DEC-24']
pd.to_datetime(dates)

DatetimeIndex(['2000-01-01', '2000-06-15', '2000-12-24'], dtype='datetime64[ns]', freq=None)

In [86]:
values = ['01/01/2000', '2000', 'abc', True]
ser = pd.Series(values)
ser

0    01/01/2000
1          2000
2           abc
3          True
dtype: object

### coerce
Trying to use ***to_datetime*** on a sequence that contains invalid values usually returns an error.<br>We can overcome this by setting the ***errors*** parameter to 'coerce'.<br>
The result will be Nat values which stands for "Not a Time". That is the equivelent to the NaN values we saw in numeric and text columns

In [87]:
pd.to_datetime(ser, errors = 'coerce')

0   2000-01-01
1   2000-01-01
2          NaT
3          NaT
dtype: datetime64[ns]

Converting numbers to dates in the "Unix" format (number of seconds since January 1st, 1970)

In [74]:
pd.to_datetime([1500000000,1510000000,1520000000,1530000000,1540000000, 1550000000], unit='s')

DatetimeIndex(['2017-07-14 02:40:00', '2017-11-06 20:26:40',
               '2018-03-02 14:13:20', '2018-06-26 08:00:00',
               '2018-10-20 01:46:40', '2019-02-12 19:33:20'],
              dtype='datetime64[ns]', freq=None)

## Generating Datetime Ranges
We can use the ***date_range*** method to generate custom sequences of timestamps.<br>The simplest way is to define a start date and an end date

In [88]:
pd.date_range('Jan 1st, 2000', 'Jan 20th, 2000')

DatetimeIndex(['2000-01-01', '2000-01-02', '2000-01-03', '2000-01-04',
               '2000-01-05', '2000-01-06', '2000-01-07', '2000-01-08',
               '2000-01-09', '2000-01-10', '2000-01-11', '2000-01-12',
               '2000-01-13', '2000-01-14', '2000-01-15', '2000-01-16',
               '2000-01-17', '2000-01-18', '2000-01-19', '2000-01-20'],
              dtype='datetime64[ns]', freq='D')

The default gap between the values is 1 day but we can change that using the ***freq*** (frequency) parameter.<br>For example, 3 days

In [91]:
pd.date_range('Jan 1st, 2000', 'Jan 20th, 2000', freq = '3D')

DatetimeIndex(['2000-01-01', '2000-01-04', '2000-01-07', '2000-01-10',
               '2000-01-13', '2000-01-16', '2000-01-19'],
              dtype='datetime64[ns]', freq='3D')

We can ask to get only business days

In [92]:
pd.date_range('Jan 1st, 2000', 'Jan 20th, 2000', freq = 'B')

DatetimeIndex(['2000-01-03', '2000-01-04', '2000-01-05', '2000-01-06',
               '2000-01-07', '2000-01-10', '2000-01-11', '2000-01-12',
               '2000-01-13', '2000-01-14', '2000-01-17', '2000-01-18',
               '2000-01-19', '2000-01-20'],
              dtype='datetime64[ns]', freq='B')

Using "M" will give us the end of month

In [93]:
pd.date_range('Jan 1st, 2000', 'Jan 20th, 2001', freq = 'M')

DatetimeIndex(['2000-01-31', '2000-02-29', '2000-03-31', '2000-04-30',
               '2000-05-31', '2000-06-30', '2000-07-31', '2000-08-31',
               '2000-09-30', '2000-10-31', '2000-11-30', '2000-12-31'],
              dtype='datetime64[ns]', freq='M')

If we want the first day of the month we'll use "MS" (Month Start)<br>Notice the extra value we get in this example compared to the previous one

In [94]:
pd.date_range('Jan 1st, 2000', 'Jan 20th, 2001', freq = 'MS')

DatetimeIndex(['2000-01-01', '2000-02-01', '2000-03-01', '2000-04-01',
               '2000-05-01', '2000-06-01', '2000-07-01', '2000-08-01',
               '2000-09-01', '2000-10-01', '2000-11-01', '2000-12-01',
               '2001-01-01'],
              dtype='datetime64[ns]', freq='MS')

We can use "W" to specify week frequency

In [96]:
pd.date_range('Jan 1st, 2000', 'March 21th, 2000', freq = 'W')

DatetimeIndex(['2000-01-02', '2000-01-09', '2000-01-16', '2000-01-23',
               '2000-01-30', '2000-02-06', '2000-02-13', '2000-02-20',
               '2000-02-27', '2000-03-05', '2000-03-12', '2000-03-19'],
              dtype='datetime64[ns]', freq='W-SUN')

By default we get the beginning of each week (the date of Sunday), but we can change that as well.<br>In this example we specify we want to see the dates of all Fridays between the selected ranges

In [97]:
pd.date_range('Jan 1st, 2000', 'March 21th, 2000', freq = 'W-FRI')

DatetimeIndex(['2000-01-07', '2000-01-14', '2000-01-21', '2000-01-28',
               '2000-02-04', '2000-02-11', '2000-02-18', '2000-02-25',
               '2000-03-03', '2000-03-10', '2000-03-17'],
              dtype='datetime64[ns]', freq='W-FRI')

We can use Y to get the start of each year

In [104]:
pd.date_range('Jan 1st, 2000', 'March 21th, 2012', freq = 'Y')

DatetimeIndex(['2000-01-01', '2001-01-01', '2002-01-01', '2003-01-01',
               '2004-01-01', '2005-01-01', '2006-01-01', '2007-01-01',
               '2008-01-01', '2009-01-01', '2010-01-01', '2011-01-01',
               '2012-01-01'],
              dtype='datetime64[ns]', freq='AS-JAN')

And "YS" for the Year Start

In [105]:
pd.date_range('Jan 1st, 2000', 'March 21th, 2012', freq = 'YS')

DatetimeIndex(['2000-01-01', '2001-01-01', '2002-01-01', '2003-01-01',
               '2004-01-01', '2005-01-01', '2006-01-01', '2007-01-01',
               '2008-01-01', '2009-01-01', '2010-01-01', '2011-01-01',
               '2012-01-01'],
              dtype='datetime64[ns]', freq='AS-JAN')

If we are interested in generating a certain number of dates we can define a number of "periods" instead of calculating the desired end date

In [4]:
pd.date_range('Jan 1st, 2000', periods=24, freq = '3 D')

DatetimeIndex(['2000-01-01', '2000-01-04', '2000-01-07', '2000-01-10',
               '2000-01-13', '2000-01-16', '2000-01-19', '2000-01-22',
               '2000-01-25', '2000-01-28', '2000-01-31', '2000-02-03',
               '2000-02-06', '2000-02-09', '2000-02-12', '2000-02-15',
               '2000-02-18', '2000-02-21', '2000-02-24', '2000-02-27',
               '2000-03-01', '2000-03-04', '2000-03-07', '2000-03-10'],
              dtype='datetime64[ns]', freq='3D')

Get a list of 20 business days

In [5]:
pd.date_range('Jan 1st, 2000', periods=20, freq = 'B')

DatetimeIndex(['2000-01-03', '2000-01-04', '2000-01-05', '2000-01-06',
               '2000-01-07', '2000-01-10', '2000-01-11', '2000-01-12',
               '2000-01-13', '2000-01-14', '2000-01-17', '2000-01-18',
               '2000-01-19', '2000-01-20', '2000-01-21', '2000-01-24',
               '2000-01-25', '2000-01-26', '2000-01-27', '2000-01-28'],
              dtype='datetime64[ns]', freq='B')

We can state the end date and the number of periods we want to get until that date

In [6]:
pd.date_range(end = 'Jan 1st, 2000', periods=24, freq = '3 D')

DatetimeIndex(['1999-10-24', '1999-10-27', '1999-10-30', '1999-11-02',
               '1999-11-05', '1999-11-08', '1999-11-11', '1999-11-14',
               '1999-11-17', '1999-11-20', '1999-11-23', '1999-11-26',
               '1999-11-29', '1999-12-02', '1999-12-05', '1999-12-08',
               '1999-12-11', '1999-12-14', '1999-12-17', '1999-12-20',
               '1999-12-23', '1999-12-26', '1999-12-29', '2000-01-01'],
              dtype='datetime64[ns]', freq='3D')

### dt Sub Library
Just like the str sub library that gives us access to many string-manipulating methods, the dt sub library allows us to perform various computations on Timestamp objects

In [2]:
sales = pd.read_csv('JanuarySales2014.csv', index_col='SalesOrderID',usecols = ['SalesOrderID', 'OrderDate', 'DueDate', 'ShipDate', 'Status', 'Freight'])
sales.head(3)

Unnamed: 0_level_0,OrderDate,DueDate,ShipDate,Status,Freight
SalesOrderID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
63363,1/1/2014 0:00,1/13/2014 0:00,1/8/2014 0:00,5,0.682
63364,1/1/2014 0:00,1/13/2014 0:00,1/8/2014 0:00,5,0.7483
63365,1/1/2014 0:00,1/13/2014 0:00,1/8/2014 0:00,5,0.1248


In [3]:
sales[['OrderDate','DueDate','ShipDate']] = sales[['OrderDate','DueDate','ShipDate']].astype('datetime64')
sales.head(3)

Unnamed: 0_level_0,OrderDate,DueDate,ShipDate,Status,Freight
SalesOrderID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
63363,2014-01-01,2014-01-13,2014-01-08,5,0.682
63364,2014-01-01,2014-01-13,2014-01-08,5,0.7483
63365,2014-01-01,2014-01-13,2014-01-08,5,0.1248


In [28]:
sales['OrderDate'].sample(3).to_frame().assign(day = sales['OrderDate'].dt.day,
                                              day_name = sales['OrderDate'].dt.weekday_name,
                                              week = sales['OrderDate'].dt.weekofyear,
                                              month_name = sales['OrderDate'].dt.month_name(),
                                               quarter = sales['OrderDate'].dt.quarter,
                                              year = sales['OrderDate'].dt.year)

Unnamed: 0_level_0,OrderDate,day,day_name,week,month_name,quarter,year
SalesOrderID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
63973,2014-01-10,10,Friday,2,January,1,2014
64878,2014-01-24,24,Friday,4,January,1,2014
64262,2014-01-15,15,Wednesday,3,January,1,2014


Group by day of the week

In [33]:
sales['weekday'] = sales['OrderDate'].dt.weekday_name
sales.groupby('weekday').size().sort_values(ascending = False)

weekday
Wednesday    477
Thursday     339
Sunday       293
Friday       274
Monday       263
Saturday     248
Tuesday      247
dtype: int64

### strftime()
We can use the ***format*** mrthod to convert the dates back to strings to display them in any format we want

In [40]:
sales['OrderDate'].dt.strftime('%d//%m--%Y A.D').sample(3)

SalesOrderID
63597    04//01--2014 A.D
65027    27//01--2014 A.D
64159    13//01--2014 A.D
Name: OrderDate, dtype: object

### is_...
We have several methods with the ***is_*** prefix to check various facts about our dates

In [41]:
some_dates = sales['OrderDate'].sample(5)
some_dates

SalesOrderID
64856   2014-01-24
65471   2014-01-31
65261   2014-01-29
63938   2014-01-09
64265   2014-01-15
Name: OrderDate, dtype: datetime64[ns]

In [44]:
some_dates.dt.is_month_end

SalesOrderID
64856    False
65471     True
65261    False
63938    False
64265    False
Name: OrderDate, dtype: bool

We can use these methods to help filter our Dataframe based on those criteria

In [48]:
sales[sales['OrderDate'].dt.is_month_end].head(3)

Unnamed: 0_level_0,OrderDate,DueDate,ShipDate,Status,Freight,weekday
SalesOrderID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
65452,2014-01-31,2014-02-12,2014-02-07,5,44.4488,Friday
65453,2014-01-31,2014-02-12,2014-02-07,5,43.8745,Friday
65454,2014-01-31,2014-02-12,2014-02-07,5,44.4868,Friday


In [50]:
sales[sales['DueDate'].dt.is_quarter_end].head(3)

Unnamed: 0_level_0,OrderDate,DueDate,ShipDate,Status,Freight,weekday
SalesOrderID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
