In [2]:
import pandas as pd
import numpy as np
from datetime import datetime

In [4]:
myyear = 2015
mymonth = 1
myday = 1
myhour = 2
mymin = 30
mysec = 15

In [5]:
mydate = datetime(myyear, mymonth, myday)

In [6]:
mydate

datetime.datetime(2015, 1, 1, 0, 0)

In [7]:
mydatetime = datetime(myyear, mymonth, myday, myhour, mymin, mysec)

In [8]:
mydatetime

datetime.datetime(2015, 1, 1, 2, 30, 15)

In [9]:
mydatetime.year

2015

In [11]:
myser = pd.Series(['Nov 3, 1990', '2000-01-01', None])

In [12]:
myser

0    Nov 3, 1990
1     2000-01-01
2           None
dtype: object

Pandas interprets these as strings, but does not yet interpret them as being dates:

In [15]:
#myser[0].year

In [17]:
timeser = pd.to_datetime(myser)

These strings are _automatically_ transfered into the same standard time format. Pretty impressive, very useful

The standard format is year-month-day: xxxx-xx-xx

In [18]:
timeser

0   1990-11-03
1   2000-01-01
2          NaT
dtype: datetime64[ns]

the dtype is a datetime with room for precision to nanoseconds

In [20]:
timeser[0].year

1990

In [21]:
obvi_euro_date = '31-12-2000'
pd.to_datetime(obvi_euro_date)

  pd.to_datetime(obvi_euro_date)


Timestamp('2000-12-31 00:00:00')

The fuction returned, but provided feedback that it wasn't formated very well

Now doing it where the result is not very obvious:

In [22]:
euro_date = '10-12-2000'

Because this is a european style date, the result should be Dec 10, 2000, but an American reading would think it's 12th of October

In [23]:
pd.to_datetime(euro_date)

Timestamp('2000-10-12 00:00:00')

Because Pandas was developed by an American, it interprets it as October 12th

To get euro style:

In [24]:
pd.to_datetime(euro_date, dayfirst = True)

Timestamp('2000-12-10 00:00:00')

Otherwise, the default is American style

If your data has a mix of American and Euro style dates, then the problem is with your data, which you'll need to clean before doing any of these functions

In [25]:
style_date = '12--Dec--2000'

In [26]:
pd.to_datetime(style_date, format = '%d--%b--%Y')

Timestamp('2000-12-12 00:00:00')

All of the rules in terms of %d, %b etc. and which ones to use are in the formating guide for pd.to_datetime()

In [27]:
custom_date = "12th of Dec 2000"

In [28]:
pd.to_datetime(custom_date)

Timestamp('2000-12-12 00:00:00')

In [29]:
pushing_it = "Dec 12th, 2000"
pd.to_datetime(pushing_it)

Timestamp('2000-12-12 00:00:00')

It was even able to correctly interpret this format without any help

In [30]:
sales = pd.read_csv('RetailSales_BeerWineLiquor.csv')

In [31]:
sales

Unnamed: 0,DATE,MRTSSM4453USN
0,1992-01-01,1509
1,1992-02-01,1541
2,1992-03-01,1597
3,1992-04-01,1675
4,1992-05-01,1822
...,...,...
335,2019-12-01,6630
336,2020-01-01,4388
337,2020-02-01,4533
338,2020-03-01,5562


In [32]:
sales['DATE']

0      1992-01-01
1      1992-02-01
2      1992-03-01
3      1992-04-01
4      1992-05-01
          ...    
335    2019-12-01
336    2020-01-01
337    2020-02-01
338    2020-03-01
339    2020-04-01
Name: DATE, Length: 340, dtype: object

In [33]:
sales['DATE'] = pd.to_datetime(sales['DATE'])

In [35]:
sales['DATE']

0     1992-01-01
1     1992-02-01
2     1992-03-01
3     1992-04-01
4     1992-05-01
         ...    
335   2019-12-01
336   2020-01-01
337   2020-02-01
338   2020-03-01
339   2020-04-01
Name: DATE, Length: 340, dtype: datetime64[ns]

Now DATE is a datetime object:

In [36]:
sales['DATE'][0]

Timestamp('1992-01-01 00:00:00')

In [37]:
sales['DATE'][0].year

1992

In [39]:
sales.head()

Unnamed: 0,DATE,MRTSSM4453USN
0,1992-01-01,1509
1,1992-02-01,1541
2,1992-03-01,1597
3,1992-04-01,1675
4,1992-05-01,1822


In [40]:
sales = pd.read_csv('RetailSales_BeerWineLiquor.csv', parse_dates=[0])

This takes the step of manually setting DATE to be a datetime object and does it within the step of importing the table

If you know which column is intended to be a date, doing it this way is very efficent

In [42]:
sales['DATE']

0     1992-01-01
1     1992-02-01
2     1992-03-01
3     1992-04-01
4     1992-05-01
         ...    
335   2019-12-01
336   2020-01-01
337   2020-02-01
338   2020-03-01
339   2020-04-01
Name: DATE, Length: 340, dtype: datetime64[ns]

In [44]:
sales = sales.set_index('DATE')

In [45]:
sales.head()

Unnamed: 0_level_0,MRTSSM4453USN
DATE,Unnamed: 1_level_1
1992-01-01,1509
1992-02-01,1541
1992-03-01,1597
1992-04-01,1675
1992-05-01,1822


Date is commonly used as an index

In [52]:
sales.resample(rule = 'A').mean().head()

Unnamed: 0_level_0,MRTSSM4453USN
DATE,Unnamed: 1_level_1
1992-12-31,1807.25
1993-12-31,1794.833333
1994-12-31,1841.75
1995-12-31,1833.916667
1996-12-31,1929.75


What the rule call does can be looked up in a table: there are different calls like AS, MS, CBMS, etc. which will give different time series offsets, and different results based on the length of time you intend to consider

A is for year-end frequency, B for business day frequency, Q for quarter-end frequency, etc

In [55]:
sales = pd.read_csv('RetailSales_BeerWineLiquor.csv', parse_dates=[0])

In [56]:
sales.head()

Unnamed: 0,DATE,MRTSSM4453USN
0,1992-01-01,1509
1,1992-02-01,1541
2,1992-03-01,1597
3,1992-04-01,1675
4,1992-05-01,1822


In [58]:
sales['DATE'].dt.year

0      1992
1      1992
2      1992
3      1992
4      1992
       ... 
335    2019
336    2020
337    2020
338    2020
339    2020
Name: DATE, Length: 340, dtype: int64

Note: .dt is the equivalent for datetime objects as .str is for strings

In [59]:
sales['DATE'].dt.month

0       1
1       2
2       3
3       4
4       5
       ..
335    12
336     1
337     2
338     3
339     4
Name: DATE, Length: 340, dtype: int64