## Datetime indexing

If observations in a dataset are provided with a date or time column, it is often convenient to index the data by that column. When the data is loaded from an external source like a file, the dates or times may be interpreted as strings by default. So we need to explicitly convert the column with dates and times into the `datetime` type:

In [1]:
import pandas as pd
import numpy as np
%matplotlib inline

In [2]:
data = {
    '2018-01-01': 5.3,
    '2018-01-02': 6.6,
    '2018-01-03': 7.1,
    '2018-01-04': 8.8,
    '2018-01-05': 7.6,
    '2018-01-06': 6.0
}

df = pd.DataFrame({'data': data})
df

Unnamed: 0,data
2018-01-01,5.3
2018-01-02,6.6
2018-01-03,7.1
2018-01-04,8.8
2018-01-05,7.6
2018-01-06,6.0


A dataframe was created from the dictionary with the keys of the dictionary becoming the index of the dataframe.

Let's check the variable type of the index:

In [3]:
df.index

Index(['2018-01-01', '2018-01-02', '2018-01-03', '2018-01-04', '2018-01-05',
       '2018-01-06'],
      dtype='object')

The `dtype` value of the index is `object`, which is a default way to represent strings in Pandas.

Now let's convert it to `datatime`. The `to_datetime` function takes a Series and returns a new Series object with its values converted to the `datatime` type. In doing that, the function automatically guesses the format in which dates (and times) are represented.

In [4]:
df.index = pd.to_datetime(df.index)

In [5]:
df.index

DatetimeIndex(['2018-01-01', '2018-01-02', '2018-01-03', '2018-01-04',
               '2018-01-05', '2018-01-06'],
              dtype='datetime64[ns]', freq=None)

Now, the `dtype` of the index are shown as `datetime64[ns]`. From now on, the data can be easily manipulated in ways that takes the date and time information into account. For example, the data can be grouped by months or years. Or, we can plot the data, and the appropriate labels for the dates will be used in the plot:

In [None]:
df.plot()

As you can see, the x-axis labels are conveniently shown as days, months and years.

Sometimes, however, the `to_datetime` method cannot guess the format in which dates are represented: the format can follow the European convention, but the function assumes the American format was used. For example:

In [None]:
# the dates are in the European format: 5th June, 6th June, 7th June
data = {
    '05-06-2019': 5.3,
    '06-06-2019': 6.6,
    '07-06-2019': 7.1
}

df = pd.DataFrame({'data': data})
df.index = pd.to_datetime(df.index)

# however, `to_datetime` guessed the dates to be 6th May, 6th June, 6th July
df

To deal with that, we can explicitly specify the string format by passing the `format` argument:

In [None]:
data = {
    '05-06-2019': 5.3,
    '06-06-2019': 6.6,
    '07-06-2019': 7.1
}

df = pd.DataFrame({'data': data})

# explicitly indicate the first comes the day, then month, then year
df.index = pd.to_datetime(df.index, format="%d-%m-%Y")

df

The dates in the index now appear as intended.