## 17 Time Analysis Mini Series Pt 4 - to_datetime
Time series analysis is very important in financial data analysis space. Pandas has in built support of time series functionality that makes analyzing time serieses extremely easy and efficient. In this tutorial we will go over to_datetime function that can convert date time string into datetime object or DatetimeIndex. We will also see how to handle invalid datetime strings, different dateformats etc.<br>
One of the most common problems in data analysis is the lack of uniformity in the structure of input data eg think of how many ways that you can write a date. When processing data, it can come in any of these formats. The challenge for you is to convert it to a common format as a dtaetime datatype.<br><br>
We are going to look at two datetime functions to see how we can perform this conversion...

In [1]:
import pandas as pd
dates = ['2017-01-05', 'Jan 5, 2017', '01/05/2017', '2017.01.05', '2017/01/05','20170105']
pd.to_datetime(dates)

DatetimeIndex(['2017-01-05', '2017-01-05', '2017-01-05', '2017-01-05',
               '2017-01-05', '2017-01-05'],
              dtype='datetime64[ns]', freq=None)

We see that to_datetime was successfully able to convert all these different date formats into one single format ready for our df.<br><br>
Not only this but the to_datetime function can handle time as well...

In [3]:
dates = ['2017-01-05 2:30:00 PM', 'Jan 5, 2017 14:30:00', '01/05/2017', '2017.01.05', '2017/01/05','20170105']
pd.to_datetime(dates)

DatetimeIndex(['2017-01-05 14:30:00', '2017-01-05 14:30:00',
               '2017-01-05 00:00:00', '2017-01-05 00:00:00',
               '2017-01-05 00:00:00', '2017-01-05 00:00:00'],
              dtype='datetime64[ns]', freq=None)

**Note:** While it handles these time formats without a problem, it does not handle the ones with just a dot rather than a colon<br><br>
**Handling US date formats**<br>
By default pandas handles dates as per the US format ie MM/DD/YYYY, whereas in the UK, we like to use DD/MM/YYYY. This means that we, as non-US users, can get some rather unexpected results when dealing with time in Pandas...

In [6]:
pd.to_datetime('01-12-2017')

Timestamp('2017-01-12 00:00:00')

Now we specified the 1st December 2017 but Pandas has returned the, what looks like, the 12 January 2017. You can see how this might cause some issues. Fortunately, Pandas does have an easy way to deal with this...

In [8]:
pd.to_datetime('01-12-2017', dayfirst = True)

Timestamp('2017-12-01 00:00:00')

Now the 1st December 2017 looks the same in any format.<br><br>
**Custom date time format**<br>
If your date looks like this...2017\$01\$05, and I do not know why it ever would, trying to execute this in Pandas will give you an error...loads of them. However, you can tell Pandas how to deal with this and it will parse the date correctly...

In [10]:
pd.to_datetime('05$01$2017', format='%d$%m$%Y')
#Here we are telling Pandas that the first element is the day - %d
#$ - Which inexplicably is your separator
#Then the month - %m
#Finally, the year - %Y
#This is something that we are familiar with from BASH

Timestamp('2017-01-05 00:00:00')

We can see that pandas now recognises this as a timedate datatype. You could use absolutely anything for the separator. much like a CSV file, as long as you  'train' Pandas as to what it is and what we are doing.<br>
You can even mix them up...

In [11]:
pd.to_datetime('05$01#2017', format='%d$%m#%Y')

Timestamp('2017-01-05 00:00:00')

datetime still comes back with a valid datetime object<br><br>
**Dealing with Errors/Exceptions**<br>
Going back to our original list of dates above, if we supply it with something that is not a date or a time, we get a ValueError exception thrown. This error will stop anything from being processed. This is because, by default, the datetime function has an argument set *<font color = blue>errors = raise</font>*. If however, we pass it the error argument and change the value to ignore then it will totally ignore the error but it will not try to convert the invalid element.

In [15]:
dates = ['2017-01-05', 'Jan 5, 2017', '01/05/2017', '2017.01.05', '2017/01/05','abc']
pd.to_datetime(dates, errors = 'ignore')

array(['2017-01-05', 'Jan 5, 2017', '01/05/2017', '2017.01.05',
       '2017/01/05', 'abc'], dtype=object)

My 'abc' string does still get a mention but Pandas has not even tried to do anything with it.<br><br>
**What about my conversion?**<br>
If you want Pandas to ignore just the invalid element but still perform the conversion on the valid elements then you have to use the coerce argument to errors...

In [16]:
dates = ['2017-01-05', 'Jan 5, 2017', '01/05/2017', '2017.01.05', '2017/01/05','abc']
pd.to_datetime(dates, errors = 'coerce')

DatetimeIndex(['2017-01-05', '2017-01-05', '2017-01-05', '2017-01-05',
               '2017-01-05', 'NaT'],
              dtype='datetime64[ns]', freq=None)

Here Pandas has converted all the dates into a datetime objects but the the invalid string is now NaT or Not a Timestamp.<br><br>
**Dealing with UNIX Epoch format**<br>
Unix/Epoch time is the number of seconds that have passed since 1 Jan 1970! We are going to save our epoch time (1527796896) and save it in a variable and pass that to to_datetime and see what we get...

In [18]:
epoch = 1527796896
pd.to_datetime(epoch, unit ='s') # by default, Pandas delas in nanoseconds but we are supplying seconds

Timestamp('2018-05-31 20:01:36')

Once we told Pandas that we were going to supply it seconds rather than nanoseconds, it was able to convert our Epoch time without any issues...except that it is an hour behind (UTC time I'm guessing).<br><br>
**Turning our Epoch time to an index**

In [19]:
dt = pd.to_datetime([epoch], unit='s') # Converting to a datetime index by supplying it as an array
dt

DatetimeIndex(['2018-05-31 20:01:36'], dtype='datetime64[ns]', freq=None)

In [20]:
dt.view('int64') # Converts it back to Epoch. Extra zeros are for the nanoseconds format

array([1527796896000000000], dtype=int64)