## Working with Datetime Variables in pandas

In [None]:
import pandas as pd

accidents = pd.read_csv('../data/Traffic_Accidents__2019_.csv')
accidents.head(2)

Recall that we can get more information about the columns by using the `.info()` method.

In [None]:
accidents.info()

From this, we can see that the `Date and Time` column is an object, which basically means that it is being treated as text. This is not very convenient if we want to work with the data in this column.

To be able to make better use of this column, we probably want to convert it to a datetime time, which can be done using the [`to_datetime` function](https://pandas.pydata.org/docs/reference/api/pandas.to_datetime.html). 

When using this function, we can let it infer the format or we can specify it, which will speed up processing. We need to specify the format using the appropriate [format codes](https://docs.python.org/3/library/datetime.html#strftime-and-strptime-format-codes).

In [None]:
# Example date and time value: 01/15/2019 07:40:00 PM

accidents['Date and Time'] = pd.to_datetime(accidents['Date and Time'],
                                           format = "%m/%d/%Y %I:%M:%S %p")

Afterwards, we can verify that we have a datetime type.

In [None]:
accidents.info()

Once we have converted, we can extract individual parts of the date and time, using [pandas datetime functionality](https://pandas.pydata.org/docs/user_guide/timeseries.html).

When using this functionality, you need to start with `.dt` to indicate to pandas that we want to use the datetime functionality.

For example, let's say we want to extract the month into a new column.

In [None]:
accidents['month'] = accidents['Date and Time'].dt.month
accidents.head()

Now we can use this to answer, for example, what is the maximum number of cars involved in a single accident in July?

In [None]:
accidents[accidents['month'] == 7]['Number of Motor Vehicles'].max()

There are [many different components](https://pandas.pydata.org/docs/user_guide/timeseries.html#time-date-components) we can extract.

In [None]:
accidents['Date and Time'].dt.time.head()

In [None]:
accidents['Date and Time'].dt.date.head()

In [None]:
accidents['Date and Time'].dt.weekday.head()

In [None]:
accidents['Date and Time'].dt.is_leap_year.head()

You can use comparison symbols with datetime columns, too. 

For example, if we want to find out how many accidents happened before March 3.

In [None]:
(accidents['Date and Time'] < '03/03/2019').sum()

You can also perform calculations on datetime columns.

For example, let's say we want to find the amount of time between each accident and the first accident in the dataset.

In [None]:
accidents['Date and Time'] - accidents.loc[0, 'Date and Time']

Notice that this is a [timedelta](https://pandas.pydata.org/docs/user_guide/timedeltas.html), which represents a difference in times.