# Datetime and Timedelta

This chapter covers two distinct concepts, datetimes and timedeltas, and how they are created and used in pandas. A datetime represents a specific **moment** in time. A timedelta represents an **amount** of time.

## Date vs Time vs Datetime

There is a distinction that needs to be made between the terms **date**, **time**, and **datetime**. They all three mean different things.

* **date** - Only the month, day, and year. So 2016-01-01 would represent January 1, 2016 and be considered a **date**.
* **time** - Only the hours, minutes, seconds, and parts of a second (milli, micro, nano, etc...). 5 hours, 45 minutes and 6.74234 seconds for example would be considered a **time**.
* **datetime** - A combination of a date and time. It has both date (year, month, day) and time (hour, minute, second, part of second) components. January 1, 2016 at 5:45 p.m. would be an example of a **datetime**.

The Python standard library contains the [datetime module][1]. It is a popular and important module, but will not be covered here since pandas builds its own datetime and timedelta objects that are more powerful.

[1]: https://docs.python.org/3/library/datetime.html

### Datetimes in numpy

In part 5 of this book, we covered the numpy datetime data type. It is more powerful and flexible than core Python's datetime module, but does not have the features of the pandas datetime object. This chapter only covers datetimes in pandas.

## Creating single datetime objects in pandas

In part 5, we used the Series constructor to create Series of datetimes. It's actually possible to create single datetime objects with the `to_datetime` function and the `Timestamp` constructor.

### Creating a single datetime with the `to_datetime` function

The `to_datetime` function converts it's inputs to a single scalar datetime with nanosecond precision. These are analogous to single integers, floats, or strings. They are not part of an array, Series, or DataFrame. It can take a variety of different of different inputs. We begin by converting a string with the format `'YYYY-MM-DD'` to a datetime.

In [None]:
import pandas as pd
d = pd.to_datetime('2020-01-05')
d

This is a new type of object. Let's retrieve its type.

In [None]:
type(d)

### Why is a Timestamp object returned?

The type that pandas uses for individual datetimes is `Timestamp`. In general, the word 'timestamp' has the same meaning as datetime. If you look at the docstring for `to_datetime` to states the following:

> Convert argument to datetime.

It would have been nice if pandas had chosen the name `Datetime` for the type so that it could match the name of the data type and function, but it did not, and thus there is potential for confusion. Let's create a Series of datetimes to show that the data type is `'datetime64[ns]'`.

In [None]:
s = pd.Series(['2020-01-05', '2020-01-06'], dtype='datetime64[ns]')
s

When selecting a single value from this Series, a Timestamp object is returned. In the official documentation, both words 'timestamp' and 'datetime' are used interchangeably to refer to the same concept - an object with year, month, day, hour, minute, second, and part of second components.

In [None]:
s.loc[0]

### More string formats

Let's see more examples of strings with different formats that can be converted to datetimes. Here, we use a hyphen to separate the components but do not place the leading zero in front of the month and day. It's important to remember that `to_datetime` is a function and not a Series or DataFrame method. It must be accessed directly from `pd`. 

In [None]:
pd.to_datetime('2016-1-5')

The hour, minute, second, and part of second components were not explicitly given, so pandas sets them to 0. Let's slowly create more datetimes by adding one more component each time. Here, we add the hour.

In [None]:
pd.to_datetime('2020-1-5 15')

The hour and minute are separated by a colon.

In [None]:
pd.to_datetime('2020-1-5 15:39')

The minute and second are also separated by a colon.

In [None]:
pd.to_datetime('2020-1-5 15:39:55')

The part of second needs to be separated from the second by a decimal. There is only enough precision to contain nanoseconds, which are nine places after the decimal. The last two decimal places are truncated below.

In [None]:
pd.to_datetime('2020-1-5 15:39:55.12345678912')

Forward slashes can be used instead of hyphens to separate the date components. The hour, minute, and second components do not require any separator.

In [None]:
pd.to_datetime('2020/01/05 153955.123456789')

The date components also don't need a separator.

In [None]:
pd.to_datetime('20200105 153955.123456789')

You can also use the month name spelled out as a string, have an ending for the day, and use AM/PM to denote part of day.

In [None]:
pd.to_datetime('January 5th, 2020 03:39:55 PM')

### Epoch
The term epoch refers to the origin of a particular era. Like many other programming languages, Python uses January 1, 1970 (also known as the Unix epoch) as its epoch for keeping track of datetime. In pandas, integers are used to represent the number of nanoseconds that have elapsed since the epoch.

### Converting numbers to Timestamps

The `to_datetime` function also accepts numbers and converts them to Timestamps. By default, it uses nanoseconds as the units for the passed number. The following creates a datetime 100 nanoseconds after January 1, 1970.

In [None]:
pd.to_datetime(100)

### Specify unit

The default unit is nanoseconds, but you can specify a different one with the `unit` parameter. Here, we create a datetime 100 seconds after the epoch.

In [None]:
pd.to_datetime(100, unit='s')

Here a datetime 20,000 days after the epoch is created.

In [None]:
pd.to_datetime(20_000, unit='d')

### Datetimes in DataFrames

It's more common to encounter datetimes in a DataFrame. Let's read in the City of Houston employee dataset converting the `hire_date` column to a datetime.

In [None]:
emp = pd.read_csv('../data/employee.csv', parse_dates=['hire_date'])
emp.dtypes

### Each individual value in the datetime columns is a Timestamp

If we extract the `hire_date` column as a Series and print out the first few rows, you will see that data type (at the bottom of the output) is still written with the word `datetime64[ns]`.

In [None]:
hire_date = emp['hire_date']
hire_date.head()

If we select the first value in the Series, we get a Timestamp.

In [None]:
hire_date.loc[0]

## Timestamp attributes and methods

Timestamp objects have similar attributes and methods as the `dt` Series accessor. Let's create a Timestamp and retrieve see some of these attributes.

In [None]:
ts = pd.to_datetime('February 14, 2019 20:45.56')

In [None]:
ts.year

In [None]:
ts.month

In [None]:
ts.day

In [None]:
ts.second

In [None]:
ts.weekofyear

In [None]:
ts.dayofyear

Several methods also exist, which we use below.

In [None]:
ts.day_name()

In [None]:
ts.month_name()

The offset aliases are used for the `round`, `ceil`, and `floor` methods. Here, we round to the nearest hour.

In [None]:
ts.round('H')

## Timedelta - an amount of time

A timedelta is a specific amount of time such as 20 seconds, or 13 days 5 minutes and 10 seconds. Use the `to_timedelta` function to create a Timedelta object. It works analogously to the `to_datetime` function. A wide variety of strings are able to be converted to Timedeltas.

In [None]:
pd.to_timedelta('5 days 03:12:45.123')

In [None]:
pd.to_timedelta('10h 13ms')

### Converting numbers to Timedeltas with `to_timedelta`

As with `to_datetime`, passing a number to `to_timedelta` will be by default treated as the number of nanoseconds. Use the `unit` parameter to change the time unit. We start by converting 123,000 nanoseconds to a timedelta.

In [None]:
pd.to_timedelta(123_000)

Here, we create a timedelta of exactly 500 days.

In [None]:
pd.to_timedelta(500, unit='d')

Over 700 hours converted to a timedelta.

In [None]:
pd.to_timedelta(705.87, unit='h')

Since years is not a standard amount, you'll get an error if you use it's unit abbreviation, 'y'. Month is also not a standard unit so you won't be able to use it either.

In [None]:
pd.to_timedelta(23, unit='y')

### No name confusion with Timedelta

Pandas Timedelta is built upon numpy's timedelta64 data type which is superior to the standard library's datetime module's timedelta. Fortunately, the pandas developers used the name timedelta for the data type which is the same as numpy's. There is no name confusion here, unlike there is with timestamp/datetime.

## Timedelta attributes and methods

There are many attributes and methods available to Timedelta objects. Let's see some below:

In [None]:
td = pd.to_timedelta(705.87, unit='h')
td

In [None]:
td.days

In [None]:
td.seconds

In [None]:
td.components

## Creating timedeltas by subtracting datetimes

It is possible to create timedeltas by subtracting two datetimes.

In [None]:
dt1 = pd.to_datetime('2012-12-21 5:30')
dt2 = pd.to_datetime('2016-1-1 12:45:12')
dt2 - dt1

### Negative Timedeltas

A negative timedelta is possible just like any negative number is.

In [None]:
dt1 - dt2

### Math with Timedeltas

You can do many different math operations with two timedeltas together.

In [None]:
td1 = pd.to_timedelta('05:23:10')
td2 = pd.to_timedelta('00:02:20')
td1 - td2

In [None]:
td2 + 5 * td2

Dividing two timedeltas will remove the units and return a number.

In [None]:
td1 / td2

### Creating Timedeltas in a DataFrame by subtracting two Datetime columns

The bikes dataset has two datetime columns, `starttime` and `stoptime`.

In [None]:
bikes = pd.read_csv('../data/bikes.csv', parse_dates=['starttime', 'stoptime'])
bikes.head(2)

Let's find the amount of time that elapsed between the start and stop times.

In [None]:
time_elapsed = bikes['stoptime'] - bikes['starttime']
time_elapsed.head()

Since both start and stop time are datetime columns, subtracting them resulted in a timedelta column. The maximum unit of time for timedelta is days.

## Exercises

### Exercise 1
<span  style="color:green; font-size:16px">What day of the week was Jan 15, 1997?</span>

### Exercise 2
<span  style="color:green; font-size:16px">Was 1925 a leap year?</span>

### Exercise 3
<span  style="color:green; font-size:16px">What year will it be 1 million hours after the UNIX epoch?</span>

### Exercise 4
<span  style="color:green; font-size:16px">Create the datetime July 20, 1969 at 2:56 a.m. and 15 seconds.</span>

### Exercise 5
<span  style="color:green; font-size:16px">Neil Armstrong stepped on the moon at the time in the last exercise. How many days have passed since that happened? Use the string 'today' when creating your datetime.</span>

### Exercise 6
<span  style="color:green; font-size:16px">Which is larger - 35 days or 700 hours?</span>

### Exercise 7

<span style="color:green; font-size:16px">The City of Houston employee data was retrieved on June 1, 2019. Can you calculate the exact amount of years of experience and assign as a new column named `experience`?</span>