# Datetime Series Methods

In this chapter we will focus on methods that work for Series that contain datetime data. Just like pandas has the `str` accessor to give us access to string-only methods, it also has the `dt` accessor to give us access to datetime-only methods. Let's read in the bikes dataset which has two datetime columns, `starttime` and `stoptime`.

In [1]:
import pandas as pd
bikes = pd.read_csv('../data/bikes.csv', parse_dates=['starttime', 'stoptime'])
bikes.head(3)

Unnamed: 0,trip_id,usertype,gender,starttime,stoptime,tripduration,from_station_name,latitude_start,longitude_start,dpcapacity_start,to_station_name,latitude_end,longitude_end,dpcapacity_end,temperature,visibility,wind_speed,precipitation,events
0,7147,Subscriber,Male,2013-06-28 19:01:00,2013-06-28 19:17:00,993,Lake Shore Dr & Monroe St,41.88105,-87.61697,11.0,Michigan Ave & Oak St,41.90096,-87.623777,15.0,73.9,10.0,12.7,-9999.0,mostlycloudy
1,7524,Subscriber,Male,2013-06-28 22:53:00,2013-06-28 23:03:00,623,Clinton St & Washington Blvd,41.88338,-87.64117,31.0,Wells St & Walton St,41.89993,-87.63443,19.0,69.1,10.0,6.9,-9999.0,partlycloudy
2,10927,Subscriber,Male,2013-06-30 14:43:00,2013-06-30 15:01:00,1040,Sheffield Ave & Kingsbury St,41.909592,-87.653497,15.0,Dearborn St & Monroe St,41.88132,-87.629521,23.0,73.0,10.0,16.1,-9999.0,mostlycloudy


## The `dt` accessor
The primary focus of this chapter will be the methods that follow the `dt` accessor. [Visit the API][1] to view all the possible datetime attributes and methods that are available.

[1]: http://pandas.pydata.org/pandas-docs/stable/reference/series.html#api-series-dt

### Only available for Series
The `dt` and `str` accessors are only available to Series objects and not DataFrames. You will have to select a single Series first in order to use them. Let's begin by selecting the `starttime` column as a Series.

In [3]:
start = bikes['starttime']

### Datetime attributes and methods are simpler than strings
Almost all the attributes and methods available to datetime Series are simple and straightforward. Let's begin by outputting the head of the Series so that we can visually verify the results of the attributes and methods.

In [4]:
start.head(3)

0   2013-06-28 19:01:00
1   2013-06-28 22:53:00
2   2013-06-30 14:43:00
Name: starttime, dtype: datetime64[ns]

There are many attributes that return a particular part of the datetime such as `year`, `month`, `day`, `hour`, `minute`, `second`, etc... Notice that the data type for all of these new Series is integer.

In [5]:
start.dt.year.head(3)

0    2013
1    2013
2    2013
Name: starttime, dtype: int64

In [6]:
start.dt.month.head(3)

0    6
1    6
2    6
Name: starttime, dtype: int64

In [7]:
start.dt.minute.head(3)

0     1
1    53
2    43
Name: starttime, dtype: int64

In [8]:
# monday is 0
start.dt.dayofweek.head(3)

0    4
1    4
2    6
Name: starttime, dtype: int64

## Datetime methods
There are only a few methods that are available to the `dt` accessor with the most useful being `ceil`, `round`, `floor`, `strftime`, and `to_period`. To use these methods you will need to be familiar with the [offset aliases][1], which are short strings, usually one character, that represent a unit of time. Below are a few of the offset aliases.

* `D` - day
* `H` - hour
* `T` or `min` - minute
* `S` - second

[1]: https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases

### Use offset aliases with datetime methods
Let's output our datetime Series again and then call some of these methods that require offset aliases.

In [9]:
start.head(3)

0   2013-06-28 19:01:00
1   2013-06-28 22:53:00
2   2013-06-30 14:43:00
Name: starttime, dtype: datetime64[ns]

### `ceil` rounds up to the nearest unit

Round up to the nearest hour by using the offset alias 'H'.

In [10]:
start.dt.ceil('H').head(3)

0   2013-06-28 20:00:00
1   2013-06-28 23:00:00
2   2013-06-30 15:00:00
Name: starttime, dtype: datetime64[ns]

Round up to nearest day.

In [11]:
start.dt.ceil('D').head(3)

0   2013-06-29
1   2013-06-29
2   2013-07-01
Name: starttime, dtype: datetime64[ns]

### `floor` rounds down to the nearest unit

Round down to nearest minute.

In [12]:
start.dt.floor('min').head(3)

0   2013-06-28 19:01:00
1   2013-06-28 22:53:00
2   2013-06-30 14:43:00
Name: starttime, dtype: datetime64[ns]

### `round` rounds to nearest whole unit
The `round` method uses typical rounding logic. Here, we round to the nearest hour.

In [13]:
start.dt.round('H').head(3)

0   2013-06-28 19:00:00
1   2013-06-28 23:00:00
2   2013-06-30 15:00:00
Name: starttime, dtype: datetime64[ns]

## Format time as a string with `strftime`
The `strftime` method stands for **string format time**. It converts each datetime value into a string object. You will use something called **string directives** to convert a part of a datetime to a string. For instance, '%A' will convert to the weekday. Consult [Python's documentation][1] to view all of the string directives. Below is an example using multiple string directives to form a complex string from a datetime. You can write any other string intertwined with the directives. 

By default, the maximum column width is defaulted to 60 characters. The `set_option` function is used to increase this width so that the entire value is viewable in the output.

[1]: https://docs.python.org/3/library/datetime.html#strftime-and-strptime-behavior

In [14]:
pd.set_option('display.max_colwidth', 100)
start.dt.strftime('On %A, %B %d, %Y at %X something great happened').head(3)

0    On Friday, June 28, 2013 at 19:01:00 something great happened
1    On Friday, June 28, 2013 at 22:53:00 something great happened
2    On Sunday, June 30, 2013 at 14:43:00 something great happened
Name: starttime, dtype: object

## Convert to period
A period is a special data type unique to pandas (they don't exist in numpy) and represents an entire period of time such as the entire month of June, 2012 or the entire year 1998, or the entire minute of June 11, 2011 12:34 p.m. This contrasts with datetimes which represent a single moment in time with nanosecond precision. Datetimes are always specific all the way down to a nanosecond, while a period refers to a time period.

### Use offset aliases to convert to a period
To convert to a period use the same [offset aliases][1] from above. Let's convert the start datetime column to a period column representing an entire month.

[1]: https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases

In [18]:
per = start.dt.to_period('M').head()
per

0    2013-06
1    2013-06
2    2013-06
3    2013-07
4    2013-07
Name: starttime, dtype: period[M]

Let's verify that the data type of this Series is indeed a period.

In [19]:
per.dtype

period[M]

Let's see another example with a different offset alias converting the datetime to a time period of an hour.

In [23]:
start.dt.to_period('h').head(3)

0    2013-06-28 19:00
1    2013-06-28 22:00
2    2013-06-30 14:00
Name: starttime, dtype: period[H]

### Period Series also have a `dt` accessor
A Series with data type of period has its own special attributes and methods accessible with the `dt` accessor. They overlap substantially with the datetime `dt` attributes and methods. Currently the [official documentation only shows the period properties][1]. You can discover all of the attributes and methods and how to use all of the methods by placing a dot after the `dt` and pressing tab. Below, we get the start and end of the period. Note that pandas returns these values as datetimes and not periods.

[1]: https://pandas.pydata.org/pandas-docs/stable/reference/series.html#period-properties

In [24]:
per.dt.start_time

0   2013-06-01
1   2013-06-01
2   2013-06-01
3   2013-07-01
4   2013-07-01
Name: starttime, dtype: datetime64[ns]

In [25]:
per.dt.end_time

0   2013-06-30 23:59:59.999999999
1   2013-06-30 23:59:59.999999999
2   2013-06-30 23:59:59.999999999
3   2013-07-31 23:59:59.999999999
4   2013-07-31 23:59:59.999999999
Name: starttime, dtype: datetime64[ns]

## Timedeltas
Timedeltas are a separate data type that represent an amount of time such as 5 minutes and 34 seconds. The highest unit of a timedelta is days and they always have nanosecond precision. Timedeltas are also available in numpy. Timedelta Series have special attributes and methods accessible with the `dt` accessor as you can [find in the documentation][1].

### Creating a Timedelta
One way to create a timedelta Series is to subtract two datetime Series from each other. Here, we select `stoptime` as a Series and subtract the `start` Series from it.

[1]: https://pandas.pydata.org/pandas-docs/stable/reference/series.html#timedelta-properties

In [26]:
stop = bikes['stoptime']
ride_length = stop - start
ride_length.head(3)

0   00:16:00
1   00:10:00
2   00:18:00
dtype: timedelta64[ns]

Again, a good way to discover and learn about the attributes and methods is by pressing tab after placing a dot after `dt`. Let's begin by converting each of the timedeltas into seconds.

In [27]:
ride_length.dt.seconds.head(3)

0     960
1     600
2    1080
dtype: int64

There are a few timedelta methods that take offset aliases. Numbers may be placed next to offset aliases to designate a more specific amount of time. Below, we round to the nearest 10 minutes.

In [None]:
ride_length.dt.round('10min').head(3)

## Exercises

### Exercise 1
<span  style="color:green; font-size:16px">What percentage of bike rides happen in January?</span>

In [34]:
filt = start.dt.month == 1
start[filt].count()/len(start)

0.027191598953862126

### Exercise 2
<span  style="color:green; font-size:16px">What percentage of bike rides happen on the weekend?</span>

In [47]:
filt = start.dt.weekday_name.isin(['Saturday','Sunday'])
start[filt].count()/len(start)

0.19692946555131866

### Exercise 3
<span  style="color:green; font-size:16px">What percentage of bike rides happen on the last day of the month?</span>

In [48]:
filt = start.dt.is_month_end
start[filt].count()/len(start)

0.031563816406795904