# Working with Time Series

Pandas was developed in the context of financial modeling, so as you might expect, it contains a fairly extensive set of tools for working with dates, times, and time-indexed data. Date and time data comes in a few flavors, which we will discuss here:

- *Time stamps* reference particular moments in time (e.g, July 4th, 2015 at 7:00 am).
- *Time intervals* and *periods* reference a length of time between a particular beginning and en point; for example, the year 2015. Periods usually reference a special case of time intervals in which each interval is of uniform length and does not overlap(e.g, 24 hour-long periods comprising days).
- *Time deltas* or *durations* reference an exact length of time (e.g., duration of 22.56 seconds).

In this section, we will introduce how to work with each of these types of date/time data in Pandas. This short section is by no means a complete guide to the time series tools available in Python or Pandas, but instead is intended as a broad overview of how you as a user should approach working with time series. We will star twith a brief discussion of tools for dealing with dates and times in Python, before moving more specifically to a discussion of the tools provided by Pandas. After listing some resources that go into more depth, we will review some short examples of working with time series data in Pandas.

## Dates and Times in Python

The Python world has a number of available representations of dates, times, deltas, and timespans. While the time series tools provided by Pandas tend to be the most useful for data science applications, it is helpful to see their relationship to other packages used in Python.

### Native python dates and times: `datetime` and `dateutil`

Python's basic objects for working with dates and times reside in the built-in `datetime` module. Along with the third-party `dateutil` module, you can use it to quickly perform a host of useful functionalities on dates and times. For exmaple, you can manually build a date using the `datetime` type:

In [2]:
from datetime import datetime 
datetime(year=2015, month=7, day=4)

datetime.datetime(2015, 7, 4, 0, 0)

Or, using the `dateutil` module, you can parse dates from a variety of string formats:

In [3]:
from dateutil import parser
date = parser.parse("4th of July, 2015")
date

datetime.datetime(2015, 7, 4, 0, 0)

One you have a `datetime` object, you can do things like printing the day of the week:

In [4]:
date.strftime('%A')

'Saturday'

In the final line, we´ve used one of the standard string format codes for printing dates (`"%A"`). A related package to be aware iof is `pytz`, which contains tools for working with the most migrane-inducing piece of time series data: time zones


The power of `datetime` and `dateutil` lie in their flexibility and easy syntax: you can use these objets and their built-in methods to easily perform nearly any operation you might be interested in. Where they break down is when you wish to work with large arrays of dates and times: just as lists of Python numerical variables are suboptimal compared to NumPy-style typed numerical arrays, lists of Python datetime objects are suboptimal compared to typed arrays of encoded dates.

### Typed arrays of times: NumPy's `datetime64`

The waknesses of Python's datetime format inspired the NumPy team to add a set of native time series data type to NumPy. The `datetime64` dtype encodes dates as 64-bit integers, and thus allows arrays of dates to be represented very compactly. The `datetime64` requires a very specific input format:

In [5]:
import numpy as np
date = np.array('2015-07-04', dtype=np.datetime64)
date

array('2015-07-04', dtype='datetime64[D]')

Once we have this date formatted, however, we can quickly do vectorized operations on it:

In [6]:
date + np.arange(12)

array(['2015-07-04', '2015-07-05', '2015-07-06', '2015-07-07',
       '2015-07-08', '2015-07-09', '2015-07-10', '2015-07-11',
       '2015-07-12', '2015-07-13', '2015-07-14', '2015-07-15'],
      dtype='datetime64[D]')

Because of the uniform type in NumPy `datetime64` arrays, this type of operation cna be accomplished much more quickly than if we were working directly with Python's `datetime`objects, especially as arrays get large (we introduces this type of vectorization in Computation on NumPy arrays: Universal Functions).

One detail of the `datetime64` and `timedelta64` objects is that htey are built on a *fundamental time unit.* Because the `datetime64` object is limited to 64-bit precision, hte rangoe of encodable times is $2^64$ times this fundamental unit. In other words, `datetime64` imposes a trade-off between *time resolution* and *maximu time span*.

For example, if you want a time resolution of one nanosecond, you only have enough information to encode a range of $2^64$ nanoseconds or just under 600 years. NumPy will infer the desired unit from the input: for example, here is a day-based datetime:

In [7]:
np.datetime64('2015-07-04')

numpy.datetime64('2015-07-04')

Here is a minute-based datetime:

In [8]:
np.datetime64('2015-07-04 12:00')

numpy.datetime64('2015-07-04T12:00')

Notice that the time zone is automatically set to the local time on the computer executing the code. You can force any desired fundamental unit using one of many format codes; for example, here we'll force a nanosecond-based time:

In [9]:
np.datetime64('2015-07-04 12:59:59.50', 'ns')

numpy.datetime64('2015-07-04T12:59:59.500000000')

The following table list the available format codes along with the relative and absolute timespans that they can encode:
```
Code 	Meaning 	Time span (relative) 	Time span (absolute)
Y 	Year 	± 9.2e18 years 	[9.2e18 BC, 9.2e18 AD]
M 	Month 	± 7.6e17 years 	[7.6e17 BC, 7.6e17 AD]
W 	Week 	± 1.7e17 years 	[1.7e17 BC, 1.7e17 AD]
D 	Day 	± 2.5e16 years 	[2.5e16 BC, 2.5e16 AD]
h 	Hour 	± 1.0e15 years 	[1.0e15 BC, 1.0e15 AD]
m 	Minute 	± 1.7e13 years 	[1.7e13 BC, 1.7e13 AD]
s 	Second 	± 2.9e12 years 	[ 2.9e9 BC, 2.9e9 AD]
ms 	Millisecond 	± 2.9e9 years 	[ 2.9e6 BC, 2.9e6 AD]
us 	Microsecond 	± 2.9e6 years 	[290301 BC, 294241 AD]
ns 	Nanosecond 	± 292 years 	[ 1678 AD, 2262 AD]
ps 	Picosecond 	± 106 days 	[ 1969 AD, 1970 AD]
fs 	Femtosecond 	± 2.6 hours 	[ 1969 AD, 1970 AD]
as 	Attosecond 	± 9.2 seconds 	[ 1969 AD, 1970 AD]
```

For the types of data we see in the real world, a useful defaults is `datetime64[ns]`, as it can encode a useful range of modern dates with a suitably fine precision.

Finally, we will note that while the `datetime64` data type addresses some of the deficiencies of the built-in Python `datetime` type, it lacks many of the convenient methods and functions provied by `datetime` and especially `dateutil`.

### Dates and times in pandas: best of both worlds

Pandas builds upon all the tools just discussed to prived a `Timestamp` object, which bomiens the ease-of-use of `datetime` and `dateutil` with the fficient storage and vectorized interface of `numpy.datetime64`. From a group of these `Timestamp` objects, Pandas can construct a `DatetimeIndex` that can be used to index data in a `Series` or `DataFrame`; we'll see many examples of this below.

For example, we can use Pandas tools to repteat the demonstration from above. We can parse a flexibly formatted string date, and use format coees to output the day of the week:

In [10]:
import pandas as pd 
date = pd.to_datetime("4th of July, 2015")
date

Timestamp('2015-07-04 00:00:00')

In [11]:
date.strftime('%A')

'Saturday'

Additionally, we can do NumPy-style vectorized operations directly on this same object:

In [12]:
date + pd.to_timedelta(np.arange(12), 'D')

DatetimeIndex(['2015-07-04', '2015-07-05', '2015-07-06', '2015-07-07',
               '2015-07-08', '2015-07-09', '2015-07-10', '2015-07-11',
               '2015-07-12', '2015-07-13', '2015-07-14', '2015-07-15'],
              dtype='datetime64[ns]', freq=None)

In the next section, we will take a closer look at manipulatin time series data with the tools provided by Pandas.

## Pandas Time Series: Indexing by Time
Where the Pandas time series tools areally become useful is when you begin to *index data by timestamps*. For example, we can construct a `Series` object that has time indexed data:

In [13]:
index = pd.DatetimeIndex(['2014-07-04', '2014-08-04',
                          '2015-07-04', '2015-08-04'])
data = pd.Series([0, 1, 2, 3], index=index)                          
data

2014-07-04    0
2014-08-04    1
2015-07-04    2
2015-08-04    3
dtype: int64

Now that we have this data in a `Series`, we can make use of any of the `Series` indexing patterns we discussed in previous sections, passing values that can be coerced into dates:

In [14]:
data['2014-07-04': '2015-07-04']

2014-07-04    0
2014-08-04    1
2015-07-04    2
dtype: int64

There are additional special date-only indexing operations, such as passing a year to obtain a slice of all data from that year:

In [15]:
data['2015']

2015-07-04    2
2015-08-04    3
dtype: int64

Later, we will see additional examples of the convenience of dates-as-indices. But first, a closer look at the available time series data structures.

In the final line, we've used one of the standard string format codes for printing dates