(time-series)=
# Time Series

In this chapter, we'll look at time series. If you haven't yet looked at the two sections on **pandas**, the [Data Quickstart](data-quickstart) and [Working with Data](working-with-data) chapters, it might be worth taking a quick spin through them first. You may also find it useful to be familiar with a few of the concepts from the previous [chapter on time](time-intro).

This chapter has benefitted from the [Python Data Science Handbook](https://jakevdp.github.io/PythonDataScienceHandbook/) by Jake VanderPlas, and Tom Augspurger's [Effective Pandas](https://github.com/TomAugspurger/effective-pandas).

Let's first do our imports.

In [None]:
import numpy as np
import pandas as pd
from rich import inspect

## Time Series with **pandas**

[**pandas**](https://pandas.pydata.org/) is the workhorse of time series analysis in Python. The basic object is a *timestamp*. The `pd.to_datetime` function creates timestamps from strings that could reasonably represent datetimes. Let's see an example of using `pd.to_datetime` to create a timestamp and then inspect all of the methods and attributes of the created timestamp using **rich**'s `inspect` function.

In [None]:
date = pd.to_datetime("16th of February, 2020")
inspect(date)

This is of type `Timestamp` and you can see that it has many of the same properties as the built-in Python `datetime.datetime` class from the previous chapter. As with that, the default setting for `tz` (timezone) and `tzinfo` is `None`. There are some extra properties, though, such as `freq` for frequency, which will be very useful when it comes to manipulating time *series* as opposed to just one or two datetimes.

### Creating Time Series

There are two main scenarios in which you might be creating time series using **pandas**: i) creating one from scratch or ii) reading in data from a file. Let's look at a few ways to do i) first. 

You can create a time series with **pandas** by taking a date as created above and extending it using **pandas** timedelta function:

In [None]:
date + pd.to_timedelta(np.arange(12), "D")

This has created a datetime index of type `datetime65[ns]` (remember, an index is a special type of **pandas** column), where "ns" stands for nano-second resolution.

Another method is to create a range of dates (pass a frequency using the `freq=` keyword argument):

In [None]:
pd.date_range(start="2018/1/1", end="2018/1/8")

Now let's see how to turn data into a vector of datetimes. This happens *all the time* in practice. We'll read in some data on job vacancies for information and communication jobs, ONS code UNEM-JP9P, and then try to wrangle the given "date" column into a **pandas** datetime column.

In [None]:
import requests

url = "https://api.ons.gov.uk/timeseries/JP9P/dataset/UNEM/data"

# Get the data from the ONS API:
df = pd.DataFrame(pd.json_normalize(requests.get(url).json()["months"]))
df = df[["date", "value"]]
df.head()

We have the data in. Let's look at the column types that arrived.

In [None]:
df.info()

This is the default 'object' type, but we want the date column to have `datetime64[ns]`, which is a datetime type. Again, we use `pd.to_datetime`:

In [None]:
pd.to_datetime(df["date"])

To save this in our original column, we would use assignment via either `df['date']=` or using `.assign(date=lambda x: pd.to_datetime(x['date'])`. In this case, the conversion from the format of data that was put in of "2001 MAY" to datetime worked out-of-the-box. But that's not always the case.

What happens if we have a more tricky-to-read-in datetime column? Let's find out with an example ...