# Handling Time Series with Pandas

A **time series** is made up of data points that are **indexed in time order**. Since this is just a special type of data frame, `pandas` has very good support for working with time series data.

In [None]:
%matplotlib inline

In [None]:
import pandas
import seaborn
import matplotlib.pyplot as plt

In [None]:
seaborn.set_style("ticks")
plt.rcParams["axes.grid"] = True
plt.rcParams["figure.figsize"] = (15, 5)

## Example: The Bitcoin Price Over Time

Due to varying interest and hype, the price of Bitcoin (BTC) is extremely volatile. Clearly, understanding trends or even predicting future values of this time series could pay off - let's see what we can find out.

In [None]:
data_path = "../.assets/data/bitcoin/BTC_USD Bitfinex Historical Data.csv"

In [None]:
data = pandas.read_csv(data_path)

In [None]:
data.head()

In [None]:
data.dtypes

By default, pandas has read all fields as strings (`numpy` datatype `object`). There is however a special efficient datatype to represent timestamps: `datetime64`. Our first step in every time series analysis should be to identify the time column, convert it to `datetime64` and make it the **index** of the dataframe. In `pandas`, a time series is simply a `pandas.DataFrame` or `pandas.Series` with a `pandas.DateTimeIndex`:

In [None]:
data = pandas.read_csv(data_path, parse_dates=["Date"])

In [None]:
data.dtypes

In [None]:
data.head()

In [None]:
data = data.set_index("Date")

In [None]:
data.index

In [None]:
data.head()

Let's also get the index sorted:

In [None]:
data.index.is_monotonic

In [None]:
data = data.sort_index()

In [None]:
data.index.is_monotonic

The `pandas.DateTimeIndex` can be used to access data points by time. The `pandas.DataFrame.loc` indexer accepts strings representing time in many formats. For example, getting the data points from January 2018 looks like this...

In [None]:
data.loc["2018-01"]

... or like this:

In [None]:
data.loc["January 2018"]

Let's also fix the data types. Clearly, the price should be a floating point number:

In [None]:
data["Price"] = data["Price"].str.replace(",", "").astype("float")

In [None]:
data.head()

We can plot the values of a time series using the known `pandas` methods:

In [None]:
data["Price"].plot(kind="line")

Often, one would like to look at a smoother version of the time series to identify trends rather than random noise. Smoothing can be done in various ways, for example by **resampling** it to a new frequency: Here we convert daily data points to weekly data points by taking the median of the values of each week. Notice how this is done by chaining the `resample` and `median` method calls, and how this works pretty much like the `groupby` method we have already seen.

In [None]:
data["Price"].resample('1w').median().plot(kind="line")

Another way to obtain a smoother time series is applying a **[moving average](https://en.wikipedia.org/wiki/Moving_average)** operation:

In [None]:
data["Price"].plot(kind="line")

In [None]:
data["Price"].rolling(50).mean().plot(kind="line")

What we did here is move a window of size 50 values over the time series and build a new time series from the mean of the 50 values in the window.

In [None]:
data["Price"].rolling(50).mean().head(51)

---
_This notebook is licensed under a [Creative Commons Attribution 4.0 International License (CC BY 4.0)](https://creativecommons.org/licenses/by/4.0/). Copyright © 2019 [Point 8 GmbH](https://point-8.de)_