<a href="https://colab.research.google.com/github/dss5202-2410/Notebooks/blob/main/Time_series.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Working with time series

`Pandas` contains a fairly extensive set of tools for working with dates, times, and time-indexed data. Data and time data come in a few flavors, including the following:

+ Time stamps reference particular moments in time (e.g., March 26, 2024 at 10pm).

+ Time intervals and periods reference a length of time between a particular beginning and end point; for example, the year 2023. Periods usually reference a special case of time intervals in which each interval is of uniform length and does not overlap (e.g., 24 hour-long periods comprising days).

+ Time deltas or durations reference an exact length of time (e.g., a duration of 22.56 seconds).

We will introduce how to work with each of these types of date/time data in Pandas.

## Date and time in Python

The Python world has a number of available representations of dates, times, deltas, and timespans. While the time series tools provided by Pandas tend to be the most useful for data science applications, it is helpful to see their relationship to other packages used in Python.

Python's basic objects for working with dates and times reside in the built-in `datetime` module. Along with the third-party `dateutil` module, you can use it to quickly perform a host of useful functionalities on dates and times.

For example, you can manually build a date using the datetime type:

In [1]:
from datetime import datetime
datetime(year = 2024, month = 3, day = 26)

datetime.datetime(2024, 3, 26, 0, 0)

Or, using the `dateutil` module, you can parse dates from a variety of string formats:

In [3]:
from dateutil import parser
date = parser.parse("26th of March, 2024")
date

datetime.datetime(2024, 3, 26, 0, 0)

Once you have a `datetime` object, you can do things like printing the day of the week.

You can read more about the [strftime section in Python's datetime documentation](https://docs.python.org/3/library/datetime.html#strftime-and-strptime-behavior).

In [4]:
date.strftime('%A')

'Tuesday'

The power of `datetime` and `dateutil` lie in their flexibility and easy syntax: You can use these objects and their built-in methods to easily perform nearly any operation you might be interested in.

Where they break down is when you wish to work with large arrays of dates and times: Just as lists of Python numerical variables are sub-optimal compared to NumPy-style typed numerical arrays, lists of Python datetime objects are sub-optimal compared to typed arrays of encoded dates.

## `Numpy`'s `datetime64` type

The weaknesses of Python's datetime format inspired the `NumPy` team to add a set of native time series data type to `NumPy`. The `datetime64` dtype encodes dates as 64-bit integers, and thus allows arrays of dates to be represented very compactly. The `datetime64` requires a very specific input format:

In [7]:
import numpy as np
date1 = np.datetime64('2024-03-26')
date1

numpy.datetime64('2024-03-26')

Once we have this date formatted, we can quickly do vectorized operations on it:

In [8]:
date1 + np.arange(12)

array(['2024-03-26', '2024-03-27', '2024-03-28', '2024-03-29',
       '2024-03-30', '2024-03-31', '2024-04-01', '2024-04-02',
       '2024-04-03', '2024-04-04', '2024-04-05', '2024-04-06'],
      dtype='datetime64[D]')

To be continued: https://jakevdp.github.io/PythonDataScienceHandbook/03.11-working-with-time-series.html