# Time of Day and “as of” Data Selection

Suppose you have a long time series containing intraday market data and you want to extract the prices at a particular time of day on each day of the data. What if the data are irregular such that observations do not fall exactly on the desired time? In practice this task can make for error-prone data munging if you are not careful. Here is an example for illustration purposes:

In [1]:
import pandas as pd
import numpy as np
from pandas import DataFrame, Series

In [2]:
# Make an intraday date range and time series

rng = pd.date_range('01-06-2022 09:30', '01-06-2022 15:59', freq= 'T')

In [3]:
# Make a 5-day series of 9:30-15:59 values

rng = rng.append([rng + pd.offsets.BDay(i) for i in range(1, 4)])

In [4]:
ts = Series(np.arange(len(rng), dtype=float), index = rng)

In [5]:
ts

2022-01-06 09:30:00       0.0
2022-01-06 09:31:00       1.0
2022-01-06 09:32:00       2.0
2022-01-06 09:33:00       3.0
2022-01-06 09:34:00       4.0
                        ...  
2022-01-11 15:55:00    1555.0
2022-01-11 15:56:00    1556.0
2022-01-11 15:57:00    1557.0
2022-01-11 15:58:00    1558.0
2022-01-11 15:59:00    1559.0
Length: 1560, dtype: float64

Indexing with a Python datetime.time object will extract values at those times:

In [6]:
from datetime import  time

In [7]:
ts[time(12, 00)]

2022-01-06 12:00:00     150.0
2022-01-07 12:00:00     540.0
2022-01-10 12:00:00     930.0
2022-01-11 12:00:00    1320.0
dtype: float64

Under the hood, this uses an instance method at_time (available on individual time series and DataFrame objects alike):

In [8]:
ts.at_time(time(10,0))

2022-01-06 10:00:00      30.0
2022-01-07 10:00:00     420.0
2022-01-10 10:00:00     810.0
2022-01-11 10:00:00    1200.0
dtype: float64

YOu can select values between two times using the related between_time method:

In [9]:
ts.between_time(time(10,00), time(10, 5))

2022-01-06 10:00:00      30.0
2022-01-06 10:01:00      31.0
2022-01-06 10:02:00      32.0
2022-01-06 10:03:00      33.0
2022-01-06 10:04:00      34.0
2022-01-06 10:05:00      35.0
2022-01-07 10:00:00     420.0
2022-01-07 10:01:00     421.0
2022-01-07 10:02:00     422.0
2022-01-07 10:03:00     423.0
2022-01-07 10:04:00     424.0
2022-01-07 10:05:00     425.0
2022-01-10 10:00:00     810.0
2022-01-10 10:01:00     811.0
2022-01-10 10:02:00     812.0
2022-01-10 10:03:00     813.0
2022-01-10 10:04:00     814.0
2022-01-10 10:05:00     815.0
2022-01-11 10:00:00    1200.0
2022-01-11 10:01:00    1201.0
2022-01-11 10:02:00    1202.0
2022-01-11 10:03:00    1203.0
2022-01-11 10:04:00    1204.0
2022-01-11 10:05:00    1205.0
dtype: float64

As mentioned above, it might be the case that no data actually fall exactly at a time like 10 AM, but you might want to know the last known value at 10 AM:

In [10]:
# Set most of the time series randomly to NA

indexer = np.sort(np.random.permutation(len(ts))[700:])

In [11]:
indexer[:10]

array([ 2,  3,  5,  7,  8,  9, 14, 15, 16, 17])

In [12]:
irr_ts = ts.copy()

In [13]:
irr_ts[indexer] = np.nan

In [16]:
irr_ts

2022-01-06 09:30:00       0.0
2022-01-06 09:31:00       1.0
2022-01-06 09:32:00       NaN
2022-01-06 09:33:00       NaN
2022-01-06 09:34:00       4.0
                        ...  
2022-01-11 15:55:00    1555.0
2022-01-11 15:56:00       NaN
2022-01-11 15:57:00    1557.0
2022-01-11 15:58:00       NaN
2022-01-11 15:59:00       NaN
Length: 1560, dtype: float64

By passing an array of timestamps to the asof method, you will obtain an array of the last valid (non-NA) values at or before each timestamp. So we construct a date range at 10 AM for each day and pass that to asof:

In [19]:
selection = pd.date_range('01-06-2022', periods=4, freq='D')

In [21]:
irr_ts.asof(selection)

2022-01-06      NaN
2022-01-07    387.0
2022-01-08    779.0
2022-01-09    779.0
Freq: D, dtype: float64