
# Time Series Functions
- `df.asof()`: Get the value closest to a specified date.
- `df.shift()`: Shift rows or columns.
- `df.diff()`: Calculate the difference.

In [1]:
import pandas as pd
import numpy as np


In [2]:
data = {
    'Date': pd.date_range(start='2024-01-01', periods=10, freq='D'),
    'Apple_Sales': [10, 20, np.nan, 40, 50, np.nan, 70, 80, 90, 100],
    'Banana_Sales': [15, np.nan, 25, 35, np.nan, 55, 65, 75, np.nan, 95],
    'Cherry_Sales': [np.nan, 12, 22, 32, 42, 52, np.nan, 72, 82, 92]
}

df = pd.DataFrame(data)
df.set_index('Date', inplace=True)

print("Original DataFrame:")
print(df)


Original DataFrame:
            Apple_Sales  Banana_Sales  Cherry_Sales
Date                                               
2024-01-01         10.0          15.0           NaN
2024-01-02         20.0           NaN          12.0
2024-01-03          NaN          25.0          22.0
2024-01-04         40.0          35.0          32.0
2024-01-05         50.0           NaN          42.0
2024-01-06          NaN          55.0          52.0
2024-01-07         70.0          65.0           NaN
2024-01-08         80.0          75.0          72.0
2024-01-09         90.0           NaN          82.0
2024-01-10        100.0          95.0          92.0


# pandas.DataFrame.shift

`DataFrame.shift(periods=1, freq=None, axis=0, fill_value=<no_default>, suffix=None)[source]`
Shift index by desired number of periods with an optional time freq.

When `freq` is not passed, shift the index without realigning the data. If `freq` is passed (in this case, the index must be date or datetime, or it will raise a NotImplementedError), the index will be increased using the periods and the freq. `freq` can be inferred when specified as “infer” as long as either `freq` or `inferred_freq` attribute is set in the index.

## Parameters

- **`periods`**: int or Sequence
  Number of periods to shift. Can be positive or negative. If an iterable of ints, the data will be shifted once by each int. This is equivalent to shifting by one value at a time and concatenating all resulting frames. The resulting columns will have the shift suffixed to their column names. For multiple periods, `axis` must not be 1.

- **`freq`**: DateOffset, tseries.offsets, timedelta, or str, optional
  Offset to use from the tseries module or time rule (e.g. ‘EOM’). If `freq` is specified then the index values are shifted but the data is not realigned. That is, use `freq` if you would like to extend the index when shifting and preserve the original data. If `freq` is specified as “infer” then it will be inferred from the `freq` or `inferred_freq` attributes of the index. If neither of those attributes exist, a ValueError is thrown.

- **`axis`**: {0 or ‘index’, 1 or ‘columns’, None}, default None
  Shift direction. For Series this parameter is unused and defaults to 0.

- **`fill_value`**: object, optional
  The scalar value to use for newly introduced missing values. the default depends on the dtype of self. For numeric data, `np.nan` is used. For datetime, timedelta, or period data, etc. `NaT` is used. For extension dtypes, `self.dtype.na_value` is used.

- **`suffix`**: str, optional
  If str and `periods` is an iterable, this is added after the column name and before the shift value for each shifted column name.

## Returns

- **DataFrame**
  Copy of input object, shifted.


In [8]:
df.shift(6)

Unnamed: 0_level_0,Apple_Sales,Banana_Sales,Cherry_Sales
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2024-01-01,,,
2024-01-02,,,
2024-01-03,,,
2024-01-04,,,
2024-01-05,,,
2024-01-06,,,
2024-01-07,10.0,15.0,
2024-01-08,20.0,,12.0
2024-01-09,,25.0,22.0
2024-01-10,40.0,35.0,32.0


In [9]:
df.shift(freq='D')

Unnamed: 0_level_0,Apple_Sales,Banana_Sales,Cherry_Sales
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2024-01-02,10.0,15.0,
2024-01-03,20.0,,12.0
2024-01-04,,25.0,22.0
2024-01-05,40.0,35.0,32.0
2024-01-06,50.0,,42.0
2024-01-07,,55.0,52.0
2024-01-08,70.0,65.0,
2024-01-09,80.0,75.0,72.0
2024-01-10,90.0,,82.0
2024-01-11,100.0,95.0,92.0


In [10]:
df.shift([2,4],suffix='_shifted')

Unnamed: 0_level_0,Apple_Sales_shifted_2,Banana_Sales_shifted_2,Cherry_Sales_shifted_2,Apple_Sales_shifted_4,Banana_Sales_shifted_4,Cherry_Sales_shifted_4
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2024-01-01,,,,,,
2024-01-02,,,,,,
2024-01-03,10.0,15.0,,,,
2024-01-04,20.0,,12.0,,,
2024-01-05,,25.0,22.0,10.0,15.0,
2024-01-06,40.0,35.0,32.0,20.0,,12.0
2024-01-07,50.0,,42.0,,25.0,22.0
2024-01-08,,55.0,52.0,40.0,35.0,32.0
2024-01-09,70.0,65.0,,50.0,,42.0
2024-01-10,80.0,75.0,72.0,,55.0,52.0


# pandas.DataFrame.asof

`DataFrame.asof(where, subset=None)[source]`
Return the last row(s) without any NaNs before where.

The last row (for each element in where, if list) without any NaN is taken. In case of a DataFrame, the last row without NaN considering only the subset of columns (if not None)

If there is no good value, NaN is returned for a Series or a Series of NaN values for a DataFrame.

## Parameters

- **`where`**: date or array-like of dates
  Date(s) before which the last row(s) are returned.

- **`subset`**: str or array-like of str, default None
  For DataFrame, if not None, only use these columns to check for NaNs.

## Returns

- **scalar, Series, or DataFrame**
  The return can be:
  - **scalar**: when self is a Series and where is a scalar
  - **Series**: when self is a Series and where is an array-like, or when self is a DataFrame and where is a scalar
  - **DataFrame**: when self is a DataFrame and where is an array-like


In [18]:
df.asof('2024-01-02',subset='Apple_Sales')

Apple_Sales     20.0
Banana_Sales     NaN
Cherry_Sales    12.0
Name: 2024-01-02 00:00:00, dtype: float64

In [15]:
df

Unnamed: 0_level_0,Apple_Sales,Banana_Sales,Cherry_Sales
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2024-01-01,10.0,15.0,
2024-01-02,20.0,,12.0
2024-01-03,,25.0,22.0
2024-01-04,40.0,35.0,32.0
2024-01-05,50.0,,42.0
2024-01-06,,55.0,52.0
2024-01-07,70.0,65.0,
2024-01-08,80.0,75.0,72.0
2024-01-09,90.0,,82.0
2024-01-10,100.0,95.0,92.0


# pandas.DataFrame.diff

`DataFrame.diff(periods=1, axis=0)[source]`
First discrete difference of element.

Calculates the difference of a DataFrame element compared with another element in the DataFrame (default is element in previous row).

## Parameters

- **`periods`**: int, default 1
  Periods to shift for calculating difference, accepts negative values.

- **`axis`**: {0 or ‘index’, 1 or ‘columns’}, default 0
  Take difference over rows (0) or columns (1).

## Returns

- **DataFrame**
  First differences of the Series.


In [19]:
df

Unnamed: 0_level_0,Apple_Sales,Banana_Sales,Cherry_Sales
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2024-01-01,10.0,15.0,
2024-01-02,20.0,,12.0
2024-01-03,,25.0,22.0
2024-01-04,40.0,35.0,32.0
2024-01-05,50.0,,42.0
2024-01-06,,55.0,52.0
2024-01-07,70.0,65.0,
2024-01-08,80.0,75.0,72.0
2024-01-09,90.0,,82.0
2024-01-10,100.0,95.0,92.0


In [22]:
df.diff(1)

Unnamed: 0_level_0,Apple_Sales,Banana_Sales,Cherry_Sales
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2024-01-01,,,
2024-01-02,10.0,,
2024-01-03,,,10.0
2024-01-04,,10.0,10.0
2024-01-05,10.0,,10.0
2024-01-06,,,10.0
2024-01-07,,10.0,
2024-01-08,10.0,10.0,
2024-01-09,10.0,,10.0
2024-01-10,10.0,,10.0
