# Pandas Code Utils

The Notebooks include different code utils in Pandas for Time Series.

In [1]:
# Import Standard Libraries
import pandas as pd
import numpy as np

# DateTimeIndex

## Date Range

It creates a range of dates from the specified origin, across the given period with a certain frequency.

In [2]:
# From 01/01/2020 to 07/01/2020 every day
pd.date_range('2020-01-01', periods=7, freq='D')

DatetimeIndex(['2020-01-01', '2020-01-02', '2020-01-03', '2020-01-04',
               '2020-01-05', '2020-01-06', '2020-01-07'],
              dtype='datetime64[ns]', freq='D')

## To Datetime

Transform the given values into `datetime` data type.

In [5]:
# Transform values into datetime
pd.to_datetime(['01/02/2018', 'Jan 03, 2018'], format='mixed')

DatetimeIndex(['2018-01-02', '2018-01-03'], dtype='datetime64[ns]', freq=None)

## Pandas DataFrame with DateTimeIndex

Create a Pandas DataFrame with a DateTimeIndex as index.

In [15]:
# Create index
index = pd.date_range('01/01/2024', periods=30, freq='D')

# Create values
values = np.random.randint(100, size=30)

# Create dataframe
dataframe = pd.DataFrame(values, index=index, columns=['Sales'])

print(dataframe.head())

            Sales
2024-01-01     85
2024-01-02     65
2024-01-03     74
2024-01-04     61
2024-01-05     81


# Operations

## Resample

It converts the frequency of a time series and resample the values.

In [20]:
# Create a dataframe with monthly values of sales
index = pd.date_range('01/01/2020', periods=48, freq='ME')
values = np.random.randint(1000, size=48)
dataframe = pd.DataFrame(values, index=index, columns=['Sales'])

In [27]:
dataframe

Unnamed: 0,Sales
2020-01-31,135
2020-02-29,403
2020-03-31,446
2020-04-30,867
2020-05-31,145
2020-06-30,68
2020-07-31,148
2020-08-31,295
2020-09-30,720
2020-10-31,832


In [23]:
# Let's resample it to a 'yearly' frequency and computing the mean
dataframe.resample(rule='YE').mean()

Unnamed: 0,Sales
2020-12-31,428.083333
2021-12-31,556.083333
2022-12-31,495.75
2023-12-31,566.166667


## Custom Resample

Instead of applying standard functions like `mean()`, it is also possible to apply custom functions.

In [38]:
# Create a function to return the sales x 2
def double_sales(entry):
    
    return entry * 2

# Let's resample it to a 'yearly' frequency and apply the 'double_sales' function
dataframe.resample(rule='YE').mean().apply(double_sales)

Unnamed: 0,Sales
2020-12-31,856.166667
2021-12-31,1112.166667
2022-12-31,991.5
2023-12-31,1132.333333


## Shifting

Shift the values of all the columns of a Pandas DataFrame by a fixed lag or a period.

In [3]:
# Create a dataframe with monthly values of sales
index = pd.date_range('01/01/2020', periods=48, freq='ME')
values = np.random.randint(1000, size=48)
dataframe = pd.DataFrame(values, index=index, columns=['Sales'])

# Shift the values forward by one
print(dataframe.head())
print(dataframe.shift(1).head())

            Sales
2020-01-31    584
2020-02-29    731
2020-03-31    677
2020-04-30    686
2020-05-31    938
            Sales
2020-01-31    NaN
2020-02-29  584.0
2020-03-31  731.0
2020-04-30  677.0
2020-05-31  686.0


In [5]:
# Shift two months ahead
print(dataframe.shift(periods=2, freq='ME').head())

            Sales
2020-03-31    584
2020-04-30    731
2020-05-31    677
2020-06-30    686
2020-07-31    938


## Rolling

In [9]:
# Create a dataframe with daily values of sales
index = pd.date_range('01/01/2020', periods=90, freq='D')
values = np.random.randint(1000, size=90)
dataframe = pd.DataFrame(values, index=index, columns=['Sales'])

In [11]:
# Compute the mean of a sliding window
dataframe.rolling(window=7, center=True).mean()

Unnamed: 0,Sales
2020-01-01,
2020-01-02,
2020-01-03,
2020-01-04,284.714286
2020-01-05,352.142857
...,...
2020-03-26,610.714286
2020-03-27,656.571429
2020-03-28,
2020-03-29,


In [18]:
# The first value is exactly the mean of the first 7 days
# The value is placed then at the center of the window (202-01-04)
dataframe.iloc[:7, :].mean()

Sales    284.714286
dtype: float64