# Table of Contents

1. [Handling Dates, Timezones, Unix Timestamps](#time)
    - [Handling Dates and Times](#htd)
    - [Handling Timezones](#Tzone)
    - [Handling Unix Timestamps](#unix)
    - [Date Range](#drange)
    - [Resample](#resample)
2. [Introduction to Rolling Operations](#roll_intro)
    - [Rolling Sum](#rsum)
    - [Rolling Mean](#rmean)
    - [Rolling Rank](#rrank)
3. [Shifting and Lagging](#sl) 

# 1.Handling Dates, Timezones, Unix Timestamps <a class = 'anchor' id = time></a> 

### Handling Dates and Times <a class = 'anchor' id = htd></a>
Pandas has a number of methods for working with dates and times. We can create a new DataFrame column of datetime objects from an existing column containing dates using the `to_datetime()` method:

In [1]:
# create a sample DataFrame
import pandas as pd
df = pd.DataFrame({
    'date': ['2023-04-01', '2023-04-02', '2023-04-03', '2023-04-04'],
    'value': [1, 2, 3, 4]})
df

Unnamed: 0,date,value
0,2023-04-01,1
1,2023-04-02,2
2,2023-04-03,3
3,2023-04-04,4


In [2]:
df.dtypes

date     object
value     int64
dtype: object

- Here the `date` column is obejct, we can also convert datatime object, that allows us to perform **Time** series related task.
- Pandas' obejct `to_date_time()` is used to typecast the data into datetime object.

In [3]:
# convert the 'date' column to a datetime object
df['date'] = pd.to_datetime(df['date'])

# print the resulting DataFrame
df

Unnamed: 0,date,value
0,2023-04-01,1
1,2023-04-02,2
2,2023-04-03,3
3,2023-04-04,4


In [4]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4 entries, 0 to 3
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype         
---  ------  --------------  -----         
 0   date    4 non-null      datetime64[ns]
 1   value   4 non-null      int64         
dtypes: datetime64[ns](1), int64(1)
memory usage: 192.0 bytes


- This creates a new `date` column containing datetime objects that can be used for further calculations or manipulations.

### Handling Timezones <a class = 'anchor' id = Tzone ></a>
Pandas allows us to work with timezones using the tz parameter of the DatetimeIndex object. To convert a datetime column to a specific timezone, we can use the `tz_localize()` method. 


In [5]:
import pytz  # time zone python module

# create a sample DataFrame
df = pd.DataFrame({
    'date': ['2023-04-01', '2023-04-02', '2023-04-03', '2023-04-04 '],
    'value': [1, 2, 3, 4]})
df

Unnamed: 0,date,value
0,2023-04-01,1
1,2023-04-02,2
2,2023-04-03,3
3,2023-04-04,4


In [6]:
# convert the 'date' column to a datetime object and set the timezone to 'US/Eastern'
df['date'] = pd.to_datetime(df['date'])
# print the resulting DataFrame
df

Unnamed: 0,date,value
0,2023-04-01,1
1,2023-04-02,2
2,2023-04-03,3
3,2023-04-04,4


In [7]:
# now let's check time zone
df['date']

0   2023-04-01
1   2023-04-02
2   2023-04-03
3   2023-04-04
Name: date, dtype: datetime64[ns]

In [8]:
df['date'].info()

<class 'pandas.core.series.Series'>
RangeIndex: 4 entries, 0 to 3
Series name: date
Non-Null Count  Dtype         
--------------  -----         
4 non-null      datetime64[ns]
dtypes: datetime64[ns](1)
memory usage: 160.0 bytes


- We can see that there is no timezone specified. Lets fiex the time zones (`US/Eastern`).
- We used the dt accessor to access the **`tz_localize()`** method, which set the timezone to `'US/Eastern'`. This created a new column containing `timezone-aware` datetime objects.

In [9]:
df['date'] = pd.to_datetime(df['date']).dt.tz_localize('US/Eastern')
# print the resulting DataFrame
df

Unnamed: 0,date,value
0,2023-04-01 00:00:00-04:00,1
1,2023-04-02 00:00:00-04:00,2
2,2023-04-03 00:00:00-04:00,3
3,2023-04-04 00:00:00-04:00,4


In [10]:
df['date'].info()

<class 'pandas.core.series.Series'>
RangeIndex: 4 entries, 0 to 3
Series name: date
Non-Null Count  Dtype                     
--------------  -----                     
4 non-null      datetime64[ns, US/Eastern]
dtypes: datetime64[ns, US/Eastern](1)
memory usage: 160.0 bytes


- Here, we can see that now time zone has convert from `None` to ` US/Eastern`
- But once again the question is --
### How to know many time_zones are?
`pytz.all_timezones`

In [11]:
# print(pytz.all_timezones)

#### How convert time zones?
``` tz_convert()``` method is used to convert time zones.

Let's create **Indian** and **Israel** time zones and put these into columns.

In [12]:
df['date_Indian_time_zone'] = pd.to_datetime(df['date']).dt.tz_convert('Asia/Kolkata')
# print the resulting DataFrame
df


Unnamed: 0,date,value,date_Indian_time_zone
0,2023-04-01 00:00:00-04:00,1,2023-04-01 09:30:00+05:30
1,2023-04-02 00:00:00-04:00,2,2023-04-02 09:30:00+05:30
2,2023-04-03 00:00:00-04:00,3,2023-04-03 09:30:00+05:30
3,2023-04-04 00:00:00-04:00,4,2023-04-04 09:30:00+05:30


In [13]:
df['Israel_time_zone'] = pd.to_datetime(df['date']).dt.tz_convert('Israel')
# print the resulting DataFrame
df

Unnamed: 0,date,value,date_Indian_time_zone,Israel_time_zone
0,2023-04-01 00:00:00-04:00,1,2023-04-01 09:30:00+05:30,2023-04-01 07:00:00+03:00
1,2023-04-02 00:00:00-04:00,2,2023-04-02 09:30:00+05:30,2023-04-02 07:00:00+03:00
2,2023-04-03 00:00:00-04:00,3,2023-04-03 09:30:00+05:30,2023-04-03 07:00:00+03:00
3,2023-04-04 00:00:00-04:00,4,2023-04-04 09:30:00+05:30,2023-04-04 07:00:00+03:00


### Handling Unix Timestamps <a class ='anchor' id = unix ></a>
- Unix timestamps represent the number of seconds that have elapsed since **January 1, 1970** at **00:00:00 UTC**. 
- It is a common way to represent time in computer systems. 
- Pandas provides functions to convert Unix timestamps to datetime objects, allowing for easier manipulation and analysis.
- The **`to_datetime()`** function can be used to convert a Unix timestamp column to a datetime column

In [14]:
# Create a sample DataFrame with datetime data
data = {
    'Date': ['2023-06-14 12:30:45', '2023-06-14 14:45:30', '2023-06-14 18:20:15'],
    'Value': [10, 20, 30]
}
df = pd.DataFrame(data)

# Convert 'Date' column to datetime
# df['Date'] = pd.to_datetime(df['Date'])
df

Unnamed: 0,Date,Value
0,2023-06-14 12:30:45,10
1,2023-06-14 14:45:30,20
2,2023-06-14 18:20:15,30


In [15]:
# Create a  DataFrame with datetime data
data = {
    'Date': ['2023-06-14 12:30:45', '2023-06-14 14:45:30', '2023-06-14 18:20:15'],
    'Value': [10, 20, 30]
}
df = pd.DataFrame(data)



In [16]:
# Convert 'Date' column to datetime
df['Date'] = pd.to_datetime(df['Date'])
df

Unnamed: 0,Date,Value
0,2023-06-14 12:30:45,10
1,2023-06-14 14:45:30,20
2,2023-06-14 18:20:15,30


In [17]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype         
---  ------  --------------  -----         
 0   Date    3 non-null      datetime64[ns]
 1   Value   3 non-null      int64         
dtypes: datetime64[ns](1), int64(1)
memory usage: 176.0 bytes


In [18]:
# Convert datetime to Unix timestamp
df['Timestamp'] = df['Date'].apply(lambda x: x.timestamp())
df

Unnamed: 0,Date,Value,Timestamp
0,2023-06-14 12:30:45,10,1686746000.0
1,2023-06-14 14:45:30,20,1686754000.0
2,2023-06-14 18:20:15,30,1686767000.0


In [19]:
# Convert Unix timestamp back to datetime
df['Date_from_timestamp'] = df['Timestamp'].apply(lambda x: pd.to_datetime(x, unit='s'))
df

Unnamed: 0,Date,Value,Timestamp,Date_from_timestamp
0,2023-06-14 12:30:45,10,1686746000.0,2023-06-14 12:30:45
1,2023-06-14 14:45:30,20,1686754000.0,2023-06-14 14:45:30
2,2023-06-14 18:20:15,30,1686767000.0,2023-06-14 18:20:15


In [20]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 4 columns):
 #   Column               Non-Null Count  Dtype         
---  ------               --------------  -----         
 0   Date                 3 non-null      datetime64[ns]
 1   Value                3 non-null      int64         
 2   Timestamp            3 non-null      float64       
 3   Date_from_timestamp  3 non-null      datetime64[ns]
dtypes: datetime64[ns](2), float64(1), int64(1)
memory usage: 224.0 bytes


- The application of **timestamps** in data science and machine learning have specific applications, including time series forecasting, fraud detection, user behavior analysis, time-based recommendations, time-dependent predictive modeling, A/B testing, resource allocation, real-time analytics, and event streaming.
- They enable accurate predictions, identify anomalies, personalize recommendations, analyze user behavior, optimize resource allocation, and process real-time data.

## Date Range <a class = 'anchor' id = drange></a>

The **`date_range()`** function in pandas is used to generate a fixed frequency DatetimeIndex. It allows us to create a **range of dates** or **timestamps** based on a specified frequency.  

   
**`pd.date_range(start=None, end=None, periods=None, freq= "D", tz=None)
`**  

Parameters:
- `start`: The starting date or timestamp of the range.
- `end`: The ending date or timestamp of the range.
- `periods`: The number of periods to generate. Either end or periods can be specified, but not both.  
  

- `freq`: The frequency at which to generate the dates. It can be a string or a pandas offset alias. Some common aliases include `'D'` for **daily**, `'H'` for **hourly**, `'M'` for **monthly**, `'Y'` for **yearly**, etc.
     - We can also specify custom frequencies using the pandas offset strings, such as '`2H'` for every **2 hours**, `'W'` for **weekly**, `'MS'` for **month start**, `'B'` for **business days**, etc.  
       

- `tz`: The time zone for the generated dates. It can be a string representing a time zone name or a pytz timezone object.  
For more information, please check the documentation [Here](https://pandas.pydata.org/docs/reference/api/pandas.date_range.html).

In [21]:
# Generate a range of dates
date_range = pd.date_range(start='2023-04-01', end='2023-04-10', freq='D')

# Print the generated dates
print(date_range)


DatetimeIndex(['2023-04-01', '2023-04-02', '2023-04-03', '2023-04-04',
               '2023-04-05', '2023-04-06', '2023-04-07', '2023-04-08',
               '2023-04-09', '2023-04-10'],
              dtype='datetime64[ns]', freq='D')


In [22]:
pd.date_range(start = '2023-04-01', periods = 10)

DatetimeIndex(['2023-04-01', '2023-04-02', '2023-04-03', '2023-04-04',
               '2023-04-05', '2023-04-06', '2023-04-07', '2023-04-08',
               '2023-04-09', '2023-04-10'],
              dtype='datetime64[ns]', freq='D')

In [23]:
# freq
pd.date_range(start = '04-01-2023',periods = 5,freq  = 'Y' )  # here alias is 'Y' year

DatetimeIndex(['2023-12-31', '2024-12-31', '2025-12-31', '2026-12-31',
               '2027-12-31'],
              dtype='datetime64[ns]', freq='A-DEC')

In [24]:
#  
pd.date_range(start = '04-01-2023',periods = 5,freq  = '5H' )  # here alias is '5H', 5 hours

DatetimeIndex(['2023-04-01 00:00:00', '2023-04-01 05:00:00',
               '2023-04-01 10:00:00', '2023-04-01 15:00:00',
               '2023-04-01 20:00:00'],
              dtype='datetime64[ns]', freq='5H')

In [25]:
# time aware date --> Let's fix time zone "Asia/Kolkata"
pd.date_range(start = '04-01-2023',periods = 5,freq  = 'Y' , tz = 'Asia/Kolkata')  # here alias is 'Y' year

DatetimeIndex(['2023-12-31 00:00:00+05:30', '2024-12-31 00:00:00+05:30',
               '2025-12-31 00:00:00+05:30', '2026-12-31 00:00:00+05:30',
               '2027-12-31 00:00:00+05:30'],
              dtype='datetime64[ns, Asia/Kolkata]', freq='A-DEC')

In [26]:
# time aware date --> Let's fix time zone "US/Eastern"
pd.date_range(start = '04-01-2023', periods = 5, freq = 'Y', tz = "US/Eastern")

DatetimeIndex(['2023-12-31 00:00:00-05:00', '2024-12-31 00:00:00-05:00',
               '2025-12-31 00:00:00-05:00', '2026-12-31 00:00:00-05:00',
               '2027-12-31 00:00:00-05:00'],
              dtype='datetime64[ns, US/Eastern]', freq='A-DEC')

## Resample  <a class = 'anchor' id = resample></a>

- The `resample()` function in pandas is used to **resample time-series data**. 
- It allows you to change the frequency of your data, aggregate  it, or apply other transformations (mean, median,sum, max, min, std and etc.).

In [27]:
# creating dataframe

data = {"Date" : pd.date_range(start = '3 March 2023', periods = 200, freq = "D"),
       "Value" : range(10,210)}

df = pd.DataFrame(data)

print("The shape of the data : ",df.shape)
df.head()

The shape of the data :  (200, 2)


Unnamed: 0,Date,Value
0,2023-03-03,10
1,2023-03-04,11
2,2023-03-05,12
3,2023-03-06,13
4,2023-03-07,14


In [28]:
# Resample the DataFrame to a monthly frequency and calculate the mean
df_resampled_mean = df.resample('M',on = 'Date').mean()
df_resampled_mean

Unnamed: 0_level_0,Value
Date,Unnamed: 1_level_1
2023-03-31,24.0
2023-04-30,53.5
2023-05-31,84.0
2023-06-30,114.5
2023-07-31,145.0
2023-08-31,176.0
2023-09-30,200.5


In [29]:
df_resampled_median = df.resample('M', on = 'Date').median()
df_resampled_median

Unnamed: 0_level_0,Value
Date,Unnamed: 1_level_1
2023-03-31,24.0
2023-04-30,53.5
2023-05-31,84.0
2023-06-30,114.5
2023-07-31,145.0
2023-08-31,176.0
2023-09-30,200.5


In [30]:
# we can resample the data on "weekly"
df_resampled_median1 = df.resample('W', on = 'Date').median()
df_resampled_median1

Unnamed: 0_level_0,Value
Date,Unnamed: 1_level_1
2023-03-05,11.0
2023-03-12,16.0
2023-03-19,23.0
2023-03-26,30.0
2023-04-02,37.0
2023-04-09,44.0
2023-04-16,51.0
2023-04-23,58.0
2023-04-30,65.0
2023-05-07,72.0


The **`resample()`** method in pandas has several useful applications for working with time-series data. Here are some common applications of the **`resample()`** method :
- **Changing the Frequency**: Adjust the frequency of time-series data (upsampling or downsampling).
- **Aggregating Data**: Calculate summary statistics over specific time intervals.
- **Handling Missing Data**: Fill or interpolate missing values in the time-series.
- **Time-Series Plotting**: Downsample data for better visualization.
- **Grouping and Aggregating by Time**: Perform complex operations on grouped time-series data.

# 4.Introduction to Rolling Operations <a class = 'anchor' id = roll_intro></a>
The rolling operations are useful for calculating rolling statistics or aggregations, such as **rolling mean, rolling sum, rolling standard deviation, etc**. These operations allow you to analyze data **trends** over time or **to smooth out noisy data** by calculating aggregated values over a specified window.


- Rolling operations in pandas refer to performing calculations over a sliding window of data points in a time series or a moving window of data points in a numerical series. The `rolling()` function in pandas is used to perform rolling operations on a pandas dataframe or a pandas series.
- The `rolling()` function generates a rolling object which can be used to apply various mathematical and statistical functions over a rolling window of data points. The size of the rolling window can be specified using the window parameter of the `rolling()` function.
- We can then apply various aggregation functions, such as **`mean()`, `sum()`, `std()`, `min()`, `max()`**, etc., to the Rolling object to compute the desired rolling statistic.
#### Rolling Sum <a class = 'anchor' id = 'rsum'></a>
The rolling sum is the sum of the values in a rolling window. The `rolling()` function in pandas can be used to calculate the rolling sum of a pandas series or a pandas dataframe.

In [31]:
# create a pandas series
data = pd.Series([10, 20, 30, 40, 50, 60])

# calculate the rolling sum with window size 3
rolling_sum = data.rolling(window=3).sum()

rolling_sum

0      NaN
1      NaN
2     60.0
3     90.0
4    120.0
5    150.0
dtype: float64

- We first create a pandas series data with values `10, 20, 30, 40, 50`, and `60`. 
- We then use the rolling() function to calculate the rolling sum of the series with a window size of 3. 
- The resulting rolling sum is `[NaN, NaN, 60.0, 90.0, 120.0, 150.0]`.

#### Rolling Mean <a class = 'anchor' id = 'rmean'></a>
The rolling mean is the mean of the values in a rolling window. The `rolling()` function in pandas can be used to calculate the rolling mean of a pandas series or a pandas dataframe.

In [32]:
# create a pandas series
data = pd.Series([10, 20, 30, 40, 50, 60])

# calculate the rolling sum with window size 3
rolling_mean = data.rolling(window=3).mean()

print(rolling_mean)

0     NaN
1     NaN
2    20.0
3    30.0
4    40.0
5    50.0
dtype: float64


- We first create a pandas series data with values `10, 20, 30, 40, 50`, and `60`. 
- We then use the rolling() function to calculate the rolling mean of the series with a window size of 3. 
- The resulting rolling sum is `[NaN, NaN, 20,30,40,50]`.

#### Rolling Ranks <a class = 'anchor' id = 'rrank'></a>
The rolling rank is the rank of the values in a rolling window. The `rolling()` function in pandas can be used to calculate the rolling rank of a pandas series or a pandas dataframe.

In [33]:
# create a pandas series
data = pd.Series([10, 20, 30, 40, 50, 60])

# calculate the rolling rank with window size 3
rolling_rank = data.rolling(window=3).apply(lambda x: pd.Series(x).rank().values[-1])

print(rolling_rank)

0    NaN
1    NaN
2    3.0
3    3.0
4    3.0
5    3.0
dtype: float64


- We first create a pandas series data with values `10, 20, 30, 40, 50, and 60`. We then use the `rolling()` function to calculate the rolling rank of the series with a window size of 3. 
- The `apply()` method is used to apply a **lambda function** that calculates the rank of the values in the rolling window and returns the last rank value. 
- The resulting rolling rank is `[NaN, NaN, 3.0, 3.0, 3.0, 3.0]`.

- Rolling operations in pandas are useful for analyzing time series and numerical data. The `rolling()` function in pandas can be used to calculate the `rolling sum`, `rolling rank`, and other **`rolling statistics`**. 
- By specifying the window size, we can control the size of the rolling window and the number of data points included in the rolling calculation.

In the same way we can perform others rolling operations.

In [34]:
# Calculate rolling sum with a window size of 3
rolling_sum = data.rolling(window=3).sum()
rolling_sum

0      NaN
1      NaN
2     60.0
3     90.0
4    120.0
5    150.0
dtype: float64

In [35]:
# Calculate rolling standard deviation with a window size of 4
rolling_std = data.rolling(window=4).std()
print(rolling_std)

0          NaN
1          NaN
2          NaN
3    12.909944
4    12.909944
5    12.909944
dtype: float64


In [36]:
# Calculate rolling minimum with a window size of 2
rolling_min = data.rolling(window=2).min()
print(rolling_min)

0     NaN
1    10.0
2    20.0
3    30.0
4    40.0
5    50.0
dtype: float64


In [37]:
# Calculate rolling maximum with a window size of 3
rolling_max = data.rolling(window=3).max()
print(rolling_max)

0     NaN
1     NaN
2    30.0
3    40.0
4    50.0
5    60.0
dtype: float64


## Shifting and Lagging <a class = 'anchor' id = sl></a>  
- A **`shifting`** method (or function) refers to the process of moving the values in a column or a time series (time dependent data points) by a specified number of periods. 
- It allows us to create new columns based on the shifted values of existing columns or perform calculations based on the lagged values of a time series.

- The **`shift()`** function in pandas is used to **shift** the values in a Series or DataFrame. It takes an optional parameter **periods** that specifies the number of periods to shift by.
- If periods is **positive**, the values are **shifted forward** in time (down the column), while a **negative** value of **`periods` shifts the values backward** in time (up the column).

In [38]:
import pandas as pd

data = {'A': [1, 2, 3, 4, 5]}
df = pd.DataFrame(data)

# Shift values in column 'A' by 1 period forward
df['Shifted'] = df['A'].shift(1)

df

Unnamed: 0,A,Shifted
0,1,
1,2,1.0
2,3,2.0
3,4,3.0
4,5,4.0


The values in column **`A`** are shifted by **1 period forward**, resulting in a new column **`'Shifted'`** where each value is the previous value of **`A`**.

In [39]:
# Shift values in column 'A' by 2 period forward
df['Shifted'] = df['A'].shift(2)
df

Unnamed: 0,A,Shifted
0,1,
1,2,
2,3,1.0
3,4,2.0
4,5,3.0


In [40]:
# Shift values in column 'A' by -1 period backwards
df['Shifted'] = df['A'].shift(-1)
df

Unnamed: 0,A,Shifted
0,1,2.0
1,2,3.0
2,3,4.0
3,4,5.0
4,5,


In [41]:
# Shift values in column 'A' by -2 period backwards 
df['Shifted'] = df['A'].shift(-2)
df

Unnamed: 0,A,Shifted
0,1,3.0
1,2,4.0
2,3,5.0
3,4,
4,5,


Shifting and lagging in data science and machine learning are applied for **time series forecasting, feature engineering, and temporal analysis, allowing for prediction, extraction of informative features, and identification of patterns and trends in data**.