### **Time Series Basics in Pandas**
Working with **time series data** is a fundamental skill in data analysis, especially for applications involving trends, forecasting, and time-based insights.
Pandas provides extensive functionality for handling and analyzing time-indexed data.

---
##### ‚û°Ô∏è **Importance of Time Series Analysis**
| Objective                                       | Description                                                  |
| ----------------------------------------------- | ------------------------------------------------------------ |
| **Analyzing trends over time**                  | Observe patterns or long-term movements in data.             |
| **Forecasting future values**                   | Use past data to predict future behavior.                    |
| **Identifying seasonal patterns**               | Detect periodic fluctuations (e.g., daily, monthly, yearly). |
| **Aggregating data at different time scales**   | Summarize data by day, month, quarter, or year.              |
| **Aligning and comparing multiple time series** | Match and analyze datasets based on their timestamps.        |


Time series handling forms the foundation for advanced analytics such as **trend analysis**, **forecasting**, and **seasonal decomposition**.

üîπ **`Suppose you are given hourly temperature data for a city over a year and need to perform the following analyses:`**
* Calculate the **average daily temperature**.
* Find the **hottest** and **coldest** months.
* Create a **weekly temperature trend** showing the **min, mean, and max** temperatures.

In [6]:
import pandas as pd
import numpy as np

# Generate hourly temperature data for a year
date_range = pd.date_range(start='2023-01-01', end='2023-12-31 23:00:00', freq='h')
hourly_temp = 15 + 10 * np.sin(np.arange(len(date_range)) * 2 * np.pi / (365 * 24)) + np.random.randn(len(date_range)) * 3

df = pd.DataFrame({'Timestamp': date_range, 'Temperature': hourly_temp})
df.set_index('Timestamp', inplace=True)

# 1. Average daily temperature
daily_avg = df.resample('D').mean()
print(f"Average Daily Temperature (first 5 days):\n{daily_avg.head()}\n")

# 2. Hottest and coldest months
monthly_avg = df.resample('ME').mean()

hottest_month = monthly_avg['Temperature'].idxmax().strftime('%B')
coldest_month = monthly_avg['Temperature'].idxmin().strftime('%B')

print(
    f"\nHottest month: {hottest_month}\n"
    f"Coldest month: {coldest_month}"
    )

# 3. Weekly temperature trend
weekly_stats = df.resample('W').agg({'Temperature': ['min', 'mean', 'max']})
weekly_stats.columns = ['Min Temp', 'Mean Temp', 'Max Temp']

print(f"\nWeekly Temperature Trend (first 5 weeks):\n{weekly_stats.head()}")

Average Daily Temperature (first 5 days):
            Temperature
Timestamp              
2023-01-01    14.849217
2023-01-02    15.257481
2023-01-03    15.156089
2023-01-04    15.197810
2023-01-05    15.980859


Hottest month: April
Coldest month: October

Weekly Temperature Trend (first 5 weeks):
             Min Temp  Mean Temp   Max Temp
Timestamp                                  
2023-01-01   9.762382  14.849217  19.213801
2023-01-08   9.231402  15.796064  24.138616
2023-01-15   8.804393  17.191361  27.488612
2023-01-22   6.630788  17.925737  24.947905
2023-01-29  12.621506  19.292890  25.967869


### **Date Range and Frequency in Pandas**
A **Date Range** in Pandas is a sequence of date-time indices used to represent regular time intervals.
It is created using the **`pd.date_range()`** function, which generates dates or timestamps at consistent intervals.

---
##### ‚û°Ô∏è **Key Parameters of `pd.date_range()`**

| Parameter   | Description                                            |
| ----------- | ------------------------------------------------------ |
| **start**   | The start date of the range                            |
| **end**     | The end date of the range                              |
| **periods** | The number of date points to generate                  |
| **freq**    | The frequency of the date range (daily, monthly, etc.) |

---
##### ‚û°Ô∏è **Understanding Frequency (`freq`)**
Frequency defines the time interval between each date or time in the generated range.
Pandas supports many **frequency aliases** for working with time-based data at various granularities.

| Alias            | Frequency Description  |
| ---------------- | ---------------------- |
| **D**            | Calendar day frequency |
| **B**            | Business day frequency |
| **W**            | Weekly frequency       |
| **ME**            | Month-end frequency    |
| **Q**            | Quarter-end frequency  |
| **Y**            | Year-end frequency     |
| **h**            | Hourly frequency       |
| **T** or **min** | Minute frequency       |
| **S**            | Second frequency       |

‚û°Ô∏è **`Example 1`: Creating a daily date range for a month**


In [10]:
daily_range = pd.date_range(start='2023-01-01', end='2023-01-31', freq='D')
print(f"Daily Date Range:\n{daily_range}")

Daily Date Range:
DatetimeIndex(['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04',
               '2023-01-05', '2023-01-06', '2023-01-07', '2023-01-08',
               '2023-01-09', '2023-01-10', '2023-01-11', '2023-01-12',
               '2023-01-13', '2023-01-14', '2023-01-15', '2023-01-16',
               '2023-01-17', '2023-01-18', '2023-01-19', '2023-01-20',
               '2023-01-21', '2023-01-22', '2023-01-23', '2023-01-24',
               '2023-01-25', '2023-01-26', '2023-01-27', '2023-01-28',
               '2023-01-29', '2023-01-30', '2023-01-31'],
              dtype='datetime64[ns]', freq='D')


‚û°Ô∏è **`Example 2:` Creating a business daily range for a month**

In [9]:
business_range = pd.date_range(start='2023-01-01', end='2023-01-31', freq='B')
print(f"Business Daily Range:\n{business_range}")

Business Daily Range:
DatetimeIndex(['2023-01-02', '2023-01-03', '2023-01-04', '2023-01-05',
               '2023-01-06', '2023-01-09', '2023-01-10', '2023-01-11',
               '2023-01-12', '2023-01-13', '2023-01-16', '2023-01-17',
               '2023-01-18', '2023-01-19', '2023-01-20', '2023-01-23',
               '2023-01-24', '2023-01-25', '2023-01-26', '2023-01-27',
               '2023-01-30', '2023-01-31'],
              dtype='datetime64[ns]', freq='B')


‚û°Ô∏è **`Example 3:` Creating a weekly range for a year**

In [11]:
weekly_range = pd.date_range(start='2023-01-01', end='2023-12-31', freq='W')
print(f"Weekly Range:\n{weekly_range}")

Weekly Range:
DatetimeIndex(['2023-01-01', '2023-01-08', '2023-01-15', '2023-01-22',
               '2023-01-29', '2023-02-05', '2023-02-12', '2023-02-19',
               '2023-02-26', '2023-03-05', '2023-03-12', '2023-03-19',
               '2023-03-26', '2023-04-02', '2023-04-09', '2023-04-16',
               '2023-04-23', '2023-04-30', '2023-05-07', '2023-05-14',
               '2023-05-21', '2023-05-28', '2023-06-04', '2023-06-11',
               '2023-06-18', '2023-06-25', '2023-07-02', '2023-07-09',
               '2023-07-16', '2023-07-23', '2023-07-30', '2023-08-06',
               '2023-08-13', '2023-08-20', '2023-08-27', '2023-09-03',
               '2023-09-10', '2023-09-17', '2023-09-24', '2023-10-01',
               '2023-10-08', '2023-10-15', '2023-10-22', '2023-10-29',
               '2023-11-05', '2023-11-12', '2023-11-19', '2023-11-26',
               '2023-12-03', '2023-12-10', '2023-12-17', '2023-12-24',
               '2023-12-31'],
              dtype='datetime64[n

‚û°Ô∏è **`Example 4:` Creating a monthly range for a year**

In [13]:
monthly_range = pd.date_range(start='2023-01-01', end='2023-12-31', freq='ME')
print(f"Monthly Range:\n{monthly_range}")

Monthly Range:
DatetimeIndex(['2023-01-31', '2023-02-28', '2023-03-31', '2023-04-30',
               '2023-05-31', '2023-06-30', '2023-07-31', '2023-08-31',
               '2023-09-30', '2023-10-31', '2023-11-30', '2023-12-31'],
              dtype='datetime64[ns]', freq='ME')


‚û°Ô∏è **`Example 5:` Creating an hourly range for a day**

In [15]:
hourly_range = pd.date_range(start='2023-01-01', periods=24, freq='h')
print(f"Hourly Range:\n{hourly_range}")

Hourly Range:
DatetimeIndex(['2023-01-01 00:00:00', '2023-01-01 01:00:00',
               '2023-01-01 02:00:00', '2023-01-01 03:00:00',
               '2023-01-01 04:00:00', '2023-01-01 05:00:00',
               '2023-01-01 06:00:00', '2023-01-01 07:00:00',
               '2023-01-01 08:00:00', '2023-01-01 09:00:00',
               '2023-01-01 10:00:00', '2023-01-01 11:00:00',
               '2023-01-01 12:00:00', '2023-01-01 13:00:00',
               '2023-01-01 14:00:00', '2023-01-01 15:00:00',
               '2023-01-01 16:00:00', '2023-01-01 17:00:00',
               '2023-01-01 18:00:00', '2023-01-01 19:00:00',
               '2023-01-01 20:00:00', '2023-01-01 21:00:00',
               '2023-01-01 22:00:00', '2023-01-01 23:00:00'],
              dtype='datetime64[ns]', freq='h')


‚û°Ô∏è **`Task:` Create a DataFrame with a daily date range, and add a column 'Value' with numbers as per the List populated**

In [19]:
import pandas as pd
import numpy as np

value = [5, 4, 3, 6, 7, 2, 8]

# Creating Date_range
date_range = pd.date_range(
    start='2024-01-01',
    end='2024-01-07',
    freq='D')

# Creating DataFrame
df = pd.DataFrame({
    'Date': date_range,
    'Value': value
})
print(df)

        Date  Value
0 2024-01-01      5
1 2024-01-02      4
2 2024-01-03      3
3 2024-01-04      6
4 2024-01-05      7
5 2024-01-06      2
6 2024-01-07      8


### **Resampling Time Series Data in Pandas**
**Resampling** is the process of changing the frequency of time series data ‚Äî either reducing (downsampling) or increasing (upsampling) its frequency.
It helps in trend analysis, smoothing, and aligning multiple time-based datasets.

---
##### **Why Resample?**
* **Downsampling** ‚Üí Aggregate data to a lower frequency (e.g., daily ‚Üí weekly)
* **Upsampling** ‚Üí Increase frequency (e.g., daily ‚Üí hourly)
* **Align** ‚Üí Combine datasets with different frequencies
---
‚û°Ô∏è **Syntax:**
```python
df.resample(rule, on=None).agg_function()
```
| Parameter        | Description                                        |
| ---------------- | -------------------------------------------------- |
| **rule**         | Target frequency alias (e.g., `'D'`, `'W'`, `'M'`) |
| **on**           | Column to use for resampling if not the index      |
| **agg_function** | Aggregation like `sum()`, `mean()`, `max()`, etc.  |

---
‚û°Ô∏è **Other Common Aggregations**
```python
weekly_mean = daily_data.resample('W').mean()
weekly_max = daily_data.resample('W').max()
```
---
‚û°Ô∏è **Upsampling Example:** When increasing frequency, missing values can be filled:
```python
upsampled = daily_data.resample('H').ffill()  # Forward fill
```

‚û°Ô∏è **`Example 1:` Downsampling from daily to monthly**

In [27]:
# Create a sample daily time series data
dates = pd.date_range(start='2023-01-01', end='2023-12-31', freq='D')
daily_data = pd.Series(np.random.randn(len(dates)), index=dates)

print(f"Original Daily Data (first 5 rows):\n{daily_data.head()}\n")

# Example 1: Downsampling from daily to monthly
monthly_data = daily_data.resample('ME').mean()
print(f"Downsampled to Monthly Data:\n{monthly_data.head()}")

Original Daily Data (first 5 rows):
2023-01-01   -0.118694
2023-01-02   -0.327641
2023-01-03    1.602167
2023-01-04    0.666203
2023-01-05    0.705005
Freq: D, dtype: float64

Downsampled to Monthly Data:
2023-01-31    0.186668
2023-02-28    0.091677
2023-03-31    0.133282
2023-04-30   -0.033500
2023-05-31   -0.103888
Freq: ME, dtype: float64


‚û°Ô∏è **`Example 2:` Downsampling (Daily ‚Üí Weekly)**

In [26]:
weekly_sum = daily_data.resample('W').sum()
print(f"Weekly Sum Data:\n{weekly_sum.head()}")

Weekly Sum Data:
2023-01-01   -1.740572
2023-01-08    2.374792
2023-01-15   -0.441526
2023-01-22   -2.911413
2023-01-29    2.155111
Freq: W-SUN, dtype: float64


‚û°Ô∏è **`Example 3:` Upsampling from monthly to daily**

In [29]:
# Upsampling from monthly to daily
monthly_data_sparse = pd.Series([1, 2, 3, 4], index=pd.date_range(start='2023-01-31', end='2023-04-30', freq='ME'))
daily_data_filled = monthly_data_sparse.resample('D').ffill()

print(f"Upsampled from Monthly to Daily (Forward Fill):\n{daily_data_filled.head(10)}")

Upsampled from Monthly to Daily (Forward Fill):
2023-01-31    1
2023-02-01    1
2023-02-02    1
2023-02-03    1
2023-02-04    1
2023-02-05    1
2023-02-06    1
2023-02-07    1
2023-02-08    1
2023-02-09    1
Freq: D, dtype: int64


In [34]:
import pandas as pd

data = {
    'Date': [
        '2023-01-03', '2023-01-04', '2023-01-05', '2023-01-06', '2023-01-09', '2023-01-10', '2023-01-11', '2023-01-12', '2023-01-13',
        '2023-01-17', '2023-01-18', '2023-01-19', '2023-01-20', '2023-01-23', '2023-01-24', '2023-01-25', '2023-01-26', '2023-01-27',
        '2023-01-30', '2023-01-31', '2023-02-01', '2023-02-02', '2023-02-03', '2023-02-06', '2023-02-07', '2023-02-08', '2023-02-09',
        '2023-02-10', '2023-02-13', '2023-02-14', '2023-02-15', '2023-02-16', '2023-02-17', '2023-02-21', '2023-02-22', '2023-02-23',
        '2023-02-24', '2023-02-27', '2023-02-28', '2023-03-01', '2023-03-02', '2023-03-03', '2023-03-06', '2023-03-07', '2023-03-08',
        '2023-03-09', '2023-03-10', '2023-03-13', '2023-03-14', '2023-03-15', '2023-03-16', '2023-03-17', '2023-03-20', '2023-03-21',
        '2023-03-22', '2023-03-23', '2023-03-24', '2023-03-27', '2023-03-28', '2023-03-29', '2023-03-30', '2023-03-31'
    ],
    'Price': [
        130.28, 126.36, 125.02, 129.62, 130.15, 130.73, 133.49, 133.41, 134.76, 135.94, 135.21, 135.27, 133.42, 135.87, 143.00,
        141.86, 143.96, 141.99, 142.92, 144.29, 145.43, 150.82, 154.50, 151.73, 154.65, 151.92, 150.87, 151.01, 153.20, 153.71,
        155.33, 153.71, 152.55, 148.91, 148.91, 149.40, 146.71, 147.92, 147.41, 145.31, 150.59, 150.09, 153.83, 151.60, 152.87,
        150.59, 148.50, 150.47, 151.83, 152.22, 155.85, 155.00, 157.40, 159.28, 157.83, 158.93, 159.28, 158.28, 157.65, 160.77,
        160.19, 160.03
    ]
}

df = pd.DataFrame(data)
df['Date'] = pd.to_datetime(df['Date'])
df.set_index('Date', inplace=True)

# Weekly average prices
weekly_resample = df.resample('W').mean()
print(f"Weekly Average Price:\n{weekly_resample}\n")

# Monthly average prices
monthly_resample = df.resample('ME').mean()
print(f"Monthly average prices:\n{monthly_resample}")


Weekly Average Price:
               Price
Date                
2023-01-08  127.8200
2023-01-15  132.5080
2023-01-22  134.9600
2023-01-29  141.3360
2023-02-05  147.5920
2023-02-12  152.0360
2023-02-19  153.7000
2023-02-26  148.4825
2023-03-05  148.2640
2023-03-12  151.4780
2023-03-19  153.0740
2023-03-26  158.5440
2023-04-02  159.3840

Monthly average prices:
                 Price
Date                  
2023-01-31  135.377500
2023-02-28  150.983684
2023-03-31  154.712609
