<a href="https://colab.research.google.com/github/BENMEZIAN/Pandas-tutorial/blob/main/Time_series_in_pandas.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [7]:
import pandas as pd

# Creating a time series using date_range
date_range = pd.date_range(start='2025-01-01', end='2025-01-10', freq='D')
#print(date_range)

# Example dataframe with time series
df = pd.DataFrame({
    'date': date_range,
    'value': range(10)
})

# Time series indexing
df.set_index('date', inplace=True)

#df.reset_index(inplace=True)

print(df)

# Convert date type column to datetime
# df['date'] = pd.to_datetime(df['date'])

df.dtypes

# Resample to monthly frequency and calculate the sum of 'value'
df_resampled_month = df.resample('ME').sum()  # 'M' stands for month-end frequency
df_resampled_day = df.resample('D').sum()  # 'D' stands for daily frequency
df_resampled_week = df.resample('W').sum()  # 'W' stands for weekly frequency
df_resampled_year = df.resample('YE').sum()  # 'Y' stands for yearly frequency

# print them
print("Sample with month :")
print(df_resampled_month)
print("----------------------------")
print("Sample with day :")
print(df_resampled_day)
print("----------------------------")
print("Sample with week :")
print(df_resampled_week)
print("----------------------------")
print("Sample with year :")
print(df_resampled_year)
print("----------------------------")


            value
date             
2025-01-01      0
2025-01-02      1
2025-01-03      2
2025-01-04      3
2025-01-05      4
2025-01-06      5
2025-01-07      6
2025-01-08      7
2025-01-09      8
2025-01-10      9
Sample with month :
            value
date             
2025-01-31     45
----------------------------
Sample with day :
            value
date             
2025-01-01      0
2025-01-02      1
2025-01-03      2
2025-01-04      3
2025-01-05      4
2025-01-06      5
2025-01-07      6
2025-01-08      7
2025-01-09      8
2025-01-10      9
----------------------------
Sample with week :
            value
date             
2025-01-05     10
2025-01-12     35
----------------------------
Sample with year :
            value
date             
2025-12-31     45
----------------------------


In [None]:
"""
# prompt: Would you like to explain to me how the sampleing is performed ?

The provided code uses pandas' `resample()` function to perform downsampling of a time series. Let's break down how the sampling is done for each frequency:

**Understanding `resample()`**

The `resample()` function groups data into specified time intervals (e.g., monthly, weekly, daily) and then applies a aggregation function (e.g., `sum()`, `mean()`, `max()`) to each group.  In this code, the aggregation function used is `sum()`.

**Different Frequencies**

1. **`'ME'` (Month End):**
   - Groups the daily data into monthly buckets.  The 'ME' frequency groups data to the end of the month.
   - `sum()` calculates the sum of the 'value' column for each month. Since the data only spans from January 1st to January 10th, the 'value' sum is performed only for the partial month of January.

2. **`'D'` (Day):**
   - The data is already daily, so 'D' doesn't change the granularity.  Essentially, it's a no-op operation concerning the sampling, because the original data is daily.
   - `sum()` calculates the sum of the 'value' column for each *day*, which in this case, is just the original daily 'value'.

3. **`'W'` (Week):**
   - Groups the daily data into weekly buckets. The week starts on a Sunday by default.
   - `sum()` calculates the sum of the 'value' column for each week.  The first week will include the values from 2025-01-01 to 2025-01-05(Sunday), and second week will include from 2025-01-06 to 2025-01-10.


4. **`'YE'` (Year End):**
   - Groups the daily data into yearly buckets.
   - `sum()` calculates the sum of the 'value' column for the entire year.  Since the data is for a small period in January 2025, the sum will correspond to the entire dataset.


**In summary:** The `resample()` function aggregates data according to specified time intervals and applies the specified aggregation function. The different frequency parameters (`'ME'`, `'D'`, `'W'`, `'YE'`) define how the data is grouped for aggregation.  Note that the aggregation operations (sum, mean etc) apply only to the `value` column.  The `date` acts as the index when determining which data points fall within which time intervals.
"""

### Get Day of Week, Week of Year and Leap Year

In [13]:
import pandas as pd

# create a dataframe with a datetime column
df = pd.DataFrame({'datetime': ['2021-01-01', '2024-02-02', '2023-03-03']})

# convert the 'datetime' column to datetime type
df['datetime'] = pd.to_datetime(df['datetime'])

# get the day of the week
df['day_of_week'] = df['datetime'].dt.day_name()

# get the week of the year
df['week_of_year'] = df['datetime'].dt.isocalendar().week

# check for leap year
df['leap_year'] = df['datetime'].dt.is_leap_year

print(df)

    datetime day_of_week  week_of_year  leap_year
0 2021-01-01      Friday            53      False
1 2024-02-02      Friday             5       True
2 2023-03-03      Friday             9      False


In [8]:
# Extracting components from the datetime index
df['year'] = df.index.year
df['month'] = df.index.month
df['day'] = df.index.day
print(df)

            value  year  month  day
date                               
2025-01-01      0  2025      1    1
2025-01-02      1  2025      1    2
2025-01-03      2  2025      1    3
2025-01-04      3  2025      1    4
2025-01-05      4  2025      1    5
2025-01-06      5  2025      1    6
2025-01-07      6  2025      1    7
2025-01-08      7  2025      1    8
2025-01-09      8  2025      1    9
2025-01-10      9  2025      1   10


In [10]:
# Create a time series with UTC time zone
df_utc = df.copy()
df_utc.index = df_utc.index.tz_localize('UTC')
print(df_utc)

# Convert to a different time zone (e.g., 'US/Eastern')
df_eastern = df_utc.copy()
df_eastern.index = df_eastern.index.tz_convert('US/Eastern')
print(df_eastern)

                           value  year  month  day
date                                              
2025-01-01 00:00:00+00:00      0  2025      1    1
2025-01-02 00:00:00+00:00      1  2025      1    2
2025-01-03 00:00:00+00:00      2  2025      1    3
2025-01-04 00:00:00+00:00      3  2025      1    4
2025-01-05 00:00:00+00:00      4  2025      1    5
2025-01-06 00:00:00+00:00      5  2025      1    6
2025-01-07 00:00:00+00:00      6  2025      1    7
2025-01-08 00:00:00+00:00      7  2025      1    8
2025-01-09 00:00:00+00:00      8  2025      1    9
2025-01-10 00:00:00+00:00      9  2025      1   10
                           value  year  month  day
date                                              
2024-12-31 19:00:00-05:00      0  2025      1    1
2025-01-01 19:00:00-05:00      1  2025      1    2
2025-01-02 19:00:00-05:00      2  2025      1    3
2025-01-03 19:00:00-05:00      3  2025      1    4
2025-01-04 19:00:00-05:00      4  2025      1    5
2025-01-05 19:00:00-05:00      

In [11]:
# Handling Missing Data
# Creating a dataframe with missing values
date_range = pd.date_range(start='2025-01-01', end='2025-01-10', freq='D')
df = pd.DataFrame({
    'date': date_range,
    'value': [10, None, 30, None, 50, None, 70, None, 90, 100]
})
df.set_index('date', inplace=True)

# Forward fill to replace missing values
df_filled = df.fillna(method='ffill')
print(df_filled)

# Interpolation to fill missing values
df_interpolated = df.interpolate()
print(df_interpolated)

            value
date             
2025-01-01   10.0
2025-01-02   10.0
2025-01-03   30.0
2025-01-04   30.0
2025-01-05   50.0
2025-01-06   50.0
2025-01-07   70.0
2025-01-08   70.0
2025-01-09   90.0
2025-01-10  100.0
            value
date             
2025-01-01   10.0
2025-01-02   20.0
2025-01-03   30.0
2025-01-04   40.0
2025-01-05   50.0
2025-01-06   60.0
2025-01-07   70.0
2025-01-08   80.0
2025-01-09   90.0
2025-01-10  100.0


  df_filled = df.fillna(method='ffill')
