## **1. The datetime Data Type**
The key to unlocking Pandas' time series functionality is to ensure your date/time columns are of the correct **datetime64[ns]** data type.

In [1]:
import pandas as pd

# Load the data. Pandas is often smart enough to infer dates, but we'll do it manually.
df = pd.read_csv("daily_temperature_data.csv")
df.info() # Note that 'Date' is initially an 'object' (string)

# Convert the 'Date' column to datetime objects
df['Date'] = pd.to_datetime(df['Date'])
df.info() # Now 'Date' is datetime64[ns]

# Set the 'Date' column as the index. This is standard practice for time series analysis.
df.set_index('Date', inplace=True)
print("\n--- DataFrame with DatetimeIndex ---")
df

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 15 entries, 0 to 14
Data columns (total 2 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   Date         15 non-null     object 
 1   Temperature  15 non-null     float64
dtypes: float64(1), object(1)
memory usage: 372.0+ bytes
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 15 entries, 0 to 14
Data columns (total 2 columns):
 #   Column       Non-Null Count  Dtype         
---  ------       --------------  -----         
 0   Date         15 non-null     datetime64[ns]
 1   Temperature  15 non-null     float64       
dtypes: datetime64[ns](1), float64(1)
memory usage: 372.0 bytes

--- DataFrame with DatetimeIndex ---


Unnamed: 0_level_0,Temperature
Date,Unnamed: 1_level_1
2023-01-01,35.2
2023-01-02,34.9
2023-01-03,36.1
2023-01-04,33.8
2023-01-05,37.0
2023-01-15,38.5
2023-01-16,39.1
2023-01-30,40.2
2023-01-31,41.5
2023-02-01,42.0


## **2. Time-based Indexing and Slicing**
Once you have a DatetimeIndex, you can select data in very intuitive ways.

In [2]:
# Select a single day's data
print("\n--- Temperature on Jan 5, 2023 ---")
print(df.loc['2023-01-05'])


--- Temperature on Jan 5, 2023 ---
Temperature    37.0
Name: 2023-01-05 00:00:00, dtype: float64


In [3]:
# Select an entire month
print("\n--- All data for January 2023 ---")
df.loc['2023-01']


--- All data for January 2023 ---


Unnamed: 0_level_0,Temperature
Date,Unnamed: 1_level_1
2023-01-01,35.2
2023-01-02,34.9
2023-01-03,36.1
2023-01-04,33.8
2023-01-05,37.0
2023-01-15,38.5
2023-01-16,39.1
2023-01-30,40.2
2023-01-31,41.5


In [4]:
# Slice a range of dates
print("\n--- Data from Jan 15 to Feb 15, 2023 ---")
df.loc['2023-01-15':'2023-02-15']


--- Data from Jan 15 to Feb 15, 2023 ---


Unnamed: 0_level_0,Temperature
Date,Unnamed: 1_level_1
2023-01-15,38.5
2023-01-16,39.1
2023-01-30,40.2
2023-01-31,41.5
2023-02-01,42.0
2023-02-02,43.1
2023-02-15,44.5


## **3. The .dt Accessor**
Just like .str for strings, the .dt accessor allows you to easily extract properties from datetime objects in a Series.

In [5]:
# Let's reset the index to demonstrate the .dt accessor on a column
df_no_index = pd.read_csv("daily_temperature_data.csv")
df_no_index['Date'] = pd.to_datetime(df_no_index['Date'])

# Extract various properties
df_no_index['Year'] = df_no_index['Date'].dt.year
df_no_index['Month'] = df_no_index['Date'].dt.month
df_no_index['Day'] = df_no_index['Date'].dt.day
df_no_index['Day of Week'] = df_no_index['Date'].dt.day_name() # e.g., 'Monday'
df_no_index['Week of Year'] = df_no_index['Date'].dt.isocalendar().week

print("\n--- DataFrame with extracted date properties ---")
df_no_index.head()


--- DataFrame with extracted date properties ---


Unnamed: 0,Date,Temperature,Year,Month,Day,Day of Week,Week of Year
0,2023-01-01,35.2,2023,1,1,Sunday,52
1,2023-01-02,34.9,2023,1,2,Monday,1
2,2023-01-03,36.1,2023,1,3,Tuesday,1
3,2023-01-04,33.8,2023,1,4,Wednesday,1
4,2023-01-05,37.0,2023,1,5,Thursday,1


## **4. Resampling**
Resampling is the process of changing the time frequency of your data. This is one of the most powerful time series features.
- **Downsampling:** Aggregating data to a lower frequency (e.g., from daily to monthly). You must provide an aggregation function (like .mean(), .sum(), .max()).
- **Upsampling:** Converting data to a higher frequency (e.g., from daily to hourly). You must specify how to fill the new, empty data points (like .fillna('ffill')).
- The .resample() method is used for this. It works very much like .groupby().

In [6]:
# Use the DataFrame with the DatetimeIndex
print("\n--- Original DataFrame ---")
df


--- Original DataFrame ---


Unnamed: 0_level_0,Temperature
Date,Unnamed: 1_level_1
2023-01-01,35.2
2023-01-02,34.9
2023-01-03,36.1
2023-01-04,33.8
2023-01-05,37.0
2023-01-15,38.5
2023-01-16,39.1
2023-01-30,40.2
2023-01-31,41.5
2023-02-01,42.0


In [7]:
# Downsample to monthly frequency, taking the mean temperature for each month
monthly_mean_temp = df['Temperature'].resample('ME').mean()
# 'ME' is a frequency string for 'Month End'. Others include 'W' (Weekly), 'Q' (Quarterly)
print("\n--- Monthly Mean Temperature (Downsampled) ---")
monthly_mean_temp


--- Monthly Mean Temperature (Downsampled) ---


Date
2023-01-31    37.366667
2023-02-28    43.100000
2023-03-31    45.650000
Freq: ME, Name: Temperature, dtype: float64

In [8]:
# Downsample to weekly, taking the max temperature
weekly_max_temp = df['Temperature'].resample('W').max()
print("\n--- Weekly Max Temperature (Downsampled) ---")
weekly_max_temp


--- Weekly Max Temperature (Downsampled) ---


Date
2023-01-01    35.2
2023-01-08    37.0
2023-01-15    38.5
2023-01-22    39.1
2023-01-29     NaN
2023-02-05    43.1
2023-02-12     NaN
2023-02-19    44.5
2023-02-26     NaN
2023-03-05    46.0
Freq: W-SUN, Name: Temperature, dtype: float64

## **5. Rolling Windows and Shifting**
- **Rolling Windows (.rolling()):** Calculates aggregations over a sliding window of a defined size. Very common for calculating things like moving averages to smooth out data.
- **Shifting (.shift()):** Shifts the data in the index forward or backward. Useful for comparing a value to the value from the previous period.

In [9]:
# Calculate the 3-day rolling average temperature
df['3-Day Rolling Avg'] = df['Temperature'].rolling(window=3).mean()
print("\n--- DataFrame with 3-Day Rolling Average ---")
df # Notice the first two values are NaN as there's not enough data for the window


--- DataFrame with 3-Day Rolling Average ---


Unnamed: 0_level_0,Temperature,3-Day Rolling Avg
Date,Unnamed: 1_level_1,Unnamed: 2_level_1
2023-01-01,35.2,
2023-01-02,34.9,
2023-01-03,36.1,35.4
2023-01-04,33.8,34.933333
2023-01-05,37.0,35.633333
2023-01-15,38.5,36.433333
2023-01-16,39.1,38.2
2023-01-30,40.2,39.266667
2023-01-31,41.5,40.266667
2023-02-01,42.0,41.233333


In [10]:
# Create a column showing yesterday's temperature
df['Yesterday\'s Temp'] = df['Temperature'].shift(1)
print("\n--- DataFrame with Yesterday's Temperature ---")
df


--- DataFrame with Yesterday's Temperature ---


Unnamed: 0_level_0,Temperature,3-Day Rolling Avg,Yesterday's Temp
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2023-01-01,35.2,,
2023-01-02,34.9,,35.2
2023-01-03,36.1,35.4,34.9
2023-01-04,33.8,34.933333,36.1
2023-01-05,37.0,35.633333,33.8
2023-01-15,38.5,36.433333,37.0
2023-01-16,39.1,38.2,38.5
2023-01-30,40.2,39.266667,39.1
2023-01-31,41.5,40.266667,40.2
2023-02-01,42.0,41.233333,41.5


In [11]:
# Calculate the daily change in temperature
df['Daily Change'] = df['Temperature'] - df['Yesterday\'s Temp']
print("\n--- DataFrame with Daily Change ---")
df


--- DataFrame with Daily Change ---


Unnamed: 0_level_0,Temperature,3-Day Rolling Avg,Yesterday's Temp,Daily Change
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2023-01-01,35.2,,,
2023-01-02,34.9,,35.2,-0.3
2023-01-03,36.1,35.4,34.9,1.2
2023-01-04,33.8,34.933333,36.1,-2.3
2023-01-05,37.0,35.633333,33.8,3.2
2023-01-15,38.5,36.433333,37.0,1.5
2023-01-16,39.1,38.2,38.5,0.6
2023-01-30,40.2,39.266667,39.1,1.1
2023-01-31,41.5,40.266667,40.2,1.3
2023-02-01,42.0,41.233333,41.5,0.5


## **Exercises**

**1. Date Range Generation and Indexing:**
- Create a date range for every day in May 2024 using pd.date_range(). (Hint: pd.date_range(start='2024-05-01', end='2024-05-31', freq='D')).
- Create a Pandas Series with this date range as the index and random integer values between 50 and 80 as the data. Name this series may_temps.
- From may_temps, select the temperature for May 10th, 2024.
- Select all temperatures for the second week of May (May 7th to May 13th).

In [12]:
import numpy as np

may_dates = pd.date_range(start ='2024-05-01', end='2024-05-31', freq='D')
may_temps = pd.Series(np.random.randint(50, 81, size = len(may_dates)), index=may_dates, name='Temperature')
may_temps.head()

2024-05-01    78
2024-05-02    54
2024-05-03    66
2024-05-04    73
2024-05-05    52
Freq: D, Name: Temperature, dtype: int32

In [13]:
temp_may_10 = may_temps['2024-05-10']
print(f"\n--- Temperature for May 10th, 2024: {temp_may_10} ---")

second_week_temps = may_temps['2024-05-07':'2024-05-13']

print("\n--- Temperatures for the Second Week of May (May 7th to May 13th) ---")
print(second_week_temps)


--- Temperature for May 10th, 2024: 62 ---

--- Temperatures for the Second Week of May (May 7th to May 13th) ---
2024-05-07    75
2024-05-08    71
2024-05-09    72
2024-05-10    62
2024-05-11    71
2024-05-12    76
2024-05-13    50
Freq: D, Name: Temperature, dtype: int32


**2. Resampling Practice:**
- Use the may_temps Series you just created.
- Resample the daily temperatures to a weekly frequency, calculating the mean, min, and max temperature for each week. Use the .agg() method.

In [14]:
weekly_summary_temps = may_temps.resample('W').agg(['mean', 'min', 'max'])

print("\n--- Weekly Temperature Summary (Mean, Min, Max) ---")
weekly_summary_temps


--- Weekly Temperature Summary (Mean, Min, Max) ---


Unnamed: 0,mean,min,max
2024-05-05,64.6,52,78
2024-05-12,69.0,56,76
2024-05-19,63.285714,50,79
2024-05-26,63.0,51,78
2024-06-02,64.4,52,80


In [15]:
weekly_summary_temps_named = may_temps.resample('W').agg(
    Weekly_Mean='mean',
    Weekly_Min='min',
    Weekly_Max='max'
)

print("\n--- Weekly Temperature Summary (with Custom Column Names) ---")
weekly_summary_temps_named


--- Weekly Temperature Summary (with Custom Column Names) ---


Unnamed: 0,Weekly_Mean,Weekly_Min,Weekly_Max
2024-05-05,64.6,52,78
2024-05-12,69.0,56,76
2024-05-19,63.285714,50,79
2024-05-26,63.0,51,78
2024-06-02,64.4,52,80


**3. Rolling Windows and Analysis:**
- Use the original df (loaded from daily_temperature_data.csv with a DatetimeIndex).
- Calculate the 7-day rolling standard deviation of the temperature. Standard deviation is a measure of volatility or variation. Store this in a new column called '7-Day Temp Volatility'.
- Find the date with the highest 7-day temperature volatility. (Hint: after calculating the column, you can use .idxmax() on that column to get the index/date of the maximum value).

In [17]:
df = pd.read_csv("daily_temperature_data.csv")
df.info() # Note that 'Date' is initially an 'object' (string)

# Convert the 'Date' column to datetime objects
df['Date'] = pd.to_datetime(df['Date'])
df.info() # Now 'Date' is datetime64[ns]

# Set the 'Date' column as the index. This is standard practice for time series analysis.
df.set_index('Date', inplace=True)
print("\n--- DataFrame with DatetimeIndex ---")
df

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 15 entries, 0 to 14
Data columns (total 2 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   Date         15 non-null     object 
 1   Temperature  15 non-null     float64
dtypes: float64(1), object(1)
memory usage: 372.0+ bytes
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 15 entries, 0 to 14
Data columns (total 2 columns):
 #   Column       Non-Null Count  Dtype         
---  ------       --------------  -----         
 0   Date         15 non-null     datetime64[ns]
 1   Temperature  15 non-null     float64       
dtypes: datetime64[ns](1), float64(1)
memory usage: 372.0 bytes

--- DataFrame with DatetimeIndex ---


Unnamed: 0_level_0,Temperature
Date,Unnamed: 1_level_1
2023-01-01,35.2
2023-01-02,34.9
2023-01-03,36.1
2023-01-04,33.8
2023-01-05,37.0
2023-01-15,38.5
2023-01-16,39.1
2023-01-30,40.2
2023-01-31,41.5
2023-02-01,42.0


In [18]:
df['7-Day Temp Volatility'] = df['Temperature'].rolling(window=7).std()

print("\n--- DataFrame with 7-Day Temperature Volatility (first 10 rows) ---")
print(df.head(10)) # Show more rows to see non-NaN values
print("-" * 30)

if not df['7-Day Temp Volatility'].dropna().empty:
    date_highest_volatility = df['7-Day Temp Volatility'].dropna().idxmax()
    highest_volatility_value = df['7-Day Temp Volatility'].max()

    print(f"\n--- Date with Highest 7-Day Temperature Volatility: {date_highest_volatility.strftime('%Y-%m-%d')} ---")
    print(f"--- Highest Volatility Value: {highest_volatility_value:.2f} ---")
else:
    print("\n--- No 7-Day Temperature Volatility calculated (data might be too short or sparse). ---")
print("-" * 30)


--- DataFrame with 7-Day Temperature Volatility (first 10 rows) ---
            Temperature  7-Day Temp Volatility
Date                                          
2023-01-01         35.2                    NaN
2023-01-02         34.9                    NaN
2023-01-03         36.1                    NaN
2023-01-04         33.8                    NaN
2023-01-05         37.0                    NaN
2023-01-15         38.5                    NaN
2023-01-16         39.1               1.940545
2023-01-30         40.2               2.320509
2023-01-31         41.5               2.607498
2023-02-01         42.0               2.824721
------------------------------

--- Date with Highest 7-Day Temperature Volatility: 2023-02-01 ---
--- Highest Volatility Value: 2.82 ---
------------------------------
