# üêº Pandas - Class 8: Working with Time Series
Welcome to **Class 8** of our Pandas series. Today we‚Äôll learn how to handle and analyze time series data.

## 1. Converting Columns to Datetime (`pd.to_datetime`)
- Use `pd.to_datetime()` to convert strings or numbers into datetime objects.
- Once converted, Pandas recognizes them as dates, enabling powerful operations.

In [90]:
import pandas as pd

# 1. Create the dataset
data = {
    "OrderID": [1001, 1002, 1003, 1004, 1005],
    "OrderDate": ["2024-01-05", "2024/01/12", "15-01-2024", "2024-01-20", "20240125"],
    "Customer": ["Alice", "Bob", "Charlie", "David", "Emma"],
    "Amount": [250, 400, 150, 300, 500]
}

df = pd.DataFrame(data)
print("Original DataFrame:")
df


Original DataFrame:


Unnamed: 0,OrderID,OrderDate,Customer,Amount
0,1001,2024-01-05,Alice,250
1,1002,2024/01/12,Bob,400
2,1003,15-01-2024,Charlie,150
3,1004,2024-01-20,David,300
4,1005,20240125,Emma,500


In [91]:
# 2. Convert the 'OrderDate' column to datetime
df["OrderDate"] = pd.to_datetime(df["OrderDate"], dayfirst=True, errors="coerce")
print("\nAfter converting OrderDate to datetime:")
df


After converting OrderDate to datetime:


Unnamed: 0,OrderID,OrderDate,Customer,Amount
0,1001,2024-05-01,Alice,250
1,1002,NaT,Bob,400
2,1003,NaT,Charlie,150
3,1004,NaT,David,300
4,1005,NaT,Emma,500


In [92]:
# 3. Verify data types
print("\nData types after conversion:")
print(df.dtypes)


Data types after conversion:
OrderID               int64
OrderDate    datetime64[ns]
Customer             object
Amount                int64
dtype: object


In [93]:
# 4. (Optional) Handle any rows where conversion failed
if df["OrderDate"].isna().any():
    print("\nRows with invalid dates:")
    print(df[df["OrderDate"].isna()])


Rows with invalid dates:
   OrderID OrderDate Customer  Amount
1     1002       NaT      Bob     400
2     1003       NaT  Charlie     150
3     1004       NaT    David     300
4     1005       NaT     Emma     500


## 2. Setting DateTime Index & Resampling
- Set a datetime column as the index with `set_index()`.
- Resample time series using `resample('M').mean()` or other rules ('D', 'W', 'Y').
- Useful for aggregating data by time periods.

In [94]:
# 1Ô∏è‚É£ Convert OrderDate to datetime
df["OrderDate"] = pd.to_datetime(df["OrderDate"], dayfirst=True, errors="coerce")
print("DataFrame after converting dates:")
df

DataFrame after converting dates:


Unnamed: 0,OrderID,OrderDate,Customer,Amount
0,1001,2024-05-01,Alice,250
1,1002,NaT,Bob,400
2,1003,NaT,Charlie,150
3,1004,NaT,David,300
4,1005,NaT,Emma,500


In [95]:
# 2Ô∏è‚É£ Set OrderDate as the index
df_idx = df.set_index("OrderDate")
print("\nDataFrame with OrderDate as index:")
df_idx


DataFrame with OrderDate as index:


Unnamed: 0_level_0,OrderID,Customer,Amount
OrderDate,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2024-05-01,1001,Alice,250
NaT,1002,Bob,400
NaT,1003,Charlie,150
NaT,1004,David,300
NaT,1005,Emma,500


In [96]:
# 3Ô∏è‚É£ Resample by month to compute total Amount per month
monthly_total = df_idx.resample("M")["Amount"].sum()
print("\nTotal Amount per month:")
monthly_total


Total Amount per month:


  monthly_total = df_idx.resample("M")["Amount"].sum()


Unnamed: 0_level_0,Amount
OrderDate,Unnamed: 1_level_1
2024-05-31,250


In [97]:
# 4Ô∏è‚É£ Resample by week to compute average Amount per week
weekly_avg = df_idx.resample("W")["Amount"].mean()
print("\nAverage Amount per week:")
weekly_avg


Average Amount per week:


Unnamed: 0_level_0,Amount
OrderDate,Unnamed: 1_level_1
2024-05-05,250.0


In [98]:
# Convert OrderDate to datetime
df["OrderDate"] = pd.to_datetime(df["OrderDate"], dayfirst=True, errors="coerce")
print("\nAfter converting to datetime:")
print(df)

# Set OrderDate as the index
df = df.set_index("OrderDate")
print("\nDataFrame with DateTime index:")
print(df)

# Slice all orders in January 2024
print("\nOrders in January 2024:")
print(df.loc["2024-01"])

# Filter with condition: Amount > 300 after Jan 12
print("\nOrders after 2024-01-12 with Amount > 300:")
print(df[(df.index > "2024-01-12") & (df["Amount"] > 300)])


After converting to datetime:
   OrderID  OrderDate Customer  Amount
0     1001 2024-05-01    Alice     250
1     1002        NaT      Bob     400
2     1003        NaT  Charlie     150
3     1004        NaT    David     300
4     1005        NaT     Emma     500

DataFrame with DateTime index:
            OrderID Customer  Amount
OrderDate                           
2024-05-01     1001    Alice     250
NaT            1002      Bob     400
NaT            1003  Charlie     150
NaT            1004    David     300
NaT            1005     Emma     500

Orders in January 2024:
Empty DataFrame
Columns: [OrderID, Customer, Amount]
Index: []

Orders after 2024-01-12 with Amount > 300:
Empty DataFrame
Columns: [OrderID, Customer, Amount]
Index: []


In [99]:
# TASK: Use rolling() to compute a moving average.
# Use expanding() to calculate cumulative metrics.

## mini-project

Converting columns to datetime (pd.to_datetime)

Setting a DateTime index & resampling

In [100]:
import pandas as pd

# 1Ô∏è‚É£ Create a dataset of website visits (string dates + visit counts)
data = {
    "VisitDate": [
        "2024-01-03", "2024/01/08", "15-01-2024", "2024-02-02",
        "2024-02-15", "2024/03/01", "2024-03-12", "01-04-2024"
    ],
    "Visitors": [120, 150, 90, 200, 180, 220, 210, 300]
}

df = pd.DataFrame(data)
print("Original DataFrame:")
df

Original DataFrame:


Unnamed: 0,VisitDate,Visitors
0,2024-01-03,120
1,2024/01/08,150
2,15-01-2024,90
3,2024-02-02,200
4,2024-02-15,180
5,2024/03/01,220
6,2024-03-12,210
7,01-04-2024,300


In [101]:
# 2Ô∏è‚É£ Convert VisitDate to datetime
df["VisitDate"] = pd.to_datetime(df["VisitDate"], dayfirst=True, errors="coerce")
print("\nAfter converting VisitDate to datetime:")
df


After converting VisitDate to datetime:


Unnamed: 0,VisitDate,Visitors
0,2024-03-01,120
1,NaT,150
2,NaT,90
3,2024-02-02,200
4,NaT,180
5,NaT,220
6,2024-12-03,210
7,NaT,300


In [102]:
# 3Ô∏è‚É£ Set VisitDate as the index
df = df.set_index("VisitDate")
print("\nDataFrame with VisitDate as index:")
df


DataFrame with VisitDate as index:


Unnamed: 0_level_0,Visitors
VisitDate,Unnamed: 1_level_1
2024-03-01,120
NaT,150
NaT,90
2024-02-02,200
NaT,180
NaT,220
2024-12-03,210
NaT,300


In [103]:
# 4Ô∏è‚É£ Resample to get total visitors per month
monthly_total = df.resample("M")["Visitors"].sum()
print("\nTotal visitors per month:")
monthly_total


Total visitors per month:


  monthly_total = df.resample("M")["Visitors"].sum()


Unnamed: 0_level_0,Visitors
VisitDate,Unnamed: 1_level_1
2024-02-29,200
2024-03-31,120
2024-04-30,0
2024-05-31,0
2024-06-30,0
2024-07-31,0
2024-08-31,0
2024-09-30,0
2024-10-31,0
2024-11-30,0


In [104]:
# 5Ô∏è‚É£ Resample to get average visitors per week
weekly_avg = df.resample("W")["Visitors"].mean()
print("\nAverage visitors per week:")
weekly_avg


Average visitors per week:


Unnamed: 0_level_0,Visitors
VisitDate,Unnamed: 1_level_1
2024-02-04,200.0
2024-02-11,
2024-02-18,
2024-02-25,
2024-03-03,120.0
2024-03-10,
2024-03-17,
2024-03-24,
2024-03-31,
2024-04-07,


---
## Summary
- Converted columns to datetime with `pd.to_datetime`.
- Set DateTime index and resampled data.
- Performed date-based filtering and slicing.
- Applied rolling and expanding window calculations.