# Lecture: Handling DateTime Data in Pandas

## Objectives
By the end of this lecture, students will be able to:
1. Understand how to handle datetime types in Pandas.
3. Convert string dates to datetime objects and vice versa.
4. Perform various datetime operations such as indexing, filtering, and resampling.
5. Utilize datetime functionalities for data analysis.



____

## Introduction to Datetime in Pandas

**What is Datetime?**
Datetime refers to the representation of dates and times in a standard format. In Pandas, datetime objects are essential for time series analysis, data manipulation, and visualization.

Pandas provides two primary types for handling dates and times:
- `datetime64[ns]`: A numpy data type for timestamps, with nanosecond precision.
- `timedelta[ns]`: Represents differences between two datetime objects.

_____

## Dataset

Let's create a sample dataset that contains sales data over a month, using string representations for the dates. The dataset will include:
- `date`: Dates of sales in string format.
- `sales`: Amount of sales.
- `category`: Category of the product sold.

In [None]:
import pandas as pd
import numpy as np

# Create a list of dates in string format
date_strings = [
    '2023-02-01 08:45', '2023-01-02 14:30', '2023-05-03 09:15', '2023-02-04 16:20', '2023-12-05 12:00',
    '2023-04-06 18:50', '2023-02-07 10:25', '2023-12-08 13:40', '2023-03-09 11:00', '2023-12-10 15:30',
    '2023-06-11 09:10', '2023-12-12 17:00', '2023-11-13 08:15', '2023-04-14 12:30', '2023-11-15 14:20',
    '2023-08-16 19:45', '2023-11-17 10:10', '2023-11-18 11:50', '2023-05-19 13:30', '2023-10-20 18:25',
    '2023-10-21 14:05', '2023-01-22 15:35', '2023-06-23 09:55', '2023-06-24 16:45', '2023-09-25 08:20',
    '2023-09-26 12:15', '2023-03-27 18:30', '2023-02-28 10:50', '2023-07-29 13:05', '2023-08-30 14:40',
    '2023-10-31 17:55'
]

# Generate random sales data
sales_data = np.random.randint(100, 500, size=len(date_strings))

# Define categories
categories = ['Electronics', 'Clothing', 'Groceries', 'Home', 'Sports']
category_data = np.random.choice(categories, size=len(date_strings))

# Create the DataFrame
sales_df = pd.DataFrame({
    'date': date_strings,
    'sales': sales_data,
    'category': category_data
})

In [None]:
sales_df.info()

______

## Converting Between String and Datetime

To convert the string representation of dates into datetime objects, we can use the `pd.to_datetime()` function.

#### Step-by-Step Conversion
1. **Select the column to convert**: In this case, we will convert the `date` column.
2. **Use `pd.to_datetime()`**: This function automatically detects the format of the date strings and converts them.


In [None]:
sales_df['date'] = pd.to_datetime(sales_df['date'])

sales_df.head()
sales_df.info()

_____

## Working with Datetime

After converting to datetime, you can easily access components like year, month, day, etc.

In [None]:
sales_df['date']

In [None]:
sales_df['year'] = sales_df['date'].dt.year
sales_df['month'] = sales_df['date'].dt.month
sales_df['day'] = sales_df['date'].dt.day
sales_df['hour'] = sales_df['date'].dt.hour
sales_df['minute'] = sales_df['date'].dt.minute


____

## Filtering by date

We can filter the data based on specific dates in a natural way.

In [None]:
sales_df

In [None]:
december_filter = sales_df['month'] == 12

sales_df[december_filter]

In [None]:
december_filter = sales_df['date'].dt.month == 12

sales_df[december_filter]


In [None]:
winter_months_filter = sales_df['date'].dt.month.isin([12, 1, 2])

sales_df[winter_months_filter]

In [None]:
autumn_months_filter = sales_df['date'].dt.month.between(9, 11)

sales_df[autumn_months_filter]


In [None]:
afternoon_hours = sales_df['date'].dt.hour.between(12, 17)

sales_df[afternoon_hours]

____

## Plotting with Datetime

Since datetime is ordered, it's easy to plot data over time.



In [None]:
afternoon_sales_df = sales_df[afternoon_hours]

afternoon_sales_df

In [None]:
import matplotlib.pyplot as plt

plt.figure(figsize=(12, 6))
plt.scatter(sales_df['date'], sales_df['sales'], marker='o', label='Daily Sales')
plt.title('Daily Sales in January 2023')
plt.xlabel('Date')
plt.ylabel('Sales')
plt.xticks(rotation=45)
plt.legend()
plt.tight_layout()
plt.show()

______

## Some more basics, and manual intervention

Here's some more cute basics.

In [None]:
from datetime import datetime

In [None]:
now = datetime.now()

print(now)

# Specifying a Format with `pd.to_datetime`

Sometimes, we need to provide a specific format to `pd.to_datetime()` because Pandas might not automatically interpret certain date string formats correctly. Specifying the format makes the conversion faster and more accurate.

*When can it go wrong?*

1. **Ambiguous Formats**: Some date strings can be ambiguous. For example, `01-02-2023` could be interpreted as January 2, 2023 (MM-DD-YYYY) or February 1, 2023 (DD-MM-YYYY). Specifying the format clarifies this.
2. **Non-Standard Formats**: Dates in formats like `2023/01/01` or `01-Jan-2023 08:30` are not standard and may not be parsed correctly by default.
3. **Speed Optimization**: Specifying the format can make the conversion faster, especially with large datasets, as Pandas doesn’t have to infer the format.



**The language of datetime**

The `format` parameter in `pd.to_datetime()` accepts standard format codes. Here are a few examples:

**%Y**: Year with century as a number (e.g., '2023').

**%y**: Year without century as a zero-padded number (e.g., '23' for '2023').

**%m**: Month as a either a zero-padded number or straight-up number (e.g., '03' or '3' for March and 12 for December).

**%d**: Day of the month as a zero-padded number or straight-up number (e.g., '09', '9' or '18').

**%H**: Hour (24-hour clock) as a zero-padded number or straight up number(e.g., '12' for noon, and '03' or '3' for 3 o'clock).
    
**%I**: Hour (12-hour clock) as a zero-padded number or straight up number (e.g., '12' for noon, and '03' or '3' for 3 o'clock).
    
**%M**: Minute as a zero-padded number or straight up number (e.g., '03' or '3', and '22').
    
**%S**: Second as a zero-padded number or straight up number (e.g., '00' or '0').
    
**%p**: AM or PM (used with 12-hour clock) (e.g., 'AM' or 'PM').

For example, if you have dates in the format `"DD-MM-YYYY HH:MM"`, you would specify it as `"%d-%m-%Y %H:%M"`.



In [None]:
# Sample dataset with date strings in non-standard format

data = {
    'date': ['01/02/2023 08:30', '15/03/2023 10:45', '28/04/2023 14:20'],
    'sales': [200, 300, 250]
}

# Create a DataFrame
df = pd.DataFrame(data)

# Specifying the format explicitly
df['date_manual'] = pd.to_datetime(df['date'], format='%d/%m/%Y %H:%M')

# Display the DataFrame to compare
df