# A. Python's datetime.datetime class and pandas' Timestamp and DatetimeIndex classes

The relationship between Python's `datetime.datetime` class and pandas' `Timestamp` and `DatetimeIndex` classes revolves around how pandas extends and optimizes the functionality of Python's built-in datetime for handling date and time data in a more efficient and powerful way.

In [1]:
# import the libraries 
import pandas as pd
import numpy as np
import datetime as datetime
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style('darkgrid')

%matplotlib inline

`datetime.datetime`:

This is a built-in Python class for representing date and time. It provides basic functionalities for date and time arithmetic, comparisons, and formatting. It is **part of the datetime module** and can be created with a specific date and time.

In [3]:
date_1 = datetime.datetime(2024, 9, 3, 17, 52, 44)
print(f'The date is given by : {date_1}')
print(f'The type of the variable, date_1, is: {type(date_1)}')
print(f'Is date_1 an instance of the datetime.datetime class? {isinstance(date_1,datetime.datetime)}')

The date is given by : 2024-09-03 17:52:44
The type of the variable, date_1, is: <class 'datetime.datetime'>
Is date_1 an instance of the datetime.datetime class? True


`pandas.Timestamp`:

`pandas.Timestamp` is essentially pandas equivalent of `datetime.datetime`, but it is **more powerful and optimized for use with pandas data structures**. 

**Timestamp is built on top of `datetime.datetime` but includes additional features**, such as:
- **Nanosecond precision**: Unlike datetime.datetime, which typically handles microseconds, pandas.Timestamp can handle nanosecond-level precision.
- **Compatibility**: pandas.Timestamp is fully compatible with `datetime.datetime` and can be used interchangeably in most cases.
- **Additional Methods**: Provides many additional methods for accessing different components of the date (like .month, .day, .year), time zones handling (tz_localize, tz_convert), and date offsets (floor, ceil, round).

In [3]:
# convert the datetime.datetime object to a pd.Timestamp object
date_2 = pd.Timestamp(date_1)
print(f'The date is given by : {date_2}')
print(f'The type of the variable, date_2, is: {type(date_2)}')
print(f'Is date_2 an instance of the pd.Timestamp class? {isinstance(date_2,pd.Timestamp)}')

The date is given by : 2024-09-03 17:52:44
The type of the variable, date_2, is: <class 'pandas._libs.tslibs.timestamps.Timestamp'>
Is date_2 an instance of the pd.Timestamp class? True


In [4]:
# We have created two instances of different classes. Are they equal?
date_1 == date_2

True

In [5]:
# Why?
print(f'Is date_2 an instance of the pd.Timestamp class? {isinstance(date_2,pd.Timestamp)}')#True
print(f'Is date_2 an instance of the datetime.datetime class? {isinstance(date_2,pd.Timestamp)}')#True

# Both are true since the pandas Timestamp class is nothing but a subclass of the datetime class of 
# datetime module!

# Going, by the logic of subclasses, date_1 should not be a instance of the Timestamp class! 
print(f'Is date_1 an instance of the pd.Timestamp class? {isinstance(date_1,pd.Timestamp)}') #False

Is date_2 an instance of the pd.Timestamp class? True
Is date_2 an instance of the datetime.datetime class? True
Is date_1 an instance of the pd.Timestamp class? False


In [6]:
# Applyibg the additional methods to the Timestamp object
print(f'The year from the date, {date_2}, is : {date_2.year}')
print(f'The month from the date, {date_2}, is : {date_2.month}')
print(f'The day from the date, {date_2}, is : {date_2.day}')

The year from the date, 2024-09-03 17:52:44, is : 2024
The month from the date, 2024-09-03 17:52:44, is : 9
The day from the date, 2024-09-03 17:52:44, is : 3


`pandas.DatetimeIndex`:

`DatetimeIndex` is an **index class for pandas DataFrames or Series**, specifically designed to handle date and time data. It is a **collection of pandas.Timestamp objects**.

- DatetimeIndex allows for efficient date and time indexing, slicing, and resampling. It is ideal for time-series data.
- Provides methods for quickly accessing date components (year, month, day), handling time zones, performing date arithmetic, and more.
- Offers vectorized operations on date ranges, making it highly efficient for operations like filtering or grouping by specific time intervals.

In [7]:
# Create a DatetimeIndex from a list of datetime.datetime objects
datetime_list = [datetime.datetime(2023, 9, 3), datetime.datetime(2024, 1, 1)]
datetime_index = pd.DatetimeIndex(datetime_list)
print(datetime_index)  # Output: DatetimeIndex(['2023-09-03', '2024-01-01'], dtype='datetime64[ns]', freq=None)

DatetimeIndex(['2023-09-03', '2024-01-01'], dtype='datetime64[ns]', freq=None)


In [8]:
print(f'The first date from the datatime_index is : {datetime_index[0]}')
print(f'The second date from the datatime_index is : {datetime_index[1]}')

The first date from the datatime_index is : 2023-09-03 00:00:00
The second date from the datatime_index is : 2024-01-01 00:00:00


In [3]:
# Create a DatetimeIndex from a list of datetime.datetime objects
datetime_list = [pd.Timestamp(year=2024, month=9, day=3), pd.Timestamp('2024-01-01')]
datetime_index = pd.DatetimeIndex(datetime_list)
print(datetime_index)  # Output: DatetimeIndex(['2023-09-03', '2024-01-01'], dtype='datetime64[ns]', freq=None)

DatetimeIndex(['2024-09-03', '2024-01-01'], dtype='datetime64[ns]', freq=None)


### How They Relate:
**Conversion and Interoperability**: You can convert datetime.datetime objects to pandas.Timestamp or DatetimeIndex and vice versa.

- A datetime.datetime object can be directly converted to pandas.Timestamp using the pd.Timestamp() constructor.
- Similarly, a list of datetime.datetime objects can be converted to a DatetimeIndex using pd.DatetimeIndex().

**Performance and Precision**: pandas.Timestamp and DatetimeIndex are optimized for performance, especially when dealing with large datasets, and provide higher precision than the native datetime.datetime.

# B. How to_datetime() Behaves?
`pandas.to_datetime()` is a powerful and flexible function used to **convert various types of data into datetime objects in pandas.** It can handle multiple input formats such as strings, integers, floats, datetime.datetime objects, and more, converting them into pandas.Timestamp or DatetimeIndex objects.


1. **Converting Strings to Datetime**: 
`to_datetime()` can parse strings representing dates and times into pandas.Timestamp objects. It automatically detects and interprets common date formats, including $YYYY-MM-DD, DD/MM/YYYY, MM-DD-YYYY$, and more.

In [9]:
# Example: Convert string to Timestamp
date_string = "2024-09-03"
converted_date = pd.to_datetime(date_string)
print(converted_date)  # Output: 2024-09-03 00:00:00

2024-09-03 00:00:00


By default, it **converts a single date string to a pandas.Timestamp** and a **list or Series of date strings to a DatetimeIndex**.

In [10]:
print(type(converted_date))

<class 'pandas._libs.tslibs.timestamps.Timestamp'>


In [11]:
# consider now a list of dates and let us sess what happens when we apply to_datetime
date_list = ["2024-09-03", "2024-10-03"]
converted_date = pd.to_datetime(date_list)
print(converted_date)  # Output: 2024-09-03 00:00:00

DatetimeIndex(['2024-09-03', '2024-10-03'], dtype='datetime64[ns]', freq=None)


In [12]:
print(type(converted_date))

<class 'pandas.core.indexes.datetimes.DatetimeIndex'>


2. **Handling Different Input Formats**:
`to_datetime()` can handle various input formats, including:

- Single strings: Converts a single string into a Timestamp.
- List or array of strings: Converts a list or array of strings into a DatetimeIndex.
- Integers/Floats: Interprets numbers as **Unix timestamps (seconds since 1970-01-01 00:00:00 UTC)**.

In [9]:
# Example: Convert Unix timestamp to datetime
timestamp = 1693747200
converted_date = pd.to_datetime(timestamp, unit='s')
print(f"{converted_date} is of type : {type(converted_date)}")  # Output: 2023-09-03 13:20:00 is of type : <class 'pandas._libs.tslibs.timestamps.Timestamp'>

2023-09-03 13:20:00 is of type : <class 'pandas._libs.tslibs.timestamps.Timestamp'>


In [11]:
# another way
from datetime import datetime, time
timestamp = 1693747200 # this as a POSIX time 
# convert POSIX to datetime
timestamp_dt = datetime.fromtimestamp(timestamp=timestamp)
print(f"{timestamp_dt} is of type : {type(timestamp_dt)}")

converted_date = pd.Timestamp(timestamp_dt)
print(f"{converted_date} is of type : {type(converted_date)}")

2023-09-03 18:50:00 is of type : <class 'datetime.datetime'>
2023-09-03 18:50:00 is of type : <class 'pandas._libs.tslibs.timestamps.Timestamp'>


3. **Dealing with Different Date Formats**:
`to_datetime()` is **flexible with date formats**. If you know the format of your date strings, you can specify it using the format parameter to speed up the conversion.

In [13]:
# Example: Specifying date format
date_string = "03/09/2024"
converted_date = pd.to_datetime(date_string, format='%d/%m/%Y')
print(converted_date.date())  # Output: 2024-09-03 00:00:00

2024-09-03


In [12]:
# Alternative way
date_string = "03/09/2024"
date = datetime.strptime(date_string, "%d/%m/%Y").date()
print(date)

2024-09-03


In [14]:
# Day/Month/Year Format
date1 = pd.to_datetime("03/09/2024", format="%d/%m/%Y")  # Output: 2024-09-03 00:00:00

# Month/Day/Year Format
date2 = pd.to_datetime("09/03/2024", format="%m/%d/%Y")  # Output: 2024-09-03 00:00:00

# Year-Month-Day Format
date3 = pd.to_datetime("2024-09-03", format="%Y-%m-%d")  # Output: 2024-09-03 00:00:00

# Full Month Name with Day and Year
date4 = pd.to_datetime("September 3, 2024", format="%B %d, %Y")  # Output: 2024-09-03 00:00:00

# Abbreviated Month Name with Day and Year
date5 = pd.to_datetime("Sep 3, 2024", format="%b %d, %Y")  # Output: 2024-09-03 00:00:00

# ISO 8601 Format with Time
date6 = pd.to_datetime("2024-09-03T14:05:07", format="%Y-%m-%dT%H:%M:%S")  # Output: 2024-09-03 14:05:07

# 12-Hour Clock with AM/PM
date7 = pd.to_datetime("09/03/2024 02:05 PM", format="%m/%d/%Y %I:%M %p")  # Output: 2024-09-03 14:05:00

# Datetime with Microseconds
date8 = pd.to_datetime("2024-09-03 14:05:07.123456", format="%Y-%m-%d %H:%M:%S.%f")  # Output: 2024-09-03 14:05:07.123456

# UTC Offset Time
date9 = pd.to_datetime("2024-09-03 14:05:07+0000", format="%Y-%m-%d %H:%M:%S%z")  # Output: 2024-09-03 14:05:07+00:00

# Weekday and Date
date10 = pd.to_datetime("Tue Sep 3 2024", format="%a %b %d %Y")  # Output: 2024-09-03 00:00:00

# Print all parsed dates
print(date1)
print(date2)
print(date3)
print(date4)
print(date5)
print(date6)
print(date7)
print(date8)
print(date9)
print(date10)

2024-09-03 00:00:00
2024-09-03 00:00:00
2024-09-03 00:00:00
2024-09-03 00:00:00
2024-09-03 00:00:00
2024-09-03 14:05:07
2024-09-03 14:05:00
2024-09-03 14:05:07.123456
2024-09-03 14:05:07+00:00
2024-09-03 00:00:00


4. **Handling Errors**:
`to_datetime()` has an errors parameter to handle errors during conversion:

- errors='raise' (default): Raises an error if any conversion fails.
- errors='coerce': Converts invalid parsing to **NaT (Not a Time)**.
- errors='ignore': Returns the input without conversion if parsing fails.

In [26]:
# Example: Handling invalid date with 'coerce'
invalid_date_string = "invalid_date"
converted_date = pd.to_datetime(invalid_date_string, errors='coerce')
print(converted_date)  # Output: NaT

NaT


5. **Dealing with Time Zones**:
`to_datetime()` can handle time zones using the utc or tz parameters:

- utc=True: Converts dates to UTC time.
- tz parameter: Converts to a specific timezone.

In [14]:
import pytz
asia_tz = list(filter(lambda x: 'Asia' in x , pytz.all_timezones))
asia_tz

['Asia/Aden',
 'Asia/Almaty',
 'Asia/Amman',
 'Asia/Anadyr',
 'Asia/Aqtau',
 'Asia/Aqtobe',
 'Asia/Ashgabat',
 'Asia/Ashkhabad',
 'Asia/Atyrau',
 'Asia/Baghdad',
 'Asia/Bahrain',
 'Asia/Baku',
 'Asia/Bangkok',
 'Asia/Barnaul',
 'Asia/Beirut',
 'Asia/Bishkek',
 'Asia/Brunei',
 'Asia/Calcutta',
 'Asia/Chita',
 'Asia/Choibalsan',
 'Asia/Chongqing',
 'Asia/Chungking',
 'Asia/Colombo',
 'Asia/Dacca',
 'Asia/Damascus',
 'Asia/Dhaka',
 'Asia/Dili',
 'Asia/Dubai',
 'Asia/Dushanbe',
 'Asia/Famagusta',
 'Asia/Gaza',
 'Asia/Harbin',
 'Asia/Hebron',
 'Asia/Ho_Chi_Minh',
 'Asia/Hong_Kong',
 'Asia/Hovd',
 'Asia/Irkutsk',
 'Asia/Istanbul',
 'Asia/Jakarta',
 'Asia/Jayapura',
 'Asia/Jerusalem',
 'Asia/Kabul',
 'Asia/Kamchatka',
 'Asia/Karachi',
 'Asia/Kashgar',
 'Asia/Kathmandu',
 'Asia/Katmandu',
 'Asia/Khandyga',
 'Asia/Kolkata',
 'Asia/Krasnoyarsk',
 'Asia/Kuala_Lumpur',
 'Asia/Kuching',
 'Asia/Kuwait',
 'Asia/Macao',
 'Asia/Macau',
 'Asia/Magadan',
 'Asia/Makassar',
 'Asia/Manila',
 'Asia/Muscat',


In [18]:
# Example: Convert to UTC
date_string = "2024-09-03 10:00:00"
converted_date = pd.to_datetime(date_string, utc=True)
print(converted_date)  # Output: 2024-09-03 10:00:00+00:00

# Example: Convert to specific timezone
converted_date = pd.to_datetime(date_string).tz_localize('Asia/Kuwait')
print(converted_date)  # Output: 2024-09-03 10:00:00+03:00

2024-09-03 10:00:00+00:00
2024-09-03 10:00:00+03:00


In [19]:
converted_date.utcoffset() #this is the offset we get from timezone 

datetime.timedelta(seconds=10800)

6. Handling Incomplete Dates:
`to_datetime()` can **parse incomplete dates by assuming missing parts**. For example, if only the year is provided, it will assume the first day of that year.

In [17]:
# Example: Incomplete date
incomplete_date = "2024"
converted_date = pd.to_datetime(incomplete_date)
print(converted_date)  # Output: 2024-01-01 00:00:00

2024-01-01 00:00:00


The `infer_datetime_format` parameter in the `pandas.to_datetime()` function is used to **optimize the parsing of date strings by inferring the format of the date automatically.** When this parameter is set to True, pandas tries to guess the format of the date strings provided and uses that format to speed up the conversion process.

- A strict version of it is now the default, see https://pandas.pydata.org/pdeps/0004-consistent-to-datetime-parsing.html. You can safely remove this argument.

In [18]:
# let us write some code and extract some useful info from dates
# let me use a non strict version
dates = ['2024-11-30', 
         '2/1/2024',
         np.datetime64('2024-07-01'), # numpy datetime64
         datetime.datetime(2024, 8, 1), # python datetime
         pd.Timestamp(2024,10,1) # pandas Timestamp
]
# If a user has dates in a mixed format, they can still use flexible parsing and accept the risks 
# that poses.
parsed_dates = pd.to_datetime(dates,errors='coerce', format='mixed')

In [12]:
parsed_dates

DatetimeIndex(['2024-11-30', '2024-02-01', '2024-07-01', '2024-08-01',
               '2024-10-01'],
              dtype='datetime64[ns]', freq=None)

In [19]:
print(f'Name of Day : {parsed_dates.day_name()}')
print(f'Month : {parsed_dates.month_name()}')
print(f'Year : {parsed_dates.year}')
print(f'Days in Month : {parsed_dates.days_in_month}')
print(f'Quarter {parsed_dates.quarter}')
print(f'Quarter Start : {parsed_dates.is_quarter_start}')
print(f'Leap Year : {parsed_dates.is_leap_year}')
print(f'Month Start : {parsed_dates.is_month_start}')
print(f'Month End : {parsed_dates.is_month_end}')
print(f'Year Start : {parsed_dates.is_year_start}')

Name of Day : Index(['Saturday', 'Thursday', 'Monday', 'Thursday', 'Tuesday'], dtype='object')
Month : Index(['November', 'February', 'July', 'August', 'October'], dtype='object')
Year : Index([2024, 2024, 2024, 2024, 2024], dtype='int32')
Days in Month : Index([30, 29, 31, 31, 31], dtype='int32')
Quarter Index([4, 1, 3, 3, 4], dtype='int32')
Quarter Start : [False False  True False  True]
Leap Year : [ True  True  True  True  True]
Month Start : [False  True  True  True  True]
Month End : [ True False False False False]
Year Start : [False False False False False]


# C. date_range() Function

The `pandas.date_range()` function is used to **generate a sequence of dates (or a range of dates) with a specified frequency.** It is particularly useful for creating time series data, building a timeline, or working with financial data that requires regular time intervals. The `date_range()` function returns a `DatetimeIndex` object, which is an index containing datetime objects.

**Key Parameters of date_range**
- start: The starting date of the range. You can pass a string, a `datetime.datetime object`, or a `pandas.Timestamp`.
- end: The ending date of the range. Similar to start, this can be a string, `datetime.datetime`, or `pandas.Timestamp`.
- periods: Specifies the number of periods (or number of dates) to generate. If periods is specified, end is not needed.
- freq: Frequency of the dates to generate. The default is daily ('D'), but it can be set to various other frequencies, such as 'H' for hourly, 'M' for month-end, 'B' for business day, etc.
- tz: Time zone information. You can specify a time zone string (like 'UTC' or 'America/New_York').
- normalize: If set to True, normalizes the start and end dates to midnight (00:00:00).
- inclusive : {"both", "neither", "left", "right"}, default "both"

In [20]:
# Generate a range of dates from September 1, 2024, to September 5, 2024
date_range_example = pd.date_range(start='2024-09-01', end='2024-09-05')
print(date_range_example)

DatetimeIndex(['2024-09-01', '2024-09-02', '2024-09-03', '2024-09-04',
               '2024-09-05'],
              dtype='datetime64[ns]', freq='D')


In [24]:
# Generate a range of dates from September 1, 2024, to September 5, 2024 with inclusive parameter
# if inclusive = 'left' it only includes the left date and leaves the right 
date_range_example = pd.date_range(start='2024-09-01', end='2024-09-05', inclusive='left')
print(date_range_example)

DatetimeIndex(['2024-09-01', '2024-09-02', '2024-09-03', '2024-09-04'], dtype='datetime64[ns]', freq='D')


In [21]:
# Generate 5 daily dates starting from September 1, 2024
date_range_example = pd.date_range(start='2024-09-01', periods=5)
print(date_range_example)

DatetimeIndex(['2024-09-01', '2024-09-02', '2024-09-03', '2024-09-04',
               '2024-09-05'],
              dtype='datetime64[ns]', freq='D')


In [22]:
# Generate an hourly range of dates from September 1, 2024, to September 2, 2024
hourly_range = pd.date_range(start='2024-09-01', end='2024-09-02', freq='H')
print(hourly_range)

DatetimeIndex(['2024-09-01 00:00:00', '2024-09-01 01:00:00',
               '2024-09-01 02:00:00', '2024-09-01 03:00:00',
               '2024-09-01 04:00:00', '2024-09-01 05:00:00',
               '2024-09-01 06:00:00', '2024-09-01 07:00:00',
               '2024-09-01 08:00:00', '2024-09-01 09:00:00',
               '2024-09-01 10:00:00', '2024-09-01 11:00:00',
               '2024-09-01 12:00:00', '2024-09-01 13:00:00',
               '2024-09-01 14:00:00', '2024-09-01 15:00:00',
               '2024-09-01 16:00:00', '2024-09-01 17:00:00',
               '2024-09-01 18:00:00', '2024-09-01 19:00:00',
               '2024-09-01 20:00:00', '2024-09-01 21:00:00',
               '2024-09-01 22:00:00', '2024-09-01 23:00:00',
               '2024-09-02 00:00:00'],
              dtype='datetime64[ns]', freq='H')


In [17]:
# Generate a range of business days
business_days = pd.date_range(start='2024-09-01', periods=5, freq='B')
print(business_days)

DatetimeIndex(['2024-09-02', '2024-09-03', '2024-09-04', '2024-09-05',
               '2024-09-06'],
              dtype='datetime64[ns]', freq='B')


Here, 'B' stands for business day, so the output includes only weekdays, skipping September 1st since it's a Sunday.

In [23]:
# Generate a range of dates in UTC
utc_dates = pd.date_range(start='2024-09-01', periods=3, freq='D', tz='UTC')
print(utc_dates)

DatetimeIndex(['2024-09-01 00:00:00+00:00', '2024-09-02 00:00:00+00:00',
               '2024-09-03 00:00:00+00:00'],
              dtype='datetime64[ns, UTC]', freq='D')


Frequency Aliases for date_range `date_range()` supports several frequency aliases:

- 'D': Day
- 'B': Business day
- 'H': Hour
- 'T' or 'min': Minute
- 'S': Second
- 'L': Millisecond
- 'U': Microsecond
- 'N': Nanosecond
- 'W': Weekly
- 'M': Month-end
- 'Q': Quarter-end
- 'A': Year-end

# D. Creating a Time Series DataFrame 

In [21]:
# Create sample data
data = {
    'dates': ['2024-09-01', '2024-09-02', '2024-09-03', '2024-09-04', '2024-09-05'],  # Dates as strings
    'sales': [200, 450, 300, 500, 700]  # Sales numbers
}

# Convert 'days' column to datetime format
df = pd.DataFrame(data)
df['dates'] = pd.to_datetime(df['dates'], format='%Y-%m-%d')  # Converting string to datetime

# Display the DataFrame
print(df)

       dates  sales
0 2024-09-01    200
1 2024-09-02    450
2 2024-09-03    300
3 2024-09-04    500
4 2024-09-05    700


In [26]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype         
---  ------  --------------  -----         
 0   dates   5 non-null      datetime64[ns]
 1   sales   5 non-null      int64         
dtypes: datetime64[ns](1), int64(1)
memory usage: 212.0 bytes


In [27]:
# however, note that df['days'] is a Series object 
# let us convert it to a DatetimeIndex object 

df.set_index('dates', inplace=True)
df

Unnamed: 0_level_0,sales
dates,Unnamed: 1_level_1
2024-09-01,200
2024-09-02,450
2024-09-03,300
2024-09-04,500
2024-09-05,700


In [28]:
df.index

DatetimeIndex(['2024-09-01', '2024-09-02', '2024-09-03', '2024-09-04',
               '2024-09-05'],
              dtype='datetime64[ns]', name='dates', freq=None)

In [29]:
# add the day column to the dataframe 
df['Day'] = df.index.day_name()
df

Unnamed: 0_level_0,sales,Day
dates,Unnamed: 1_level_1,Unnamed: 2_level_1
2024-09-01,200,Sunday
2024-09-02,450,Monday
2024-09-03,300,Tuesday
2024-09-04,500,Wednesday
2024-09-05,700,Thursday


In [53]:
# we can also add parts of the date as columns in the dataframe  w/o convering to DatetimeIndex Object
# Sample data
data = {
    'days': ['2024-09-01', '2024-09-02', '2024-09-03', '2024-09-04', '2024-09-05'],
    'sales': [200, 450, 300, 500, 700]
}

# Create DataFrame
df = pd.DataFrame(data)

# Convert 'days' column to datetime
df['days'] = pd.to_datetime(df['days'])

# Using various DatetimeIndex functions
df['day_name'] = df['days'].dt.day_name()
df['month_name'] = df['days'].dt.month_name()
df['year'] = df['days'].dt.year
df['month'] = df['days'].dt.month
df['day'] = df['days'].dt.day
df['dayofweek'] = df['days'].dt.dayofweek
df['dayofyear'] = df['days'].dt.dayofyear
df['quarter'] = df['days'].dt.quarter
df['is_leap_year'] = df['days'].dt.is_leap_year
df['formatted_date'] = df['days'].dt.strftime('%d-%b-%Y')

# Display the DataFrame
df

Unnamed: 0,days,sales,day_name,month_name,year,month,day,dayofweek,dayofyear,quarter,is_leap_year,formatted_date
0,2024-09-01,200,Sunday,September,2024,9,1,6,245,3,True,01-Sep-2024
1,2024-09-02,450,Monday,September,2024,9,2,0,246,3,True,02-Sep-2024
2,2024-09-03,300,Tuesday,September,2024,9,3,1,247,3,True,03-Sep-2024
3,2024-09-04,500,Wednesday,September,2024,9,4,2,248,3,True,04-Sep-2024
4,2024-09-05,700,Thursday,September,2024,9,5,3,249,3,True,05-Sep-2024


The `.dt` accessor in pandas is used to **access the datetime properties and methods of a Series containing datetime-like data**. When you have a pandas Series with datetime objects (like `datetime64[ns]`), the `.dt` accessor provides a convenient way to perform **vectorized datetime operations** directly on the Series.

**What Does `.dt` Do?**
- **Accesses Date and Time Components**: It allows you to extract specific components like the year, month, day, day of the week, hour, minute, etc., from each datetime object in the Series.

- **Performs Vectorized Operations**: Instead of applying datetime methods one-by-one through a loop, .dt allows you to apply operations over an entire Series in a single, efficient step.

- **Handles Datetime-Specific Methods**: .dt gives you access to datetime-specific methods, such as `strftime()`, `normalize()`, and `tz_localize()`, among others, which are designed specifically to work with datetime data.

**Properties**:
- year, month, day, hour, minute, second, microsecond, dayofweek (0=Monday, 6=Sunday), day_name(), is_month_start, is_month_end, is_year_start, is_year_end , is_leap_year, days_in_month

**Methods:**

- `strftime(format)`: Formats the datetime to a string based on a given format.
- `normalize()`: Resets the time component to midnight.
- `tz_localize(tz)`: Sets a timezone for the datetime.
- `tz_convert(tz)`: Converts the time to a different timezone.
- `floor(freq)`, `ceil(freq)`, `round(freq)`: Adjusts the datetime to the nearest given frequency (e.g., day, hour).

# E. pd.Period Object 

`pd.Period` is a **pandas object used to represent a single time period, such as a specific month, day, or year.** Unlike a single datetime value, which represents an exact point in time (including both date and time), a Period represents a time span or interval (such as "March 2024" or "2024 Q1"). This makes it particularly useful when working with time series data where you are interested in periods rather than exact timestamps.

**Key Features of `pd.Period`**
- **Represents a Time Interval**: A Period object **represents a span of time**, such as a month, quarter, or year, rather than an exact timestamp. This is useful for data analysis focused on aggregated or period-based data (like monthly sales, quarterly revenues, etc.).

- **Supports Different Frequencies**: You can specify the **frequency of the period**, such as daily (D), monthly (M), yearly (A or Y), and many others. This allows you to define what kind of period you are working with.

- **Arithmetics and Operations**: You can **perform arithmetic operations (like addition and subtraction) with periods**. For example, you can add or subtract a number to a Period object to get a new period.

In [32]:
# Create a Period object representing March 2024
p1 = pd.Period('2024-03', freq='M')  # 'M' stands for monthly frequency
print(p1)  # Output: 2024-03

# Create a Period object representing the year 2024
p2 = pd.Period('2024', freq='A')  # 'A' stands for annual frequency
print(p2)  # Output: 2024

# Create a Period object representing the first quarter of 2024
p3 = pd.Period('2024Q1', freq='Q')  # 'Q' stands for quarterly frequency
print(p3)  # Output: 2024Q1

# Perform arithmetic operations
next_month = p1 + 1  # Adds one month
print(next_month)  # Output: 2024-04

previous_year = p2 - 1  # Subtracts one year
print(previous_year)  # Output: 2023

2024-03
2024
2024Q1
2024-04
2023


**Common Frequencies in `pd.Period`**
Here are some of the common frequencies (freq) that you can use with `pd.Period`:

- D: Day
- M: Month-end
- Q: Quarter-end
- A: Year-end (Annual frequency)
- H: Hour
- T or min: Minute
- S: Second
- B: Business day

**Usage with PeriodIndex**
`pd.Period` objects are often used to create a PeriodIndex for a pandas DataFrame, which is particularly useful in time series analysis.

In [33]:
# Create a PeriodIndex for a DataFrame
periods = pd.period_range(start='2024-01', end='2024-06', freq='M')  # Range of periods from January to June 2024
data = pd.DataFrame({'sales': [200, 300, 400, 500, 600, 700]}, index=periods)
data

Unnamed: 0,sales
2024-01,200
2024-02,300
2024-03,400
2024-04,500
2024-05,600
2024-06,700


In [34]:
type(periods)

pandas.core.indexes.period.PeriodIndex

**Key Methods and Attributes of `pd.Period`**

- start_time: The start time of the period.
- end_time: The end time of the period.
- freq: The frequency of the period.
- strftime(format): Formats the period to a string representation according to the specified format.
- asfreq(freq): Converts the period to the specified frequency.

In [35]:
p = pd.Period('2024-03', freq='M')

# Start and end time of the period
print(p.start_time)  # Output: 2024-03-01 00:00:00
print(p.end_time)    # Output: 2024-03-31 23:59:59.999999999

# Convert to a different frequency
print(p.asfreq('D', 'start'))  # Convert to daily frequency, start of the month
# Output: 2024-03-01

print(p.asfreq('D', 'end'))    # Convert to daily frequency, end of the month
# Output: 2024-03-31

2024-03-01 00:00:00
2024-03-31 23:59:59.999999999
2024-03-01
2024-03-31


In [36]:
# apply the methods to the periodIndex object
periods.asfreq('D', 'start')

PeriodIndex(['2024-01-01', '2024-02-01', '2024-03-01', '2024-04-01',
             '2024-05-01', '2024-06-01'],
            dtype='period[D]')

In [72]:
date = pd.Period('2017-01-31', freq='M', )
date+=1
print(date.asfreq('D', 'end'))

2017-02-28


**When to Use `pd.Period`?**
- Use `pd.Period` when you need to represent time intervals, like months or quarters, rather than specific points in time. It's especially useful in financial, economic, or any analysis where time spans are more meaningful than specific timestamps.

# F. pd.to_period 
`pd.to_period` is a **function used to convert datetime objects, Timestamp, or DatetimeIndex objects to Period objects with a specified frequency.** It converts each datetime value in a Series or DataFrame to a Period, which represents a time span.

- Usage
>The function `pd.to_period` is primarily used when you want to convert datetime-like data to period data, such as converting daily data to monthly data.

**Example of `pd.to_period`**

Here's how to use `pd.to_period` to convert datetime data to periods:

In [37]:
# to_period converts a Timestamp object to a Period object
date_timestamp = pd.Timestamp(2024,8,9)
print(date_timestamp)
print(type(date_timestamp))

date_period = date_timestamp.to_period('M')
print(date_period)
print(type(date_period))

2024-08-09 00:00:00
<class 'pandas._libs.tslibs.timestamps.Timestamp'>
2024-08
<class 'pandas._libs.tslibs.period.Period'>


In [43]:
date_range =  pd.date_range(start=pd.Timestamp(2024,10,2),
                            end=pd.Timestamp(2024,10,10))
print(date_range)
print(type(date_range))

period_range = date_range.to_period('M')
print(period_range)
print(type(period_range))

DatetimeIndex(['2024-10-02', '2024-10-03', '2024-10-04', '2024-10-05',
               '2024-10-06', '2024-10-07', '2024-10-08', '2024-10-09',
               '2024-10-10'],
              dtype='datetime64[ns]', freq='D')
<class 'pandas.core.indexes.datetimes.DatetimeIndex'>
PeriodIndex(['2024-10', '2024-10', '2024-10', '2024-10', '2024-10', '2024-10',
             '2024-10', '2024-10', '2024-10'],
            dtype='period[M]')
<class 'pandas.core.indexes.period.PeriodIndex'>


In [44]:
# Create a DataFrame with a datetime column
data = {
    'date': ['2024-09-01', '2024-09-02', '2024-09-03', '2024-09-04', '2024-09-05']
}
df = pd.DataFrame(data)

# Convert the 'date' column to datetime
df['date'] = pd.to_datetime(df['date'])

# Convert datetime to period with monthly frequency
df['period_month'] = df['date'].dt.to_period('M')  # Convert to monthly periods
df['period_next_month'] = df['period_month']+1

# Convert datetime to period with yearly frequency
df['period_year'] = df['date'].dt.to_period('A')  # Convert to annual periods

df

Unnamed: 0,date,period_month,period_next_month,period_year
0,2024-09-01,2024-09,2024-10,2024
1,2024-09-02,2024-09,2024-10,2024
2,2024-09-03,2024-09,2024-10,2024
3,2024-09-04,2024-09,2024-10,2024
4,2024-09-05,2024-09,2024-10,2024


**Key Points of `pd.to_period`**

- Converts datetime-like data to period data, where each period represents a specific span of time.
- Supports various frequencies (freq), such as 'D' (daily), 'M' (monthly), 'Q' (quarterly), 'A' (annually), etc.
- Allows easy conversion from a datetime format to a period format.

**pd.PeriodIndex**

`pd.PeriodIndex` is a **pandas index that is used to store and handle a sequence of Period objects, which are time spans rather than specific points in time**. It is similar to DatetimeIndex but represents periods instead of exact timestamps.

- Usage
> `PeriodIndex` is used when you want the index of a pandas DataFrame or Series to represent periods (like months, quarters, or years) rather than specific datetime values.

Example of `pd.PeriodIndex`

Here's an example showing how to create a PeriodIndex:

In [45]:
# Create a PeriodIndex representing months from January to June 2024
period_index = pd.period_range(start='2024-01', end='2024-06', freq='M')

# Create a DataFrame using the PeriodIndex
df = pd.DataFrame({'sales': [200, 300, 400, 500, 600, 700]}, index=period_index)

# Set the index name
df.index.name = 'Month'

df

Unnamed: 0_level_0,sales
Month,Unnamed: 1_level_1
2024-01,200
2024-02,300
2024-03,400
2024-04,500
2024-05,600
2024-06,700


In [59]:
# Let's use both pd.to_period and pd.PeriodIndex together

# Create a DataFrame with a datetime column
data = {
    'date': ['2024-01-01', '2024-02-01', '2024-03-01', '2024-04-01', '2024-05-01'],
    'sales':[100,200,300,400,500]
}
df = pd.DataFrame(data)

# Convert the 'date' column to datetime and then convert it to period with freq set as month
df['date'] = pd.to_datetime(df['date'])
df['date'] = df['date'].dt.to_period('M')

# # Create a PeriodIndex from the period column
# period_index = pd.PeriodIndex(df['date'], freq='M')

# Set the PeriodIndex as the DataFrame's index
df.set_index('date', inplace=True)

df

Unnamed: 0_level_0,sales
date,Unnamed: 1_level_1
2024-01,100
2024-02,200
2024-03,300
2024-04,400
2024-05,500


In [57]:
type(df.index)

pandas.core.indexes.period.PeriodIndex

# G. pd.to_timestamp
`pd.to_timestamp` is a **method in pandas used to convert period data (Period or PeriodIndex) to timestamp data (Timestamp or DatetimeIndex).** This method is particularly useful when you need to convert a period (like a specific month or quarter) to a point in time that represents a specific start or end of that period (like the first day of the month or quarter).

- Usage of `pd.to_timestamp`
>This method is primarily used when you want to transform period-based data (e.g., quarterly or monthly data) into specific timestamps, often for more granular time series analysis or when you need exact date representations rather than period intervals.

Example of `pd.to_timestamp`

Let's see how to use pd.to_timestamp to convert period data to timestamps:

In [64]:
# Create a PeriodIndex representing months
period_index = pd.period_range(start='2024-01', end='2024-06', freq='M')
print(period_index)
print(type(period_index))

datetime_index = period_index.to_timestamp(how='start')
print(datetime_index)
print(type(datetime_index))
print(type(datetime_index[0]))

PeriodIndex(['2024-01', '2024-02', '2024-03', '2024-04', '2024-05', '2024-06'], dtype='period[M]')
<class 'pandas.core.indexes.period.PeriodIndex'>
DatetimeIndex(['2024-01-01', '2024-02-01', '2024-03-01', '2024-04-01',
               '2024-05-01', '2024-06-01'],
              dtype='datetime64[ns]', freq='MS')
<class 'pandas.core.indexes.datetimes.DatetimeIndex'>
<class 'pandas._libs.tslibs.timestamps.Timestamp'>


In [60]:
# Create a PeriodIndex representing months
period_index = pd.period_range(start='2024-01', end='2024-06', freq='M')

# Convert PeriodIndex to a DataFrame
df = pd.DataFrame({'sales': [200, 300, 400, 500, 600, 700]}, index=period_index)

# Convert the PeriodIndex to Timestamp (default: start of the period)
df.index = df.index.to_timestamp()

df

Unnamed: 0,sales
2024-01-01,200
2024-02-01,300
2024-03-01,400
2024-04-01,500
2024-05-01,600
2024-06-01,700


**Parameters of `pd.to_timestamp`**
- freq: String, optional. The frequency of the output timestamps. Defaults to the start frequency of the period ('S').
- how: String, optional. Can be 'S' (start) or 'E' (end). It determines whether to convert to the start or end of the period.
- axis: Integer, default 0. The axis to convert (only applies to DataFrames).

**Converting to the End of the Period**

You can specify whether to convert to the start or the end of the period using the how parameter:

In [78]:
# Create a PeriodIndex representing months
period_index = pd.period_range(start='2024-01', end='2024-06', freq='M')

# Convert PeriodIndex to a DataFrame
df = pd.DataFrame({'sales': [200, 300, 400, 500, 600, 700]}, index=period_index)

# Convert to Timestamps at the end of each month
df.index = df.index.to_timestamp(how='E')
df

Unnamed: 0,sales
2024-01-31 23:59:59.999999999,200
2024-02-29 23:59:59.999999999,300
2024-03-31 23:59:59.999999999,400
2024-04-30 23:59:59.999999999,500
2024-05-31 23:59:59.999999999,600
2024-06-30 23:59:59.999999999,700


In [79]:
type(df.index)

pandas.core.indexes.datetimes.DatetimeIndex

**Key Points of `pd.to_timestamp`**

- Converts Period to Timestamp: pd.to_timestamp is used to convert a period (like a month, quarter, or year) to a specific timestamp.
- Supports Different Frequencies: You can specify the frequency (such as 'D', 'M', 'Q') and the exact point in time (start or end of the period).
- Versatile in Time Series Analysis: Useful when moving from period-based data to exact time-based data for detailed analysis.

**Example of Using `pd.to_timestamp` with Different Frequencies**

Here’s another example to show how you can use pd.to_timestamp with various frequencies:

In [81]:
# Create a DataFrame with quarterly periods
period_index = pd.period_range(start='2024Q1', end='2024Q4', freq='Q')

# Convert PeriodIndex to a DataFrame
df = pd.DataFrame({'revenue': [1000, 1500, 1200, 1800]}, index=period_index)

# Convert the PeriodIndex to Timestamp at the end of each quarter
df.index = df.index.to_timestamp(how='S')

df

Unnamed: 0,revenue
2024-01-01,1000
2024-04-01,1500
2024-07-01,1200
2024-10-01,1800


In [82]:
# Create a DataFrame with quarterly periods
period_index = pd.period_range(start='2024Q1', end='2024Q4', freq='Q')

# Convert PeriodIndex to a DataFrame
df = pd.DataFrame({'revenue': [1000, 1500, 1200, 1800]}, index=period_index)

# Convert the PeriodIndex to Timestamp at the end of each quarter
df.index = df.index.to_timestamp(how='E')

df

Unnamed: 0,revenue
2024-03-31 23:59:59.999999999,1000
2024-06-30 23:59:59.999999999,1500
2024-09-30 23:59:59.999999999,1200
2024-12-31 23:59:59.999999999,1800


# H. timedelta in Python

`timedelta` is a class in Python's `datetime` module used for representing differences between two dates or times. It represents the duration or difference between two points in time, which can include days, seconds, and microseconds. You can use timedelta to add, subtract, or compare date and time objects.

timedelta Constructor
The timedelta object can be created using the datetime.timedelta() constructor, which takes the following optional arguments:

python
Copy code
datetime.timedelta(days=0, seconds=0, microseconds=0, milliseconds=0, minutes=0, hours=0, weeks=0)
Each of these arguments represents a particular unit of time and can be positive or negative.

In [23]:
# Example 1: Creating a Timedelta using a string format
# Represents a duration of 2 days, 5 hours, and 30 minutes
td1 = pd.Timedelta("2 days 5 hours 30 minutes")
print("Timedelta from string:", td1)

# Example 2: Creating a Timedelta with keyword arguments
# Represents 2 days, 5 hours, and 30 minutes as separate arguments
td2 = pd.Timedelta(days=2, hours=5, minutes=30)
print("Timedelta with keyword arguments:", td2)

# Example 3: Creating a Timedelta in seconds
# Represents a duration of 5000 seconds
td3 = pd.Timedelta(seconds=5000)
print("Timedelta in seconds:", td3)

# Example 4: Adding Timedelta to a Timestamp
# Define a starting Timestamp
start_time = pd.Timestamp("2024-10-03 10:00:00")
# Add 3 days and 6 hours
new_time = start_time + pd.Timedelta(days=3, hours=6)
print("Adding Timedelta to Timestamp:", new_time)

# Example 5: Subtracting Timedelta from a Timestamp
# Subtract 12 hours from the start_time
past_time = start_time - pd.Timedelta(hours=12)
print("Subtracting Timedelta from Timestamp:", past_time)

# Example 6: Finding difference between two Timestamps
# Define two Timestamps
ts1 = pd.Timestamp("2024-10-01")
ts2 = pd.Timestamp("2024-10-05")
# Subtract to get Timedelta (time difference)
time_diff = ts2 - ts1
print("Difference between Timestamps:", time_diff)

# Example 7: Timedelta unit conversions
# Define a Timedelta
td4 = pd.Timedelta(days=1, hours=6)
# Convert to total seconds
total_seconds = td4.total_seconds()
print("Total seconds in Timedelta:", total_seconds)

# Example 8: Using Timedelta in a DataFrame
# Create a DataFrame with start and end dates
data = {
    "start": pd.to_datetime(["2024-09-30", "2024-10-01", "2024-10-02"]),
    "end": pd.to_datetime(["2024-10-02", "2024-10-05", "2024-10-04"])
}
df = pd.DataFrame(data)
# Calculate duration by subtracting start from end
df["duration"] = df["end"] - df["start"]
print("\nDataFrame with calculated durations:")
print(df)

# Example 9: Timedelta properties
# Define a Timedelta
td5 = pd.Timedelta(days=1, hours=5, minutes=45)
# Access properties like days, seconds, and microseconds
print("Days:", td5.days)               # Total days in the Timedelta
print("Seconds:", td5.seconds)         # Seconds portion, excluding days
print("Microseconds:", td5.microseconds)  # Microseconds portion

# Example 10: Total duration in seconds
# Get total duration in seconds for td5
print("Total duration in seconds:", td5.total_seconds())

Timedelta from string: 2 days 05:30:00
Timedelta with keyword arguments: 2 days 05:30:00
Timedelta in seconds: 0 days 01:23:20
Adding Timedelta to Timestamp: 2024-10-06 16:00:00
Subtracting Timedelta from Timestamp: 2024-10-02 22:00:00
Difference between Timestamps: 4 days 00:00:00
Total seconds in Timedelta: 108000.0

DataFrame with calculated durations:
       start        end duration
0 2024-09-30 2024-10-02   2 days
1 2024-10-01 2024-10-05   4 days
2 2024-10-02 2024-10-04   2 days
Days: 1
Seconds: 20700
Microseconds: 0
Total duration in seconds: 107100.0
