<a href="https://colab.research.google.com/github/AzadMehedi/Pandas/blob/main/DateTime_in_Pandas.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
import numpy as np
import pandas as pd

# Timestamp Object

Time stamps reference particular moments in time (e.g., Oct 24th, 2022 at 7:00pm)

# Creating Timestamp objects

In [3]:
# creating timestamp
pd.Timestamp('2023/1/5')

Timestamp('2023-01-05 00:00:00')

In [4]:
type(pd.Timestamp('2023/1/5'))

pandas._libs.tslibs.timestamps.Timestamp

In [5]:
# variation
pd.Timestamp('2023-1-5')

Timestamp('2023-01-05 00:00:00')

In [7]:
pd.Timestamp('2023, 1, 5')

Timestamp('2023-01-05 00:00:00')

In [8]:
# only year
pd.Timestamp('2023')

Timestamp('2023-01-01 00:00:00')

In [9]:
pd.Timestamp('5th january 2023')

Timestamp('2023-01-05 00:00:00')

In [10]:
# providing time also
pd.Timestamp('2023/1/5/9:31')

Timestamp('2023-01-05 09:31:00')

In [13]:
pd.Timestamp('2023, 1, 5, 9:31')

Timestamp('2023-01-05 09:31:00')

In [17]:
pd.Timestamp('5th january 2023 9:21AM')

Timestamp('2023-01-05 09:21:00')

In [None]:
# AM to PM


In [27]:
# using datetime.datetime object   (python object)
import datetime as dt
dt.datetime(2023,1,5,9,21,56)

Timestamp('2023-01-05 09:21:56')

In [25]:
pd.Timestamp(dt.datetime(2023,1,5,9,21,56))  # can also use timestamp object on datetime

Timestamp('2023-01-05 09:21:56')

Main benifit of using Timestamp is we can fetch any information

In [29]:
# fetching attributes
import datetime as dt

x = pd.Timestamp(dt.datetime(2023,1,5,9,21,56))
x

Timestamp('2023-01-05 09:21:56')

In [30]:
x.year

2023

In [31]:
x.month

1

In [32]:
x.day

5

In [33]:
x.hour

9

In [34]:
x.minute

21

In [35]:
x.second

56

## why separate objects to handle data and time when python already has datetime functionality?

- syntax wise datetime is very convenient
- But the performance takes a hit while working with huge data. List vs Numpy Array
- The weaknesses of Python's datetime format inspired the NumPy team to add a set of native time series data type to NumPy.
- The datetime64 dtype encodes dates as 64-bit integers, and thus allows arrays of dates to be represented very compactly.

In [36]:
import numpy as np
date = np.array('2023-01-05', dtype=np.datetime64)
date 

array('2023-01-05', dtype='datetime64[D]')

In [38]:
date + np.arange(12)

array(['2023-01-05', '2023-01-06', '2023-01-07', '2023-01-08',
       '2023-01-09', '2023-01-10', '2023-01-11', '2023-01-12',
       '2023-01-13', '2023-01-14', '2023-01-15', '2023-01-16'],
      dtype='datetime64[D]')

- Because of the uniform type in NumPy datetime64 arrays, this type of operation can be accomplished much more quickly than if we were working directly with Python's datetime objects, especially as arrays get large 

- Pandas Timestamp object combines the ease-of-use of python datetime with the efficient storage and vectorized interface of numpy.datetime64

- From a group of these Timestamp objects, Pandas can construct a DatetimeIndex that can be used to index data in a Series or DataFrame

### DatetimeIndex Object

A collection of pandas timestamp

- single date store  -> use timestamp
- multiple date store  -> use datetimeindex

In [41]:
# from strings
pd.DatetimeIndex(['2023/1/5', '2023/2/6', '2023/3/7'])

DatetimeIndex(['2023-01-05', '2023-02-06', '2023-03-07'], dtype='datetime64[ns]', freq=None)

In [42]:
pd.DatetimeIndex(['2023/1/5', '2023/2/6', '2023/3/7'])[0]

Timestamp('2023-01-05 00:00:00')

In [43]:
type(pd.DatetimeIndex(['2023/1/5', '2023/2/6', '2023/3/7']))

pandas.core.indexes.datetimes.DatetimeIndex

In [44]:
type(pd.DatetimeIndex(['2023/1/5', '2023/2/6', '2023/3/7'])[0])

pandas._libs.tslibs.timestamps.Timestamp

In [46]:
# using python datetime object
pd.DatetimeIndex([dt.datetime(2023,1,5),dt.datetime(2023,1,6),dt.datetime(2023,1,7)])

DatetimeIndex(['2023-01-05', '2023-01-06', '2023-01-07'], dtype='datetime64[ns]', freq=None)

In [48]:
# using pd.timestamps
pd.DatetimeIndex([pd.Timestamp(2023,1,5),pd.Timestamp(2023,1,6),pd.Timestamp(2023,1,7)])

DatetimeIndex(['2023-01-05', '2023-01-06', '2023-01-07'], dtype='datetime64[ns]', freq=None)

In [49]:
# using datatimeindex as series index
dt_index = pd.DatetimeIndex([pd.Timestamp(2023,1,5),pd.Timestamp(2023,1,6),pd.Timestamp(2023,1,7)])
pd.Series([1,2,3], index=dt_index)

2023-01-05    1
2023-01-06    2
2023-01-07    3
dtype: int64

# summary:
- `Timestamp`(panda's object): a moment in time is Timestamp. can store 6 things: year, month, day, hour, minite, second
- `DatetimeIndex`(oython object): bunch of Timestamps. can store multiple Timestamps.

# date_range function

In [54]:
# generate daily dates in a given range
pd.date_range('2023/1/1', end='2023/1/31', freq='D')

DatetimeIndex(['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04',
               '2023-01-05', '2023-01-06', '2023-01-07', '2023-01-08',
               '2023-01-09', '2023-01-10', '2023-01-11', '2023-01-12',
               '2023-01-13', '2023-01-14', '2023-01-15', '2023-01-16',
               '2023-01-17', '2023-01-18', '2023-01-19', '2023-01-20',
               '2023-01-21', '2023-01-22', '2023-01-23', '2023-01-24',
               '2023-01-25', '2023-01-26', '2023-01-27', '2023-01-28',
               '2023-01-29', '2023-01-30', '2023-01-31'],
              dtype='datetime64[ns]', freq='D')

In [55]:
# generate alternate dates in a given range using freq=2D
pd.date_range('2023/1/1', end='2023/1/31', freq='2D')

DatetimeIndex(['2023-01-01', '2023-01-03', '2023-01-05', '2023-01-07',
               '2023-01-09', '2023-01-11', '2023-01-13', '2023-01-15',
               '2023-01-17', '2023-01-19', '2023-01-21', '2023-01-23',
               '2023-01-25', '2023-01-27', '2023-01-29', '2023-01-31'],
              dtype='datetime64[ns]', freq='2D')

In [56]:
pd.date_range('2023/1/1', end='2023/1/31', freq='3D')

DatetimeIndex(['2023-01-01', '2023-01-04', '2023-01-07', '2023-01-10',
               '2023-01-13', '2023-01-16', '2023-01-19', '2023-01-22',
               '2023-01-25', '2023-01-28', '2023-01-31'],
              dtype='datetime64[ns]', freq='3D')

In [57]:
# B -> business days
pd.date_range('2023/1/1', end='2023/1/31', freq='B')  # monday - friday

DatetimeIndex(['2023-01-02', '2023-01-03', '2023-01-04', '2023-01-05',
               '2023-01-06', '2023-01-09', '2023-01-10', '2023-01-11',
               '2023-01-12', '2023-01-13', '2023-01-16', '2023-01-17',
               '2023-01-18', '2023-01-19', '2023-01-20', '2023-01-23',
               '2023-01-24', '2023-01-25', '2023-01-26', '2023-01-27',
               '2023-01-30', '2023-01-31'],
              dtype='datetime64[ns]', freq='B')

In [58]:
# W -> one week per day
pd.date_range('2023/1/1', end='2023/1/31', freq='W')  #sunday

DatetimeIndex(['2023-01-01', '2023-01-08', '2023-01-15', '2023-01-22',
               '2023-01-29'],
              dtype='datetime64[ns]', freq='W-SUN')

In [60]:
pd.date_range('2023/1/1', end='2023/1/31', freq='W-THU')

DatetimeIndex(['2023-01-05', '2023-01-12', '2023-01-19', '2023-01-26'], dtype='datetime64[ns]', freq='W-THU')

In [61]:
# H -> Hourly data(factor)
pd.date_range('2023/1/1', end='2023/2/28', freq='H')

DatetimeIndex(['2023-01-01 00:00:00', '2023-01-01 01:00:00',
               '2023-01-01 02:00:00', '2023-01-01 03:00:00',
               '2023-01-01 04:00:00', '2023-01-01 05:00:00',
               '2023-01-01 06:00:00', '2023-01-01 07:00:00',
               '2023-01-01 08:00:00', '2023-01-01 09:00:00',
               ...
               '2023-02-27 15:00:00', '2023-02-27 16:00:00',
               '2023-02-27 17:00:00', '2023-02-27 18:00:00',
               '2023-02-27 19:00:00', '2023-02-27 20:00:00',
               '2023-02-27 21:00:00', '2023-02-27 22:00:00',
               '2023-02-27 23:00:00', '2023-02-28 00:00:00'],
              dtype='datetime64[ns]', length=1393, freq='H')

In [62]:
# H -> Hourly data(factor)
pd.date_range('2023/1/1', end='2023/2/28', freq='6H')

DatetimeIndex(['2023-01-01 00:00:00', '2023-01-01 06:00:00',
               '2023-01-01 12:00:00', '2023-01-01 18:00:00',
               '2023-01-02 00:00:00', '2023-01-02 06:00:00',
               '2023-01-02 12:00:00', '2023-01-02 18:00:00',
               '2023-01-03 00:00:00', '2023-01-03 06:00:00',
               ...
               '2023-02-25 18:00:00', '2023-02-26 00:00:00',
               '2023-02-26 06:00:00', '2023-02-26 12:00:00',
               '2023-02-26 18:00:00', '2023-02-27 00:00:00',
               '2023-02-27 06:00:00', '2023-02-27 12:00:00',
               '2023-02-27 18:00:00', '2023-02-28 00:00:00'],
              dtype='datetime64[ns]', length=233, freq='6H')

In [64]:
# M -> Month end
pd.date_range('2023/1/1', end='2023/2/28', freq='M')

DatetimeIndex(['2023-01-31', '2023-02-28'], dtype='datetime64[ns]', freq='M')

In [65]:
# M -> Month start
pd.date_range('2023/1/1', end='2023/2/28', freq='MS')

DatetimeIndex(['2023-01-01', '2023-02-01'], dtype='datetime64[ns]', freq='MS')

In [67]:
# A -> Year end
pd.date_range('2023/1/1', end='2030/2/28', freq='A')

DatetimeIndex(['2023-12-31', '2024-12-31', '2025-12-31', '2026-12-31',
               '2027-12-31', '2028-12-31', '2029-12-31'],
              dtype='datetime64[ns]', freq='A-DEC')

In [68]:
# A -> Year start
pd.date_range(start='2023/1/5',end='2030/2/28',freq='AS')

DatetimeIndex(['2024-01-01', '2025-01-01', '2026-01-01', '2027-01-01',
               '2028-01-01', '2029-01-01', '2030-01-01'],
              dtype='datetime64[ns]', freq='AS-JAN')

In [70]:
# using periods(number of results)
pd.date_range(start='2023/1/5', periods=25, freq='D')

DatetimeIndex(['2023-01-05', '2023-01-06', '2023-01-07', '2023-01-08',
               '2023-01-09', '2023-01-10', '2023-01-11', '2023-01-12',
               '2023-01-13', '2023-01-14', '2023-01-15', '2023-01-16',
               '2023-01-17', '2023-01-18', '2023-01-19', '2023-01-20',
               '2023-01-21', '2023-01-22', '2023-01-23', '2023-01-24',
               '2023-01-25', '2023-01-26', '2023-01-27', '2023-01-28',
               '2023-01-29'],
              dtype='datetime64[ns]', freq='D')

In [72]:
# using periods(number of results)-> hourly
pd.date_range(start='2023/1/5', periods=25, freq='6H')

DatetimeIndex(['2023-01-05 00:00:00', '2023-01-05 06:00:00',
               '2023-01-05 12:00:00', '2023-01-05 18:00:00',
               '2023-01-06 00:00:00', '2023-01-06 06:00:00',
               '2023-01-06 12:00:00', '2023-01-06 18:00:00',
               '2023-01-07 00:00:00', '2023-01-07 06:00:00',
               '2023-01-07 12:00:00', '2023-01-07 18:00:00',
               '2023-01-08 00:00:00', '2023-01-08 06:00:00',
               '2023-01-08 12:00:00', '2023-01-08 18:00:00',
               '2023-01-09 00:00:00', '2023-01-09 06:00:00',
               '2023-01-09 12:00:00', '2023-01-09 18:00:00',
               '2023-01-10 00:00:00', '2023-01-10 06:00:00',
               '2023-01-10 12:00:00', '2023-01-10 18:00:00',
               '2023-01-11 00:00:00'],
              dtype='datetime64[ns]', freq='6H')

In [73]:
# upcoming month of 25th day
pd.date_range(start='2023/1/5', periods=25, freq='M')

DatetimeIndex(['2023-01-31', '2023-02-28', '2023-03-31', '2023-04-30',
               '2023-05-31', '2023-06-30', '2023-07-31', '2023-08-31',
               '2023-09-30', '2023-10-31', '2023-11-30', '2023-12-31',
               '2024-01-31', '2024-02-29', '2024-03-31', '2024-04-30',
               '2024-05-31', '2024-06-30', '2024-07-31', '2024-08-31',
               '2024-09-30', '2024-10-31', '2024-11-30', '2024-12-31',
               '2025-01-31'],
              dtype='datetime64[ns]', freq='M')

### to_datetime function

converts an existing objects to pandas timestamp/datetimeindex object

In [76]:
# simple series example
s = pd.Series(['2023/1/1','2022/1/1','2021/1/1'])
s

0    2023/1/1
1    2022/1/1
2    2021/1/1
dtype: object

In [78]:
s.str.split('/').str.get(0)

0    2023
1    2022
2    2021
dtype: object