# `pandas datetime remaining topics`

In [1]:
import numpy as np
import pandas as pd

#### `timedelta`

    The timedelta object is used to return the difference between two pandas timestamp objects.

In [2]:
t1 = pd.Timestamp('26 March 2024 20:04')
t1

Timestamp('2024-03-26 20:04:00')

In [3]:
t2 = pd.Timestamp('30 April 2024 20:04')
t2

Timestamp('2024-04-30 20:04:00')

In [4]:
t2 - t1

Timedelta('35 days 00:00:00')

In [5]:
type(t2 - t1)

pandas._libs.tslibs.timedeltas.Timedelta

In [6]:
(t2 - t1).__class__.__name__

'Timedelta'

##### `We can also create a timedelta object manually by using the pd.Timedelta() constructor.`

    Available kwargs: {days, seconds, microseconds,
    milliseconds, minutes, hours, weeks}

In [7]:
pd.Timedelta(days = 2)

Timedelta('2 days 00:00:00')

In [8]:
delta = pd.Timedelta(weeks = 2, days = 2, hours = 5, minutes = 78, seconds = 12)
delta

Timedelta('16 days 06:18:12')

##### `We can also create a timedelta index object. It is a collection of timedelta objects.`

In [9]:
pd.TimedeltaIndex(['2 days', '3 days 6 hours 30 minutes 40 seconds 90 microseconds'])

TimedeltaIndex(['2 days 00:00:00', '3 days 06:30:40.000090'], dtype='timedelta64[ns]', freq=None)

## `Applying methods on DatetimeIndex object`

    When we create a pandas DatetimeIndex object using Timestamp objects, we can either use this as a column in our 
    dataframe or we may chose to use it as index object of our series or dataframe.

In [10]:
index = pd.DatetimeIndex(['2 March 2022', '4 July 1899', '21 June 2022', '21 Sep 2023'])
index

DatetimeIndex(['2022-03-02', '1899-07-04', '2022-06-21', '2023-09-21'], dtype='datetime64[ns]', freq=None)

### `Note`
`When we want to use date time methods on index object, then we do not need to use the dt accessor. When only need to use it, when we are dealing with a datetime column of a dataframe.`

In [11]:
index.month_name()
index.day_name()

Index(['Wednesday', 'Tuesday', 'Tuesday', 'Thursday'], dtype='object')

In [12]:
index.quarter
index.day
index.year
index.month
index.minute
index.hour
index.second

Index([0, 0, 0, 0], dtype='int32')

# `Time series data`

    A time series data is a data which is taken over a period of time. As a data scientist, these types of data are very 
    important.
    
    Although we are not going to study time series data, we are going to study their analysis and that too only relative
    to our context.
    
    We can either do time series data analysis or we may do forecasting on time series data.
    
    Time series is a very broad topic. We are not going to study the analysis or forecasting of time series. We are just
    learning to do a bit of analysis on time series data.

In [13]:
google = pd.read_csv('google.csv')
google

Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume
0,2004-08-19,49.813290,51.835709,47.800831,49.982655,49.982655,44871361
1,2004-08-20,50.316402,54.336334,50.062355,53.952770,53.952770,22942874
2,2004-08-23,55.168217,56.528118,54.321388,54.495735,54.495735,18342897
3,2004-08-24,55.412300,55.591629,51.591621,52.239197,52.239197,15319808
4,2004-08-25,52.284027,53.798351,51.746044,52.802086,52.802086,9232276
...,...,...,...,...,...,...,...
4466,2022-05-16,2307.679932,2332.149902,2286.699951,2295.850098,2295.850098,1164100
4467,2022-05-17,2344.550049,2344.550049,2306.750000,2334.030029,2334.030029,1078800
4468,2022-05-18,2304.750000,2313.913086,2242.840088,2248.020020,2248.020020,1399100
4469,2022-05-19,2236.820068,2271.750000,2209.360107,2214.909912,2214.909912,1459600


`The first step while doing any time series analysis is to make the date column into a pandas datetime column and then make this a index of the dataframe.`

In [14]:
google.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4471 entries, 0 to 4470
Data columns (total 7 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   Date       4471 non-null   object 
 1   Open       4471 non-null   float64
 2   High       4471 non-null   float64
 3   Low        4471 non-null   float64
 4   Close      4471 non-null   float64
 5   Adj Close  4471 non-null   float64
 6   Volume     4471 non-null   int64  
dtypes: float64(5), int64(1), object(1)
memory usage: 244.6+ KB


#### `As we can see that right now the Date column has data type as Object. Thus we need to change this.`

In [15]:
google['Date'] = pd.to_datetime(google['Date'])
google.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4471 entries, 0 to 4470
Data columns (total 7 columns):
 #   Column     Non-Null Count  Dtype         
---  ------     --------------  -----         
 0   Date       4471 non-null   datetime64[ns]
 1   Open       4471 non-null   float64       
 2   High       4471 non-null   float64       
 3   Low        4471 non-null   float64       
 4   Close      4471 non-null   float64       
 5   Adj Close  4471 non-null   float64       
 6   Volume     4471 non-null   int64         
dtypes: datetime64[ns](1), float64(5), int64(1)
memory usage: 244.6 KB


#### `Now we need to set this Date column as the index of our dataframe.`

In [16]:
google = google.set_index('Date')
google.head()

Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2004-08-19,49.81329,51.835709,47.800831,49.982655,49.982655,44871361
2004-08-20,50.316402,54.336334,50.062355,53.95277,53.95277,22942874
2004-08-23,55.168217,56.528118,54.321388,54.495735,54.495735,18342897
2004-08-24,55.4123,55.591629,51.591621,52.239197,52.239197,15319808
2004-08-25,52.284027,53.798351,51.746044,52.802086,52.802086,9232276
