Readme:


We encourage you to explore more functionalities in 'Python for Data Analysis, 3E' by Wes McKinney, Chapter 11: 'Time Series'.</br>
Link: https://wesmckinney.com/book/time-series

In [1]:
import numpy as np
import pandas as pd
from datetime import datetime

<h3><b>Task 1 </b></h3>
<p>
pandas is generally oriented toward working with arrays of dates, whether used as an axis index or a column in a DataFrame.  </br>
The pandas.to_datetime method parses many different kinds of date representations. Standard date formats like ISO 8601 can be parsed quickly. </br>
Run below code and analyze the data type it returns. </br>
</p>


In [3]:
import pandas as pd

datestrs = ["2011-07-06 12:00:00", "2011-08-06 00:00:00"]
dt = pd.to_datetime(datestrs)
print(dt)
print(type(dt))


DatetimeIndex(['2011-07-06 12:00:00', '2011-08-06 00:00:00'], dtype='datetime64[ns]', freq=None)
<class 'pandas.core.indexes.datetimes.DatetimeIndex'>


<h3><b>Task 2 </b></h3>
<p>
Scalar values from a DatetimeIndex are pandas Timestamp objects.</br>
Now print the second item from the DatetimeIndex array and see what date type it is.</br></br>

Note: A pandas.Timestamp can be substituted most places where you would use a datetime object. The reverse is not true, however, because pandas.Timestamp can store nanosecond precision data, while datetime stores only up to microseconds. </br>
Additionally, pandas.Timestamp can store frequency information (if any) and understands how to do time zone conversions and other kinds of manipulation </br>
</p>


In [4]:
import pandas as pd

datestrs = ["2011-07-06 12:00:00", "2011-08-06 00:00:00"]
dt = pd.to_datetime(datestrs)

# Access and print the second item
second_item = dt[1]
print(second_item)
print(type(second_item))


2011-08-06 00:00:00
<class 'pandas._libs.tslibs.timestamps.Timestamp'>


<h3><b>Task 3 </b></h3>
<p>
Now run below to display how the None value is parsed. What does 'NaT' mean? </br>
Then run isna() on the 'idx' and analyze the output.</br>
</p>


In [6]:
import pandas as pd

datestrs = ["2011-07-06 12:00:00", "2011-08-06 00:00:00"]
idx = pd.to_datetime(datestrs + [None])

print(idx)
print(pd.isna(idx))


DatetimeIndex(['2011-07-06 12:00:00', '2011-08-06 00:00:00', 'NaT'], dtype='datetime64[ns]', freq=None)
[False False  True]


<h3><b>Task 4 </b></h3>
<p>
Create a Series with length 1000 populated by random numbers with date index starting from 2000-01-01. </br>
</p>


In [None]:
import pandas as pd
import numpy as np

# Create date range
date_index = pd.date_range(start="2000-01-01", periods=1000)

# Generate random numbers
random_series = pd.Series(np.random.randn(1000), index=date_index)

print(random_series.head())


2000-01-01    2.349186
2000-01-02    0.259037
2000-01-03   -0.350644
2000-01-04   -1.644959
2000-01-05   -0.521223
Freq: D, dtype: float64


<h3><b>Task 5 </b></h3>
<p>
Select data where index contains '2002'. </br>
</p>


In [11]:
# Select data from the year 2002
data_2002 = random_series["2002"]

print(data_2002)


2002-01-01    0.596732
2002-01-02    0.199104
2002-01-03   -1.980473
2002-01-04    1.020998
2002-01-05   -0.728190
                ...   
2002-09-22   -0.826895
2002-09-23   -0.726109
2002-09-24    1.636765
2002-09-25    1.151573
2002-09-26    2.067038
Freq: D, Length: 269, dtype: float64


<h3><b>Task 6 </b></h3>
<p>
Remove data after 2001-01-01 and display the result</br>
</p>


In [12]:
# Keep data up to and including 2001-01-01
filtered_data = random_series[: "2001-01-01"]

print(filtered_data)


2000-01-01    2.349186
2000-01-02    0.259037
2000-01-03   -0.350644
2000-01-04   -1.644959
2000-01-05   -0.521223
                ...   
2000-12-28   -0.551923
2000-12-29    0.502325
2000-12-30    0.657067
2000-12-31   -1.061003
2001-01-01   -2.549900
Freq: D, Length: 367, dtype: float64


<h3><b>Task 7 </b></h3>
<p>
Create a DatetimeIndex with length 100 starting from 2000-01-01 with weekly frequency (Tuesdays). </br>
</p>


In [13]:
import pandas as pd

# Create a DatetimeIndex with weekly frequency on Tuesdays
dt_index = pd.date_range(start="2000-01-01", periods=100, freq="W-TUE")

print(dt_index)


DatetimeIndex(['2000-01-04', '2000-01-11', '2000-01-18', '2000-01-25',
               '2000-02-01', '2000-02-08', '2000-02-15', '2000-02-22',
               '2000-02-29', '2000-03-07', '2000-03-14', '2000-03-21',
               '2000-03-28', '2000-04-04', '2000-04-11', '2000-04-18',
               '2000-04-25', '2000-05-02', '2000-05-09', '2000-05-16',
               '2000-05-23', '2000-05-30', '2000-06-06', '2000-06-13',
               '2000-06-20', '2000-06-27', '2000-07-04', '2000-07-11',
               '2000-07-18', '2000-07-25', '2000-08-01', '2000-08-08',
               '2000-08-15', '2000-08-22', '2000-08-29', '2000-09-05',
               '2000-09-12', '2000-09-19', '2000-09-26', '2000-10-03',
               '2000-10-10', '2000-10-17', '2000-10-24', '2000-10-31',
               '2000-11-07', '2000-11-14', '2000-11-21', '2000-11-28',
               '2000-12-05', '2000-12-12', '2000-12-19', '2000-12-26',
               '2001-01-02', '2001-01-09', '2001-01-16', '2001-01-23',
      


<h3><b>Task 8 </b></h3>
<p>
Generating Date Ranges.
By default, pandas.date_range generates daily timestamps.
Create a DatetimeIndex in range from 2000-01-01 to 2000-12-01 with frequency 'business end of month'. </br>
</p>


In [17]:
import pandas as pd

# Generate date range with business month end frequency using 'BME'
dt_index = pd.date_range(start="2000-01-01", end="2000-12-01", freq="BME")

print(dt_index)


DatetimeIndex(['2000-01-31', '2000-02-29', '2000-03-31', '2000-04-28',
               '2000-05-31', '2000-06-30', '2000-07-31', '2000-08-31',
               '2000-09-29', '2000-10-31', '2000-11-30'],
              dtype='datetime64[ns]', freq='BME')


<h3><b>Task 9 </b></h3>
<p>
pandas.date_range by default preserves the time (if any) of the start or end timestamp.</br>
Run below code and see the start and end date values. </br>
</p>


In [18]:
pd.date_range("2012-05-02 12:56:31", periods=5)

DatetimeIndex(['2012-05-02 12:56:31', '2012-05-03 12:56:31',
               '2012-05-04 12:56:31', '2012-05-05 12:56:31',
               '2012-05-06 12:56:31'],
              dtype='datetime64[ns]', freq='D')

<h3><b>Task 10 </b></h3>
<p>
Sometimes you will have start or end dates with time information but want to generate a set of timestamps normalized to midnight as a convention. </br>
To do this, there is a normalize option - run below and analyze the output. </br>
</p>


In [19]:
pd.date_range("2012-05-02 12:56:31", periods=5, normalize=True)

DatetimeIndex(['2012-05-02', '2012-05-03', '2012-05-04', '2012-05-05',
               '2012-05-06'],
              dtype='datetime64[ns]', freq='D')

<h3><b>Task 11 </b></h3>
<p>
Frequencies and Date Offsets. </br>
1. Create a DatetimeIndex in range from 2000-01-01 to 2000-01-03 23:59 with frequency '6 hours'. </br>
2. Then change the frequency to '2 hours and 30 minutes'. </br>

</p>


In [22]:
import pandas as pd

# 1. Create DatetimeIndex with 6-hour frequency using lowercase 'h'
dt_index_6h = pd.date_range(start="2000-01-01", end="2000-01-03 23:59", freq="6h")
print("6-hour frequency:")
print(dt_index_6h)

# 2. Create DatetimeIndex with 2 hours 30 minutes frequency using lowercase 'h'
dt_index_2h30 = pd.date_range(start="2000-01-01", end="2000-01-03 23:59", freq="2h30min")
print("\n2 hours 30 minutes frequency:")
print(dt_index_2h30)


6-hour frequency:
DatetimeIndex(['2000-01-01 00:00:00', '2000-01-01 06:00:00',
               '2000-01-01 12:00:00', '2000-01-01 18:00:00',
               '2000-01-02 00:00:00', '2000-01-02 06:00:00',
               '2000-01-02 12:00:00', '2000-01-02 18:00:00',
               '2000-01-03 00:00:00', '2000-01-03 06:00:00',
               '2000-01-03 12:00:00', '2000-01-03 18:00:00'],
              dtype='datetime64[ns]', freq='6h')

2 hours 30 minutes frequency:
DatetimeIndex(['2000-01-01 00:00:00', '2000-01-01 02:30:00',
               '2000-01-01 05:00:00', '2000-01-01 07:30:00',
               '2000-01-01 10:00:00', '2000-01-01 12:30:00',
               '2000-01-01 15:00:00', '2000-01-01 17:30:00',
               '2000-01-01 20:00:00', '2000-01-01 22:30:00',
               '2000-01-02 01:00:00', '2000-01-02 03:30:00',
               '2000-01-02 06:00:00', '2000-01-02 08:30:00',
               '2000-01-02 11:00:00', '2000-01-02 13:30:00',
               '2000-01-02 16:00:00', '2000-01-

<h3><b>Task 12 </b></h3>
<p>
Now create a DatetimeIndex in range from 2012-01-01 to 2012-09-01 and get fourth Wednesday of each month. </br>
</p>


In [23]:
import pandas as pd
from pandas.tseries.offsets import Week, MonthBegin

# Create a date range starting from the first day of each month between Jan and Sep 2012
month_starts = pd.date_range(start='2012-01-01', end='2012-09-01', freq='MS')

# Calculate the 4th Wednesday of each month:
#  - Start at first day of month
#  - Add offset for Wednesday of the 4th week
fourth_wednesdays = month_starts + Week(weekday=2) * 3  # 0-based, so 3 weeks after first Wed is 4th Wed

print(fourth_wednesdays)


DatetimeIndex(['2012-01-18', '2012-02-22', '2012-03-21', '2012-04-18',
               '2012-05-16', '2012-06-20', '2012-07-18', '2012-08-22',
               '2012-09-19'],
              dtype='datetime64[ns]', freq=None)


<h3><b>Task 13</b></h3>
<p>
Periods and Period Arithmetic. </br>
1. Create a pandas.Period that represents the full time span from January 1, 2011, to December 31, 2011, inclusive.</br>
2. Then add 5 to it and analyze the result.</br>
</p>


In [24]:
import pandas as pd

# 1. Create a Period representing the full year 2011
p = pd.Period('2011', freq='Y')  # 'Y' for yearly frequency

print("Original Period:", p)

# 2. Add 5 periods (years) to it
p_added = p + 5

print("Period after adding 5:", p_added)


Original Period: 2011
Period after adding 5: 2016


<h3><b>Task 14 </b></h3>
<p>
Compare period and date range. </br>
Run below code and analyze the result.
</p>


In [26]:
import pandas as pd

# Creating a PeriodIndex with monthly frequency
periods = pd.period_range("2000-01-01", "2000-06-30", freq="M") 

# Creating a DatetimeIndex with monthly frequency (end of month)
dt = pd.date_range("2000-01-01", "2000-06-30", freq="ME")  # use 'ME' instead of deprecated 'M'

print("PeriodIndex:")
print(periods)
print("\nDatetimeIndex:")
print(dt)


PeriodIndex:
PeriodIndex(['2000-01', '2000-02', '2000-03', '2000-04', '2000-05', '2000-06'], dtype='period[M]')

DatetimeIndex:
DatetimeIndex(['2000-01-31', '2000-02-29', '2000-03-31', '2000-04-30',
               '2000-05-31', '2000-06-30'],
              dtype='datetime64[ns]', freq='ME')


<h3><b>Task 15 </b></h3>
<p>
Run below and analyze the output. </br>
Think of how could you use these functionalities in real life. </br>

</p>


In [27]:
values = ["2001Q3", "2002Q2", "2003Q1"]
index = pd.PeriodIndex(values, freq="Q-DEC") 

print(index)

PeriodIndex(['2001Q3', '2002Q2', '2003Q1'], dtype='period[Q-DEC]')
