## 16 Time Analysis Mini Series Pt3 - Holidays
Time series analysis is very important in financial data analysis space. Pandas has in built support of time series functionality that makes analyzing time serieses extremely easy and efficient. In this tutorial we will cover how to handle holidays in time series analysis. Using CustomBusinessDay and AbstractHolidayCalendar you can create custom holiday calendar. USFederalHolidayCalendar is ready made calendar available in pandas library that serves as an example for those who want to create their own custom calendar.

In [6]:
import pandas as pd
aapl = pd.read_csv('D:\\Pandas\\CodeBasics\\datasets\\16aaplnodates.tsv', sep = '\t')
aapl 

Unnamed: 0,Open,High,Low,Close,Volume
0,144.88,145.3,143.1,143.5,14277848
1,143.69,144.79,142.72,144.09,21569557
2,143.02,143.5,142.41,142.73,24128782
3,142.9,144.75,142.9,144.18,19201712
4,144.11,145.95,143.37,145.06,21090636
5,144.73,145.85,144.38,145.53,19781836
6,145.87,146.18,144.82,145.74,24884478
7,145.5,148.49,145.44,147.77,25199373
8,147.97,149.33,147.33,149.04,20132061
9,148.82,150.9,148.57,149.56,23793456


Our df is fine except it has no dates which renders the data meaningless. I do know that it goes from 1/7/2017 to 21/7/2017. So we should be able to build an date range for it and insert that into our df. We can make a date range for our df by...

In [7]:
rng = pd.date_range(start = "7/1/2017", end = "7/21/2017", freq = 'B')
rng # This uses the business frequency which means that it ignores weekends when there is no trading

DatetimeIndex(['2017-07-03', '2017-07-04', '2017-07-05', '2017-07-06',
               '2017-07-07', '2017-07-10', '2017-07-11', '2017-07-12',
               '2017-07-13', '2017-07-14', '2017-07-17', '2017-07-18',
               '2017-07-19', '2017-07-20', '2017-07-21'],
              dtype='datetime64[ns]', freq='B')

We notice one issue however, we have the 4th July 2017 in our range and, as this is a holiday in the USA and stocks are not traded, it renders our range a bit useless. We need something better. We want to exclude all US holidays and 'B' doesn't work. In fact, there is no predefined frequency which takes into account the holidays.
<br>We therefore have to use the US holiday calendar for this. The two classes for the US federal holidays, that need to be imported into Python, are...

In [8]:
from pandas.tseries.holiday import USFederalHolidayCalendar
from pandas.tseries.offsets import CustomBusinessDay

**Creating a custom business day**

In [9]:
usb = CustomBusinessDay(calendar=USFederalHolidayCalendar())
usb

<CustomBusinessDay>

We have to create an instance of the class CustomBusinessDay and supply the calendar to be the USFederalHolidayCalendar.
<br>We can then pass that as an argument in our freq argument, ie...

In [10]:
pd.date_range(start = "7/1/2017", end = "7/21/2017", freq = usb)

DatetimeIndex(['2017-07-03', '2017-07-05', '2017-07-06', '2017-07-07',
               '2017-07-10', '2017-07-11', '2017-07-12', '2017-07-13',
               '2017-07-14', '2017-07-17', '2017-07-18', '2017-07-19',
               '2017-07-20', '2017-07-21'],
              dtype='datetime64[ns]', freq='C')

We see that now the 4th July is excluded as will all US holidays including weekends. We can now use this as an index on our df...

In [11]:
rng = pd.date_range(start = "7/1/2017", end = "7/21/2017", freq = usb)
aapl.set_index(rng, inplace = True)
aapl.head()

Unnamed: 0,Open,High,Low,Close,Volume
2017-07-03,144.88,145.3,143.1,143.5,14277848
2017-07-05,143.69,144.79,142.72,144.09,21569557
2017-07-06,143.02,143.5,142.41,142.73,24128782
2017-07-07,142.9,144.75,142.9,144.18,19201712
2017-07-10,144.11,145.95,143.37,145.06,21090636


**Note:** While there is a US holiday class availabe, there are no other such classes available for other countries. If you would like one for your own country, you should be using the AbstractHolidayCalendar class, at the Pandas Github page (https://github.com/pandas-dev/pandas/blob/master/pandas/tseries/holiday.py) and make your own...

**Making our own holiday custom class - AbstractHolidayCalendar()**
<br>First we copy the details of the UD Federal Holiday Calendar class...<br>class USFederalHolidayCalendar(AbstractHolidayCalendar):<br>
    """
    US Federal Government Holiday Calendar based on rules specified by:
    https://www.opm.gov/policy-data-oversight/snow-dismissal-procedures/federal-holidays/
    """<br>
    rules = [<br>
        Holiday('New Years Day', month=1, day=1, observance=nearest_workday),<br>
        USMartinLutherKingJr,<br>
        USPresidentsDay,<br>
        USMemorialDay,<br>
        Holiday('July 4th', month=7, day=4, observance=nearest_workday),<br>
        USLaborDay,<br>
        USColumbusDay,<br>
        Holiday('Veterans Day', month=11, day=11, observance=nearest_workday),<br>
        USThanksgivingDay,<br>
        Holiday('Christmas', month=12, day=25, observance=nearest_workday)<br>
    ]<br>
We want to remove everything except one to help us with the syntax<br>
###### Let's build one for our birthday**<br>
*<font color=blue> from pandas.tseries.holiday import AbstractHolidayCalendar, nearest_workday, Holiday</font>* - We need to import these <br>
*<font color=blue>class myBirthdayCalendar(AbstractHolidayCalendar):</font>*<br>
*<font color=blue>'''This is a custom holiday calendar to include my birthday'''</font>*<br>
*<font color=blue>rules = [</font>*<br>
        *<font color=blue>Holiday("Roly's Birthday", month=2, day=4, observance=nearest_workday)</font>*<br>
*]*<br>
<br>**We are now going to create our own frequency**<br>
*<font color=blue>myc = CustomBusinessDay(calendar = myBirthdayCalendar())</font>*<br>
*<font color=blue>pd.date_range('2/1/2017', '2/28/2017', freq=myc)</font>* - When we execute this, the 4/2/2017 should be missing from our date range<br>
**Dealing with weekends...again**<br>
When a holiday falls on a weekend it is often observed on the following weekday. The way we deal with that is by specifying the *<font color=blue>observance=nearest_workday</font>* argument.<br>
**Note:** Becareful with this argument as it literally does pick the nearest week day ie if the holiday falls on a Saturday, it will deem to the observance to be on a Friday.

###### The case of Eqypt
The case of Egypt is special because their weekend is Friday and Saturday and Sunday is a normal working day.<br>To deal with this, we again look to CustomBusinessDay class for help. The weekmask argument species the days of the week, by default they are Mon-Fri but this can be modified to whatever values you want...

In [13]:
CustomBusinessDay(weekmask = 'Mon Tue Wed Thu Fri') # This is the default
egypt = CustomBusinessDay(weekmask = 'Sun Mon Tue Wed Thu')
pd.date_range('2/1/2018', '2/28/2018', freq=egypt) # We can now use it as our frequency argument

DatetimeIndex(['2018-02-01', '2018-02-04', '2018-02-05', '2018-02-06',
               '2018-02-07', '2018-02-08', '2018-02-11', '2018-02-12',
               '2018-02-13', '2018-02-14', '2018-02-15', '2018-02-18',
               '2018-02-19', '2018-02-20', '2018-02-21', '2018-02-22',
               '2018-02-25', '2018-02-26', '2018-02-27', '2018-02-28'],
              dtype='datetime64[ns]', freq='C')

**Holidays in Egypt**<br>
To take into account the holidays in Egypt, which will be diufferent from the US, you have to provide them as an argument to the CustomBusinessDay...

In [15]:
egypt = CustomBusinessDay(weekmask = 'Sun Mon Tue Wed Thu', holidays = ['2018-02-04'])
pd.date_range('2/1/2018', '2/28/2018', freq=egypt)

DatetimeIndex(['2018-02-01', '2018-02-05', '2018-02-06', '2018-02-07',
               '2018-02-08', '2018-02-11', '2018-02-12', '2018-02-13',
               '2018-02-14', '2018-02-15', '2018-02-18', '2018-02-19',
               '2018-02-20', '2018-02-21', '2018-02-22', '2018-02-25',
               '2018-02-26', '2018-02-27', '2018-02-28'],
              dtype='datetime64[ns]', freq='C')

Now 4th February 2018 is missing from the date range as it is now officially a holiday in Egypt. As the square brackets denote, we can pass a list of dates for all hjolidays to exclude them from the business days