In [75]:
import pandas as pd
import os
import utility

In [76]:
#import data: 
datapath = os.getcwd() + '/Data/'
holidays = pd.read_csv(datapath + 'holidays_events.csv')

## Type of holidays 
The different types of holiday indicates for a particular store in a particlular day, it can be:

- holiday, celebrated
- holiday, not celebrated
- not holiday, celebrated 
- not holiday, not celebrated
- event


We don't know if for a particular holiday, whether it's a day off or not is going to affect the sale. So I'll also add "whether it's celebrated (day off)" as a feature. 

## location and description 
There're three locations. Based on the location, we can determine wheter a particular store celebrate the holiday. 

Different holidays will likely have different shopping culture. Holidays should be considered separately. 

So at first glance, I will add three columns:
- Local holiday (0 is none, 1 - 27 is the different discription of the holidays.)
- Regional holiday (0 is None, 1 - 4 is four different holidays.)
- National holiday (0 is None, 1 - 29 are different holidays.)
- National Events (0 is None, 1 - 43 are different events.) 
  - Event is not recruiting. Hence it should be labeled separately. 

One problem, 2012-12-31	is both Puente Primer dia del ano	(Bridge) and Primer dia del ano-1	(Additional)
It termed out Puente Primer dia del ano	means "The bridge for new year" , Primer dia del ano means "new year eve".

Other than that, there's no overlapping holidays. 

Notice that "Bridge", "Work Day" and "Transfer" Changes every year based on the day of the week. 
While "Additional" only goes with the holiday it self. 

# Different holidays 

Not all holiday is repeated in the same day every year. 

In [77]:
# Check how holidays are distributed every year. 
holidays['date'] = pd.to_datetime(holidays['date'])
holidays['year'] = holidays.date.dt.year
holidays['month-day'] = holidays['date'].dt.strftime('%m-%d')

# Show all rows
pd.set_option('display.max_rows', None)
# First: National
national = holidays[(holidays.locale == 'National') & (holidays.type == 'Holiday')]

In [78]:

table_national = national.pivot_table(index='month-day', columns='year', values='description', aggfunc=lambda x: ', '.join(x)).fillna(0)
print(table_national)


year                                2012                           2013  \
month-day                                                                 
01-01                                  0             Primer dia del ano   
02-08                                  0                              0   
02-09                                  0                              0   
02-11                                  0                       Carnaval   
02-12                                  0                       Carnaval   
02-16                                  0                              0   
02-17                                  0                              0   
02-27                                  0                              0   
02-28                                  0                              0   
03-03                                  0                              0   
03-04                                  0                              0   
03-25                    

Not All national holidays are of the same date. 

In [79]:
# Then Regional 
Regional = holidays[(holidays.locale == 'Regional') & (holidays.type == 'Holiday')]
table_regional = Regional.pivot_table(index='month-day', columns='year', values='description', aggfunc=lambda x: ', '.join(x)).fillna(0)
print(table_regional)

year                                     2012  \
month-day                                       
04-01           Provincializacion de Cotopaxi   
06-25           Provincializacion de Imbabura   
11-06      Provincializacion de Santo Domingo   
11-07           Provincializacion Santa Elena   

year                                     2013  \
month-day                                       
04-01           Provincializacion de Cotopaxi   
06-25           Provincializacion de Imbabura   
11-06      Provincializacion de Santo Domingo   
11-07           Provincializacion Santa Elena   

year                                     2014  \
month-day                                       
04-01           Provincializacion de Cotopaxi   
06-25           Provincializacion de Imbabura   
11-06      Provincializacion de Santo Domingo   
11-07           Provincializacion Santa Elena   

year                                     2015  \
month-day                                       
04-01           P

Regional holidays are of the day date. 

In [80]:
# Finally, local: 

# Then Regional 
Local = holidays[(holidays.locale == 'Local') & (holidays.type == 'Holiday')]
table_local = Local.pivot_table(index='month-day', columns='year', values='description', aggfunc=lambda x: ', '.join(x)).fillna(0)
print(table_local)

year                                                    2012  \
month-day                                                      
03-02                                     Fundacion de Manta   
04-12                                    Fundacion de Cuenca   
04-14                              Cantonizacion de Libertad   
04-21                              Cantonizacion de Riobamba   
05-12                                 Cantonizacion del Puyo   
06-23                              Cantonizacion de Guaranda   
06-25       Cantonizacion de Latacunga, Fundacion de Machala   
07-03      Fundacion de Santo Domingo, Cantonizacion de E...   
07-23                               Cantonizacion de Cayambe   
07-24                                                      0   
07-25                                                      0   
08-05                                Fundacion de Esmeraldas   
08-15                                  Fundacion de Riobamba   
08-24                                   

Local holiday is very wild. 
The  Fundacion de Guayaquil is celebrated in different ways every time. 

# Summary on holiday types 

Because how complicated the different holidays are, it's easier to use the holiday.csv as a table and not try to predict the holiday dates. 

Treat every discription separetly. 
To deal with 2012-12-31, label it as "Primer dia del ano" for now. Since new year eve affects more than just a "bridge day". (If any thing goes wrong, this is the "error" we introduce. )



In [118]:


# Create a new dataset: 
Local_holidays = holidays[holidays.locale == 'Local']
Local_holidays['local_holiday'] = Local_holidays['description']
# Set 'Local_celebrated' to True where 'type' is 'Holiday'
Local_holidays['Local_celebrated'] = Local_holidays['type'].isin(['Holiday','Bridge', 'Additional','Transfer'])
# Update 'Local_celebrated' to False where 'transferred' is True
Local_holidays.loc[Local_holidays['transferred'] == True, 'Local_celebrated'] = False
# Take only the needed 
Local_holidays = Local_holidays[['date', 'local_holiday', 'locale_name','Local_celebrated']]

# For regional too 
Regional_holidays = holidays[holidays.locale == 'Regional']
Regional_holidays['Regional_holiday'] = Regional_holidays['description']
# Set 'Local_celebrated' to True where 'type' is 'Holiday'
Regional_holidays['Regional_celebrated'] = Regional_holidays['type'].isin(['Holiday','Bridge', 'Additional','Transfer'])
# Update 'Local_celebrated' to False where 'transferred' is True
Regional_holidays.loc[Regional_holidays['transferred'] == True, 'Regional_celebrated'] = False
# Take only the needed 
Regional_holidays = Regional_holidays[['date', 'Regional_holiday', 'locale_name','Regional_celebrated']]

# Same with National 
National_holidays = holidays[holidays.locale == 'National']
National_holidays['National_holiday'] = National_holidays['description']
# Set 'Local_celebrated' to True where 'type' is 'Holiday'
National_holidays['National_celebrated'] = National_holidays['type'].isin(['Holiday','Bridge', 'Additional','Transfer'])
# Update 'Local_celebrated' to False where 'transferred' is True
National_holidays.loc[National_holidays['transferred'] == True, 'National_celebrated'] = False
# Take only the needed 
National_holidays = National_holidays[['date', 'National_holiday', 'locale_name','National_celebrated']]


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  Local_holidays['local_holiday'] = Local_holidays['description']
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  Local_holidays['Local_celebrated'] = Local_holidays['type'].isin(['Holiday','Bridge', 'Additional','Transfer'])
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  Regional_holidays['Regional_h

In [125]:
holiday_final = Local_holidays.merge(Regional_holidays, how = 'outer')
holiday_final = holiday_final.merge(National_holidays , how = 'outer')

In [127]:
# holiday_final

11/7 notes: I need to separete locations to be city/state/country to merge the different listsings. 