<h2>Pre-processing Notebook 4 for 90days clients data</h2><br>
<em> Pre-processing the data to achieve Time-Stampped Tables or ts-tables of each client for the first 90 days of interaction. Done for all the attributes </em>
<br><em>Along with description of EDA, feature engineering</em>

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import scipy as sci
import scipy.special as scisp
import scipy.stats as scist
import datetime, copy, imp, sys

sys.path.append('../../lib')

from tqdm import tqdm
from mpl_toolkits.mplot3d import Axes3D

tqdm.pandas()
plt.ion()

In [None]:
validClientsFirst90DaysDf2 = pd.read_hdf('validClientsFirst90DaysDf2.h5')

In [None]:
validClientsFirst90DaysDf2.tail(2)

<h3> Exploratory Data Analysis (EDA) Summary for feature engineering </h3>
<br>
1. <strong>Age</strong> of all the clients doesnt change in the first 90 days.<br> 
2. <strong>BarDuration:</strong> it is the bar/ban on a client from accessing the DI services. The entries can be numeric 2 days, 3 days, 5 days etc and Categorical like less than 24hrs, Lifetime, conditional etc <br>
3. However there are a lot of cases where we can find a barDuration <strong>mismatch</strong> in the entries on a particular day. Hypothetical example: the barDuration entry is 2 days for client 123 for Sleep EntryType and 5 days for Log EntryType on the same day.<br>
4. <strong>Location</strong> specifies the location in the the DI shelter where the client access the facility/sleeps. This attribute is to be ignored as this be unreliable in the long run.<br>
5. <strong>Word counts</strong>: From GettingStarted.ipynb (master branch) each of the 32 buckets (below) has a number of associated words, the counts of these words were extarcted from the logs in order to maintain data anonymization and still provide useful insights from the logs: 
<br><br>Addiction,Bar,Biometrics,Brawl,CPS,Conflict,Death,EMS,Education,Employment,Financial,FriendsFamily,Gun,Health,Housing<br>ID,Indigenous,Justice,Knife,Medication,MentalHealth,NegativeWord,Overdose,PhysicalHealth,PhysicalViolence,PositiveWord,<br>Property,Seniors,SexualViolence,Spray,Supports,Weapon.<br>
6. <strong> EmsLogFlag:</strong> is 1 in the log entry for an event where the EMS were called. Multiple log entries with EmsLogFlag also present on some days for a few clients.<br>
7. <strong> PoliceLogFlag:</strong> is 1 in the log entry for an event where the police were called. Multiple log entries with PoliceLogFlag also present on some days for a few clients.<br>
8. <strong>ClientState:</strong> The state of a particular client can change on a particular day as seen from the data. Eg: A client can be sober in the morning and intoxicated in the evening.<br>
9. <strong>EntryType: </strong> Different types of entries present in the dataset: Bar, CounsellorsNotes, ProgressDetails, Log, or Sleep. Multiple entries of each type can be present on a particular day for each client.<br>

<h3> Raw Attribute Description and Feature Engineering </h3>
<br>
1. <strong>EmployeeId:</strong> Unique Employee Ids are counted to compute the total number of employees interacted with the client on a particular day. <br>
2. ClientId - index <br>
3. Date - index <br>
4. <strong>Age: </strong>no change by date, same for a particular client across the first 90 days <br>
5. <strong>BarDuration:</strong>maximum BarDuration in the dataset for a particular day taken of all. For pre-processing categorical entries are assigned values: warning 0.1 days, conditional 0.5 days, less than 24hrs 0.9 days, and lifetime 500 days. These values are replaced by the original categorical entries after taking the maximum. <br>
6. <strong>Word Counts:</strong>For all the individual 32 word counts, sum of the individual word counts was computed for each day. This pre-processing step was performed in the the previous notebook (preProcessing_mergedCounts_90_Days_From_FirstSleepDate)<br>
7. <strong>EmployeeIsCounsellor:</strong> Total number of councillor interactions of the client on a particular day computed by taking a daily sum of the EmployeeIsCouncellor column. <br>
8. <strong>PoliceLogFlag:</strong> Total number of times police was called on a client on a particular day computed by taking a daily sum of the PoliceLogFlag column<br>
9. <strong>EmsLogFlag:</strong> Total number of times EMS was called for a client on a particular day computed by taking a daily sum of the EmsLogFlag column <br>
10. <strong>ClientState:</strong> Since the clients state can change on a particular day and states are categorical variables, hence each state type is used to compute new individual features. They indicate the client states on a particular day. These engineered features indicate whether a client was in that state or not. Hypothetical Eg. If client 123 was ever sober on a specific day then clientSoberState would be 1. These features are flag type hence they would be 1 even if multiple entries of sober state are found on a particular day.<br>
11. <strong>EntryType: </strong>  similar to clientState, multiple new features are generated from this attribute to indicate the presence of a particular entry type on a particular day. These features are also flag type and indicate only the presence or absence of a specific entry on a particular day. Hypothetical Eg. If client 123 had a sleep entry on a specific day then ClientSleepEntry would be 1.<br>
12. However EntryType of <strong>Bar</strong> is ignored due to the presence of barDuration which indicates the same information along with the exact amount of bar/ban from DI shelter. Hence this EntryTYpe would have been redundant. 

In [None]:
validClientsFirst90DaysDf2 = validClientsFirst90DaysDf2.set_index('Date')

### Feature Engineering 
- All functions which are Aggregates (sum) are DailyAggXYZ and All functions which count the total number of daily entries are DailyABCDCount. 
- Age doesnt change in a single day so all values are same. Hence, max taken for convience to append to the merged Dataframe
- Client States values are used to engineer seprate binary feature vectors. The engineered features are flags and indicate the presence or absence of that state that particular day. Eg : if sober entry present in that day (1 or more sober entries) then ClientStateSober = 1
- Bar duration has categorical and numerical entries. Both are stored as strings. Numerical vaklues are mapped to integer values and an assumption is made to assign numerical values to categorical entries of BarDuration
- The aggregations are split into seprate functions due to constraints of the RAM memory
- The computed individual aggregates are intially stored in temporary variables and then are appended to two dataframes depending upon type sequently. Categorical features to tempMergeCategorical and Non-Categorical features to tempMergeNonCategorical

In [None]:
def ClientAge(tbl):
    dayVal = tbl.Age.groupby(tbl.index.date).max().reset_index()
    return dayVal

In [None]:
def DailyAggEmployeeIsCounsellor(tbl): #sum of total counsillor interactions
    dayVal = tbl.EmployeeIsCounsellor.groupby(tbl.index.date).sum().reset_index() 
    return dayVal

In [None]:
def DailyAggEmsLogFlag(tbl): 
    dayVal = tbl.EmsLogFlag.groupby(tbl.index.date).sum().reset_index() 
    return dayVal

In [None]:
def DailyAggPoliceLogFlag(tbl): 
    dayVal = tbl.PoliceLogFlag.groupby(tbl.index.date).sum().reset_index() 
    return dayVal

In [None]:
def DailyEmployeeIdCount(tbl): 
    dayVal = tbl.EmployeeId.groupby(tbl.index.date).count().reset_index() 
    return dayVal

In [None]:
def ClientStateSober(tbl): # flag
    dayVal = np.sign(tbl.ClientState[tbl.ClientState=='Sober'].groupby(tbl[tbl.ClientState=='Sober'].index.date).count()).reset_index() 
    return dayVal

In [None]:
def ClientStateUnder(tbl): # flag
    dayVal = np.sign(tbl.ClientState[tbl.ClientState=='Under'].groupby(tbl[tbl.ClientState=='Under'].index.date).count()).reset_index() 
    return dayVal

In [None]:
def ClientStateIntoxicated(tbl): # flag
    dayVal = np.sign(tbl.ClientState[tbl.ClientState=='Intoxicated'].groupby(tbl[tbl.ClientState=='Intoxicated'].index.date).count()).reset_index() 
    return dayVal

In [None]:
def ClientStateDrugged(tbl): # flag
    dayVal = np.sign(tbl.ClientState[tbl.ClientState=='Drugged'].groupby(tbl[tbl.ClientState=='Drugged'].index.date).count()).reset_index() 
    return dayVal

In [None]:
def ClientStateDruggedIntoxicated(tbl): # flag
    dayVal = np.sign(tbl.ClientState[tbl.ClientState=='Drugged & Intoxicated'].groupby(tbl[tbl.ClientState=='Drugged & Intoxicated'].index.date).count()).reset_index() 
    return dayVal

In [None]:
di = {'1':1, '14':14, '21':21, '60':60, '5':5, '120':120, '90':90, '90':90,'90':90, '3':3, '7':7, '2':2,'Warning':0.1, '-24 Hours':0.9,'Life':500,'Conditional':0.5,'30':30}
validClientsFirst90DaysDf2 = validClientsFirst90DaysDf2.replace({"BarDuration": di})

In [None]:
def ClientBarDuration(tbl): # barDuration: max of the entries kept if multiple entries present
    dayVal = tbl.BarDuration.groupby(tbl.index.date).max().reset_index()
    return dayVal

In [None]:
def ClientSleepEntry(tbl): #indicates sleep entry on a day- Flag type
    dayVal = np.sign(tbl.EntryType[tbl.EntryType=='Sleep'].groupby(tbl.EntryType[tbl.EntryType=='Sleep'].index.date).count()).reset_index() 
    return dayVal

In [None]:
def DailyCounsellorsNotesCount(tbl): # Count: total entries in a day
    dayVal = tbl.EntryType[tbl.EntryType=='counsellorsNotes'].groupby(tbl.EntryType[tbl.EntryType=='counsellorsNotes'].index.date).count().reset_index() 
    return dayVal

In [None]:
def DailyProgressDetailsCount(tbl): 
    dayVal = tbl.EntryType[tbl.EntryType=='ProgressDetails'].groupby(tbl.EntryType[tbl.EntryType=='ProgressDetails'].index.date).count().reset_index() 
    return dayVal

In [None]:
def DailyLogEntryCount(tbl): # sum
    dayVal = tbl.EntryType[tbl.EntryType=='Log'].groupby(tbl.EntryType[tbl.EntryType=='Log'].index.date).count().reset_index() 
    return dayVal

In [None]:
#bar entryType redundant- covered in barDuration

<h3> Feature Engineering of categorical EntryType and ClientState </h3><br>
- Pre-processing of Categorical features is different from non-categorical features as  Date is required as the index for feature engineering<br>
- Index of non-categorial features is Day number    <br>
- Date required as index beacuse not all days would have a particular type of entry or state. Hence it is required to have index Date, to later merge with the non-categorical features later

In [None]:
tempFuncTbl1 = validClientsFirst90DaysDf2.groupby("ClientId").progress_apply(ClientStateSober)

tempFuncTbl2 = validClientsFirst90DaysDf2.groupby("ClientId").progress_apply(ClientStateUnder)

In [None]:
tempFuncTbl1

In [None]:
tempDf = pd.DataFrame(tempFuncTbl1)
tempDf = tempDf.reset_index(level=[0,1])
tempDf = tempDf.rename(columns={'index':'Date','ClientState':'SoberState'})
tempDf = tempDf.drop(columns=['level_1'])# to have only one Ind column at the end 
tempDf = tempDf.set_index(['ClientId','Date'])
tempDf

In [None]:
tempDf2 = pd.DataFrame(tempFuncTbl2)
tempDf2 = tempDf2.reset_index(level=[0,1])
tempDf2 = tempDf2.rename(columns={'index':'Date','ClientState':'UnderState'})
tempDf2 = tempDf2.drop(columns=['level_1']) # to have only one Ind column at the end 
tempDf2 = tempDf2.set_index(['ClientId','Date'])
tempDf2

In [None]:
tempMergeCategorical=tempDf.join(tempDf2, how ='outer')

In [None]:
tempFuncTbl2 = validClientsFirst90DaysDf2.groupby("ClientId").progress_apply(ClientStateIntoxicated)
tempDf2 = pd.DataFrame(tempFuncTbl2)
tempDf2 = tempDf2.reset_index(level=[0,1])
tempDf2 = tempDf2.rename(columns={'index':'Date','ClientState':'IntoxicatedState'})
tempDf2 = tempDf2.drop(columns=['level_1']) # to have only one Ind column at the end 
tempDf2 = tempDf2.set_index(['ClientId','Date'])

In [None]:
tempMergeCategorical = tempMergeCategorical.join(tempDf2, how ='outer')

In [None]:
tempFuncTbl2 = validClientsFirst90DaysDf2.groupby("ClientId").progress_apply(ClientStateDrugged)
tempDf2 = pd.DataFrame(tempFuncTbl2)
tempDf2 = tempDf2.reset_index(level=[0,1])
tempDf2 = tempDf2.rename(columns={'index':'Date','ClientState':'DruggedState'})
tempDf2 = tempDf2.drop(columns=['level_1'])# to have only one Ind column at the end 
tempDf2 = tempDf2.set_index(['ClientId','Date'])
tempDf2

In [None]:
tempMergeCategorical = tempMergeCategorical.join(tempDf2, how ='outer')

In [None]:
tempFuncTbl2 = validClientsFirst90DaysDf2.groupby("ClientId").progress_apply(ClientStateDruggedIntoxicated)
tempDf2 = pd.DataFrame(tempFuncTbl2)
tempDf2 = tempDf2.reset_index(level=[0,1])
tempDf2 = tempDf2.rename(columns={'index':'Date','ClientState':'DruggedIntoxicatedState'})
tempDf2 = tempDf2.drop(columns=['level_1'])# to have only one Ind column at the end 
tempDf2 = tempDf2.set_index(['ClientId','Date'])
tempMergeCategorical = tempMergeCategorical.join(tempDf2, how ='outer')

In [None]:
tempFuncTbl2 = validClientsFirst90DaysDf2.groupby("ClientId").progress_apply(ClientSleepEntry)
tempDf2 = pd.DataFrame(tempFuncTbl2)
tempDf2 = tempDf2.reset_index(level=[0,1])
tempDf2 = tempDf2.rename(columns={'index':'Date','EntryType':'SleepEntry'})
tempDf2 = tempDf2.drop(columns=['level_1'])# to have only one Ind column at the end 
tempDf2 = tempDf2.set_index(['ClientId','Date'])
tempMergeCategorical = tempMergeCategorical.join(tempDf2, how ='outer')

In [None]:
tempFuncTbl2 = validClientsFirst90DaysDf2.groupby("ClientId").progress_apply(DailyCounsellorsNotesCount)
tempDf2 = pd.DataFrame(tempFuncTbl2)
tempDf2 = tempDf2.reset_index(level=[0,1])
tempDf2 = tempDf2.rename(columns={'index':'Date','EntryType':'CounsellorNotes'})
tempDf2 = tempDf2.drop(columns=['level_1'])# to have only one Ind column at the end 
tempDf2 = tempDf2.set_index(['ClientId','Date'])
tempMergeCategorical = tempMergeCategorical.join(tempDf2, how ='outer')

In [None]:
tempFuncTbl2= validClientsFirst90DaysDf2.groupby("ClientId").progress_apply(DailyProgressDetailsCount)
tempDf2 = pd.DataFrame(tempFuncTbl2)
tempDf2 = tempDf2.reset_index(level=[0,1])
tempDf2 = tempDf2.rename(columns={'index':'Date','EntryType':'ProgressDetails'})
tempDf2 = tempDf2.drop(columns=['level_1'])# to have only one Ind column at the end 
tempDf2 = tempDf2.set_index(['ClientId','Date'])
tempMergeCategorical = tempMergeCategorical.join(tempDf2, how ='outer')

In [None]:
tempFuncTbl2 = validClientsFirst90DaysDf2.groupby("ClientId").progress_apply(DailyLogEntryCount)
tempDf2 = pd.DataFrame(tempFuncTbl2)
tempDf2 = tempDf2.reset_index(level=[0,1])
tempDf2 = tempDf2.rename(columns={'index':'Date','EntryType':'LogEntry'})
tempDf2 = tempDf2.drop(columns=['level_1'])# to have only one Ind column at the end 
tempDf2 = tempDf2.set_index(['ClientId','Date'])
tempMergeCategorical = tempMergeCategorical.join(tempDf2, how ='outer')

In [None]:
tempMergeCategorical

In [None]:
# tempMergeCategorical  - Would need to set all the index back to Ind to zero-padd the df

<h3>Feature Engineering of non-categorical features</h3> <br>
- Except barDuration all the features below have non-categorical values. For pre-processing the categorial values of barDuration are converted into numerical values. <br>
- For pre-processing day number used as index <br>
- To merge correctly with the categorial features, the index is changed to Date before merging <br>

In [None]:
tempDf = validClientsFirst90DaysDf2.groupby("ClientId").progress_apply(DailyEmployeeIdCount)
tempDf = tempDf.reset_index(level=[0,1])
tempDf['level_1']=tempDf['level_1']+1
tempDf = tempDf.rename(columns={'level_1':'Ind','index':'Date'})
tempDf = tempDf.set_index(['ClientId','Ind'])
tempDf

In [None]:
tempDf2 = validClientsFirst90DaysDf2.groupby("ClientId").progress_apply(ClientAge)
tempDf2 = tempDf2.reset_index(level=[0,1])
tempDf2['level_1']=tempDf2['level_1']+1
tempDf2 = tempDf2.rename(columns={'level_1':'Ind','index':'Date'})
tempDf2 = tempDf2.set_index(['ClientId','Ind'])
tempDf2

In [None]:
tempDf2 = tempDf2.drop(columns=['Date']) # to have only one Date column at the end 
tempMergeNonCategorical = tempDf.join(tempDf2, how ='outer')
tempMergeNonCategorical

In [None]:
tempDf2 = validClientsFirst90DaysDf2.groupby("ClientId").progress_apply(DailyAggEmployeeIsCounsellor)
tempDf2 = tempDf2.reset_index(level=[0,1])
tempDf2['level_1'] = tempDf2['level_1']+1
tempDf2 = tempDf2.rename(columns={'level_1':'Ind','index':'Date'})
tempDf2 = tempDf2.set_index(['ClientId','Ind'])
tempDf2

In [None]:
tempDf2 = tempDf2.drop(columns=['Date'])
tempMergeNonCategorical = tempMergeNonCategorical.join(tempDf2, how ='outer')
tempMergeNonCategorical

In [None]:
tempDf2 = validClientsFirst90DaysDf2.groupby("ClientId").progress_apply(DailyAggEmsLogFlag)
tempDf2 = tempDf2.reset_index(level=[0,1])
tempDf2['level_1']=tempDf2['level_1']+1
tempDf2 = tempDf2.rename(columns={'level_1':'Ind','index':'Date'})
tempDf2 = tempDf2.set_index(['ClientId','Ind'])
tempDf2 = tempDf2.drop(columns=['Date'])
tempMergeNonCategorical = tempMergeNonCategorical.join(tempDf2, how ='outer')
tempMergeNonCategorical

In [None]:
tempDf2 = validClientsFirst90DaysDf2.groupby("ClientId").progress_apply(DailyAggPoliceLogFlag)
tempDf2 = tempDf2.reset_index(level=[0,1])
tempDf2['level_1']=tempDf2['level_1']+1
tempDf2 = tempDf2.rename(columns={'level_1':'Ind','index':'Date'})
tempDf2 = tempDf2.set_index(['ClientId','Ind'])
tempDf2 = tempDf2.drop(columns=['Date'])
tempMergeNonCategorical = tempMergeNonCategorical.join(tempDf2, how ='outer')
tempMergeNonCategorical

In [None]:
print(validClientsFirst90DaysDf2['BarDuration'].dtype)

X2 = validClientsFirst90DaysDf2['BarDuration'].unique()
print(X2)

In [None]:
validClientsFirst90DaysDf2['BarDuration'] = validClientsFirst90DaysDf2['BarDuration'].astype(float)

In [None]:
X2 = validClientsFirst90DaysDf2['BarDuration'].unique()
print(X2)

In [None]:
tempDf2 = validClientsFirst90DaysDf2.groupby("ClientId").progress_apply(ClientBarDuration)
tempDf2 = tempDf2.reset_index(level=[0,1])
tempDf2['level_1']=tempDf2['level_1']+1
tempDf2 = tempDf2.rename(columns={'level_1':'Ind','index':'Date'})
tempDf2 = tempDf2.set_index(['ClientId','Ind'])
tempDf2 = tempDf2.drop(columns=['Date'])
tempMergeNonCategorical = tempMergeNonCategorical.join(tempDf2, how ='outer')
tempMergeNonCategorical

In [None]:
tempMergeNonCategorical = tempMergeNonCategorical.reset_index(level=[0,1])
tempMergeNonCategorical = tempMergeNonCategorical.set_index(['ClientId','Date'])
tempMergeNonCategorical

In [None]:
tempDfMergeAll = tempMergeNonCategorical.join(tempMergeCategorical, how ='outer')
tempDfMergeAll

In [None]:
tempDfMergeAll = tempDfMergeAll.reset_index(level=[0,1])
tempDfMergeAll = tempDfMergeAll.set_index(['ClientId','Ind'])
tempDfMergeAll

<h3>Merging Word-counts</h3><br>
- Merging the pre-processed word counts stored as mergedCounts90daysDF.h5 with the the features pre-processed in this notebook

In [None]:
mergedCounts90daysDF = pd.read_hdf('mergedCounts_90DaysDf2.h5')
mergedCounts90daysDF = mergedCounts90daysDF.drop(columns=['Date'])

In [None]:
dfMergeAll = tempDfMergeAll.join(mergedCounts90daysDF, how ='outer')

dfMergeAll

### Zero-Padding

In [None]:
anyFeature = tempDf.reset_index(level=1) # using any feature to create a Blank TS-Table first
anyFeature

In [None]:
def BlanktsTbl(tbl):
    dayy = []
    for i in range (1,91):
        dayy.append(i)
    dayy = pd.DataFrame(dayy)
    
    return dayy
tsTblBlank = anyFeature.groupby("ClientId").progress_apply(BlanktsTbl)

In [None]:
tsTblBlank = tsTblBlank.reset_index(level=[0,1])
tsTblBlank = tsTblBlank.set_index(['ClientId',0])
tsTblBlank.level_1 = 0
tsTblBlank = tsTblBlank.rename_axis(index=['ClientId', 'Ind'])
tsTblBlank = tsTblBlank.rename(columns={'level_1':'EncodedVector'})

In [None]:
tsTblBlank.head(182)

In [None]:
validClientsFirst90DaysDf2 = pd.read_hdf('validClientsFirst90DaysDf2.h5')

### Generating the day number from the first registration date of each entry 
- Useful to differentiate between the index number of the record and the Day number 
- Client number 2532818 taken as an example to illustrate this difference. 4 records present for 4 different days. Since there is no record for 2017-05-24 (day 4), the day number from the first registration date (2017-05-22) is 5 for the record number 4 on 2017-05-26.
- The day numbers are the calender days from the first access date for each client ID. Day number 1 is the first access day for each client ID

In [None]:
def GenShelterAcessIndex(tbl):
    dates = tbl.Date.dt.date.drop_duplicates().sort_values()   #For exact day numer 
    dateFirst = dates.iloc[0]  
    dayNumber = dates - dateFirst 
    return pd.DataFrame({
        'Date': dates,                 # Date of each stay.
        'Ind': range(1,len(dates)+1), 
        'Day': (dayNumber/np.timedelta64(1, 'D') + 1).astype(int)# Index of each stay.
    })
        
shelterAcessIndx = validClientsFirst90DaysDf2.groupby('ClientId').progress_apply(GenShelterAcessIndex)
shelterAcessIndx = shelterAcessIndx.reset_index(level=[0,1])
shelterAcessIndx[shelterAcessIndx.ClientId == 2532818]

In [None]:
dfShelterAcess2 = shelterAcessIndx.set_index(['ClientId','Ind'])
dfShelterAcess2 = dfShelterAcess2.drop(columns=['Date', 'level_1'])  #drop date and level 1

<h3> Merging features with day numbers and index of records</h3>

In [None]:
dfShelterAccessFeatures = dfShelterAcess2.join(dfMergeAll, how='left')
dfShelterAccessFeatures

<h3> Encoded vector and final padded dataframe: tsTblPadded </h3>

In [None]:
dfShelterAccessFeatures = dfShelterAccessFeatures.reset_index(level=[0,1])

In [None]:
shelterFeaturesDayIndex = dfShelterAccessFeatures.set_index(['ClientId','Day'])
shelterFeaturesDayIndex = shelterFeaturesDayIndex.rename(columns={'Ind':'index'})
shelterFeaturesDayIndex = shelterFeaturesDayIndex.rename_axis(['ClientId','Ind'])
tsTblPadded = pd.merge(tsTblBlank,shelterFeaturesDayIndex, how='left', left_index=True, right_index=True)
tsTblPadded.EncodedVector = tsTblPadded.Date.notnull()
tsTblPadded

In [None]:
tsTblPadded = tsTblPadded.rename_axis(["ClientId","Day"])

In [None]:
di2 = {0.1:'Warning',0.9:'-24 Hours',500:'Life',0.5:'Conditional'}
tsTblPadded = tsTblPadded.replace({"BarDuration": di2})
tsTblPadded

In [None]:
tsTblPadded[tsTblPadded.BarDuration.notnull()].head(2)

<h3> Test case</h3> <br>
- To demonstarate the final dataframe but with reset level to print properly<br>
- Final dataset is a multi-index data-frame with clientId and Day as index axis<br>

In [None]:
tsTblPadded_Columns = tsTblPadded.reset_index()
tsTblPadded = tsTblPadded.reset_index()

In [None]:
tsTblPadded.head(1).transpose().index #all the final features after pre-processing

<h3>Saving the final dataframe to disc</h3>

In [None]:
tsTblPadded.to_hdf('clientTsTables90DaysPaddedDF.h5',key='df',mode='w')