# Preppin Data

## Week 25: Housing Happy Hotel Guests

https://preppindata.blogspot.com/2022/06/2022-week-25-housing-happy-hotel-guests.html

Note: the official solution has 217 rows (218 with headers). It does not remove excess child capacity from each of the rooms in the final step (*"Finally, for the rooms with the largest capacity, we want to ensure guests with larger parties are prioritized. Filter the data to remove parties that could fit into smaller rooms."* If you remove the excess child capacity as well, you will have 153 rows (154 with headers).

### Import Pandas and Numpy

In [1]:
import pandas as pd
import numpy as np

### Import dataframes

In [2]:
file = '2022W25 Input.xlsx'

rooms = pd.read_excel(file, sheet_name='Hotel Rooms')

rooms.head()

Unnamed: 0,Room,Adults,Children,Features
0,101,2,,"Accessible, Near to lift, Double"
1,102,2,,"Accessible, Double"
2,103,2,,"Accessible, Double"
3,104,2,,"Accessible, Double"
4,201,2,,"Near to lift, Double"


In [3]:
guests = pd.read_excel(file, sheet_name='Guests')
guests.head()

Unnamed: 0,Party,Adults,Children,Double/Twin,Requires Accessible Room?,Additional Requests
0,Corain,4,0,Double,N,"Bath, High Floor, NOT Near to lift"
1,Aarons,2,0,Twin,N,Bath
2,Saph,2,1,Double,N,"Bath, NOT Near to lift"
3,Baxstare,1,1,Double,N,"Bath, High Floor"
4,Kelle,1,1,Twin,N,Bath


### Make a separate column to flag each additional request a guest has

Use np.where to set a new column based on the contents of a single column
 - ie, if guests['Additional Requests'] contains 'Bath', then return 1 in the new column, else return 0
     - use the na=False flag to return nulls as False

In [4]:
guests['Request Bath'] = np.where(guests['Additional Requests'].str.contains('Bath', na=False), 1, 0)

In [5]:
guests['Request High Floor'] = np.where(guests['Additional Requests'].str.contains('High Floor', na=False), 1, 0)

In [6]:
guests['Request Not Near Lift'] = np.where(guests['Additional Requests'].str.contains('lift', na=False), 1, 0)

### Sum those columns to produce a column detailing how many requests a goal has

In [7]:
guests['Count of Requests'] = guests['Request Bath'] + guests['Request High Floor'] + guests['Request Not Near Lift']
guests['Count of Requests'].value_counts()

2    15
1     7
0     6
3     2
Name: Count of Requests, dtype: int64

### Make separate columns to use as flags for bath, high floor, near elevator in the rooms dataframe

In [8]:
rooms['Bath'] = np.where(rooms['Features'].str.contains('Bath', na=False), 1, 0)

In [9]:
rooms['High Floor'] = np.where(rooms['Features'].str.contains('High Floor', na=False), 1, 0)

In [10]:
rooms['Near Lift'] = np.where(rooms['Features'].str.contains('Near to lift', na=False), 1, 0)

In [11]:
rooms['Double/Twin'] = np.where(rooms['Features'].str.contains('Double', na=False), 'Double', np.where(rooms['Features'].str.contains('Twin'), 'Twin', np.nan))

In [12]:
rooms['Accessible'] = np.where(rooms['Features'].str.contains('Accessible', na=False), 'Y', 'N')

### Join the tables together so that the guests only receive the bed size of their preference

Note, this will produce a table with more rows than we started with because 'Double' and 'Twin' are not unique keys. This is a memory intensive solution, so it is less than optimal.

I will also be changing the names of the adults & children rows in order to preserve clarity after the join

In [13]:
#Rename columns using df.rename
guests = guests.rename(columns={'Adults' : 'Adults in Party', 'Children' :'Children in Party'})
guests.head()

Unnamed: 0,Party,Adults in Party,Children in Party,Double/Twin,Requires Accessible Room?,Additional Requests,Request Bath,Request High Floor,Request Not Near Lift,Count of Requests
0,Corain,4,0,Double,N,"Bath, High Floor, NOT Near to lift",1,1,1,3
1,Aarons,2,0,Twin,N,Bath,1,0,0,1
2,Saph,2,1,Double,N,"Bath, NOT Near to lift",1,0,1,2
3,Baxstare,1,1,Double,N,"Bath, High Floor",1,1,0,2
4,Kelle,1,1,Twin,N,Bath,1,0,0,1


In [14]:
#Merge tables on 'Double/Twin'
join = pd.merge(guests, rooms, on='Double/Twin', how='outer', indicator=True)
join

Unnamed: 0,Party,Adults in Party,Children in Party,Double/Twin,Requires Accessible Room?,Additional Requests,Request Bath,Request High Floor,Request Not Near Lift,Count of Requests,Room,Adults,Children,Features,Bath,High Floor,Near Lift,Accessible,_merge
0,Corain,4,0,Double,N,"Bath, High Floor, NOT Near to lift",1,1,1,3,101,2,,"Accessible, Near to lift, Double",0,0,1,Y,both
1,Corain,4,0,Double,N,"Bath, High Floor, NOT Near to lift",1,1,1,3,102,2,,"Accessible, Double",0,0,0,Y,both
2,Corain,4,0,Double,N,"Bath, High Floor, NOT Near to lift",1,1,1,3,103,2,,"Accessible, Double",0,0,0,Y,both
3,Corain,4,0,Double,N,"Bath, High Floor, NOT Near to lift",1,1,1,3,104,2,,"Accessible, Double",0,0,0,Y,both
4,Corain,4,0,Double,N,"Bath, High Floor, NOT Near to lift",1,1,1,3,201,2,,"Near to lift, Double",0,0,1,N,both
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
675,Guisler,1,0,Twin,N,"Bath, High Floor, NOT Near to lift",1,1,1,3,404,2,1.0,"High Floor, Twin",0,1,0,N,both
676,Guisler,1,0,Twin,N,"Bath, High Floor, NOT Near to lift",1,1,1,3,405,2,,"High Floor, Bath, Twin",1,1,0,N,both
677,Guisler,1,0,Twin,N,"Bath, High Floor, NOT Near to lift",1,1,1,3,406,2,,"High Floor, Twin",0,1,0,N,both
678,Guisler,1,0,Twin,N,"Bath, High Floor, NOT Near to lift",1,1,1,3,407,2,,"High Floor, Bath, Twin",1,1,0,N,both


### Filter the dataframe to keep only rows where the number of guests (adults and children) staying in the room is less than the room capacity

1. Fill blank values in the 'Children' column with 0s so they can be compared
2. Filter via slicing, first on adults then on children

In [15]:
join['Children'] = join['Children'].fillna(0)

In [16]:
join['Children'].value_counts()

0.0    430
1.0    250
Name: Children, dtype: int64

In [17]:
join = join[join['Adults in Party'] <= join['Adults']].copy()

In [18]:
join = join[join['Children in Party'] <= join['Children']].copy()

In [19]:
join.shape

(451, 19)

### Ensure guests with accessibility requirements have accessible rooms

1. Split the dataframes into people who require accessible rooms and those who don't
2. For the people requiring accessible rooms (acc), keep only the rooms that are accessible
    - No changes are required for the people who don't require accessible rooms
3. Join the two dataframes to produce the full dataset

In [20]:
#Split dataframes into people who require accessible rooms and those who don't

acc = join[join['Requires Accessible Room?'] == 'Y'].copy()
nacc = join[join['Requires Accessible Room?'] == 'N'].copy()

print(acc.shape)
print(nacc.shape)

(84, 19)
(367, 19)


In [21]:
#For the accessible data frame, keep only accessible rooms

acc = acc[acc['Accessible'] == 'Y'].copy()
acc

Unnamed: 0,Party,Adults in Party,Children in Party,Double/Twin,Requires Accessible Room?,Additional Requests,Request Bath,Request High Floor,Request Not Near Lift,Count of Requests,Room,Adults,Children,Features,Bath,High Floor,Near Lift,Accessible,_merge
224,Fearby,2,0,Double,Y,,0,0,0,0,101,2,0.0,"Accessible, Near to lift, Double",0,0,1,Y,both
225,Fearby,2,0,Double,Y,,0,0,0,0,102,2,0.0,"Accessible, Double",0,0,0,Y,both
226,Fearby,2,0,Double,Y,,0,0,0,0,103,2,0.0,"Accessible, Double",0,0,0,Y,both
227,Fearby,2,0,Double,Y,,0,0,0,0,104,2,0.0,"Accessible, Double",0,0,0,Y,both
392,Norcutt,2,0,Double,Y,NOT Near to lift,0,0,1,1,101,2,0.0,"Accessible, Near to lift, Double",0,0,1,Y,both
393,Norcutt,2,0,Double,Y,NOT Near to lift,0,0,1,1,102,2,0.0,"Accessible, Double",0,0,0,Y,both
394,Norcutt,2,0,Double,Y,NOT Near to lift,0,0,1,1,103,2,0.0,"Accessible, Double",0,0,0,Y,both
395,Norcutt,2,0,Double,Y,NOT Near to lift,0,0,1,1,104,2,0.0,"Accessible, Double",0,0,0,Y,both
476,Iczokvitz,2,0,Double,Y,Bath,1,0,0,1,101,2,0.0,"Accessible, Near to lift, Double",0,0,1,Y,both
477,Iczokvitz,2,0,Double,Y,Bath,1,0,0,1,102,2,0.0,"Accessible, Double",0,0,0,Y,both


In [22]:
#No changes are required for the nacc dataframe, so join both dataframes back together to produce the full set of possible rooms

full = nacc.append(acc)
full

Unnamed: 0,Party,Adults in Party,Children in Party,Double/Twin,Requires Accessible Room?,Additional Requests,Request Bath,Request High Floor,Request Not Near Lift,Count of Requests,Room,Adults,Children,Features,Bath,High Floor,Near Lift,Accessible,_merge
24,Corain,4,0,Double,N,"Bath, High Floor, NOT Near to lift",1,1,1,3,601,4,0.0,"High Floor, Bath, Near to lift, Double",1,1,1,N,both
25,Corain,4,0,Double,N,"Bath, High Floor, NOT Near to lift",1,1,1,3,602,4,0.0,"High Floor, Bath, Double",1,1,0,N,both
26,Corain,4,0,Double,N,"Bath, High Floor, NOT Near to lift",1,1,1,3,603,4,0.0,"High Floor, Bath, Double",1,1,0,N,both
27,Corain,4,0,Double,N,"Bath, High Floor, NOT Near to lift",1,1,1,3,604,4,0.0,"High Floor, Bath, Double",1,1,0,N,both
34,Saph,2,1,Double,N,"Bath, NOT Near to lift",1,0,1,2,203,2,1.0,Double,0,0,0,N,both
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
395,Norcutt,2,0,Double,Y,NOT Near to lift,0,0,1,1,104,2,0.0,"Accessible, Double",0,0,0,Y,both
476,Iczokvitz,2,0,Double,Y,Bath,1,0,0,1,101,2,0.0,"Accessible, Near to lift, Double",0,0,1,Y,both
477,Iczokvitz,2,0,Double,Y,Bath,1,0,0,1,102,2,0.0,"Accessible, Double",0,0,0,Y,both
478,Iczokvitz,2,0,Double,Y,Bath,1,0,0,1,103,2,0.0,"Accessible, Double",0,0,0,Y,both


### Calculate the request satisfaction % for each guest in each room

1. Using df.apply() assess wither the Request columns match the 'Features' columns
    1. First, write a function to use with df.apply().
        - You will write three functions, one to use with bath, high floor, and not near lift
    2. Use df.apply on the dataframe to call the functions
2. Sum the resulting columns to find out how many matching requests the guests had. This column is called 'Requirements Met'
3. Divide the 'Requirements Met' column by the 'Count of Requests' column. Round and multiply by 100 to produce a percent
    - Guests who have zero requests will result in a NaN, due to revising by zero. Fill this with 0 using series.fillna()

In [24]:
#Calculate whether each column is matched (or irrelevant)

#Bath match
def bath(x):
    if x['Request Bath'] == 1 and x['Bath'] == 1.0:
        return 1
    else:
        return 0
    
#High Floor
def hf(x):
    if x['Request High Floor'] == 1 and x['High Floor'] == 1.0:
        return 1
    else:
        return 0

#Not near lift
def nnl(x):
    if x['Request Not Near Lift'] == 1 and x['Near Lift'] == 0.0:
        return 1
    else:
        return 0

In [25]:
#Produce new columns to state whether these are a match

full['match_bath'] = full.apply(bath, axis=1)
full['match_bath'].value_counts()

0    271
1    108
Name: match_bath, dtype: int64

In [26]:
full['match_hf'] = full.apply(hf, axis=1)
full['match_hf'].value_counts()

0    271
1    108
Name: match_hf, dtype: int64

In [27]:
full['Near Lift'].value_counts()

0    323
1     56
Name: Near Lift, dtype: int64

In [28]:
full['match_nnl'] = full.apply(nnl, axis=1)
full['match_nnl'].value_counts()

0    262
1    117
Name: match_nnl, dtype: int64

In [29]:
#Sum match columns

full['Requirements Met'] = full['match_bath'] + full['match_hf'] + full['match_nnl']
full['Requirements Met'].value_counts()

0    150
1    131
2     92
3      6
Name: Requirements Met, dtype: int64

In [30]:
#Divide sum match columns by number of requests to produce percentage

full['Request Satisfaction'] = round(full['Requirements Met']/full['Count of Requests'], 4) * 100
full['Request Satisfaction']

24      66.67
25     100.00
26     100.00
27     100.00
34      50.00
        ...  
395    100.00
476      0.00
477      0.00
478      0.00
479      0.00
Name: Request Satisfaction, Length: 379, dtype: float64

In [41]:
#Fill NaNs resulting from dividing by zero above with 0

full['Request Satisfaction'] = full['Request Satisfaction'].fillna(0)

### Filter so that guests are left only with the highest request satisfaction

1. Create a smaller dataset containing only the 'Party' and 'Request Satisfaction' fields.
2. Create a groupby sorted on the maximum of request satisfaction
3. Join the small groupby back to the full dataset on both 'Party' and 'Request Satisfaction', using a left join to only keep the maximum of request satisfaction

In [42]:
#Create a smaller database
gb = full[['Party', 'Request Satisfaction']]

In [44]:
#Create a groubpy on party with the maximum of request satisfaction
gb = gb.groupby('Party').max()
gb

Unnamed: 0_level_0,Request Satisfaction
Party,Unnamed: 1_level_1
Aarons,100.0
Abramowitz,100.0
Baxstare,100.0
Chese,100.0
Corain,100.0
Cullum,0.0
Fallens,0.0
Fearby,0.0
Gendrich,100.0
Ghiriardelli,100.0


In [45]:
### Join the Groupby to the full dataframe to keep only the highest rooms

full2 = pd.merge(gb, full, on=['Party', 'Request Satisfaction'], how='left')
full2

Unnamed: 0,Party,Request Satisfaction,Adults in Party,Children in Party,Double/Twin,Requires Accessible Room?,Additional Requests,Request Bath,Request High Floor,Request Not Near Lift,...,Features,Bath,High Floor,Near Lift,Accessible,_merge,match_bath,match_hf,match_nnl,Requirements Met
0,Aarons,100.0,2,0,Twin,N,Bath,1,0,0,...,"Bath, Twin",1,0,0,N,both,1,0,0,1
1,Aarons,100.0,2,0,Twin,N,Bath,1,0,0,...,"Bath, Twin",1,0,0,N,both,1,0,0,1
2,Aarons,100.0,2,0,Twin,N,Bath,1,0,0,...,"Bath, Twin",1,0,0,N,both,1,0,0,1
3,Aarons,100.0,2,0,Twin,N,Bath,1,0,0,...,"High Floor, Bath, Near to lift, Twin",1,1,1,N,both,1,0,0,1
4,Aarons,100.0,2,0,Twin,N,Bath,1,0,0,...,"High Floor, Bath, Twin",1,1,0,N,both,1,0,0,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
237,Vowell,100.0,4,0,Double,N,"High Floor, NOT Near to lift",0,1,1,...,"High Floor, Bath, Double",1,1,0,N,both,0,1,1,2
238,Winmill,100.0,1,1,Double,N,"High Floor, NOT Near to lift",0,1,1,...,"High Floor, Bath, Double",1,1,0,N,both,0,1,1,2
239,Winmill,100.0,1,1,Double,N,"High Floor, NOT Near to lift",0,1,1,...,"High Floor, Bath, Double",1,1,0,N,both,0,1,1,2
240,Winmill,100.0,1,1,Double,N,"High Floor, NOT Near to lift",0,1,1,...,"High Floor, Bath, Double",1,1,0,N,both,0,1,1,2


### Remove rooms that are too large for the party

1. Remove the request columns. (This is not strictly necessary to do in this order.)
2. For each child and adult capacity, we will create a groupby that contains the minimum size of the room needed for each party
    1. For adults, create a smaller df with only Party and Adults (room capacity) information
    2. Produce a groupby to take the minimum. Use reset index to produce a flat df rather than an hierarchical index.
    3. Repeat steps 1 and 2 for Children
    4. Combine these two groupbys using pd.merge()
3. Join the capacity dataframe created during step 2 with the full dataframe from the cleanup step. This results in the solution for the exercise.

In [50]:
#Clean up extra columns

full2 = full2[['Party',  'Adults in Party', 'Children in Party',
       'Double/Twin', 'Requires Accessible Room?', 'Additional Requests',
       'Request Satisfaction',  'Room', 'Adults', 'Children', 'Features']]
full2

Unnamed: 0,Party,Adults in Party,Children in Party,Double/Twin,Requires Accessible Room?,Additional Requests,Request Satisfaction,Room,Adults,Children,Features
0,Aarons,2,0,Twin,N,Bath,100.0,302,2,1.0,"Bath, Twin"
1,Aarons,2,0,Twin,N,Bath,100.0,303,2,0.0,"Bath, Twin"
2,Aarons,2,0,Twin,N,Bath,100.0,304,2,1.0,"Bath, Twin"
3,Aarons,2,0,Twin,N,Bath,100.0,401,2,0.0,"High Floor, Bath, Near to lift, Twin"
4,Aarons,2,0,Twin,N,Bath,100.0,402,2,1.0,"High Floor, Bath, Twin"
...,...,...,...,...,...,...,...,...,...,...,...
237,Vowell,4,0,Double,N,"High Floor, NOT Near to lift",100.0,604,4,0.0,"High Floor, Bath, Double"
238,Winmill,1,1,Double,N,"High Floor, NOT Near to lift",100.0,503,2,1.0,"High Floor, Bath, Double"
239,Winmill,1,1,Double,N,"High Floor, NOT Near to lift",100.0,504,2,1.0,"High Floor, Bath, Double"
240,Winmill,1,1,Double,N,"High Floor, NOT Near to lift",100.0,506,2,1.0,"High Floor, Bath, Double"


In [57]:
#Create a groupby that keeps only the minimum size of the room for adults

gb2 = full2[['Party', 'Adults']]
gb2 = gb2.groupby(by='Party').min().reset_index()
gb2

Unnamed: 0,Party,Adults
0,Aarons,2
1,Abramowitz,2
2,Baxstare,2
3,Chese,2
4,Corain,4
5,Cullum,2
6,Fallens,2
7,Fearby,2
8,Gendrich,4
9,Ghiriardelli,2


In [58]:
#Create a groupby that keeps only the minimum size of the room for children

gb3 = full2[['Party', 'Children']]
gb3 = gb3.groupby(by='Party').min().reset_index()
gb3

Unnamed: 0,Party,Children
0,Aarons,0.0
1,Abramowitz,1.0
2,Baxstare,1.0
3,Chese,0.0
4,Corain,0.0
5,Cullum,0.0
6,Fallens,0.0
7,Fearby,0.0
8,Gendrich,0.0
9,Ghiriardelli,0.0


In [59]:
#Join the adults and children groupbys together

gb4 = pd.merge(gb2, gb3, on='Party')
gb4

Unnamed: 0,Party,Adults,Children
0,Aarons,2,0.0
1,Abramowitz,2,1.0
2,Baxstare,2,1.0
3,Chese,2,0.0
4,Corain,4,0.0
5,Cullum,2,0.0
6,Fallens,2,0.0
7,Fearby,2,0.0
8,Gendrich,4,0.0
9,Ghiriardelli,2,0.0


In [66]:
#Join the combined minimum groupby with the full2 groupby

full3 = pd.merge(gb4, full2, on=['Party', 'Adults','Children'], how='left')
full3

Unnamed: 0,Party,Adults,Children,Adults in Party,Children in Party,Double/Twin,Requires Accessible Room?,Additional Requests,Request Satisfaction,Room,Features
0,Aarons,2,0.0,2,0,Twin,N,Bath,100.0,303,"Bath, Twin"
1,Aarons,2,0.0,2,0,Twin,N,Bath,100.0,401,"High Floor, Bath, Near to lift, Twin"
2,Aarons,2,0.0,2,0,Twin,N,Bath,100.0,405,"High Floor, Bath, Twin"
3,Aarons,2,0.0,2,0,Twin,N,Bath,100.0,407,"High Floor, Bath, Twin"
4,Abramowitz,2,1.0,2,1,Double,N,"Bath, High Floor",100.0,501,"High Floor, Bath, Near to lift, Double"
...,...,...,...,...,...,...,...,...,...,...,...
148,Vowell,4,0.0,4,0,Double,N,"High Floor, NOT Near to lift",100.0,604,"High Floor, Bath, Double"
149,Winmill,2,1.0,1,1,Double,N,"High Floor, NOT Near to lift",100.0,503,"High Floor, Bath, Double"
150,Winmill,2,1.0,1,1,Double,N,"High Floor, NOT Near to lift",100.0,504,"High Floor, Bath, Double"
151,Winmill,2,1.0,1,1,Double,N,"High Floor, NOT Near to lift",100.0,506,"High Floor, Bath, Double"


### Export to csv

In [67]:
full3.to_csv('pandas_output.csv', index=False)