# AirBnB assignment problem

You are planning a weekend vacation in Boston for a large group of students. You would like to stay in AirBnB's, since this is generally cheaper than staying at an hotel.
Unfortunately, this is quite complicated to organize!

* AirBnBs accommodate different numbers of guests, so it is difficult to divide everyone into groups
* Since there are many of you, some people might try to book the same listings and be disappointed
* All the AirBnBs have different features and prices

Instead of everyone trying to organize their own groups and bookings, you have volunteered to formulate an optimization model to determine where each person should stay.
You sent out an initial survey to your classmates to find out what preferences and constraints you need to consider:
* Some students requested to be placed in a Male only/Female only lisitng, while some were happy to share with anyone
* Some students requested specific amenities: Kitchen, Air-con etc.

You also decided to restrict the listings that you would consider based on certain criteria
* You all want to be near each other for group activities, so we will only consider listings near central Boston or Back Bay
* We will only consider listings with high review scores



In [88]:
#import Pkg
#Pkg.add("CSV")
#Pkg.add("DataFrames")

In [1]:
# first, set the working directory to the 3_optimization folder
cd("/Users/Emma/Documents/MBAN/mban_orientation/1_orientation/3_optimization/")

In [2]:
# load packages
using JuMP, Gurobi, DataFrames, CSV

Yesterday we practiced filtering and manipulating the AirBnB data in R. Instead of starting from scratch in Julia, let's use our R skills to do some initial data processing.
To do this, we'll need to do the following:
* an R script that contains all the code we want to run: input_data.R
* a way to tell Julia to run the R script
* a way to read the output files from the R script into Julia

To do this we'll need two commands:
* the run( [command] ) function tells Julia to send a command to the terminal
* the Rscript [script] [arguments] command tells your computer to open and R session and run a specific script

In [3]:
# our R script takes one argument: the date that we will start our stay

# set the date that we want to start our booking
date = "2019-11-25"

# this line will run the R script 
run(`Rscript input_data.R $date`)

── Attaching packages ─────────────────────────────────────── tidyverse 1.2.1 ──
✔ ggplot2 3.1.0     ✔ purrr   0.2.5
✔ tibble  1.4.2     ✔ dplyr   0.7.7
✔ tidyr   0.8.0     ✔ stringr 1.3.1
✔ readr   1.1.1     ✔ forcats 0.3.0
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
1: package ‘ggplot2’ was built under R version 3.4.4 
2: package ‘purrr’ was built under R version 3.4.4 
3: package ‘dplyr’ was built under R version 3.4.4 
4: package ‘stringr’ was built under R version 3.4.4 
Loading required package: methods

Attaching package: ‘lubridate’

The following object is masked from ‘package:base’:

    date

package ‘lubridate’ was built under R version 3.4.4 


[1] "2019-11-25"


Parsed with column specification:
cols(
  .default = col_character(),
  id = col_integer(),
  scrape_id = col_double(),
  last_scraped = col_date(format = ""),
  host_id = col_integer(),
  host_since = col_date(format = ""),
  host_listings_count = col_integer(),
  host_total_listings_count = col_integer(),
  latitude = col_double(),
  longitude = col_double(),
  accommodates = col_integer(),
  bathrooms = col_double(),
  bedrooms = col_integer(),
  beds = col_integer(),
  square_feet = col_integer(),
  guests_included = col_integer(),
  minimum_nights = col_integer(),
  maximum_nights = col_integer(),
  minimum_minimum_nights = col_integer(),
  maximum_minimum_nights = col_integer(),
  minimum_maximum_nights = col_integer()
  # ... with 24 more columns
)
See spec(...) for full column specifications.
package ‘bindrcpp’ was built under R version 3.4.4 
Parsed with column specification:
cols(
  listing_id = col_integer(),
  date = col_date(format = ""),
  available = col_character(),
  p

[1] "Success!"


Process(`[4mRscript[24m [4minput_data.R[24m [4m2019-11-25[24m`, ProcessExited(0))

In [4]:
# now, let's read in the output that we saved from this script 
listings = CSV.read("filtered_listings.csv")
L=size(listings,1)
first(listings,5)

Unnamed: 0_level_0,id,listing_url,scrape_id,last_scraped,name
Unnamed: 0_level_1,Int64,String,Int64,Dates…,String
1,311240,https://www.airbnb.com/rooms/311240,20190714024644,2019-07-14,"Upscale Back Bay Studio, River Views"
2,1090413,https://www.airbnb.com/rooms/1090413,20190714024644,2019-07-14,HynesConventionCtr-5min-Lge Rm-Pvt Bath
3,1090545,https://www.airbnb.com/rooms/1090545,20190714024644,2019-07-14,BostonCtrBackBay-Pvt Bath&Rm-Comfy Bed
4,1868124,https://www.airbnb.com/rooms/1868124,20190714024644,2019-07-14,Lux Downtown Boston 1 Bedroom Apt w/pool
5,1868513,https://www.airbnb.com/rooms/1868513,20190714024644,2019-07-14,Lux 1 Bedroom in Post-War Back Bay building w/WiFi


In [5]:
# check if this is a logical or string

amenities = [:Kitchen]
listings[amenities]

│   caller = top-level scope at In[5]:4
└ @ Core In[5]:4


Unnamed: 0_level_0,Kitchen
Unnamed: 0_level_1,String
1,TRUE
2,TRUE
3,TRUE
4,TRUE
5,TRUE
6,TRUE
7,TRUE
8,TRUE
9,TRUE
10,TRUE


In [6]:
# read the student preferences 
preferences = CSV.read("preferences.csv")
N=size(preferences,1)
first(preferences,5)

Unnamed: 0_level_0,Name,room_A,room_F,room_M,Kitchen,Air_conditioning
Unnamed: 0_level_1,String,Int64,Int64,Int64,Int64,Int64
1,name1,0,1,0,0,1
2,name2,0,0,1,0,0
3,name3,1,0,1,0,0
4,name4,1,0,0,1,1
5,name5,1,0,0,0,1


In [7]:
# now select the column in the listings data that gives the cost on this date
# the column name will be total_price_$date
column_name = "stay 1"
println("The column name is: ",column_name)

# the column_name variable is a string. We need to make it into a symbol to use it in the listings DataFrame
cost = listings[!,Symbol(column_name)]

The column name is: stay 1


118-element CSV.Column{Int64,Int64}:
  555
  300
  300
 1371
 1335
 1680
 1851
 1851
 1680
  840
 1335
  480
  810
    ⋮
  567
  667
  667
 1197
  567
  285
  612
  612
  960
  663
  180
  267

In [8]:
# look at the cost vector: some of the entries are numbers and others are missing
# R uses NA to represent missing data, but Julia uses its own 'missing' type

# the missing costs are for loactions that are not available on this date
# let's make a new variable to show which listings ARE available
available = .!ismissing.(cost)
println(available[1:10])
println("Number of listings available: ",sum(available))
println("Number of listings not available: ",sum(1 .- available))

Bool[true, true, true, true, true, true, true, true, true, true]
Number of listings available: 118
Number of listings not available: 0


In [9]:
# Let's try a simple model to select the cheapest listings that are available for this period
model1 = Model(with_optimizer(Gurobi.Optimizer,TimeLimit=60))

# variables
# y[j]=1 listing j is booked, and 0 otherwise
@variable(model1, y[1:L], Bin)
# x[i,j]=1 if person i stays in listing j, and 0 otherwise
@variable(model1, x[1:N,1:L], Bin)

# constraints
# everyone has to be assigned exactly one listing
@constraint(model1,[i=1:N],sum(x[i,:])==1) 
# we can only put people in a listing if it is booked
@constraint(model1,[i=1:N,j=1:L],x[i,j]<=y[j]) 
# we can only book available listings
@constraint(model1,[i=1:N,j=1:L],y[j]<=available[j] ) 
# maximum occupancy per unit
@constraint(model1,[j=1:L],sum(x[:,j])<=listings[j,:accommodates]*y[j]) 

# objective
@objective(model1, Min, sum( y[j]*cost[j] for j=findall(available) ))

optimize!(model1)

Optimize a model with 14101 rows, 7080 columns and 34928 nonzeros
Variable types: 0 continuous, 7080 integer (7080 binary)
Coefficient statistics:
  Matrix range     [1e+00, 8e+00]
  Objective range  [2e+02, 2e+03]
  Bounds range     [0e+00, 0e+00]
  RHS range        [1e+00, 1e+00]
Found heuristic solution: objective 29891
Presolve removed 7139 rows and 0 columns
Presolve time: 0.13s
Presolved: 6962 rows, 7080 columns, 27612 nonzeros
Variable types: 0 continuous, 7080 integer (7080 binary)

Root relaxation: objective 6.635667e+03, 7939 iterations, 0.23 seconds

    Nodes    |    Current Node    |     Objective Bounds      |     Work
 Expl Unexpl |  Obj  Depth IntInf | Incumbent    BestBd   Gap | It/Node Time

     0     0 6635.66667    0   13 29891.0000 6635.66667  77.8%     -    0s
H    0     0                    6929.0000000 6635.66667  4.23%     -    0s
H    0     0                    6922.0000000 6635.66667  4.14%     -    0s
H    0     0                    6706.0000000 6635.66667 

In [10]:
println("Cost per person: ",JuMP.objective_value(model1)/N)
println("Number of listings: ",sum(JuMP.value.(y)))
println("Selected listings: ",findall(JuMP.value.(y).>0) )
println("Guests per listing: ",listings[findall(JuMP.value.(y).>0),:accommodates] )
println("Beds per listing: ",listings[findall(JuMP.value.(y).>0),:beds] )

Cost per person: 112.64406779661017
Number of listings: 11.0
Selected listings: [36, 49, 56, 63, 83, 90, 91, 101, 104, 112, 118]
Guests per listing: [4, 4, 8, 8, 6, 6, 5, 6, 4, 4, 4]
Beds per listing: [2, 2, 4, 4, 2, 3, 3, 2, 2, 2, 1]


In [11]:
room_gender = [:room_A ,:room_F ,:room_M ]
amenities = [:Kitchen,:Air_conditioning]
for a in amenities
    listings[!,a] = listings[!,a].=="TRUE"
end

In [12]:
model1 = Model(with_optimizer(Gurobi.Optimizer,TimeLimit=60))

# variables
# y[j]=1 listing j is booked, and 0 otherwise
@variable(model1, y[1:L], Bin)
# x[i,j]=1 if person i stays in listing j, and 0 otherwise
@variable(model1, x[1:N,1:L], Bin)

# constraints
# everyone has to be assigned exactly one listing
@constraint(model1,[i=1:N],sum(x[i,:])==1) 
# we can only put people in a listing if it is booked
@constraint(model1,[i=1:N,j=1:L],x[i,j]<=y[j]) 
# we can only book available listings
@constraint(model1,[i=1:N,j=1:L],y[j]<=available[j] ) 
# maximum occupancy per unit
@constraint(model1,[j=1:L],sum(x[:,j])<=listings[j,:accommodates]*y[j]) 

# objective
@objective(model1, Min, sum( y[j]*cost[j] for j=findall(available) ))# variables
# g[i,k] = 1 if listing i is type k
@variable(model1, g[1:L,room_gender], Bin)

# gender preferences: x[i,j] can only be 1 if g[j,t] matches student i's gender preferences
@constraint(model1,[i=1:N,j=1:L],x[i,j]<=sum(preferences[i,t]*g[j,t]  for t in room_gender))

# amenity preferences: x[i,j] can only be 1 if listing[j] matches student i's preferences
@constraint(model1,[i=1:N,j=1:L,a=amenities],x[i,j]<=(1-preferences[i,a])+listings[j,a]*preferences[i,a] )

optimize!(model1)

Optimize a model with 34987 rows, 7434 columns and 66198 nonzeros
Variable types: 0 continuous, 7434 integer (7434 binary)
Coefficient statistics:
  Matrix range     [1e+00, 8e+00]
  Objective range  [2e+02, 2e+03]
  Bounds range     [0e+00, 0e+00]
  RHS range        [1e+00, 1e+00]
Found heuristic solution: objective 30245
Presolve removed 28314 rows and 686 columns
Presolve time: 0.18s
Presolved: 6673 rows, 6748 columns, 26370 nonzeros
Variable types: 0 continuous, 6748 integer (6748 binary)

Root relaxation: objective 6.635667e+03, 7579 iterations, 0.23 seconds

    Nodes    |    Current Node    |     Objective Bounds      |     Work
 Expl Unexpl |  Obj  Depth IntInf | Incumbent    BestBd   Gap | It/Node Time

     0     0 6635.66667    0   13 30245.0000 6635.66667  78.1%     -    0s
H    0     0                    6929.0000000 6635.66667  4.23%     -    0s
H    0     0                    6922.0000000 6635.66667  4.14%     -    0s
H    0     0                    6706.0000000 6635.666

In [13]:
println("Cost per person: ",JuMP.objective_value(model1)/N)
println("Number of listings: ",sum(JuMP.value.(y)))
println("Selected listings: ",findall(JuMP.value.(y).>0) )
println("Guests per listing: ",listings[findall(JuMP.value.(y).>0),:accommodates] )
println("Beds per listing: ",listings[findall(JuMP.value.(y).>0),:beds] )

Cost per person: 112.64406779661017
Number of listings: 11.0
Selected listings: [36, 49, 56, 63, 83, 90, 91, 101, 104, 112, 118]
Guests per listing: [4, 4, 8, 8, 6, 6, 5, 6, 4, 4, 4]
Beds per listing: [2, 2, 4, 4, 2, 3, 3, 2, 2, 2, 1]


In [14]:
# maximum occupancy per unit
@constraint(model1,[j=1:L],sum(x[:,j])<=listings[j,:beds]*y[j]) 
optimize!(model1)

Optimize a model with 35105 rows, 7434 columns and 73278 nonzeros
Variable types: 0 continuous, 7434 integer (7434 binary)
Coefficient statistics:
  Matrix range     [1e+00, 8e+00]
  Objective range  [2e+02, 2e+03]
  Bounds range     [0e+00, 0e+00]
  RHS range        [1e+00, 1e+00]
Found heuristic solution: objective 36362
Presolve removed 31754 rows and 686 columns
Presolve time: 0.13s
Presolved: 3351 rows, 6748 columns, 19726 nonzeros

MIP start did not produce a new incumbent solution
MIP start violates constraint R35022 by 2.000000000

Variable types: 0 continuous, 6748 integer (6748 binary)

Root relaxation: objective 1.544300e+04, 7083 iterations, 0.37 seconds

    Nodes    |    Current Node    |     Objective Bounds      |     Work
 Expl Unexpl |  Obj  Depth IntInf | Incumbent    BestBd   Gap | It/Node Time

*    0     0               0    15443.000000 15443.0000  0.00%     -    0s

Explored 0 nodes (7083 simplex iterations) in 0.52 seconds
Thread count was 4 (of 4 available pro

In [15]:
println("Cost per person: ",JuMP.objective_value(model1)/N)
println("Number of listings: ",sum(JuMP.value.(y)))
println("Selected listings: ",listings[findall(JuMP.value.(y).>0),:id] )
println("Guests per listing: ",listings[findall(JuMP.value.(y).>0),:accommodates] )
println("Beds per listing: ",listings[findall(JuMP.value.(y).>0),:beds] )

Cost per person: 261.7457627118644
Number of listings: 26.0
Selected listings: [1090413, 1090545, 13686161, 14994014, 15166858, 15630129, 17559756, 17559846, 17573922, 18396209, 19448926, 19469134, 19552273, 21648590, 22897065, 22934377, 23871901, 27584770, 27669626, 29704910, 32330288, 32330728, 32331200, 33136211, 33549296, 33781676]
Guests per listing: [1, 1, 3, 4, 3, 4, 5, 3, 3, 8, 6, 5, 8, 6, 6, 5, 4, 4, 6, 4, 4, 4, 3, 4, 1, 4]
Beds per listing: [1, 1, 2, 2, 2, 2, 3, 2, 2, 4, 3, 3, 4, 2, 3, 3, 3, 3, 2, 2, 2, 2, 2, 2, 1, 1]


In [19]:
# find out which listing each person is assigned to
X = JuMP.value.(x)
assigned_listing = [ listings[findall(X[i,:].>0)[1],:id] for i=1:N]

59-element Array{Int64,1}:
 19552273
 33136211
 17559846
 17573922
  1090413
 18396209
 21648590
 19469134
 15630129
 32330728
 15166858
 22934377
 32331200
        ⋮
 19469134
 21648590
 19448926
 15166858
 23871901
 15630129
 17573922
 27669626
 14994014
 22897065
 32330288
 23871901

In [20]:
# make a new data frame with each individual's assignment
assignments = DataFrame()
assignments[!,:Name] = preferences[!,:Name]
assignments[!,:listing] = assigned_listing
first(assignments,5)

Unnamed: 0_level_0,Name,listing
Unnamed: 0_level_1,String,Int64
1,name1,19552273
2,name2,33136211
3,name3,17559846
4,name4,17573922
5,name5,1090413


In [21]:
# choose a name for the output file

filename = "airbnb_solution"

CSV.write("$filename.csv",assignments)

"airbnb_solution.csv"