# **Housing Prices in Vancouver vs Calgary**

In [2]:
library(broom)
library(repr)
library(gridExtra)
library(glmnet)
library(cowplot)
library(modelr)
library(tidyverse)


Loading required package: Matrix

Loaded glmnet 4.1-8


Attaching package: ‘modelr’


The following object is masked from ‘package:broom’:

    bootstrap


“package ‘lubridate’ was built under R version 4.4.2”
── [1mAttaching core tidyverse packages[22m ──────────────────────── tidyverse 2.0.0 ──
[32m✔[39m [34mdplyr    [39m 1.1.4     [32m✔[39m [34mreadr    [39m 2.1.5
[32m✔[39m [34mforcats  [39m 1.0.0     [32m✔[39m [34mstringr  [39m 1.5.1
[32m✔[39m [34mggplot2  [39m 3.5.1     [32m✔[39m [34mtibble   [39m 3.2.1
[32m✔[39m [34mlubridate[39m 1.9.4     [32m✔[39m [34mtidyr    [39m 1.3.1
[32m✔[39m [34mpurrr    [39m 1.0.2     
── [1mConflicts[22m ────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mmodelr[39m::[32mbootstrap()[39m masks [34mbroom[39m::bootstrap()
[31m✖[39m [34mdplyr[39m::[32mcombine()[39m    masks [34mgridExtra[39m::combine()
[31m✖[39m [34mtidyr[39m::[32mexpand()[39m     masks [34mMatrix[

In [3]:
CanadaRent_dataset <- read.csv("rentfaster.csv") %>%
select(-link)

head(CanadaRent_dataset)

Unnamed: 0_level_0,rentfaster_id,city,province,address,latitude,longitude,lease_term,type,price,beds,baths,sq_feet,furnishing,availability_date,smoking,cats,dogs
Unnamed: 0_level_1,<int>,<chr>,<chr>,<chr>,<dbl>,<dbl>,<chr>,<chr>,<dbl>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>
1,468622,Airdrie,Alberta,69 Gateway Dr NE,51.30596,-114.0125,Long Term,Townhouse,2495,2 Beds,2.5,1403,Unfurnished,Immediate,Non-Smoking,True,True
2,468622,Airdrie,Alberta,69 Gateway Dr NE,51.30596,-114.0125,Long Term,Townhouse,2695,3 Beds,2.5,1496,Unfurnished,Immediate,Non-Smoking,True,True
3,468622,Airdrie,Alberta,69 Gateway Dr NE,51.30596,-114.0125,Long Term,Townhouse,2295,2 Beds,2.5,1180,Unfurnished,Immediate,Non-Smoking,True,True
4,468622,Airdrie,Alberta,69 Gateway Dr NE,51.30596,-114.0125,Long Term,Townhouse,2095,2 Beds,2.5,1403,Unfurnished,November 18,Non-Smoking,True,True
5,468622,Airdrie,Alberta,69 Gateway Dr NE,51.30596,-114.0125,Long Term,Townhouse,2495,2 Beds,2.5,1403,Unfurnished,Immediate,Non-Smoking,True,True
6,468622,Airdrie,Alberta,69 Gateway Dr NE,51.30596,-114.0125,Long Term,Townhouse,2095,2 Beds,2.5,1403,Unfurnished,November 18,Non-Smoking,True,True


### Project objective:

The real estate market is a critical driver of economic activity and a significant factor in determining quality of life. This project focuses on comparing housing prices in Vancouver and Calgary, two major Canadian cities with distinct economic landscapes and housing markets. Vancouver, known for its scenic beauty and global appeal, consistently ranks as one of the most expensive cities in Canada(1). Calgary, on the other hand, offers a comparatively affordable market, driven by its strong ties to the energy sector and a growing urban population(2). This analysis aims to explore the factors influencing housing prices in these cities, identify key trends, and assess how economic, geographic, and demographic differences shape their respective housing markets and determine if the housing markets in both cities are different.

### Data Description:

The data provided was pulled from kaggle which were from pulle from the website https://www.rentfaster.com using python libraries. This data set contains real estate listing in Canada broken by provinces and Cities. This data set contains 25000+ entries with 17 columns:

rentfaster_id <- id of property on https://www.rentfaster.com

city <- city of the property eg. Vancouver, Calgary, Toronto etc

province <- province of property eg. Alberta, British Columbia, Ontario etc

address <- address of the rental properties

latitude <- latitude coordinates of the property

longtitude <- longtitude coordinates of the property

lease_term <- rental period of the property eg. Long Term, Negotiable, 12 months etc

type <- type of property eg. Apartment, house etc

price <- price of the rental property (this will be the response variable of interest)

bed <- number of beds in the property or a studio, if it is a studio we will define that as a 1 bedroom

baths <- number of bath in the property 

sq_feet <- property area in square feet

furnishing <- if the property is furnished or not

availability_date <- availability of the property eg. immediate, Date etc

smoking <- if smoking is prohibited the property

cats <- if cats are permitted in the property

dogs <- if dogs are permitted in the property.

Before proceeding with the Analysis it is necessary to clean and wrange the data so that it is easier to wrok with. This would involved converting any categorical variables to binary for example, in the data set the category beds is separated into categorical variables (1 bed, 2 bed) which can be converted into numerical values. It would also involve removing any unnecessary variables such as rentfaster_id and address

In [8]:
# obtain city rental prices
Calgary_rent <- CanadaRent_dataset%>%
filter(city == "Calgary")

head(Calgary_rent,3)
nrow(Calgary_rent)

Unnamed: 0_level_0,rentfaster_id,city,province,address,latitude,longitude,lease_term,type,price,beds,baths,sq_feet,furnishing,availability_date,smoking,cats,dogs
Unnamed: 0_level_1,<int>,<chr>,<chr>,<chr>,<dbl>,<dbl>,<chr>,<chr>,<dbl>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>
1,571702,Calgary,Alberta,1055 11 Street SW,51.04338,-114.089,Long Term,Apartment,1725,1 Bed,1,483,Unfurnished,Immediate,Non-Smoking,True,True
2,571702,Calgary,Alberta,1055 11 Street SW,51.04338,-114.089,Long Term,Apartment,1875,1 Bed,1,527,Unfurnished,Immediate,Non-Smoking,True,True
3,571702,Calgary,Alberta,1055 11 Street SW,51.04338,-114.089,Long Term,Apartment,2400,2 Beds,2,854,Unfurnished,Immediate,Non-Smoking,True,True


In [37]:
Calgary_rent_clean <- Calgary_rent %>% 
select(-rentfaster_id, -address, -latitude, - longitude)%>% # removes unnecessary variables
mutate(furnishing = trimws(furnishing))%>% #remove any spaces in each entry
mutate(furnishing = ifelse(furnishing == "Unfurnished", 0, 1))%>% # converts furnishing to binary variables, 1 indicating it is furnished 0, otherwise
mutate(smoking = trimws(smoking))%>% #remove any spaces in each entry
mutate(smoking = ifelse(smoking == "Non-Smoking", 0 , 1)) %>%# converts smoking to binary 1 indicating smoking, 0 indicating non-smokin
mutate(cats = trimws(cats))%>%#remove any spaces in each entry
mutate(cats = ifelse(cats == "True", 1 , 0))%>%# converts cats to binary 1 indicating permitted, 0 indicating prohibited
mutate(dogs = trimws(dogs))%>%#remove any spaces in each entry
mutate(dogs = ifelse(dogs == "True", 1 , 0)) %>%# converts dogs to binary 1 indicating permitted, 0 indicating prohibited
mutate(beds = trimws(beds))%>%#remove any spaces in each entry
mutate(beds = ifelse(beds == "Studio", 1 , as.numeric(str_extract(beds, "\\d+"))))%>%# converts beds to numerical entries, if it is a studio beds = 1
na.omit()%>%
filter(sq_feet == as.integer(sq_feet) & sq_feet>0)



head(Calgary_rent_clean,3)


[1m[22m[36mℹ[39m In argument: `sq_feet == as.integer(sq_feet) & sq_feet > 0`.
[33m![39m NAs introduced by coercion”


Unnamed: 0_level_0,city,province,lease_term,type,price,beds,baths,sq_feet,furnishing,availability_date,smoking,cats,dogs
Unnamed: 0_level_1,<chr>,<chr>,<chr>,<chr>,<dbl>,<dbl>,<chr>,<chr>,<dbl>,<chr>,<dbl>,<dbl>,<dbl>
1,Calgary,Alberta,Long Term,Apartment,1725,1,1,483,0,Immediate,0,1,1
2,Calgary,Alberta,Long Term,Apartment,1875,1,1,527,0,Immediate,0,1,1
3,Calgary,Alberta,Long Term,Apartment,2400,2,2,854,0,Immediate,0,1,1


Now to do the same with Vancouver

In [24]:
Vancouver_rent <- CanadaRent_dataset%>%
filter(city == "Vancouver")

head(Vancouver_rent, 3)

nrow(Vancouver_rent)

Unnamed: 0_level_0,rentfaster_id,city,province,address,latitude,longitude,lease_term,type,price,beds,baths,sq_feet,furnishing,availability_date,smoking,cats,dogs
Unnamed: 0_level_1,<int>,<chr>,<chr>,<chr>,<dbl>,<dbl>,<chr>,<chr>,<dbl>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>
1,544095,Vancouver,British Columbia,1770 Pendrell Street,49.28746,-123.1405,Long Term,Apartment,3895,2 Beds,2,820.0,Unfurnished,Immediate,Non-Smoking,True,True
2,544095,Vancouver,British Columbia,1770 Pendrell Street,49.28746,-123.1405,Long Term,Apartment,2695,Studio,1,440.0,Unfurnished,Immediate,Non-Smoking,True,True
3,544095,Vancouver,British Columbia,1770 Pendrell Street,49.28746,-123.1405,Long Term,Apartment,4395,2 Beds,2,,Unfurnished,Immediate,Non-Smoking,True,True


In [30]:
Vancouver_rent_clean <- Vancouver_rent %>% 
select(-rentfaster_id, -address, -latitude, - longitude)%>% # removes unnecessary variables
mutate(furnishing = trimws(furnishing))%>% #remove any spaces in each entry
mutate(furnishing = ifelse(furnishing == "Unfurnished", 0, 1))%>% # converts furnishing to binary variables, 1 indicating it is furnished 0, otherwise
mutate(smoking = trimws(smoking))%>% #remove any spaces in each entry
mutate(smoking = ifelse(smoking == "Non-Smoking", 0 , 1)) %>%# converts smoking to binary 1 indicating smoking, 0 indicating non-smokin
mutate(cats = trimws(cats))%>%#remove any spaces in each entry
mutate(cats = ifelse(cats == "True", 1 , 0))%>%# converts cats to binary 1 indicating permitted, 0 indicating prohibited
mutate(dogs = trimws(dogs))%>%#remove any spaces in each entry
mutate(dogs = ifelse(dogs == "True", 1 , 0)) %>%# converts dogs to binary 1 indicating permitted, 0 indicating prohibited
mutate(beds = trimws(beds))%>%#remove any spaces in each entry
mutate(beds = ifelse(beds == "Studio", 1 , as.numeric(str_extract(beds, "\\d+"))))%>%# converts beds to numerical entries, if it is a studio beds = 1
na.omit()%>%
filter(sq_feet == as.integer(sq_feet) & sq_feet>0) # removes entries that are not a singular numerical values



head(Vancouver_rent_clean,3)

[1m[22m[36mℹ[39m In argument: `sq_feet == as.integer(sq_feet) & sq_feet > 0`.
[33m![39m NAs introduced by coercion”


Unnamed: 0_level_0,city,province,lease_term,type,price,beds,baths,sq_feet,furnishing,availability_date,smoking,cats,dogs
Unnamed: 0_level_1,<chr>,<chr>,<chr>,<chr>,<dbl>,<dbl>,<chr>,<chr>,<dbl>,<chr>,<dbl>,<dbl>,<dbl>
1,Vancouver,British Columbia,Long Term,Apartment,3895,2,2,820,0,Immediate,0,1,1
2,Vancouver,British Columbia,Long Term,Apartment,2695,1,1,440,0,Immediate,0,1,1
3,Vancouver,British Columbia,Long Term,Apartment,3300,1,1,650,0,Immediate,0,1,1


In [41]:
#take a sample of 50 for both cities

set.seed(2345)

Calgary_sample <- sample_n(Calgary_rent_clean,size =  50) # take a sample size of 50 in Calgary rent
head(Calgary_sample, 3)

Vancouver_sample <- sample_n(Vancouver_rent_clean,size =  50) # take a sample size of 50 in Vancouver rent

Unnamed: 0_level_0,city,province,lease_term,type,price,beds,baths,sq_feet,furnishing,availability_date,smoking,cats,dogs
Unnamed: 0_level_1,<chr>,<chr>,<chr>,<chr>,<dbl>,<dbl>,<chr>,<chr>,<dbl>,<chr>,<dbl>,<dbl>,<dbl>
1,Calgary,Alberta,Long Term,House,2700,4,2,1800,0,Immediate,0,1,1
2,Calgary,Alberta,Negotiable,Room For Rent,800,1,1,2500,1,Immediate,0,0,0
3,Calgary,Alberta,Long Term,Apartment,1575,1,1,550,0,July 01,0,0,0


In [45]:
Calgary_price <- Calgary_sample$price
Vancouver_price <- Vancouver_sample$price




tidy(t.test(Vancouver_price, Calgary_price, alternative = "greater", paired = FALSE, 
            mu = 0, var.equal = FALSE, conf.level = 0.95))

estimate,estimate1,estimate2,statistic,p.value,parameter,conf.low,conf.high,method,alternative
<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<chr>,<chr>
522.8,3023.5,2500.7,3.056,0.001447756,97.26117,238.7034,inf,Welch Two Sample t-test,greater
