# Creating a Recommender System for Airbnb to Enhance User Experience and  Retention and Boost Business.

GROUP 2
*  > Mercy Onduso
*  > Nurulain Abdi
*  > Amos Kibet
*  > Beth Mithamo

# Overview

Airbnb is a global online marketplace that offers housing and other accommodations to travelers. The platform has grown significantly in popularity over the years, with millions of hosts and guests using the platform for their travel needs.


The aim of this project is to create a recommender system that will help stakeholders and clients have a better strategy in decision making. The system will also help stakeholders do proper renovations of their listings efficiently without inconveniencing their clients.

# Business Problem 

Airbnb has become a popular alternative to traditional hotels for tourists and visitors in Cape Town, South Africa. However, despite its many advantages, users often face several challenges when using the platform. These include poor recommendations, unreliable pricing, and subpar customer experience. Moreover, stakeholders often struggle to renovate their listings to meet the needs of their target customers.
A South-Africa based housing company wants to venture into the Airbnb business and needs to create a sustainable and profitable business model that can compete with established players in the market. The company's stakeholders aims at ensuring customer retention,customer satisfaction and boost their business as a new party entity in the Airbnb Platform.  As Data Scientists, we are expected to address questions as well as provide recommendations.

Some of the questions we are expected to answer are:
1. What is the best month to visit Cape Town if you are on a budget?
2. What is the best time to list your property on Airbnb? And how do set price rates according to the time of the year?
3. What is the best time in the year when owners can take down their listing for maintenance and repair?
4. When is the best time to lure clients with offers in the case of an upcoming low season: Time series analysis



# Data

We extracted the data from InsideAirbnb which has data from the Airbnb platform. The link to the dataset is provided here: "http://insideairbnb.com/get-the-data/"

The data from the Airbnb app provides insights into the availability, pricing, and characteristics of short-term rental properties, such as apartments, houses, and rooms. The data can be used to understand the demand and supply dynamics in the market, as well as the preferences of guests and hosts. The data can also help identify trends and patterns in guest behavior, such as popular locations, amenities, and property types. Additionally, the data will be used in the development of a recommender systems that can make personalized recommendations to guests based on their preferences and past behavior. This will help both stakeholders and their clients have better strategies during decision making.

# Airbnb Exploration

*  Who needs the Recommender System? The Host- A South Africa-based Housing Company 
*  Which technologies does the Airbnb platform use in providing recommendations as of now? Cookies, Mobile Identifiers, Tracking URLs, log data.
* Airbnb uses two machine learning models in predicting prices for clients: Smart Pricing and Price Tips. 


# Possible Algorithms 

1. Collaborative Filtering: This algorithm is based on the idea that people who have similar preferences in the past are likely to have similar preferences in the future. Collaborative filtering can be further divided into two types: user-based and item-based. In user-based collaborative filtering, recommendations are made based on the preferences of similar users. In item-based collaborative filtering, recommendations are made based on the similarity between items.

2. Content-Based Filtering: This algorithm is based on the idea that recommendations can be made based on the characteristics of the items being recommended. For example, if a user has shown a preference for properties with a specific location or amenities, a content-based filtering algorithm can recommend similar properties based on these characteristics.

3. Matrix Factorization: This algorithm is based on the idea that the preferences of users and items can be represented in a lower-dimensional space. Matrix factorization algorithms try to find latent factors that explain the observed preferences of users and items, and use these factors to make recommendations.

4. Hybrid Algorithms: Hybrid algorithms combine two or more recommendation techniques to make more accurate and diverse recommendations. For example, a hybrid algorithm could combine collaborative filtering and content-based filtering to provide a more personalized and diverse set of recommendations.


In [1]:
import pandas as pd

In [2]:
data = pd.read_csv("E:\Data Science\DATA\listings.csv")
data.head()

Unnamed: 0,id,name,host_id,host_name,neighbourhood_group,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365,number_of_reviews_ltm,license
0,736534,Enjoy a Private Room in Chartfield Guesthouse,3007248,Florian,,Ward 64,-34.12637,18.44963,Private room,1250,1,69,2022-12-13,0.82,18,310,16,
1,3191,Malleson Garden Cottage,3754,Brigitte,,Ward 57,-33.94739,18.476,Entire home/apt,515,3,70,2022-12-23,0.6,1,297,14,
2,1625734,"Fabulous Villa, Large Entertainment Area Patio...",8643899,Linda,,Ward 62,-34.03014,18.42844,Entire home/apt,12500,7,5,2022-04-25,0.1,4,340,1,
3,742345,Room with a View - Green Point,3886732,Koos,,Ward 115,-33.90999,18.41148,Entire home/apt,750,3,12,2022-12-11,0.29,80,123,6,
4,1626659,Big Bay La Paloma - 2 bedroom suite,5646468,Madeleine,,Ward 23,-33.780788,18.454124,Private room,800,1,12,2022-12-20,0.11,2,162,4,


In [4]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 19670 entries, 0 to 19669
Data columns (total 18 columns):
 #   Column                          Non-Null Count  Dtype  
---  ------                          --------------  -----  
 0   id                              19670 non-null  int64  
 1   name                            19669 non-null  object 
 2   host_id                         19670 non-null  int64  
 3   host_name                       19670 non-null  object 
 4   neighbourhood_group             0 non-null      float64
 5   neighbourhood                   19670 non-null  object 
 6   latitude                        19670 non-null  float64
 7   longitude                       19670 non-null  float64
 8   room_type                       19670 non-null  object 
 9   price                           19670 non-null  int64  
 10  minimum_nights                  19670 non-null  int64  
 11  number_of_reviews               19670 non-null  int64  
 12  last_review                     

In [5]:
data.describe()

Unnamed: 0,id,host_id,neighbourhood_group,latitude,longitude,price,minimum_nights,number_of_reviews,reviews_per_month,calculated_host_listings_count,availability_365,number_of_reviews_ltm
count,19670.0,19670.0,0.0,19670.0,19670.0,19670.0,19670.0,19670.0,14653.0,19670.0,19670.0,19670.0
mean,1.796446e+17,148264000.0,,-33.959494,18.475509,4041.962938,4.136299,20.416777,0.877358,10.268022,216.172547,5.840569
std,3.056252e+17,147070100.0,,0.100235,0.127855,18936.811003,13.301732,39.997375,1.088386,21.893164,125.099667,10.783907
min,3191.0,3754.0,,-34.2644,18.31941,153.0,1.0,0.0,0.01,1.0,0.0,0.0
25%,20594450.0,28199270.0,,-34.022103,18.402673,900.0,1.0,0.0,0.16,1.0,93.0,0.0
50%,40154280.0,88739820.0,,-33.930055,18.429465,1680.0,2.0,4.0,0.5,2.0,261.0,1.0
75%,5.560003e+17,242432700.0,,-33.910435,18.4835,3300.0,3.0,21.0,1.16,7.0,328.0,7.0
max,7.913132e+17,492970200.0,,-33.55177,18.930557,979144.0,730.0,635.0,17.75,139.0,365.0,142.0


In [9]:
data.isnull().sum()

id                                    0
name                                  1
host_id                               0
host_name                             0
neighbourhood_group               19670
neighbourhood                         0
latitude                              0
longitude                             0
room_type                             0
price                                 0
minimum_nights                        0
number_of_reviews                     0
last_review                        5017
reviews_per_month                  5017
calculated_host_listings_count        0
availability_365                      0
number_of_reviews_ltm                 0
license                           19608
dtype: int64

In [8]:
data.nunique()

id                                19670
name                              19292
host_id                           10686
host_name                          4401
neighbourhood_group                   0
neighbourhood                        90
latitude                          13713
longitude                         13303
room_type                             4
price                              4224
minimum_nights                       45
number_of_reviews                   308
last_review                        1429
reviews_per_month                   582
calculated_host_listings_count       52
availability_365                    366
number_of_reviews_ltm               107
license                              46
dtype: int64

# Data Understanding