# Topic: Predict Airbnb listing price in San Francisco 

## Problem Statement

Goals: 
1. Help customers predict the reasonable price for their listings of interest  
2. Help hosts target the reasonable listing price 
3. Help hosts understand what features of the house/room and of the listing itself are most important for the price
4. Help Airbnb understand the difference of listing price before and during COVID

If I could find customer data:
5. Help Airbnb identify customers who are likely to book again (so send ads to them) and who are likely to be lost (so send incentives to them)

Audience: Airbnb customers and hosts

## Methods

### Data
Airbnb listing and review data from [Inside Airbnb](http://insideairbnb.com/get-the-data.html). 

The listing data includes 
- Features of the house/room: number of bedrooms and bathrooms, amenities, etc
- Host services: communication, response time, etc
- availability
- Features of the listing itself: title, description
- review scores: overall and on different categories 

The review data includes
- reviews for the listings

### Models
1. Predict the listing price with regression models
- Linear regression with 
- kNN
- Decision tree
- Random forest
- Gradient Boosting regression

2. Understand how the listing title and description affect the listing price and popularity
- CountVectorizer / TfidfVectorizer + regression models

3. Understand how the review texts are related to the review scores
- CountVectorizer / TfidfVectorizer + regression models
- Sentiment analysis of reviews + regression

### Model selection
1. performance metrics: mse, R2
2. grid search

# Read in data

In [2]:
import numpy as np
import pandas as pd


In [4]:
listing = pd.read_csv('../Data/listings.csv')
listing.head()

Unnamed: 0,id,listing_url,scrape_id,last_scraped,name,description,neighborhood_overview,picture_url,host_id,host_url,...,review_scores_communication,review_scores_location,review_scores_value,license,instant_bookable,calculated_host_listings_count,calculated_host_listings_count_entire_homes,calculated_host_listings_count_private_rooms,calculated_host_listings_count_shared_rooms,reviews_per_month
0,958,https://www.airbnb.com/rooms/958,20211102175524,2021-11-02,"Bright, Modern Garden Unit - 1BR/1BTH",Please check local laws re Covid before you re...,Quiet cul de sac in friendly neighborhood<br /...,https://a0.muscache.com/pictures/b7c2a199-4c17...,1169,https://www.airbnb.com/users/show/1169,...,4.9,4.98,4.78,City Registration Pending,f,1,1,0,0,3.57
1,5858,https://www.airbnb.com/rooms/5858,20211102175524,2021-11-02,Creative Sanctuary,<b>The space</b><br />We live in a large Victo...,I love how our neighborhood feels quiet but is...,https://a0.muscache.com/pictures/17714/3a7aea1...,8904,https://www.airbnb.com/users/show/8904,...,4.85,4.77,4.68,,f,1,1,0,0,0.76
2,7918,https://www.airbnb.com/rooms/7918,20211102175524,2021-11-02,A Friendly Room - UCSF/USF - San Francisco,Nice and good public transportation. 7 minute...,"Shopping old town, restaurants, McDonald, Whol...",https://a0.muscache.com/pictures/26356/8030652...,21994,https://www.airbnb.com/users/show/21994,...,4.6,4.73,4.0,,f,9,0,9,0,0.17
3,8142,https://www.airbnb.com/rooms/8142,20211102175524,2021-11-02,Friendly Room Apt. Style -UCSF/USF - San Franc...,Nice and good public transportation. 7 minute...,,https://a0.muscache.com/pictures/27832/3b1f9e5...,21994,https://www.airbnb.com/users/show/21994,...,4.75,4.63,4.63,,f,9,0,9,0,0.1
4,8339,https://www.airbnb.com/rooms/8339,20211102175524,2021-11-02,Historic Alamo Square Victorian,Pls email before booking. <br />Interior featu...,,https://a0.muscache.com/pictures/213fbf05-3545...,24215,https://www.airbnb.com/users/show/24215,...,5.0,4.94,4.75,STR-0000264,f,2,2,0,0,0.19


In [5]:
listing.shape

(6508, 74)

# Data Cleaning and Exploration

### Check na's

In [13]:
listing.isnull().sum().sort_values(ascending=False).head(42)

neighbourhood_group_cleansed    6508
bathrooms                       6508
calendar_updated                6508
license                         2718
host_about                      1922
neighborhood_overview           1793
neighbourhood                   1793
host_response_rate              1424
host_response_time              1424
review_scores_value             1412
review_scores_location          1412
review_scores_checkin           1412
review_scores_communication     1411
review_scores_cleanliness       1411
review_scores_accuracy          1411
reviews_per_month               1378
first_review                    1378
review_scores_rating            1378
last_review                     1378
host_acceptance_rate            1314
bedrooms                         937
host_neighbourhood               395
beds                             296
description                       69
host_location                     19
host_picture_url                  13
host_since                        13
h

In [None]:
# drop columns that are mostly missing: neighbourhood_group_cleansed, bathrooms, calendar_updated, license

In [None]:
listing.drop(['neighbourhood_group_cleansed','bathrooms','calendar_updated'])