## Introduction
Airbnb has thousands of places available for renting all over the world with lots of options and price ranges available for travelers. A regular airbnb listing will have a main picture, a title, and a price rate. Guests can also leave a review after their stay, so then others can read and get a better idea of what to expect from this place.

So there is a variaty of data that can be collected, now the question would be; Is there a hidden data treasure worth millions somewhere in this data? Probably not, but it's worth to explore it for the sake of learning. (.. and for passing my college course)

Without further ado, this project has the following:
1. Data extraction: Airbnb non-official API. [airbnb01](posts/airbnb01/)
2. Image recognition and Sentiment Analysis. [airbnb02](posts/airbnb02/)
3. Data exploration and Dashboard design. [airbnb03](posts/airbnb03/)

## Data Extraction
First it is important to mention there is no official release of an API from Airbnb. However it is still possible to get JSON responses from the URL and this unofficial API has all that very well documented [here](https://stevesie.com/apps/airbnb-api).

So using the "unofficial airbnb API" we can fetch the data and store it into a .csv file. One of the fields has the URL to access the main picture of each listing, that is what will later be used in our image recognition analysis. (Spoiler Alert: Using Azure cognitive services.)

In [1]:
# unofficial Airbnb API: https://stevesie.com/apps/airbnb-api

import airbnb
import pandas as pd
api = airbnb.Api(randomize=True)

# Pagianation for the API calls
page = 0

# Initialize a dictionary and data frame to store listings details.
place_dtl = {}
columns = ['id','city','neighborhood','name','lat','lng','person_capacity','space_type','picture_url',
          'price_rate']
airbnb_df = pd.DataFrame(columns=columns)
# Exploring data to build the schema for the dataframe
# data = api.get_homes('Toronto, ON', items_per_grid=50, offset=500)
# len(data['explore_tabs'][0]['sections'][0]['listings'])

### Listing details extraction
Nothing too fancy, just storing details in the dataframe and later exporting it to **airbnb_listing.csv**.

In [56]:
while page <= 300:
    data = api.get_homes('Toronto, ON', items_per_grid=50, offset=page)
    for h in data['explore_tabs'][0]['sections'][0]['listings']:
        try:
            place_dtl['id'] = h['listing']['id']
            place_dtl['city'] = h['listing']['city']
            place_dtl['neighborhood'] = h['listing']['neighborhood']
            place_dtl['name'] = h['listing']['name']        
            place_dtl['lat'] = h['listing']['lat']
            place_dtl['lng'] = h['listing']['lng']
            place_dtl['person_capacity'] = h['listing']['person_capacity']
            place_dtl['space_type'] = h['listing']['space_type']
            place_dtl['price_rate'] = h['pricing_quote']['rate']['amount']            
            place_dtl['picture_url'] = h['listing']['picture_url']
            airbnb_df.loc[len(airbnb_df)] = place_dtl
        except:
            continue
    print(f"last id from page: {page} : {place_dtl['id']}")    
    page += 50
airbnb_df.to_csv ('airbnb_listing.csv', index = None, header=True)

last id from page: 0 : 40204854
last id from page: 50 : 25719877
last id from page: 100 : 17682704
last id from page: 150 : 29945949
last id from page: 200 : 12836500
last id from page: 250 : 34899472
last id from page: 300 : 43948056


### Reviews details extraction
A similar process but now for the reviews.

In [None]:
comment_dtl = {}

r_columns = ['listing_id','author','rating','comments']
reviews_df = pd.DataFrame(columns=r_columns)

# I will read the listing
listing = pd.read_csv('airbnb_listing.csv')

In [None]:
for i in listing['id']:
    reviews = api.get_reviews(i, limit=10)
    for r in reviews['reviews']:
        comment_dtl['listing_id'] = i
        comment_dtl['author'] = r['author']['smart_name']
        comment_dtl['rating'] = r['rating']
        comment_dtl['comments'] = r['comments']
        reviews_df.loc[len(reviews_df)] = comment_dtl
reviews_df.to_csv ('airbnb_reviews.csv', index=None, header=True) 

## The datasets
Finally we will check what we got.

In [4]:
listing = pd.read_csv('airbnb_listing.csv')
reviews = pd.read_csv('airbnb_reviews.csv')

### Listing dataset: airbnb_listing.csv
These are the first 3 rows for the listing dataset:

In [23]:
listing.iloc[:, 0:9].head(3)

Unnamed: 0,id,city,neighborhood,name,lat,lng,person_capacity,space_type,picture_url
0,28103946,Toronto,Downtown Toronto,"Chic, Modern condo downtown by Scotiabank Arena!",43.64327,-79.38115,4,Entire condominium,https://a0.muscache.com/im/pictures/36fa807f-2...
1,11533218,Toronto,Downtown Toronto,Luxurious Condo near CN Tower with FREE PARKING,43.64252,-79.39617,4,Entire condominium,https://a0.muscache.com/im/pictures/a22f9757-3...
2,12290314,Toronto,Downtown Toronto,Take in Panoramic City Views from a Sophistica...,43.65391,-79.38273,2,Entire condominium,https://a0.muscache.com/4ea/air/v2/pictures/f3...


### Reviews dataset: airbnb_reviews.csv 
We have collected a maximum of 5 reviews per listing.

In [21]:
reviews.loc[reviews['id'] == 28103946]

Unnamed: 0,id,author,rating,comments
1259,28103946,Tyler,5,This condo is an incredible location in downto...
1260,28103946,Emmanuel,5,Outstanding place and perfect location for one...
1261,28103946,Alicia,5,Great place at a great location!
1262,28103946,Miah,5,This place was in the heart of Toronto! So clo...
1263,28103946,Lalji,5,"a greal location, easy check in and clean place"
