<a href="https://colab.research.google.com/github/analyticsariel/projects/blob/master/Analyze_Airbnb_Listings_Apify.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Notebook Title

## Overview
| Detail Tag            | Information                                                                                        |
|-----------------------|----------------------------------------------------------------------------------------------------|
| Originally Created By | Ariel Herrera arielherrera@analyticsariel.com |
| External References   | API |
| Input Datasets        | Source name |
| Output Datasets       | Source name |
| Input Data Source     | Pandas DataFrame |
| Output Data Source    | Pandas DataFrame |

## History
| Date         | Developed By  | Reason                                                |
|--------------|---------------|-------------------------------------------------------|
| 1st Jun 2021 | Ariel Herrera | Create notebook. |

## Getting Started
1. Copy this notebook -> File -> Save a Copy in Drive
2. Directions

## Useful Resources
- [Google Colab Cheat Sheet](https://towardsdatascience.com/cheat-sheet-for-google-colab-63853778c093)

## <font color="blue">Install Packages</font>

## <font color="blue">Imports</font>

In [74]:
from google.colab import drive, files # specific to Google Colab
import plotly.express as px # visualization
import pandas as pd # data manipulation
import io

## <font color="blue">Functions</font>

## <font color="blue">Locals & Constants</font>

In [75]:
############
# OPTIONAL #
############

# mount drive
drive.mount('/content/drive', force_remount=False)

# data location
file_dir = '/content/drive/My Drive/Colab Data/input/' # optional

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [76]:
# read in api key file
df_api_keys = pd.read_csv(file_dir + 'api_keys.csv')

# get keys
mapbox_api_key = df_api_keys.loc[df_api_keys['API'] =='mapbox']['KEY'].iloc[0] # replace this with your own key

## <font color="blue">Data</font>

In [77]:
# upload
uploaded = files.upload()

Saving dataset_airbnb-scraper_2022-06-08_21-00-58-787.csv to dataset_airbnb-scraper_2022-06-08_21-00-58-787 (1).csv


In [96]:
# get file name
file_name = list(uploaded.keys())[0]

# read file
df_upload = pd.read_csv(io.BytesIO(uploaded[file_name]))
print('Num of rows:', len(df_upload))
print('Num of columns:', len(df_upload.columns))

Num of rows: 500
Num of columns: 17


In [94]:
# view elements in first row
df_upload.iloc[0]

address                                                        Bradenton, Florida, United States
location/lat                                                                               27.49
location/lng                                                                             -82.586
name                                                “Happy”—cozy TWIN bed: bedroom 3of5 in home.
numberOfGuests                                                                                 1
pricing/rate/amount                                                                         7000
pricing/rate/amount_formatted                                                             $7,000
pricing/rate/currency                                                                        USD
pricing/rate/is_micros_accuracy                                                            False
pricing/rate_type                                                                        nightly
pricing/rate_with_service_fee/

In [80]:
# list all columns
df_upload.columns

Index(['address', 'location/lat', 'location/lng', 'name', 'numberOfGuests',
       'pricing/rate/amount', 'pricing/rate/amount_formatted',
       'pricing/rate/currency', 'pricing/rate/is_micros_accuracy',
       'pricing/rate_type', 'pricing/rate_with_service_fee/amount',
       'pricing/rate_with_service_fee/amount_formatted',
       'pricing/rate_with_service_fee/currency',
       'pricing/rate_with_service_fee/is_micros_accuracy', 'roomType', 'stars',
       'url'],
      dtype='object')

## <font color="blue">Transformations</font>

In [81]:
# cities
df_grp_cities = df_upload.groupby(by=['address'])['name'].count().reset_index()\
  .rename(columns={'name': 'count'})\
  .sort_values(by=['count'], ascending=False)
df_grp_cities['percent'] = (df_grp_cities['count'] / df_grp_cities['count'].sum()) * 100
df_grp_cities

Unnamed: 0,address,count,percent
5,"Siesta Key, Florida, United States",302,60.4
4,"Sarasota, Florida, United States",184,36.8
1,"Lido Key, Florida, United States",8,1.6
0,"Bradenton, Florida, United States",4,0.8
2,"Longboat Key, Florida, United States",1,0.2
3,"Sarasota , Florida, United States",1,0.2


In [84]:
# filter on Siesta Key, Florida
df = df_upload.loc[df_upload['address'] == 'Siesta Key, Florida, United States']
print('Num of rows:', len(df))
df.head()

Num of rows: 302


Unnamed: 0,address,location/lat,location/lng,name,numberOfGuests,pricing/rate/amount,pricing/rate/amount_formatted,pricing/rate/currency,pricing/rate/is_micros_accuracy,pricing/rate_type,pricing/rate_with_service_fee/amount,pricing/rate_with_service_fee/amount_formatted,pricing/rate_with_service_fee/currency,pricing/rate_with_service_fee/is_micros_accuracy,roomType,stars,url
6,"Siesta Key, Florida, United States",27.284,-82.565,Modern Marvel - Stunning Gulf Front Home! Amaz...,14,2747,"$2,747",USD,False,nightly,2747,"$2,747",USD,False,Entire home,4.5,https://www.airbnb.com/rooms/38351053
7,"Siesta Key, Florida, United States",27.27446,-82.56757,The Oasis-Flexible Cancellation Policy,16,3696,"$3,696",USD,False,nightly,3696,"$3,696",USD,False,Entire home,,https://www.airbnb.com/rooms/53935001
8,"Siesta Key, Florida, United States",27.272,-82.563,Ocean Overlook - Nearly Beachfront Home w/ Gor...,16,4056,"$4,056",USD,False,nightly,4056,"$4,056",USD,False,Entire home,5.0,https://www.airbnb.com/rooms/50259570
9,"Siesta Key, Florida, United States",27.271,-82.559,Sea Spray - Gulf Access with Dock. Resort Styl...,16,3371,"$3,371",USD,False,nightly,3371,"$3,371",USD,False,Entire home,5.0,https://www.airbnb.com/rooms/20071265
10,"Siesta Key, Florida, United States",27.274,-82.567,"Sugar Sand Cottage - Gorgeous 5 Bedroom Home, ...",16,3371,"$3,371",USD,False,nightly,3371,"$3,371",USD,False,Entire home,5.0,https://www.airbnb.com/rooms/43122199


In [82]:
# room type
df_grp_rm_type = df.groupby(by=['roomType'])['name'].count().reset_index()\
  .rename(columns={'name': 'count'})\
  .sort_values(by=['count'], ascending=False)
df_grp_rm_type['percent'] = round((df_grp_rm_type['count'] / df_grp_rm_type['count'].sum()) * 100, 2)
df_grp_rm_type

Unnamed: 0,roomType,count,percent
4,Entire home,103,34.11
1,Entire condo,82,27.15
5,Entire rental unit,62,20.53
7,Entire townhouse,24,7.95
2,Entire cottage,9,2.98
12,Private room in resort,6,1.99
8,Entire villa,5,1.66
3,Entire guest suite,4,1.32
0,Entire bungalow,2,0.66
10,Private room in home,2,0.66


In [83]:
# star count & avg price amount
df.groupby('stars')\
  .agg({'name':'count','pricing/rate/amount':'mean'}).reset_index()\
  .rename(columns={'name': 'count'})\
  .sort_values(by=['count'], ascending=False)

Unnamed: 0,stars,count,pricing/rate/amount
3,5.0,175,746.114286
2,4.5,44,562.681818
1,4.0,3,626.666667
0,3.5,2,357.0


## <font color="blue">Visualization</font>

In [85]:
# view number of guests
fig = px.histogram(df, x="numberOfGuests", nbins=10)
fig.show()

In [86]:
# daily price
fig = px.histogram(df, x="pricing/rate/amount")
fig.show()

In [87]:
# daily price by number of guests
fig = px.scatter(df, x="pricing/rate/amount", y="numberOfGuests")
fig.show()

### Map all AirBnB Listings

In [88]:
# take a single example to test logic of getting listing id
test_url = df['url'].iloc[0]
print('Test URL:', test_url)
print('Listing ID:', test_url.split('/')[-1])

Test URL: https://www.airbnb.com/rooms/38351053
Listing ID: 38351053


In [89]:
# create plot dataframe
df_plot = df.copy()
# feature #1 - listing id
df_plot['listing_id'] = df_plot.apply(lambda x: x['url'].split('/')[-1], axis=1)
# feature #2 - header (listing id + listing name)
df_plot['header'] = df_plot.apply(lambda x: x['listing_id'] + ' - ' + x['name'], axis=1)
df_plot['header'].head()

6     38351053 - Modern Marvel - Stunning Gulf Front...
7     53935001 - The Oasis-Flexible Cancellation Policy
8     50259570 - Ocean Overlook - Nearly Beachfront ...
9     20071265 - Sea Spray - Gulf Access with Dock. ...
10    43122199 - Sugar Sand Cottage - Gorgeous 5 Bed...
Name: header, dtype: object

In [90]:
# set mapbox key
px.set_mapbox_access_token(mapbox_api_key)
# create scatter plot
fig = px.scatter_mapbox(
    df_plot, 
    lat="location/lat", 
    lon="location/lng",     
    color="pricing/rate/amount", 
    size="numberOfGuests", 
    color_continuous_scale=px.colors.cyclical.Edge,
    hover_name="header",
    zoom=13
)
# modify height to show all of siesta key (town specific)
fig.update_layout(
    height=800
)
fig.show()

In [97]:
# view single listing
df_single_list = df_plot.loc[df_plot['listing_id'] == '38351053']
# get URL to view listing photos and detail
print('Listing URL:', df_single_list['url'].iloc[0])
# view df
df_single_list

Listing URL: https://www.airbnb.com/rooms/38351053


Unnamed: 0,address,location/lat,location/lng,name,numberOfGuests,pricing/rate/amount,pricing/rate/amount_formatted,pricing/rate/currency,pricing/rate/is_micros_accuracy,pricing/rate_type,pricing/rate_with_service_fee/amount,pricing/rate_with_service_fee/amount_formatted,pricing/rate_with_service_fee/currency,pricing/rate_with_service_fee/is_micros_accuracy,roomType,stars,url,listing_id,header
6,"Siesta Key, Florida, United States",27.284,-82.565,Modern Marvel - Stunning Gulf Front Home! Amaz...,14,2747,"$2,747",USD,False,nightly,2747,"$2,747",USD,False,Entire home,4.5,https://www.airbnb.com/rooms/38351053,38351053,38351053 - Modern Marvel - Stunning Gulf Front...


## <font color="blue">Output</font>

In [93]:
# # download file
# df.to_csv('output.csv', index=False)
# files.download('output.csv')

# End Notebook