**Preprocessing Airbnb Calendar for Time Series Analysis**

# Introduction

## Read in libraries, data, and set notebook preferences

**Read in libraries**

In [21]:
#Read in libraries
import pandas as pd
import numpy as np

**Read in Data**

In [22]:
#Set path to local machine for data
path = r'C:\Users\kishe\Documents\Data Science\Projects\Python Projects\In Progress\Air BnB - SF\Data\02_Intermediate/'

#Read in Airbnb Calendar data
calendar = pd.read_csv(path + '2020_0407_Calendar_Cleaned.csv', sep = ',',dtype = {'listing_id':'category'},
                       parse_dates=['date'], low_memory=True,index_col=0)

**Set notebook preferences**

In [23]:
#Set float format
pd.options.display.float_format = '{:.02f}'.format

#supress future warnings
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)

# Preview Data

## Airbnb Calendar Data

In [24]:
#View shape and preview calendar data
print(calendar.shape)
calendar.head()

(16010035, 3)


Unnamed: 0_level_0,available,listing_id,price
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2019-04-03,0,187730,80.0
2019-04-04,0,187730,80.0
2019-04-05,1,187730,82.0
2019-04-06,1,187730,82.0
2019-04-07,1,187730,81.0


# Date preparatio nand feature engineering

## Listings data for booked Airbnbs

In [25]:
#Create data frames for counts of listings(available and unavailable)
listings_df = calendar.groupby(['date','available'])['listing_id'].count().reset_index()

#Set index as date and rename columns
listings_df.set_index('date', inplace = True)
listings_df.columns=['available', 'count'] 

#Filter for booked listings and drop available
booked_listings = listings_df.loc[listings_df['available'] == 0]
booked_listings.drop(columns = 'available', inplace = True)

#Check
display(booked_listings.head())

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  errors=errors,


Unnamed: 0_level_0,count
date,Unnamed: 1_level_1
2018-10-03,5897
2018-10-04,5806
2018-10-05,5847
2018-10-06,5822
2018-10-07,5493


## Airbnb net rental income for hosts

In [26]:
#Capture Booked listings
booked = calendar.loc[calendar['available'] == 0].sort_values(by = 'listing_id')
booked = booked.loc[(booked.index >= '2019-01-09')]

#Capture total sum per day
net_income = booked.groupby('date')['price'].sum().reset_index()

#Set index and rename columns
net_income.set_index('date', inplace= True)
net_income.columns = ['net_income']

#Check
display(net_income.head())

Unnamed: 0_level_0,net_income
date,Unnamed: 1_level_1
2019-01-09,1322563.0
2019-01-10,1111389.0
2019-01-11,1062798.0
2019-01-12,1060895.0
2019-01-13,1008958.0


# Write out dataframes

In [27]:
#Set path to write listings
path = r'C:\Users\kishe\Documents\Data Science\Projects\Python Projects\In Progress\Air BnB - SF\Data\03_Processed/'

#Write booked_listings to path
booked_listings.to_csv(path +'2020_0417_Booked_Listings.csv', sep=',')

#Write prices_df to path
net_income.to_csv(path +'2020_0417_Daily_Net_Rental_Income.csv', sep=',')