Compare the distribution of Airbnbs and other traditional accommodation types such as hotels.

data source: https://data.cityofnewyork.us/City-Government/Hotels-Properties-Citywide/tjus-cn27

In [1]:
import warnings
warnings.filterwarnings('ignore')
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import sys
%matplotlib inline

In [12]:
df_hotel = pd.read_csv('../data/Hotels_Properties_Citywide.csv')

In [13]:
df_hotel.columns

Index(['PARID', 'BOROCODE', 'BLOCK', 'LOT', 'TAXYEAR', 'STREET NUMBER',
       'STREET NAME', 'Postcode', 'BLDG_CLASS', 'TAXCLASS', 'OWNER_NAME',
       'Borough', 'Latitude', 'Longitude', 'Community Board',
       'Council District', 'Census Tract', 'BIN', 'BBL', 'NTA'],
      dtype='object')

Description for columns:  

'PARID': No description, Doesn't seem important  

'BOROCODE': Don't need it because of 'Borough'  

'BLOCK': Borough, Block, and Lot (BBL) is the parcel number system used to identify each unit of real estate in New York City for numerous city purposes. It consists of three numbers, separated by slashes; the borough, which is 1 digit; the block number, which is up to 5 digits; and the lot number, which is up to 4 digits.  

'LOT'  

'TAXYEAR': An annual accounting period for keeping records and reporting income and expenses. We're not investigating tax, so don't need it

'STREET NUMBER'
'STREET NAME'
'Postcode'

'BLDG_CLASS': Building class (Use and Occupancy classification: https://igpny.com/wp-content/uploads/2019/05/NYC-DOB-Building-Code-Chapter-3-Use-and-Occupancy-Classification.pdf). I don't think we need it.

'TAXCLASS': We're not interested in tax here

'OWNER_NAME': Do we wanna check if owner of Airbnb and hotels is the same?

'Borough': We need it

'Latitude': We need it

'Longitude': We need it

'Community Board': Membership - Community Boards are local representative bodies. There are 59 throughout the city. Each Board consists of up to 50 unsalaried members appointed by the Borough President, with half nominated by the City Council Members who represent the community district.
Are we interested in if airbnbs are nearby the community board? 

'Council District': Council District means any of four political subdivisions within the City by which City Council members are elected.
Are we interested in if airbnbs are nearby the community board? 

'Census Tract',
'BIN': Building Identification Number. Don't think we need this.

'BBL': Borough, Block, Lot

'NTA': Neighborhood Tabulation Areas; created by the NYC Dept of Planning by aggregating census tracts into 195 neighborhood-like areas.
Maybe interesting? Because it's neighborhood like areas.


Questions: Do we need BBL, Street number/name, Postcode?  
I think only for a geographic use, borough, latitue, longitude are enough? 

In [14]:
df_hotel.head()

Unnamed: 0,PARID,BOROCODE,BLOCK,LOT,TAXYEAR,STREET NUMBER,STREET NAME,Postcode,BLDG_CLASS,TAXCLASS,OWNER_NAME,Borough,Latitude,Longitude,Community Board,Council District,Census Tract,BIN,BBL,NTA
0,1000080039,1,8,39,2021,32,PEARL STREET,10004,H3,4,"32 PEARL, LLC",MANHATTAN,40.703235,-74.012421,101.0,1.0,9.0,1078968.0,1000080000.0,Battery Park City-Lower Manhattan
1,1000080051,1,8,51,2021,6,WATER STREET,10004,H2,4,AI IV LLC,MANHATTAN,40.702744,-74.012201,101.0,1.0,9.0,1090472.0,1000080000.0,Battery Park City-Lower Manhattan
2,1000100033,1,10,33,2021,8,STONE STREET,10004,H2,4,"B.H. 8 STONE STREET AG, LLC",MANHATTAN,40.704025,-74.012638,101.0,1.0,9.0,1087618.0,1000100000.0,Battery Park City-Lower Manhattan
3,1000110029,1,11,29,2021,11,STONE STREET,10004,H2,4,"PREMIER EMERALD, LLC",MANHATTAN,40.704039,-74.012317,101.0,1.0,9.0,1000041.0,1000110000.0,Battery Park City-Lower Manhattan
4,1000161301,1,16,1301,2021,102,NORTH END AVENUE,10282,RH,4,GOLDMAN SACHS,MANHATTAN,40.714812,-74.016153,101.0,1.0,31703.0,1085867.0,1000168000.0,Battery Park City-Lower Manhattan


In [21]:
# check the data size
print(df_hotel.info())
print(df_hotel.describe())
print('Data`s Shape: ', df_hotel.shape)
print('\nType of features \n', df_hotel.dtypes.value_counts())
isnull_series = df_hotel.isnull().sum()
isna_series = df_hotel.isna().sum()
print('\nNull columns and numbers:\n ', isnull_series[isnull_series > 0].sort_values(ascending=False))

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5519 entries, 0 to 5518
Data columns (total 20 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   PARID             5519 non-null   int64  
 1   BOROCODE          5519 non-null   int64  
 2   BLOCK             5519 non-null   int64  
 3   LOT               5519 non-null   int64  
 4   TAXYEAR           5519 non-null   int64  
 5   STREET NUMBER     5514 non-null   object 
 6   STREET NAME       5519 non-null   object 
 7   Postcode          5519 non-null   int64  
 8   BLDG_CLASS        5519 non-null   object 
 9   TAXCLASS          5519 non-null   int64  
 10  OWNER_NAME        5519 non-null   object 
 11  Borough           5514 non-null   object 
 12  Latitude          5502 non-null   float64
 13  Longitude         5502 non-null   float64
 14  Community Board   5502 non-null   float64
 15  Council District  5502 non-null   float64
 16  Census Tract      5502 non-null   float64


In [18]:
df_hotel.value_counts(['Community Board']).sort_index()

Community Board
1.0                165
2.0                513
3.0                 96
4.0                116
5.0                786
                  ... 
413.0               12
414.0                7
501.0                6
502.0                7
503.0                4
Length: 77, dtype: int64

In [20]:
df_hotel.value_counts(['NTA']).sort_index()

NTA                                       
Allerton-Pelham Gardens                         2
Annadale-Huguenot-Prince's Bay-Eltingville      2
Astoria                                         9
Auburndale                                      4
Baisley Park                                   15
                                             ... 
Williamsbridge-Olinville                        4
Williamsburg                                    2
Woodlawn-Wakefield                              9
Woodside                                        8
Yorkville                                     220
Length: 136, dtype: int64