<a href="https://colab.research.google.com/github/20rashmi128/NYC2019-Airbnb-Bookings-Analysis/blob/main/Notebook_Airbnb_Bookings_Analysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Airbnb Bookings Analysis (EDA)**

Since 2008, guests and hosts have used Airbnb to expand on traveling possibilities and present a more unique, personalized way of experiencing the world. Today, Airbnb became one of a kind service that is used and recognized by the whole world. Data analysis on millions of listings provided through Airbnb is a crucial factor for the company. These millions of listings generate a lot of data - data that can be analyzed and used for security, business decisions, understanding of customers' and providers' (hosts) behavior and performance on the platform, guiding marketing initiatives, implementation of innovative additional services and much more.

This dataset has around 49,000 observations in it with 16 columns and it is a mix between categorical and numeric values.

Explore and analyze the data to discover key understandings (not limited to these) such as :
  What can we learn about different hosts and areas?
  What can we learn from predictions? (ex: locations, prices, reviews, etc)
  Which hosts are the busiest and why?
  Is there any noticeable difference of traffic among different areas and what could be the reason for it?

# 1. Basic Information about Dataset

### 1.(a) Imported relevant python libraries and then loaded the dataset csv file using Pandas.

In [1]:
# Importing relevant libraries.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [2]:
# Mounted the google drive 
# creating a variable to store path address of required dataset file and then, reading csv file using pandas to a dataframe.

path = '/content/drive/MyDrive/EDA_AirBnB/Copy of Airbnb NYC 2019.csv'
df = pd.read_csv(path)

### 1.(b) Getting basic details/information about the data, its columns and respective data types.


In [3]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 48895 entries, 0 to 48894
Data columns (total 16 columns):
 #   Column                          Non-Null Count  Dtype  
---  ------                          --------------  -----  
 0   id                              48895 non-null  int64  
 1   name                            48879 non-null  object 
 2   host_id                         48895 non-null  int64  
 3   host_name                       48874 non-null  object 
 4   neighbourhood_group             48895 non-null  object 
 5   neighbourhood                   48895 non-null  object 
 6   latitude                        48895 non-null  float64
 7   longitude                       48895 non-null  float64
 8   room_type                       48895 non-null  object 
 9   price                           48895 non-null  int64  
 10  minimum_nights                  48895 non-null  int64  
 11  number_of_reviews               48895 non-null  int64  
 12  last_review                     

In [4]:
df.columns

Index(['id', 'name', 'host_id', 'host_name', 'neighbourhood_group',
       'neighbourhood', 'latitude', 'longitude', 'room_type', 'price',
       'minimum_nights', 'number_of_reviews', 'last_review',
       'reviews_per_month', 'calculated_host_listings_count',
       'availability_365'],
      dtype='object')

In [5]:
for column in df.columns:
  print("Column",' "',column,'" ', "is of -", type(df.loc[0,column]), "data type")

Column  " id "  is of - <class 'numpy.int64'> data type
Column  " name "  is of - <class 'str'> data type
Column  " host_id "  is of - <class 'numpy.int64'> data type
Column  " host_name "  is of - <class 'str'> data type
Column  " neighbourhood_group "  is of - <class 'str'> data type
Column  " neighbourhood "  is of - <class 'str'> data type
Column  " latitude "  is of - <class 'numpy.float64'> data type
Column  " longitude "  is of - <class 'numpy.float64'> data type
Column  " room_type "  is of - <class 'str'> data type
Column  " price "  is of - <class 'numpy.int64'> data type
Column  " minimum_nights "  is of - <class 'numpy.int64'> data type
Column  " number_of_reviews "  is of - <class 'numpy.int64'> data type
Column  " last_review "  is of - <class 'str'> data type
Column  " reviews_per_month "  is of - <class 'numpy.float64'> data type
Column  " calculated_host_listings_count "  is of - <class 'numpy.int64'> data type
Column  " availability_365 "  is of - <class 'numpy.int64'

A) Observations:-


*   Total 16 columns, out of which 10 are numerical columns (either integer type or float type data) and remaining 6 columns have string type data.
*   Total data/rows available is 48,895 entries.

In [6]:
#  Going through first 5 rows of dataset
df.head()

#  Observations:
#  One row entry has 0 as number of reviews and corresponding last review & reviews per month are missing values.
#  One row entry has 0 availability days for booking.
#  last review column contains the date of the last review. Therefore the same needs to converted to datetime format

Unnamed: 0,id,name,host_id,host_name,neighbourhood_group,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365
0,2539,Clean & quiet apt home by the park,2787,John,Brooklyn,Kensington,40.64749,-73.97237,Private room,149,1,9,2018-10-19,0.21,6,365
1,2595,Skylit Midtown Castle,2845,Jennifer,Manhattan,Midtown,40.75362,-73.98377,Entire home/apt,225,1,45,2019-05-21,0.38,2,355
2,3647,THE VILLAGE OF HARLEM....NEW YORK !,4632,Elisabeth,Manhattan,Harlem,40.80902,-73.9419,Private room,150,3,0,,,1,365
3,3831,Cozy Entire Floor of Brownstone,4869,LisaRoxanne,Brooklyn,Clinton Hill,40.68514,-73.95976,Entire home/apt,89,1,270,2019-07-05,4.64,1,194
4,5022,Entire Apt: Spacious Studio/Loft by central park,7192,Laura,Manhattan,East Harlem,40.79851,-73.94399,Entire home/apt,80,10,9,2018-11-19,0.1,1,0


In [7]:
#  checking the last 5 rows of dataset
df.tail()

# Observations -
# All last 5 entries have 0 number of reviews and have missing values for columns "last_review" and "reviews_per_month".
# check the logical deduction that 0 reviews must be accompanied by NaN last review and NaN reviews_per_months. 
# If yes, then these missing values are natural and so, need no treatment.

Unnamed: 0,id,name,host_id,host_name,neighbourhood_group,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365
48890,36484665,Charming one bedroom - newly renovated rowhouse,8232441,Sabrina,Brooklyn,Bedford-Stuyvesant,40.67853,-73.94995,Private room,70,2,0,,,2,9
48891,36485057,Affordable room in Bushwick/East Williamsburg,6570630,Marisol,Brooklyn,Bushwick,40.70184,-73.93317,Private room,40,4,0,,,2,36
48892,36485431,Sunny Studio at Historical Neighborhood,23492952,Ilgar & Aysel,Manhattan,Harlem,40.81475,-73.94867,Entire home/apt,115,10,0,,,1,27
48893,36485609,43rd St. Time Square-cozy single bed,30985759,Taz,Manhattan,Hell's Kitchen,40.75751,-73.99112,Shared room,55,1,0,,,6,2
48894,36487245,Trendy duplex in the very heart of Hell's Kitchen,68119814,Christophe,Manhattan,Hell's Kitchen,40.76404,-73.98933,Private room,90,7,0,,,1,23


### 1.(c) Renaming few columns for better clarity

In [8]:
df.rename(columns = {'id':'property_id','name':'property_name','price':'price_dollar'}, inplace = True)
df.head(2)

Unnamed: 0,property_id,property_name,host_id,host_name,neighbourhood_group,neighbourhood,latitude,longitude,room_type,price_dollar,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365
0,2539,Clean & quiet apt home by the park,2787,John,Brooklyn,Kensington,40.64749,-73.97237,Private room,149,1,9,2018-10-19,0.21,6,365
1,2595,Skylit Midtown Castle,2845,Jennifer,Manhattan,Midtown,40.75362,-73.98377,Entire home/apt,225,1,45,2019-05-21,0.38,2,355


### 1.(d) Descriptive statistical analysis of Numerical type columns/features.

In [9]:
# checking the descriptive stats such as mean, mode etc. for numerical columns.
df.describe()

# Observations:-
# 1. reviews_per_month seems to have few missing values.
# 2. Rest all 9 numerical columns have no missing value.
# 3. Minimum value in price_dollar column is 0.
# 4. There is/are properties whose number of reviews is 0 (i.e., no reviews present).
# 5. There is/are properties whose days' availability for booking is 0 day. (Explain)

Unnamed: 0,property_id,host_id,latitude,longitude,price_dollar,minimum_nights,number_of_reviews,reviews_per_month,calculated_host_listings_count,availability_365
count,48895.0,48895.0,48895.0,48895.0,48895.0,48895.0,48895.0,38843.0,48895.0,48895.0
mean,19017140.0,67620010.0,40.728949,-73.95217,152.720687,7.029962,23.274466,1.373221,7.143982,112.781327
std,10983110.0,78610970.0,0.05453,0.046157,240.15417,20.51055,44.550582,1.680442,32.952519,131.622289
min,2539.0,2438.0,40.49979,-74.24442,0.0,1.0,0.0,0.01,1.0,0.0
25%,9471945.0,7822033.0,40.6901,-73.98307,69.0,1.0,1.0,0.19,1.0,0.0
50%,19677280.0,30793820.0,40.72307,-73.95568,106.0,3.0,5.0,0.72,1.0,45.0
75%,29152180.0,107434400.0,40.763115,-73.936275,175.0,5.0,24.0,2.02,2.0,227.0
max,36487240.0,274321300.0,40.91306,-73.71299,10000.0,1250.0,629.0,58.5,327.0,365.0


In [10]:
print("Total number of unique listings:", len(df))

Total number of unique listings: 48895


Observation:-
1) Each row/data entry has been uniquely defined by unique id.

# 2. Data Preparation & Cleaning

## Steps followed:-
   2. Checking for Missing values and Unique Values of each column.
   3. Handling Missing Values
   4. Checking duplicate values
   5. Checking Outlierrs and handling outliers (if any).
   6. If any column/feature seems to be of zero importance, drop those columns.

# 1. Converting the last_review_dates from string data type to datetime without timestamp.

In [11]:
#  Conerting the last_review column data to datetime format without timestamp.

# Creating new column with last_review data converted to datetime with timestamp format.
df['date_col'] = pd.to_datetime(df.last_review)

# Creating new column with just the date part without timestamp.
df['last_review_date'] = df['date_col'].dt.date
df['last_review_date']

0        2018-10-19
1        2019-05-21
2               NaT
3        2019-07-05
4        2018-11-19
            ...    
48890           NaT
48891           NaT
48892           NaT
48893           NaT
48894           NaT
Name: last_review_date, Length: 48895, dtype: object

In [12]:
df.drop(['date_col'], axis =1, inplace = True)

# 2. Checking and Handling Missing Values

In [13]:
# checking for missing values against all columns of dataset
df.isnull().sum()

property_id                           0
property_name                        16
host_id                               0
host_name                            21
neighbourhood_group                   0
neighbourhood                         0
latitude                              0
longitude                             0
room_type                             0
price_dollar                          0
minimum_nights                        0
number_of_reviews                     0
last_review                       10052
reviews_per_month                 10052
calculated_host_listings_count        0
availability_365                      0
last_review_date                  10052
dtype: int64

In [14]:
df[df.property_name.isnull()]

Unnamed: 0,property_id,property_name,host_id,host_name,neighbourhood_group,neighbourhood,latitude,longitude,room_type,price_dollar,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365,last_review_date
2854,1615764,,6676776,Peter,Manhattan,Battery Park City,40.71239,-74.0162,Entire home/apt,400,1000,0,,,1,362,NaT
3703,2232600,,11395220,Anna,Manhattan,East Village,40.73215,-73.98821,Entire home/apt,200,1,28,2015-06-08,0.45,1,341,2015-06-08
5775,4209595,,20700823,Jesse,Manhattan,Greenwich Village,40.73473,-73.99244,Entire home/apt,225,1,1,2015-01-01,0.02,1,0,2015-01-01
5975,4370230,,22686810,Michaël,Manhattan,Nolita,40.72046,-73.9955,Entire home/apt,215,7,5,2016-01-02,0.09,1,0,2016-01-02
6269,4581788,,21600904,Lucie,Brooklyn,Williamsburg,40.7137,-73.94378,Private room,150,1,0,,,1,0,NaT
6567,4756856,,1832442,Carolina,Brooklyn,Bushwick,40.70046,-73.92825,Private room,70,1,0,,,1,0,NaT
6605,4774658,,24625694,Josh,Manhattan,Washington Heights,40.85198,-73.93108,Private room,40,1,0,,,1,0,NaT
8841,6782407,,31147528,Huei-Yin,Brooklyn,Williamsburg,40.71354,-73.93882,Private room,45,1,0,,,1,0,NaT
11963,9325951,,33377685,Jonathan,Manhattan,Hell's Kitchen,40.76436,-73.98573,Entire home/apt,190,4,1,2016-01-05,0.02,1,0,2016-01-05
12824,9787590,,50448556,Miguel,Manhattan,Harlem,40.80316,-73.95189,Entire home/apt,300,5,0,,,5,0,NaT


# Properties with 0 days of availabilty for booking purpose.

In [15]:
#  checking the number of entries whose availability of days for booking out of 365 days is zero. 
# So, these properties are listed/active on Airbnb, but are not available for booking the year the data was collected.
print( "The number of AirBnB properties which are not available for booking this year is", len(df[df['availability_365']==0]))

The number of AirBnB properties which are not available for booking this year is 17533


# Properties having zero Reviews

In [16]:
#  Getting the name and total count of those properties, for which users have not given reviews.
null_rvw_df = df[df['number_of_reviews']==0]
print("The below",len(null_rvw_df), "properties have zero reviews:", '\n', null_rvw_df['property_name'].unique())

The below 10052 properties have zero reviews: 
 ['THE VILLAGE OF HARLEM....NEW YORK !' 'Huge 2 BR Upper East  Cental Park'
 'Magnifique Suite au N de Manhattan - vue Cloitres' ...
 'Sunny Studio at Historical Neighborhood'
 '43rd St. Time Square-cozy single bed'
 "Trendy duplex in the very heart of Hell's Kitchen"]


# 3. Data Exploration and Visulaization

## Columns to analyse:-
1. Duplicate property names.
2. Host_id, host_name and property_name
3. Neighbourhood area and location 

# 1. Duplicate Property Names

In [17]:
print("#unique values in 'property_id' column:", df.property_id.nunique())
print("#unique values in 'property_name' column:", df.property_name.nunique())

#unique values in 'property_id' column: 48895
#unique values in 'property_name' column: 47905


In [18]:
duplicate_property = df[df.duplicated(subset=['property_name'])]
print("The number of properting listings' names which are duplicate is",duplicate_property.property_name.nunique())
duplicate_property[3:12]

The number of properting listings' names which are duplicate is 645


Unnamed: 0,property_id,property_name,host_id,host_name,neighbourhood_group,neighbourhood,latitude,longitude,room_type,price_dollar,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365,last_review_date
661,250537,The Lenox in Harlem,1313306,Yvette,Manhattan,Harlem,40.81122,-73.94279,Entire home/apt,400,5,0,,,2,365,NaT
669,253471,Loft Suite @ The Box House Hotel,417504,The Box House Hotel,Brooklyn,Greenpoint,40.73641,-73.9533,Entire home/apt,199,3,24,2018-11-06,0.32,28,84,2018-11-06
670,253475,Loft Suite @ The Box House Hotel,417504,The Box House Hotel,Brooklyn,Greenpoint,40.73794,-73.95254,Entire home/apt,199,3,59,2019-06-24,0.66,28,60,2019-06-24
674,253800,Loft Suite @ Box House Hotel,417504,The Box House Hotel,Brooklyn,Greenpoint,40.7373,-73.95323,Entire home/apt,199,3,24,2019-04-25,0.26,28,60,2019-04-25
675,253803,Loft Suite @ The Box House Hotel,417504,The Box House Hotel,Brooklyn,Greenpoint,40.73708,-73.95271,Entire home/apt,199,3,23,2019-06-22,0.26,28,60,2019-06-22
676,253806,Loft Suite @ The Box House Hotel,417504,The Box House Hotel,Brooklyn,Greenpoint,40.73652,-73.95236,Entire home/apt,199,3,43,2019-07-02,0.47,28,60,2019-07-02
677,253811,Loft Suite @ The Box House Hotel,417504,The Box House Hotel,Brooklyn,Greenpoint,40.73693,-73.95316,Entire home/apt,199,3,30,2019-07-03,0.32,28,56,2019-07-03
678,253815,Loft Suite @ The Box House Hotel,417504,The Box House Hotel,Brooklyn,Greenpoint,40.73784,-73.95324,Entire home/apt,199,3,39,2019-06-29,0.44,28,84,2019-06-29
679,253828,Duplex w/ Terrace @ Box House Hotel,417504,The Box House Hotel,Brooklyn,Greenpoint,40.73674,-73.95247,Private room,349,3,8,2018-07-26,0.09,28,58,2018-07-26


# 2. Information regarding Duplicate Hosts and thier multiple Listings

In [19]:
#  missing values in host_id, host_name
print("Number of missing values in host_id column:", df.host_id.isnull().sum())
print("Number of unique values in 'host_id' column:", df.host_id.nunique())
print(" ")
print("Number of missing values in host_name column:", df.host_name.isnull().sum())
print("Number of unique values in 'host_name' column:", df.host_name.nunique())

Number of missing values in host_id column: 0
Number of unique values in 'host_id' column: 37457
 
Number of missing values in host_name column: 21
Number of unique values in 'host_name' column: 11452


In [20]:
duplicate_hosts = df[df['host_id'].duplicated()]

print("The number of Duplicate hosts are :", duplicate_hosts['host_id'].nunique())
print("The number of Duplicate host names are :", duplicate_hosts['host_name'].nunique())
print("The total proerties managed by these duplicate hosts are:", len(duplicate_hosts))
print(" ")
print("05 Duplicate hosts are :")
duplicate_hosts[:5]

The number of Duplicate hosts are : 5154
The number of Duplicate host names are : 2942
The total proerties managed by these duplicate hosts are: 11438
 
05 Duplicate hosts are :


Unnamed: 0,property_id,property_name,host_id,host_name,neighbourhood_group,neighbourhood,latitude,longitude,room_type,price_dollar,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365,last_review_date
22,8025,CBG Helps Haiti Room#2.5,22486,Lisel,Brooklyn,Park Slope,40.67989,-73.97798,Private room,80,1,39,2019-01-01,0.37,6,364,2019-01-01
23,8110,CBG Helps Haiti Rm #2,22486,Lisel,Brooklyn,Park Slope,40.68001,-73.97865,Private room,110,2,71,2019-07-02,0.61,6,304,2019-07-02
33,9783,back room/bunk beds,32294,Ssameer Or Trip,Manhattan,Harlem,40.8213,-73.95318,Private room,50,3,273,2019-07-01,2.37,3,359,2019-07-01
35,10962,"Lovely room 2 & garden; Best area, Legal rental",9744,Laurie,Brooklyn,South Slope,40.66869,-73.9878,Private room,89,4,168,2019-06-21,1.41,3,340,2019-06-21
39,12048,LowerEastSide apt share shortterm 1,7549,Ben,Manhattan,Lower East Side,40.71401,-73.98917,Shared room,40,1,214,2019-07-05,1.81,4,188,2019-07-05


In [21]:
df[df.host_id == 7549]

Unnamed: 0,property_id,property_name,host_id,host_name,neighbourhood_group,neighbourhood,latitude,longitude,room_type,price_dollar,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365,last_review_date
9,5238,Cute & Cozy Lower East Side 1 bdrm,7549,Ben,Manhattan,Chinatown,40.71344,-73.99037,Entire home/apt,150,1,160,2019-06-09,1.33,4,188,2019-06-09
39,12048,LowerEastSide apt share shortterm 1,7549,Ben,Manhattan,Lower East Side,40.71401,-73.98917,Shared room,40,1,214,2019-07-05,1.81,4,188,2019-07-05
4767,3373030,"Cute,Cozy Lower East Side 1bdrm",7549,Ben,Manhattan,Lower East Side,40.71307,-73.99025,Entire home/apt,150,1,60,2019-06-25,1.0,4,188,2019-06-25
5778,4215595,LowerEastSide apt share shortterm 3,7549,Ben,Manhattan,Lower East Side,40.71329,-73.99047,Shared room,40,1,88,2019-05-19,1.53,4,197,2019-05-19


In [22]:
df[df.host_id == 32294]

Unnamed: 0,property_id,property_name,host_id,host_name,neighbourhood_group,neighbourhood,latitude,longitude,room_type,price_dollar,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365,last_review_date
30,9668,front room/double bed,32294,Ssameer Or Trip,Manhattan,Harlem,40.82245,-73.95104,Private room,50,3,242,2019-06-01,2.04,3,355,2019-06-01
33,9783,back room/bunk beds,32294,Ssameer Or Trip,Manhattan,Harlem,40.8213,-73.95318,Private room,50,3,273,2019-07-01,2.37,3,359,2019-07-01
100,22918,loft bed - near transportation-15min to times sq,32294,Ssameer Or Trip,Manhattan,Harlem,40.82279,-73.95139,Private room,60,3,11,2019-01-03,0.87,3,219,2019-01-03


In [23]:
df[df.host_id == 9744]

Unnamed: 0,property_id,property_name,host_id,host_name,neighbourhood_group,neighbourhood,latitude,longitude,room_type,price_dollar,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365,last_review_date
12,5803,"Lovely Room 1, Garden, Best Area, Legal rental",9744,Laurie,Brooklyn,South Slope,40.66829,-73.98779,Private room,89,4,167,2019-06-24,1.34,3,314,2019-06-24
35,10962,"Lovely room 2 & garden; Best area, Legal rental",9744,Laurie,Brooklyn,South Slope,40.66869,-73.9878,Private room,89,4,168,2019-06-21,1.41,3,340,2019-06-21
192,50447,Lovely Apt & Garden; Legal; Best Area; Ameni...,9744,Laurie,Brooklyn,South Slope,40.6693,-73.98804,Entire home/apt,135,5,151,2019-06-22,1.43,3,162,2019-06-22


In [24]:
df[df.host_id == 22486]

Unnamed: 0,property_id,property_name,host_id,host_name,neighbourhood_group,neighbourhood,latitude,longitude,room_type,price_dollar,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365,last_review_date
21,8024,CBG CtyBGd HelpsHaiti rm#1:1-4,22486,Lisel,Brooklyn,Park Slope,40.68069,-73.97706,Private room,130,2,130,2019-07-01,1.09,6,347,2019-07-01
22,8025,CBG Helps Haiti Room#2.5,22486,Lisel,Brooklyn,Park Slope,40.67989,-73.97798,Private room,80,1,39,2019-01-01,0.37,6,364,2019-01-01
23,8110,CBG Helps Haiti Rm #2,22486,Lisel,Brooklyn,Park Slope,40.68001,-73.97865,Private room,110,2,71,2019-07-02,0.61,6,304,2019-07-02
476,167222,CBG# 4Tiny room w/ huge window/AC,22486,Lisel,Brooklyn,Park Slope,40.6788,-73.97643,Private room,60,1,20,2018-08-24,0.21,6,258,2018-08-24
588,222054,CBG Helps Haiti Rm #3,22486,Lisel,Brooklyn,Park Slope,40.68012,-73.97847,Private room,120,2,23,2018-09-15,0.24,6,342,2018-09-15
1783,801626,CBG HelpsHaiti #5 Suite,22486,Lisel,Brooklyn,Park Slope,40.68015,-73.978,Private room,115,2,25,2019-05-26,0.36,6,89,2019-05-26


In [25]:
print(len(df[df.host_id == 417504]))
print(df[df.host_id == 417504].calculated_host_listings_count.unique())

28
[28]


Observations:
 1. 5,154 number of hosts are managing multiple properties (i.e., in total 11,438 properties).
 2. Most hosts out of above 5,154 hosts, such as Lisel, Ben, Laurie etc. have enlisted different category of rooms/flats available in their property as separate units/property for booking separately. This also matches the column "calculated_host_listings_count", which is basically the total number of properties listed by a given host.

# 3. Neighborhood Area and locations

In [26]:
print("#unique values in 'neighbourhood_group' column:", df.neighbourhood_group.nunique())
print("#unique values in 'neighbourhood' column:", df.neighbourhood.nunique())

#unique values in 'neighbourhood_group' column: 5
#unique values in 'neighbourhood' column: 221


# 3. Questions and Answers

# 4. Summary:

  Summary should be crisp and pertinent. The outcome of the project and the method used to reach the outcome should be specified. It should be articulate, well-versed, and apposite. Also, add the link to the GitHub project repository.

Insights:-
  1. 
##  2. 

## Any future project ideas:-

# **Evaluation criteria (For reference only):-**

FINAL CHECK

1. Exploration, namely: head, tail, summary, data dictionary - 5
2. Looking for and handling NaN/ Null/ Missing Values - 5
3. Trying to get some conclusion from data, correlation, trends and making sense in doing so - 10
4. Accomplish various milestones given in the problem statement - 10
5. Using visualization (atleast five different types) for presenting the EDA - 15
6. Final summary of conclusion - 5
7. Commented code - 5
8. Proper output formatting - 2.5
9. Modularity of code - 2.5
10. Presentation Slideshow - 5
11. Presentation video - 15
12. Fluency and Grammatical Accuracy in PPT and Video - 5
13. GitHub Commits - 5
14. Summary and Technical Documentation - 10

