# üè† AirBnB Booking Analysis Using Exploratory Data Analysis (EDA)



üìå Introduction

This project focuses on performing Exploratory Data Analysis (EDA) on the Airbnb, Inc. booking dataset. The goal is to uncover hidden patterns, clean the data, and visualize important insights related to property listings, pricing, reviews, and host behavior.

EDA helps us understand the structure of the dataset, detect missing values, analyze distributions, and identify relationships between variables before building any predictive model.

üåç About the AirBnB Platform

AirBnB is a global online marketplace that connects property owners (hosts) with travelers seeking short-term accommodations. Founded in 2008, the platform has expanded rapidly and now operates in over 220 countries and regions worldwide.

AirBnB offers diverse lodging options, including:

- Private rooms

- Entire apartments or houses

- Shared spaces

- Unique stays (e.g., treehouses, villas, castles)

This diversity allows travelers to choose accommodations that match their preferences and budget, while hosts can generate income from unused or underutilized spaces.

üéØ Objective of the Analysis

- Perform data cleaning and preprocessing

- Analyze price distribution and booking trends

- Study host and neighborhood patterns

- Visualize review and availability relationships

- Extract actionable insights from the dataset

In [6]:
# importing Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [5]:
# Loading Dataset
data = pd.read_csv('Data_Set/Airbnb_Open_Data.csv')

  data = pd.read_csv('Data_Set/Airbnb_Open_Data.csv')


In [15]:
# shape of the Dataset
data.shape

(102599, 26)

In [16]:
# column names
data.columns


Index(['id', 'NAME', 'host id', 'host_identity_verified', 'host name',
       'neighbourhood group', 'neighbourhood', 'lat', 'long', 'country',
       'country code', 'instant_bookable', 'cancellation_policy', 'room type',
       'Construction year', 'price', 'service fee', 'minimum nights',
       'number of reviews', 'last review', 'reviews per month',
       'review rate number', 'calculated host listings count',
       'availability 365', 'house_rules', 'license'],
      dtype='str')

In [18]:
# Data Type
data.dtypes

id                                  int64
NAME                                  str
host id                             int64
host_identity_verified                str
host name                             str
neighbourhood group                   str
neighbourhood                         str
lat                               float64
long                              float64
country                               str
country code                          str
instant_bookable                   object
cancellation_policy                   str
room type                             str
Construction year                 float64
price                                 str
service fee                           str
minimum nights                    float64
number of reviews                 float64
last review                           str
reviews per month                 float64
review rate number                float64
calculated host listings count    float64
availability 365                  

In [22]:
# Missing values count
data.isna().sum()

id                                     0
NAME                                 250
host id                                0
host_identity_verified               289
host name                            406
neighbourhood group                   29
neighbourhood                         16
lat                                    8
long                                   8
country                              532
country code                         131
instant_bookable                     105
cancellation_policy                   76
room type                              0
Construction year                    214
price                                247
service fee                          273
minimum nights                       409
number of reviews                    183
last review                        15893
reviews per month                  15879
review rate number                   326
calculated host listings count       319
availability 365                     448
house_rules     

In [20]:
# Duplicate rows count
data.duplicated().sum()

np.int64(541)

In [30]:
# standardizing column Names
data.columns = data.columns.str.lower()

In [29]:
data.columns

Index(['id', 'NAME', 'host id', 'host_identity_verified', 'host name',
       'neighbourhood group', 'neighbourhood', 'lat', 'long', 'country',
       'country code', 'instant_bookable', 'cancellation_policy', 'room type',
       'Construction year', 'price', 'service fee', 'minimum nights',
       'number of reviews', 'last review', 'reviews per month',
       'review rate number', 'calculated host listings count',
       'availability 365', 'house_rules', 'license'],
      dtype='str')

In [34]:
# Replace Spaces with Underscore
data.columns = data.columns.str.replace(' ','_')

In [35]:
# Remove leading/Trailing Spaces
data.columns = data.columns.str.strip()

In [36]:
data.columns

Index(['id', 'name', 'host_id', 'host_identity_verified', 'host_name',
       'neighbourhood_group', 'neighbourhood', 'lat', 'long', 'country',
       'country_code', 'instant_bookable', 'cancellation_policy', 'room_type',
       'construction_year', 'price', 'service_fee', 'minimum_nights',
       'number_of_reviews', 'last_review', 'reviews_per_month',
       'review_rate_number', 'calculated_host_listings_count',
       'availability_365', 'house_rules', 'license'],
      dtype='str')