# Airbnb Hotel Booking Analysis

This notebook analyzes the **Airbnb Open Data** to answer key business questions related to New York City Airbnb listings.

---

### Problem Statement:
The hospitality industry has undergone a transformation with online platforms like Airbnb facilitating short-term lodging. This analysis explores the New York City Airbnb dataset to extract meaningful insights about pricing, host behavior, property types, and customer reviews.

---

### Key Questions:
1. What are the different property types in the dataset?
2. Which neighborhood group has the highest number of listings?
3. Which neighborhoods have the highest average prices for Airbnb listings?
4. Is there a relationship between the construction year of property and price?
5. Who are the top 10 hosts by calculated host listing count?
6. Are hosts with verified identities more likely to receive positive reviews?
7. Is there a correlation between the price of a listing and its service fee?


## 1. Import Libraries and Load Dataset

In [None]:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Load dataset
df = pd.read_csv("Airbnb_Open_Data (1) (2).csv")
df.head()


## 2. Data Cleaning

In [None]:

# Remove $ from price and service fee, convert to numeric
df['price'] = df['price'].replace('[\$,]', '', regex=True).astype(float)
df['service fee'] = df['service fee'].replace('[\$,]', '', regex=True).astype(float)

# Drop duplicates if any
df.drop_duplicates(inplace=True)

# Fill missing values for simplicity in analysis
df['neighbourhood group'].fillna('Unknown', inplace=True)
df['neighbourhood'].fillna('Unknown', inplace=True)
df['host_identity_verified'].fillna('unconfirmed', inplace=True)
df['review rate number'].fillna(0, inplace=True)

df.info()


## 3. What are the different property types?

In [None]:

df['room type'].value_counts().plot(kind='bar', color='skyblue')
plt.title("Distribution of Property Types")
plt.ylabel("Count")
plt.show()


## 4. Neighborhood group with highest number of listings

In [None]:

sns.countplot(data=df, x='neighbourhood group', order=df['neighbourhood group'].value_counts().index, palette="viridis")
plt.title("Number of Listings by Neighborhood Group")
plt.show()


## 5. Neighborhoods with the highest average Airbnb prices

In [None]:

top_prices = df.groupby('neighbourhood')['price'].mean().sort_values(ascending=False).head(10)
top_prices.plot(kind='bar', color='salmon')
plt.title("Top 10 Neighborhoods by Average Price")
plt.ylabel("Average Price")
plt.show()


## 6. Relationship between construction year and price

In [None]:

sns.scatterplot(data=df, x='Construction year', y='price', alpha=0.3)
plt.title("Price vs Construction Year")
plt.show()


## 7. Top 10 Hosts by Listing Count

In [None]:

top_hosts = df.groupby('host name')['calculated host listings count'].sum().sort_values(ascending=False).head(10)
top_hosts.plot(kind='bar', color='orange')
plt.title("Top 10 Hosts by Listings")
plt.ylabel("Total Listings")
plt.show()


## 8. Verified Hosts vs Review Ratings

In [None]:

sns.boxplot(data=df, x='host_identity_verified', y='review rate number')
plt.title("Review Ratings: Verified vs Unverified Hosts")
plt.show()


## 9. Correlation between Price and Service Fee

In [None]:

sns.scatterplot(data=df, x='price', y='service fee', alpha=0.3)
plt.title("Correlation between Price and Service Fee")
plt.show()

correlation = df[['price','service fee']].corr()
correlation
