# **Project Name**

Project Type - AIRBNB Dashboards (Power BI Project)

Contribution - Individual

Name - Nikita Rajput

# **Project Summary**

This project analyzes Airbnb listings in Chicago and New Orleans to identify key trends in pricing, popular neighborhoods, and host behavior.

The goal is to provide data-driven insights for potential Airbnb hosts and travelers.


# **GitHub Link -**

# **Problem Statement**

- Identify the most popular neighborhoods based on listing count.
- Analyze the distribution of property types and their pricing trends.
- Examine the relationship between reviews and pricing.
- Understand host behavior, including the number of listings per host.

# **Dataset Overview**

We are working with two datasets:
1. **Chicago listings dataset**: Contains detailed information about Airbnb listings in Chicago.
2. **New Orleans listings dataset**: Includes similar data for New Orleans.


# **Key Features**

- `id`: Unique identifier for each listing
- `neighbourhood`: Name of the neighborhood
- `price`: Cost per night for the listing
- `room_type`: Type of property (Entire home, Private room, etc.)
- `reviews_per_month`: Number of reviews per month
- `calculated_host_listings_count`: Number of listings owned by a host

# **Key Observations**

- Some neighborhoods have significantly more listings than others.
- Price distributions vary based on room type and location.
- Hosts with multiple listings may follow different pricing strategies.
- Listings with higher reviews tend to be priced competitively.


# **Let's Begin From Here....**

# **1. Popular Neighborhoods Analysis**

In [None]:
# Step 1: Upload files in Google Colab
from google.colab import files
uploaded = files.upload()


In [None]:
# Step 1: Upload files in Google Colab
from google.colab import files
uploaded = files.upload()

In [None]:
# Step 2: Import necessary libraries
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Step 3: Load the datasets
chicago_df = pd.read_csv("Chicago listings.csv")
new_orleans_df = pd.read_csv("New Orleans Listing.csv")

# Step 4: Display the first few rows to verify
print(chicago_df.head())
print(new_orleans_df.head())

# Step 5: Count listings per neighborhood
chicago_neighborhood_stats = chicago_df.groupby('neighbourhood').agg({'id': 'count'})\
    .rename(columns={'id': 'listings'})\
    .sort_values(by='listings', ascending=False)

new_orleans_neighborhood_stats = new_orleans_df.groupby('neighbourhood').agg({'id': 'count'})\
    .rename(columns={'id': 'listings'})\
    .sort_values(by='listings', ascending=False)

# Step 6: Visualization of Top 10 Neighborhoods in Chicago
plt.figure(figsize=(12,5))
sns.barplot(y=chicago_neighborhood_stats.head(10).index,
            x=chicago_neighborhood_stats['listings'].head(10),
            palette='viridis')
plt.xlabel("Number of Listings")
plt.ylabel("Neighborhood")
plt.title("Top 10 Neighborhoods in Chicago by Listings")
plt.show()

# Step 7: Visualization of Top 10 Neighborhoods in New Orleans
plt.figure(figsize=(12,5))
sns.barplot(y=new_orleans_neighborhood_stats.head(10).index,
            x=new_orleans_neighborhood_stats['listings'].head(10),
            palette='coolwarm')
plt.xlabel("Number of Listings")
plt.ylabel("Neighborhood")
plt.title("Top 10 Neighborhoods in New Orleans by Listings")
plt.show()


# **2. Property Type Analysis**

In [None]:
plt.figure(figsize=(10,5))
sns.countplot(y=chicago_df['room_type'], order=chicago_df['room_type'].value_counts().index, palette='Set2')
plt.xlabel("Count")
plt.ylabel("Room Type")
plt.title("Room Type Distribution in Chicago")
plt.show()

plt.figure(figsize=(10,5))
sns.countplot(y=new_orleans_df['room_type'], order=new_orleans_df['room_type'].value_counts().index, palette='Set1')
plt.xlabel("Count")
plt.ylabel("Room Type")
plt.title("Room Type Distribution in New Orleans")
plt.show()


# **3. Price vs. Reviews Analysis**

In [None]:
plt.figure(figsize=(10,6))
sns.scatterplot(x=chicago_df['reviews_per_month'], y=chicago_df['price'], alpha=0.5)
plt.xlabel("Reviews per Month")
plt.ylabel("Price ($)")
plt.title("Price vs. Reviews (Chicago)")
plt.show()

plt.figure(figsize=(10,6))
sns.scatterplot(x=new_orleans_df['reviews_per_month'], y=new_orleans_df['price'], alpha=0.5, color='red')
plt.xlabel("Reviews per Month")
plt.ylabel("Price ($)")
plt.title("Price vs. Reviews (New Orleans)")
plt.show()

# **4. Host Analysis**

In [None]:
plt.figure(figsize=(12,5))
sns.histplot(chicago_df['calculated_host_listings_count'], bins=30, kde=True, color='blue')
plt.xlabel("Number of Listings per Host")
plt.ylabel("Count")
plt.title("Distribution of Host Listings in Chicago")
plt.show()

plt.figure(figsize=(12,5))
sns.histplot(new_orleans_df['calculated_host_listings_count'], bins=30, kde=True, color='green')
plt.xlabel("Number of Listings per Host")
plt.ylabel("Count")
plt.title("Distribution of Host Listings in New Orleans")
plt.show()


# **Data Cleaning**

In [None]:
# Convert 'price' column to numeric (removing '$' if needed)
chicago_df['price'] = pd.to_numeric(chicago_df['price'], errors='coerce')
new_orleans_df['price'] = pd.to_numeric(new_orleans_df['price'], errors='coerce')

# Fill missing values
chicago_df['reviews_per_month'].fillna(0, inplace=True)
new_orleans_df['reviews_per_month'].fillna(0, inplace=True)

# Remove duplicate rows
chicago_df.drop_duplicates(inplace=True)
new_orleans_df.drop_duplicates(inplace=True)

In [None]:
# Save cleaned data
chicago_df.to_csv("cleaned_chicago_listings.csv", index=False)
new_orleans_df.to_csv("cleaned_new_orleans_listings.csv", index=False)

# Download the files
from google.colab import files
files.download("cleaned_chicago_listings.csv")
files.download("cleaned_new_orleans_listings.csv")


In [None]:
from google.colab import files
files.download("cleaned_chicago_listings.csv")