# Airbnb Seattle Data Analysis

## Introduction
This notebook provides exploratory data analysis (EDA) and insights into Seattle's Airbnb data.
The goal of this project is to uncover patterns and trends in Airbnb listings, such as factors influencing prices, seasonal availability, and the impact of customer reviews.
By analyzing these datasets, stakeholders such as hosts, guests, and policymakers can make informed decisions.

## Key Objectives:
- Understand what factors contribute to listing prices.
- Explore seasonal trends and their influence on availability.
- Evaluate how customer reviews and ratings impact a listing's success.

The data for this analysis includes information about listings, calendar availability, and customer reviews, offering a comprehensive view of the Airbnb market in Seattle.


## Step 1: Inspect, Clean, and Save the Dataset

## Installation
#To install the required dependencies, run:
```bash

In [None]:
pip install -r requirements.txt

### Import necessary libraries

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

### Load the datasets

In [2]:
listings_path = 'listings.csv'
calendar_path = 'calendar.csv'
reviews_path = 'reviews.csv'

### Load the data

In [3]:
listings = pd.read_csv(listings_path)
calendar = pd.read_csv(calendar_path)
reviews = pd.read_csv(reviews_path)

### Inspect Listings Dataset

In [None]:
print("Listings Dataset Info:")
listings.info()
print("\nListings Dataset Sample:")
print(listings.head())

### Clean Listings Dataset

In [5]:
listings['price'] = listings['price'].replace('[\$,]', '', regex=True).astype(float)
listings['cleaning_fee'] = listings['cleaning_fee'].replace('[\$,]', '', regex=True).astype(float)
listings['security_deposit'] = listings['security_deposit'].replace('[\$,]', '', regex=True).astype(float)
listings.dropna(subset=['price', 'bedrooms', 'bathrooms', 'accommodates'], inplace=True)

### Save cleaned listings dataset

In [None]:
listings.to_csv('listings_cleaned.csv', index=False)
print("Cleaned listings dataset saved.")

### Inspect Calendar Dataset

In [None]:
print("\nCalendar Dataset Info:")
calendar.info()
print("\nCalendar Dataset Sample:")
print(calendar.head())

### Clean Calendar Dataset

In [8]:
calendar['price'] = calendar['price'].replace('[\$,]', '', regex=True).astype(float)
calendar['available'] = calendar['available'].map({'t': True, 'f': False})

### Save cleaned calendar dataset

In [None]:
calendar.to_csv('calendar_cleaned.csv', index=False)
print("Cleaned calendar dataset saved.")

### Inspect Reviews Dataset

In [None]:
print("\nReviews Dataset Info:")
reviews.info()
print("\nReviews Dataset Sample:")
print(reviews.head())

### Clean Reviews Dataset

In [11]:
reviews.dropna(subset=['comments'], inplace=True)

### Save cleaned reviews dataset

In [None]:
reviews.to_csv('reviews_cleaned.csv', index=False)
print("Cleaned reviews dataset saved.")

## Step 2: Exploratory Data Analysis (EDA)

### Load cleaned datasets

In [13]:
listings = pd.read_csv('listings_cleaned.csv')
calendar = pd.read_csv('calendar_cleaned.csv')
reviews = pd.read_csv('reviews_cleaned.csv')

### Listings Dataset EDA

In [None]:
print("Listings Dataset Summary:")
listings.describe()

print("\nListings Dataset Column Types:")
listings.dtypes

In [None]:
plt.figure(figsize=(10, 6))
sns.histplot(listings['price'], bins=50, kde=True)
plt.title('Price Distribution of Listings (Before Outlier Removal)')
plt.xlabel('Price')
plt.ylabel('Frequency')
plt.show()

### Remove Outliers for Better Insights

In [None]:
listings = listings[listings['price'] < listings['price'].quantile(0.95)]

plt.figure(figsize=(10, 6))
sns.histplot(listings['price'], bins=50, kde=True)
plt.title('Price Distribution of Listings (After Outlier Removal)')
plt.xlabel('Price')
plt.ylabel('Frequency')
plt.show()

### Room Type Distribution

In [None]:
plt.figure(figsize=(8, 5))
room_type_counts = listings['room_type'].value_counts()
room_type_counts.plot(kind='bar', color='skyblue')
plt.title('Room Type Distribution')
plt.xlabel('Room Type')
plt.ylabel('Count')
plt.show()

### Calendar Dataset EDA

In [None]:
print("Calendar Dataset Summary:")
calendar.describe()

### Availability Over Time

In [None]:
calendar['date'] = pd.to_datetime(calendar['date'])
availability_over_time = calendar.groupby('date')['available'].mean()
plt.figure(figsize=(12, 6))
availability_over_time.plot()
plt.title('Availability Over Time')
plt.xlabel('Date')
plt.ylabel('Proportion Available')
plt.show()

### Reviews Dataset EDA

In [None]:
print("Reviews Dataset Summary:")
reviews.describe()

### Number of Reviews Over Time

In [None]:
reviews['date'] = pd.to_datetime(reviews['date'])
reviews_per_month = reviews.groupby(reviews['date'].dt.to_period('M')).size()
plt.figure(figsize=(12, 6))
reviews_per_month.plot()
plt.title('Number of Reviews Over Time')
plt.xlabel('Date')
plt.ylabel('Number of Reviews')
plt.show()

print("Exploratory Data Analysis Completed.")

## Step 3: Define Key Questions and Prepare Analysis

### Question 1: What factors influence the price of a listing?

In [None]:
correlation_features = ['price', 'bedrooms', 'bathrooms', 'accommodates']
correlation_matrix = listings[correlation_features].corr()

plt.figure(figsize=(8, 6))
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', fmt='.2f')
plt.title('Correlation Matrix of Features Influencing Price')
plt.show()

### Question 2: Are there seasonal trends in Airbnb bookings?

In [None]:
calendar['month'] = calendar['date'].dt.month
monthly_availability = calendar.groupby('month')['available'].mean()

plt.figure(figsize=(10, 6))
monthly_availability.plot(kind='bar', color='orange')
plt.title('Average Availability by Month')
plt.xlabel('Month')
plt.ylabel('Average Availability')
plt.show()

### Question 3: How do customer reviews impact a listing's success?

In [None]:
reviews_per_listing = reviews.groupby('listing_id').size()
listings['total_reviews'] = listings['id'].map(reviews_per_listing)

plt.figure(figsize=(10, 6))
sns.scatterplot(data=listings, x='total_reviews', y='review_scores_rating', alpha=0.6)
plt.title('Relationship Between Total Reviews and Review Scores')
plt.xlabel('Total Reviews')
plt.ylabel('Review Scores Rating')
plt.show()

print("Key questions analyzed and visualized.")

## Step 4: Evaluate Results and Generate Insights

### Evaluation for Question 1

In [None]:
print("\nEvaluation for Question 1:")
print("Correlation Matrix:")
correlation_matrix

### Evaluation for Question 2

In [None]:
print("\nEvaluation for Question 2:")
seasonal_trends = monthly_availability.describe()
print("Monthly Availability Summary:")
seasonal_trends

### Evaluation for Question 3

In [None]:
print("\nEvaluation for Question 3:")
review_stats = listings[['total_reviews', 'review_scores_rating']].describe()
print("Review Statistics:")
review_stats

In [None]:
insights = """
Key Insights:
1. Price is moderately correlated with the number of bedrooms and accommodates capacity, indicating larger properties tend to be more expensive.
2. Seasonal trends show lower availability in peak months, suggesting higher demand.
3. Listings with more reviews generally maintain higher review scores, but additional factors such as amenities and neighborhood also play a role.
4. Hosts can focus on improving property features like amenities to increase listing appeal.
"""
print(insights)
