# Airbnb EDA Project - Albany, New York


## Project Description:
This project analyzes Airbnb listings in Albany, New York using Exploratory Data Analysis (EDA).
The goal is to uncover trends in price, availability, location, and property type to provide valuable insights for decision-making.


In [None]:

# Import necessary libraries
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

# Load the dataset
df = pd.read_csv('listings.csv')

# Display the first few rows of the dataset
df.head()


## Data Cleaning

In [None]:

# Check for missing values
df.isnull().sum()

# Handle missing values by filling or dropping
df.fillna({'price': df['price'].mean(), 'availability_365': 0}, inplace=True)  # Example filling missing values

# Remove duplicates
df.drop_duplicates(inplace=True)

# Remove any rows where 'price' is an unrealistic value (outliers, if any)
df = df[df['price'] > 0]  # Filtering out rows where price is less than 0


## Exploratory Data Analysis (EDA)

### 1. Price Distribution by Neighbourhood


**Question:** How does the price vary across different neighbourhoods in Albany?
**Insight:** We are interested in observing how the prices vary across different neighbourhoods. Higher prices in certain areas could indicate better amenities or location advantages.


In [None]:

# Price distribution by neighbourhood
plt.figure(figsize=(10, 6))
sns.boxplot(x='neighbourhood', y='price', data=df)
plt.xticks(rotation=90)
plt.title('Price Distribution by Neighbourhood')
plt.show()


### 2. Price Distribution by Property Type


**Question:** What is the price distribution for different property types? Does property type significantly affect the price?
**Insight:** This graph will allow us to identify how different property types (e.g., apartment, house, etc.) vary in price. It helps us understand the demand for specific property types in Albany.


In [None]:

# Price distribution by property type
plt.figure(figsize=(10, 6))
sns.boxplot(x='property_type', y='price', data=df)
plt.xticks(rotation=45)
plt.title('Price Distribution by Property Type')
plt.show()


### 3. Correlation Analysis


**Question:** What are the correlations between price, availability, and review scores? Can we infer any relationships?
**Insight:** The correlation matrix helps us understand how different variables like price, availability, and review scores are related. A high correlation between price and reviews could imply that higher-rated listings tend to be priced higher.


In [None]:

# Extended correlation matrix
corr_matrix = df[['price', 'availability_365', 'number_of_reviews', 'review_scores_rating']].corr()
plt.figure(figsize=(8, 6))
sns.heatmap(corr_matrix, annot=True, cmap='coolwarm', fmt='.2f')
plt.title('Correlation Matrix (Extended)')
plt.show()


### 4. Price vs Availability


**Question:** Is there any correlation between price and availability? Does a higher availability impact the price?
**Insight:** A scatter plot between price and availability will give insights into whether availability of listings influences their pricing. If higher availability is linked to lower prices, it could suggest competitive pricing in less popular listings.


In [None]:

# Scatter plot: Price vs Availability_365
plt.figure(figsize=(10, 6))
sns.scatterplot(x='availability_365', y='price', data=df, alpha=0.6)
plt.title('Price vs Availability (365 days)')
plt.xlabel('Availability (365 days)')
plt.ylabel('Price')
plt.show()


### 5. Combined Price Distribution by Neighbourhood and Property Type


**Question:** How does the price distribution vary when we consider both neighbourhood and property type together?
**Insight:** This combined boxplot will give us a deeper insight into how neighbourhoods and property types together affect the pricing of listings. This can help identify specific areas with high-demand property types.


In [None]:

# Combined Price Distribution by Neighbourhood and Property Type
plt.figure(figsize=(14, 8))
sns.boxplot(x='neighbourhood', y='price', hue='property_type', data=df)
plt.xticks(rotation=90)
plt.title('Price Distribution by Neighbourhood and Property Type')
plt.show()


### 6. Seasonality of Availability


**Question:** Does availability show seasonal trends? Is there a month with high or low availability?
**Insight:** By plotting the availability of listings over different months, we can see if the availability changes seasonally. Seasonal trends could suggest periods of higher tourist demand and help in pricing strategies.


In [None]:

# First, create a 'month' column to simulate the trend over months
df['month'] = pd.to_datetime(df['last_review']).dt.month

# Plot availability by month
plt.figure(figsize=(10, 6))
sns.boxplot(x='month', y='availability_365', data=df)
plt.title('Availability by Month')
plt.xlabel('Month')
plt.ylabel('Availability (365 days)')
plt.show()
