






<h2 id='part1'>A Look at the Data</h2>

Using the Seattle Airbnb (https://www.kaggle.com/datasets/airbnb/seattle/data) dataset from Kaggle, we would like to deep dive and discuss these three business-oriented questions:

1. What are the seasonal trends in pricing across Seattle?
2. What factors impact a listing's average rating or occupancy?
3. Are there certain neighborhoods or property types with the highest bookings?

<h2 id='part2'>About the Dataset</h2>

<h3 id='Context'>Context</h3>

Since 2008, guests and hosts have used Airbnb to travel in a more unique, personalized way. As part of the Airbnb Inside initiative, this dataset describes the listing activity of homestays in Seattle, WA.

<h3 id='Content'>Content</h3>
The following Airbnb activity is included in this Seattle dataset:

Listings, including full descriptions and average review score
Reviews, including unique id for each reviewer and detailed comments
Calendar, including listing id and the price and availability for that day



In [1]:

import pandas as pd

# Load datasets
listings = pd.read_csv('listings.csv')
reviews = pd.read_csv('reviews.csv')
calendar = pd.read_csv('calendar.csv')

# View the first few rows of each dataset
display(listings.head())
display(reviews.head())
display(calendar.head())


Unnamed: 0,id,listing_url,scrape_id,last_scraped,name,summary,space,description,experiences_offered,neighborhood_overview,...,review_scores_value,requires_license,license,jurisdiction_names,instant_bookable,cancellation_policy,require_guest_profile_picture,require_guest_phone_verification,calculated_host_listings_count,reviews_per_month
0,241032,https://www.airbnb.com/rooms/241032,20160104002432,2016-01-04,Stylish Queen Anne Apartment,,Make your self at home in this charming one-be...,Make your self at home in this charming one-be...,none,,...,10.0,f,,WASHINGTON,f,moderate,f,f,2,4.07
1,953595,https://www.airbnb.com/rooms/953595,20160104002432,2016-01-04,Bright & Airy Queen Anne Apartment,Chemically sensitive? We've removed the irrita...,"Beautiful, hypoallergenic apartment in an extr...",Chemically sensitive? We've removed the irrita...,none,"Queen Anne is a wonderful, truly functional vi...",...,10.0,f,,WASHINGTON,f,strict,t,t,6,1.48
2,3308979,https://www.airbnb.com/rooms/3308979,20160104002432,2016-01-04,New Modern House-Amazing water view,New modern house built in 2013. Spectacular s...,"Our house is modern, light and fresh with a wa...",New modern house built in 2013. Spectacular s...,none,Upper Queen Anne is a charming neighborhood fu...,...,10.0,f,,WASHINGTON,f,strict,f,f,2,1.15
3,7421966,https://www.airbnb.com/rooms/7421966,20160104002432,2016-01-04,Queen Anne Chateau,A charming apartment that sits atop Queen Anne...,,A charming apartment that sits atop Queen Anne...,none,,...,,f,,WASHINGTON,f,flexible,f,f,1,
4,278830,https://www.airbnb.com/rooms/278830,20160104002432,2016-01-04,Charming craftsman 3 bdm house,Cozy family craftman house in beautiful neighb...,Cozy family craftman house in beautiful neighb...,Cozy family craftman house in beautiful neighb...,none,We are in the beautiful neighborhood of Queen ...,...,9.0,f,,WASHINGTON,f,strict,f,f,1,0.89


Unnamed: 0,listing_id,id,date,reviewer_id,reviewer_name,comments
0,7202016,38917982,2015-07-19,28943674,Bianca,Cute and cozy place. Perfect location to every...
1,7202016,39087409,2015-07-20,32440555,Frank,Kelly has a great room in a very central locat...
2,7202016,39820030,2015-07-26,37722850,Ian,"Very spacious apartment, and in a great neighb..."
3,7202016,40813543,2015-08-02,33671805,George,Close to Seattle Center and all it has to offe...
4,7202016,41986501,2015-08-10,34959538,Ming,Kelly was a great host and very accommodating ...


Unnamed: 0,listing_id,date,available,price
0,241032,2016-01-04,t,$85.00
1,241032,2016-01-05,t,$85.00
2,241032,2016-01-06,f,
3,241032,2016-01-07,f,
4,241032,2016-01-08,f,


In [2]:
print(f'Listings shape: {listings.shape}')
print(f'Reviews shape: {reviews.shape}')
print(f'Calendar shape: {calendar.shape}')

# Check for data types and null values
listings.info()
reviews.info()
calendar.info()

Listings shape: (3818, 92)
Reviews shape: (84849, 6)
Calendar shape: (1393570, 4)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3818 entries, 0 to 3817
Data columns (total 92 columns):
id                                  3818 non-null int64
listing_url                         3818 non-null object
scrape_id                           3818 non-null int64
last_scraped                        3818 non-null object
name                                3818 non-null object
summary                             3641 non-null object
space                               3249 non-null object
description                         3818 non-null object
experiences_offered                 3818 non-null object
neighborhood_overview               2786 non-null object
notes                               2212 non-null object
transit                             2884 non-null object
thumbnail_url                       3498 non-null object
medium_url                          3498 non-null object
picture_url    

In [3]:
# Remove dollar sign and convert `price` to numeric
calendar['price'] = calendar['price'].replace('[\$,]', '', regex=True).astype(float)

# Check for missing values in `calendar`
calendar.isna().sum()


listing_id         0
date               0
available          0
price         459028
dtype: int64

In [5]:
# Count the number of reviews per listing
review_counts = reviews.groupby('listing_id').size().reset_index(name='review_count')
listings = listings.merge(review_counts, left_on='id', right_on='listing_id', how='left')
listings['review_count'] = listings['review_count'].fillna(0)

KeyError: 'review_count'

In [7]:
# Fill or drop missing values as needed
listings['review_scores_value'].fillna(listings['review_scores_value'].median(), inplace=True)
listings['neighborhood_overview'].fillna('No overview', inplace=True)

In [8]:
# Convert `date` in `calendar.csv` to datetime
calendar['date'] = pd.to_datetime(calendar['date'])
calendar['month'] = calendar['date'].dt.month

# Average price per month
monthly_price_trend = calendar.groupby('month')['price'].mean()

import matplotlib.pyplot as plt

plt.figure(figsize=(10, 5))
monthly_price_trend.plot(kind='line', marker='o')
plt.title('Average Monthly Price Trend')
plt.xlabel('Month')
plt.ylabel('Average Price ($)')
plt.grid(True)
plt.show()

<matplotlib.figure.Figure at 0x79af64e4f198>

In [9]:
import seaborn as sns

# Correlation analysis for numerical fields
sns.heatmap(listings[['review_scores_value', 'price', 'review_count']].corr(), annot=True, cmap='coolwarm')
plt.title('Correlation Between Review Score and Other Factors')
plt.show()

# Scatter plot for price vs. review scores
plt.figure(figsize=(10, 5))
sns.scatterplot(data=listings, x='price', y='review_scores_value')
plt.title('Price vs Review Score')
plt.xlabel('Price ($)')
plt.ylabel('Review Score')
plt.grid(True)
plt.show()

KeyError: "['review_count'] not in index"

In [10]:
# Group by neighborhood and calculate average price and review count
neighborhood_stats = listings.groupby('neighbourhood_cleansed').agg({
    'price': 'mean',
    'review_count': 'mean'
}).reset_index()

plt.figure(figsize=(12, 6))
sns.barplot(data=neighborhood_stats.sort_values(by='review_count', ascending=False).head(10), x='review_count', y='neighbourhood_cleansed')
plt.title('Top 10 Neighborhoods by Average Number of Reviews')
plt.xlabel('Average Number of Reviews')
plt.ylabel('Neighborhood')
plt.grid(True)
plt.show()


  return super(DataFrameGroupBy, self).aggregate(arg, *args, **kwargs)


DataError: No numeric types to aggregate

In [11]:
## Key Insights and Conclusions
1. **Seasonal Price Trends**: Prices in Seattle tend to peak in the summer months, suggesting high demand in warmer weather.
2. **Factors Affecting Ratings**: High prices slightly correlate with lower ratings, potentially due to raised expectations or value perceptions.
3. **Popular Neighborhoods**: Listings in areas such as Capitol Hill and Downtown Seattle have higher review counts, indicating popularity.


SyntaxError: invalid syntax (<ipython-input-11-f58ea31755c2>, line 2)