# Lab Instructions

Create 3 visualizations from a spatial and time-series dataset of your choice.  Describe your dataset including where it came from and the features it contains.  Each visualization should be accompanied by at least 1 - 2 sentences explaining how the features do (or do not!) change over time and througout space.

## Dataset Description

For this lab, I'm using the **Airbnb listings dataset for the United States from 2023**. This dataset comes from Inside Airbnb (http://insideairbnb.com/), a project that provides data about Airbnb's impact on residential communities.

**Dataset Features:**
- `latitude` and `longitude`: Geographic coordinates of listings (spatial component)
- `price`: Nightly rental price in USD
- `neighbourhood`: Neighborhood name
- `room_type`: Type of accommodation (Entire home/apt, Private room, Shared room, Hotel room)
- `minimum_nights`: Minimum stay requirement
- `number_of_reviews`: Total reviews received
- `last_review`: Date of most recent review (time component)
- `reviews_per_month`: Average monthly review rate
- `availability_365`: Number of days available per year

This dataset is perfect for spatial and time-series analysis since it contains geographic coordinates showing WHERE listings are located across the US, and review dates showing WHEN activity occurred, allowing us to explore how Airbnb listings are distributed geographically and how their activity patterns change over time.

In [12]:
# Import required libraries
import plotly.express as px
import pandas as pd

# Load the Airbnb US dataset
df = pd.read_csv('../Lecture/Week 4/assets/AB_US_2023.csv')

# Convert last_review to datetime
df['last_review'] = pd.to_datetime(df['last_review'], errors='coerce')

# Clean price data (remove outliers)
df = df[df['price'] > 0]
df = df[df['price'] < 1000]

# Display basic information
print(f"Dataset shape: {df.shape}")
print(f"\nColumn names:")
print(df.columns.tolist())
print(f"\nFirst few rows:")
print(df.head())
print(f"\nRoom types: {df['room_type'].unique()}")
print(f"\nPrice range: ${df['price'].min():.2f} - ${df['price'].max():.2f}")
print(f"\nDate range for reviews: {df['last_review'].min()} to {df['last_review'].max()}")

Dataset shape: (226084, 18)

Column names:
['id', 'name', 'host_id', 'host_name', 'neighbourhood_group', 'neighbourhood', 'latitude', 'longitude', 'room_type', 'price', 'minimum_nights', 'number_of_reviews', 'last_review', 'reviews_per_month', 'calculated_host_listings_count', 'availability_365', 'number_of_reviews_ltm', 'city']

First few rows:
     id                                               name  host_id  \
0   958              Bright, Modern Garden Unit - 1BR/1BTH     1169   
1  5858                                 Creative Sanctuary     8904   
2  8142  Friendly Room Apt. Style -UCSF/USF - San Franc...    21994   
3  8339                    Historic Alamo Square Victorian    24215   
4  8739                Mission Sunshine, with Private Bath     7149   

          host_name neighbourhood_group     neighbourhood  latitude  \
0             Holly                 NaN  Western Addition  37.77028   
1  Philip And Tania                 NaN    Bernal Heights  37.74474   
2           


Columns (4) have mixed types. Specify dtype option on import or set low_memory=False.



## Visualization 1: Geographic Distribution of Airbnb Listings by Price (Spatial)

This scatter map shows the spatial distribution of Airbnb listings across the United States, with color indicating price levels and size representing the number of reviews (popularity). This reveals WHERE expensive vs. affordable listings are concentrated geographically.

In [13]:
# Clean data for mapping
df_map = df.dropna(subset=['latitude', 'longitude', 'price']).copy()

# Sample data for better performance (take representative sample)
df_sample = df_map.sample(n=min(5000, len(df_map)), random_state=42)

# Calculate center coordinates
center_lat = df_sample['latitude'].mean()
center_lon = df_sample['longitude'].mean()

# Create scatter map
fig = px.scatter_map(
    df_sample,
    lat='latitude',
    lon='longitude',
    color='price',
    size='number_of_reviews',
    hover_name='name',
    hover_data=['neighbourhood', 'room_type', 'price'],
    color_continuous_scale='Viridis',
    size_max=15,
    zoom=3,
    map_style='open-street-map',
    center={'lat': center_lat, 'lon': center_lon},
    height=600,
    title='US Airbnb Listings: Geographic Price Distribution'
)

fig.show()

**Analysis:** The map reveals clear geographic patterns in pricing as of 2023, with major coastal cities and tourist destinations showing dense clusters of expensive listings (bright yellow/green colors), particularly in areas like New York, Los Angeles, San Francisco, and Miami. In contrast, listings in the interior and rural areas of the country show lower prices (darker purple colors), demonstrating how location and proximity to major metropolitan areas strongly influence Airbnb pricing across space.

## Visualization 2: Review Activity Over Time

This time series shows how Airbnb review activity has changed from 2019 through 2023, revealing temporal patterns and major disruptions in the travel industry.

In [14]:
# Prepare time series data
df_reviews = df.dropna(subset=['last_review']).copy()

# Filter to recent years with good data
df_reviews = df_reviews[df_reviews['last_review'] >= '2019-01-01']

# Extract year-month for grouping
df_reviews['year_month'] = df_reviews['last_review'].dt.to_period('M').dt.to_timestamp()

# Aggregate reviews by month
reviews_by_month = df_reviews.groupby('year_month').size().reset_index(name='review_count')

# Create line plot
fig = px.line(
    reviews_by_month,
    x='year_month',
    y='review_count',
    title='Airbnb Review Activity Over Time (2019-2023)',
    labels={'year_month': 'Date', 'review_count': 'Number of Reviews'},
    height=500
)

fig.update_layout(
    xaxis_title='Date',
    yaxis_title='Monthly Review Count',
    hovermode='x unified'
)

fig.show()

The time series shows Airbnb review activity patterns from 2019 through 2023. There's a huge drop in mid-2020 which makes sense with COVID lockdowns and travel restrictions. You can see activity was pretty steady before that, then it crashes hard. After 2020 there's a gradual recovery, but it looks like it took a couple years to really bounce back. The seasonal ups and downs throughout the years show people book more during certain months, probably summer travel seasons. This visualization really shows how external events like a pandemic can completely disrupt travel and hospitality trends over time.

## Visualization 3: Average Price by Neighborhood Over Time

This animated visualization combines spatial and temporal dimensions to show how average Airbnb prices in different neighborhoods have evolved over the years.

In [15]:
# Prepare data with both time and location
df_spacetime = df.dropna(subset=['last_review', 'neighbourhood', 'latitude', 'longitude']).copy()

# Filter to recent years
df_spacetime = df_spacetime[df_spacetime['last_review'] >= '2019-01-01']

# Extract year from last_review
df_spacetime['year'] = df_spacetime['last_review'].dt.year

# Get top neighborhoods by number of listings
top_neighborhoods = df_spacetime['neighbourhood'].value_counts().head(20).index

# Filter to top neighborhoods
df_spacetime = df_spacetime[df_spacetime['neighbourhood'].isin(top_neighborhoods)]

# Calculate average price by neighborhood and year
price_trends = df_spacetime.groupby(['neighbourhood', 'year', 'latitude', 'longitude']).agg({
    'price': 'mean',
    'id': 'count'
}).reset_index()
price_trends.columns = ['neighbourhood', 'year', 'latitude', 'longitude', 'avg_price', 'listing_count']

# Create animated scatter map
fig = px.scatter_map(
    price_trends,
    lat='latitude',
    lon='longitude',
    size='listing_count',
    color='avg_price',
    hover_name='neighbourhood',
    hover_data={'latitude': False, 'longitude': False, 'avg_price': ':.2f', 'listing_count': True},
    animation_frame='year',
    color_continuous_scale='Viridis',
    title='Average Airbnb Prices by Neighborhood Over Time',
    height=600,
    zoom=3
)

fig.update_layout(coloraxis_colorbar=dict(title="Avg Price ($)"))

fig.show()

This animated map shows how Airbnb pricing evolved across different neighborhoods from 2019 to 2023. You can see the geographic distribution of prices changing year by year. Some neighborhoods stay expensive the whole time while others fluctuate more. The size of the bubbles shows how many listings each area has, so you can tell which neighborhoods are more popular. During the pandemic years you might notice some shifts in where expensive listings are concentrated, probably because people changed their travel preferences or hosts adjusted pricing strategies. This combines both the spatial element (where listings are located) with the temporal element (how prices changed over time) to give a complete picture of market dynamics across space and time.