<center><h1>Load Data</h1></center>
    
This notebook is the first step in our **Airbnb Smart Pricing project** ETL pipeline. It focuses on loading the raw data files, which are provided as compressed CSVs (.csv.gz), into Pandas DataFrames for further processing. Additionally, saving the cleaned, uncompressed CSV files for faster access in downstream notebooks or scripts.

In [1]:
# Load necessary libraries
import os
import pandas as pd

In [2]:
# Define file paths
data_dir = "../data/raw/"
listings_fp = os.path.join(data_dir, "listings.csv.gz")
calendar_fp = os.path.join(data_dir, "calendar.csv.gz")
reviews_fp = os.path.join(data_dir, "reviews.csv.gz")

In [3]:
# Function to load .csv.gz files
def load_gz_csv(file_path):
    """
    Load a compressed CSV (.gz) file into a pandas DataFrame.
    """
    if not os.path.exists(file_path):
        raise FileNotFoundError(f"File not found: {file_path}")
    
    df = pd.read_csv(file_path, compression='gzip', low_memory=False)
    print(f"Loaded {file_path.split('/')[-1]} with shape {df.shape}")
    return df

In [4]:
# Load datasets
listings_df = load_gz_csv(listings_fp)
calendar_df = load_gz_csv(calendar_fp)
reviews_df = load_gz_csv(reviews_fp)

Loaded listings.csv.gz with shape (36125, 79)
Loaded calendar.csv.gz with shape (13196822, 7)
Loaded reviews.csv.gz with shape (1358162, 6)


In [6]:
# Create processed directory
os.makedirs("../data/processed", exist_ok=True)
# Save uncompressed versions for faster loading later
listings_df.to_csv("../data/processed/listings.csv", index=False)
calendar_df.to_csv("../data/processed/calendar.csv", index=False)
reviews_df.to_csv("../data/processed/reviews.csv", index=False)

In [9]:
# Check columns
print("Columns in listings.csv", listings_df.columns.to_list())
print("\nColumns in calender.csv", calendar_df.columns.to_list())
print("\nColumns in reviews.csv", reviews_df.columns.to_list())

Columns in listings.csv ['id', 'listing_url', 'scrape_id', 'last_scraped', 'source', 'name', 'description', 'neighborhood_overview', 'picture_url', 'host_id', 'host_url', 'host_name', 'host_since', 'host_location', 'host_about', 'host_response_time', 'host_response_rate', 'host_acceptance_rate', 'host_is_superhost', 'host_thumbnail_url', 'host_picture_url', 'host_neighbourhood', 'host_listings_count', 'host_total_listings_count', 'host_verifications', 'host_has_profile_pic', 'host_identity_verified', 'neighbourhood', 'neighbourhood_cleansed', 'neighbourhood_group_cleansed', 'latitude', 'longitude', 'property_type', 'room_type', 'accommodates', 'bathrooms', 'bathrooms_text', 'bedrooms', 'beds', 'amenities', 'price', 'minimum_nights', 'maximum_nights', 'minimum_minimum_nights', 'maximum_minimum_nights', 'minimum_maximum_nights', 'maximum_maximum_nights', 'minimum_nights_avg_ntm', 'maximum_nights_avg_ntm', 'calendar_updated', 'has_availability', 'availability_30', 'availability_60', 'avai

In [11]:
# Get a view of listings.csv data
listings_df.head(5)

Unnamed: 0,id,listing_url,scrape_id,last_scraped,source,name,description,neighborhood_overview,picture_url,host_id,...,review_scores_communication,review_scores_location,review_scores_value,license,instant_bookable,calculated_host_listings_count,calculated_host_listings_count_entire_homes,calculated_host_listings_count_private_rooms,calculated_host_listings_count_shared_rooms,reviews_per_month
0,5269,https://www.airbnb.com/rooms/5269,20250306035046,2025-03-14,city scrape,Upcountry Hospitality in the 'Auwai Suite,"The 'Auwai Suite is a lovely, self-contained a...","We are located on the ""sunny side"" of Waimea, ...",https://a0.muscache.com/pictures/5b52b72f-5a09...,7620,...,4.85,5.0,4.85,119-269-5808-01R,f,3,3,0,0,0.24
1,5387,https://www.airbnb.com/rooms/5387,20250306035046,2025-03-15,city scrape,Hale Koa Studio & 1 Bedroom Units!!,This Wonderful Spacious Studio apt/flat is in ...,IN a Farm belt area with small commercial farm...,https://a0.muscache.com/pictures/1170713/dca6a...,7878,...,4.88,4.74,4.78,TA-163-133-0304-01,t,2,2,0,0,1.26
2,5480,https://www.airbnb.com/rooms/5480,20250306035046,2025-03-14,city scrape,Isle Of You Naturally Farm Retreat,The Best Choice for your Clothing Optional Nud...,We are located on a rural one lane road going ...,https://a0.muscache.com/pictures/75530989/8ed3...,8145,...,,,,,f,3,0,3,0,
3,5532,https://www.airbnb.com/rooms/5532,20250306035046,2025-03-12,previous scrape,2BR Waialua Beach Condo w/ Saltwater Pool & Sauna,This split-level condo is right across the str...,,https://a0.muscache.com/pictures/13743/134691a...,8279,...,,,,,f,1,1,0,0,
4,7888,https://www.airbnb.com/rooms/7888,20250306035046,2025-03-07,city scrape,Pineapple House 2 Bed 1 Bath with Loft Entire ...,We offer a medium term rental requiring a mini...,,https://a0.muscache.com/pictures/hosting/Hosti...,22083,...,4.76,4.25,4.6,GE-104-390-7584-01,f,1,1,0,0,0.72


In [13]:
# Get a view of calendar.csv data
calendar_df.head(5)

Unnamed: 0,listing_id,date,available,price,adjusted_price,minimum_nights,maximum_nights
0,5269,2025-03-06,f,$185.00,,,
1,5269,2025-03-07,t,$185.00,,,
2,5269,2025-03-08,f,$185.00,,,
3,5269,2025-03-09,t,$185.00,,,
4,5269,2025-03-10,t,$185.00,,,


In [14]:
# Get a view of reviews.csv data
reviews_df.head(5)

Unnamed: 0,listing_id,id,date,reviewer_id,reviewer_name,comments
0,5269,289800,2011-05-31,452192,Gert,Very warm welcome. Great place to stay. Highl...
1,5269,742885,2011-11-25,1135109,Lene,Barrie was very kind and sweet but it could no...
2,5269,494178707,2019-07-23,131185347,Kathleen,"Great place, location & wonderful hostess. Tha..."
3,5269,523932651,2019-09-04,5708075,Martha,This is such a charming and cozy place to stay...
4,5269,536049410,2019-09-25,85727419,Brent,Cute little place with easy access to Waimea a...
