# Storefronts Vacancy Analysis for Office Apocalypse Algorithm

## Overview
This notebook analyzes NYC storefront vacancy data to understand **street-level economic vitality** and **neighborhood commercial health** that directly impacts office building desirability and occupancy. Storefront vacancy serves as a real-time indicator of local economic conditions that affects office worker experience and building attractiveness.

## Why Storefront Vacancy Data is Critical:
- **Street-Level Economic Health**: Vacant storefronts indicate declining neighborhood vitality
- **Office Worker Ecosystem**: Retail and services that support office workers (restaurants, shops, services)
- **Pedestrian Activity Indicator**: Active storefronts drive foot traffic and area vibrancy
- **Leading Economic Indicator**: Retail vacancy often precedes office vacancy in declining areas

## Unique Street-Level Intelligence Only Storefront Data Provides:
- **Immediate Economic Conditions**: Real-time neighborhood commercial health assessment
- **Worker Support Ecosystem**: Availability of amenities that office workers need
- **Area Desirability**: Vacant storefronts create negative perception and reduce area appeal
- **Economic Recovery Signals**: Storefront occupancy recovery indicates neighborhood resilience

## Types of Critical Storefront Analysis:
- **Vacancy Clustering**: Areas with high storefront vacancy concentrations
- **Business Type Mix**: Essential services vs. luxury retail availability
- **Temporal Patterns**: Seasonal and trend-based vacancy changes
- **Proximity Impact**: Distance-based effects on nearby office buildings

## Impact on Office Building Prediction:
- High storefront vacancy → Reduced office building desirability, higher vacancy risk
- Vibrant retail ecosystem → Premium office locations, stable occupancy
- Essential services presence → Worker convenience, lower office turnover
- Vacancy trend patterns → Early warning signals for office market decline

## Dataset Source
- **File**: `Storefronts_Reported_Vacant_or_Not_20250915.csv`
- **Source**: NYC Department of Small Business Services
- **Coverage**: All reported storefronts with vacancy status across NYC

In [2]:
# Import Required Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime, timedelta
import warnings
warnings.filterwarnings('ignore')

# Set display options
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', 100)

# Configure plotting
plt.style.use('default')
sns.set_palette("magma")
plt.rcParams['figure.figsize'] = (12, 8)

## 1. Load and Explore Storefronts Dataset Structure

Let's examine the storefront vacancy data to understand how street-level commercial activity serves as a leading indicator for neighborhood economic health and office building market conditions.

In [3]:
# Load Storefronts vacancy dataset
storefronts_path = r"c:\Users\pcric\Desktop\capstone_project\office_apocalypse_algorithm_project\data\raw\Storefronts_Reported_Vacant_or_Not_20250915.csv"
print("Loading NYC Storefronts Vacancy dataset...")
df_storefronts = pd.read_csv(storefronts_path)

print(f"Dataset shape: {df_storefronts.shape}")
print(f"Number of storefront records: {len(df_storefronts):,}")
print(f"Number of features: {len(df_storefronts.columns)}")
print("\n" + "="*60)
print("Storefronts Vacancy Dataset Overview")
print("="*60)

Loading NYC Storefronts Vacancy dataset...
Dataset shape: (348297, 27)
Number of storefront records: 348,297
Number of features: 27

Storefronts Vacancy Dataset Overview
Dataset shape: (348297, 27)
Number of storefront records: 348,297
Number of features: 27

Storefronts Vacancy Dataset Overview


In [5]:
# Sample plotting cell (fast) - Vacant Storefronts
try:
    import os
    import pandas as pd
    import matplotlib.pyplot as plt
    root = r"c:\Users\pcric\Desktop\capstone_project\office_apocalypse_algorithm_project"
    fig_dir = os.path.join(root, 'figures')
    os.makedirs(fig_dir, exist_ok=True)

    path = r"c:\Users\pcric\Desktop\capstone_project\office_apocalypse_algorithm_project\data\raw\Storefronts_Reported_Vacant_or_Not_20250915.csv"
    df = pd.read_csv(path, nrows=100000, low_memory=False)

    # attempt to find date column
    date_col = None
    for c in df.columns:
        lc = c.lower()
        if 'date' in lc or 'vacant on' in lc or 'reported' in lc:
            date_col = c
            break
    if date_col is not None:
        df['vac_date'] = pd.to_datetime(df[date_col], errors='coerce')
        df['month'] = df['vac_date'].dt.to_period('M')
        monthly = df.groupby('month').size()
        if len(monthly) > 0:
            monthly.index = monthly.index.to_timestamp()
            plt.figure(figsize=(8,3))
            monthly.plot(marker='o')
            plt.title('Vacant Storefronts Reports (sample): Monthly Count')
            plt.xlabel('Month')
            plt.ylabel('Reports')
            out = os.path.join(fig_dir, 'storefronts_sample_monthly_reports.png')
            plt.tight_layout()
            plt.savefig(out)
            plt.close()
            print('Saved', out)
    else:
        # fallback: borough counts
        bcol = None
        for c in df.columns:
            if 'boro' in c.lower():
                bcol = c
                break
        if bcol:
            counts = df[bcol].fillna('UNKNOWN').value_counts()
            plt.figure(figsize=(6,4))
            counts.plot(kind='bar', color='purple')
            plt.title('Storefront Vacancy Reports by Borough (sample)')
            out = os.path.join(fig_dir, 'storefronts_sample_borough_counts.png')
            plt.tight_layout()
            plt.savefig(out)
            plt.close()
            print('Saved', out)
except Exception as e:
    print('Storefronts sample plotting failed:', e)


Saved c:\Users\pcric\Desktop\capstone_project\office_apocalypse_algorithm_project\figures\storefronts_sample_monthly_reports.png


In [6]:
# Ultra-fast EDA plotting cell (Storefronts vacancy, tiny sample nrows=2000)
try:
    import os
    import pandas as pd
    import matplotlib.pyplot as plt
    root = r"c:\Users\pcric\Desktop\capstone_project\office_apocalypse_algorithm_project"
    fig_dir = os.path.join(root, 'figures')
    os.makedirs(fig_dir, exist_ok=True)
    path = os.path.join(root, 'data', 'raw', 'Storefronts_Reported_Vacant_or_Not_20250915.csv')
    df = pd.read_csv(path, nrows=2000, low_memory=False)
    # try to find a date column
    date_col = None
    for c in df.columns:
        lc = c.lower()
        if 'date' in lc or 'reported' in lc or 'vacant on' in lc:
            date_col = c
            break
    if date_col:
        df['vac_date'] = pd.to_datetime(df[date_col], errors='coerce')
        df['month'] = df['vac_date'].dt.to_period('M')
        monthly = df.groupby('month').size()
        if len(monthly)>0:
            monthly.index = monthly.index.to_timestamp()
            plt.figure(figsize=(8,3))
            monthly.plot(marker='o')
            plt.title('Storefronts (sample): Monthly Reports')
            plt.tight_layout()
            out = os.path.join(fig_dir, 'storefronts_sample_monthly_reports.png')
            plt.savefig(out)
            plt.close()
            print('Saved', out)
    # borough counts fallback
    bcol = None
    for c in df.columns:
        if 'boro' in c.lower():
            bcol = c
            break
    if (not date_col) and bcol:
        counts = df[bcol].fillna('UNKNOWN').value_counts()
        plt.figure(figsize=(6,3))
        counts.plot(kind='bar', color='purple')
        plt.title('Storefronts (sample): Reports by Borough')
        plt.tight_layout()
        out2 = os.path.join(fig_dir, 'storefronts_sample_borough_counts.png')
        plt.savefig(out2)
        plt.close()
        print('Saved', out2)
except Exception as e:
    print('Storefronts sample plotting failed:', e)


Saved c:\Users\pcric\Desktop\capstone_project\office_apocalypse_algorithm_project\figures\storefronts_sample_monthly_reports.png


In [7]:
# Fallback: lightweight CSV-based borough counts (no pandas) - fast
try:
    import os
    import csv
    import matplotlib.pyplot as plt
    from collections import Counter

    root = r"c:\Users\pcric\Desktop\capstone_project\office_apocalypse_algorithm_project"
    fig_dir = os.path.join(root, 'figures')
    os.makedirs(fig_dir, exist_ok=True)
    path = os.path.join(root, 'data', 'raw', 'Storefronts_Reported_Vacant_or_Not_20250915.csv')

    with open(path, 'r', encoding='utf-8', errors='ignore') as fh:
        reader = csv.reader(fh)
        header = next(reader)
        # find likely borough column
        bidx = None
        for i,h in enumerate(header):
            if 'boro' in h.lower() or 'borough' in h.lower():
                bidx = i
                break
        if bidx is None:
            # fallback: look for 'borough' substring
            for i,h in enumerate(header):
                if 'borough' in h.lower():
                    bidx = i
                    break
        counts = Counter()
        if bidx is not None:
            for i,row in enumerate(reader):
                if i>20000: break
                try:
                    counts[row[bidx].strip() or 'UNKNOWN'] += 1
                except Exception:
                    continue
            if counts:
                labels, vals = zip(*counts.most_common())
                plt.figure(figsize=(6,3))
                plt.bar(labels, vals, color='purple')
                plt.title('Storefronts (csv sample): Reports by Borough')
                plt.xticks(rotation=45)
                plt.tight_layout()
                out = os.path.join(fig_dir, 'storefronts_sample_borough_counts_csv.png')
                plt.savefig(out)
                plt.close()
                print('Saved', out)
        else:
            print('No borough-like column found in storefronts CSV header')
except Exception as e:
    print('Storefronts csv fallback failed:', e)


Saved c:\Users\pcric\Desktop\capstone_project\office_apocalypse_algorithm_project\figures\storefronts_sample_borough_counts_csv.png
