# Approved Building Permits Dataset

The Approved Building Permits (ABP) dataset contains information on building permits issued across Boston from 2010 to 2024. It provide insights into city maintenance and development patterns. More information can be found via the following link: https://data.boston.gov/dataset/approved-building-permits. 

In this notebook, we will... 
- Clean and pre-process the dataset
- Conduct base analysis on the dataset 
- Gather useful insights on building permits in District 7 from 2021 to 2024

To understand the quality of life in District 7 under Councilor's guidance, we will further focus on the time range from 2021 to 2024. In the upcoming weeks, this notebook will also help us compare the District's performance with the city of Boston. 

### 1. Load Necessary Packages

In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import zipfile

### 2. Load Approved Building Permits Dataset
Because the dataset is too large to upload to GitHub, we can use the following two methods to load the dataset. 

**Method 1**: Download the dataset here https://data.boston.gov/dataset/approved-building-permits, replace the example file name in the read_csv code below with your file name, and run the remaining section to pre-process data.

In [None]:
df = pd.read_csv("../data/approved_building_permits.csv", low_memory=False)

In [None]:
df

## 3. Data Cleaning 

In [None]:
worktype_mapping = {
    'INTEXT': 'Renovation & Interior/Exterior Work',
    'INTREN': 'Renovation & Interior/Exterior Work',
    'EXTREN': 'Renovation & Interior/Exterior Work',
    'OTHER': 'Miscellaneous',
    'SPRINK': 'Fire Protection & Safety',
    'ADDITION': 'Renovation & Interior/Exterior Work',
    'COB': 'Miscellaneous',
    'FA': 'Fire Protection & Safety',
    'ERECT': 'Construction & New Installations',
    'SITE': 'Temporary Structures & Events',
    'VIOL': 'Miscellaneous',
    'PLUMBING': 'Electrical, Plumbing & Utility Systems',
    'SPCEVE': 'Temporary Structures & Events',
    'NEWCON': 'Construction & New Installations',
    'SIGNES': 'Signage & Canopy',
    'SPRNK9': 'Fire Protection & Safety',
    'EXTDEM': 'Demolition',
    'SD': 'Miscellaneous',
    'ROOF': 'Renovation & Interior/Exterior Work',
    'GARAGE': 'Construction & New Installations',
    'AWNING': 'Signage & Canopy',
    'FENCE2': 'Renovation & Interior/Exterior Work',
    'INSUL': 'Renovation & Interior/Exterior Work',
    'SIGNS': 'Signage & Canopy',
    'FSTTRK': 'Temporary Structures & Events',
    'CHGOCC': 'Occupancy & Use Change',
    'CELL': 'Temporary Structures & Events',
    'NROCC': 'Miscellaneous',
    'SOL': 'Construction & New Installations',
    'INTDEM': 'Demolition',
    'SPFT': 'Miscellaneous',
    'RAZE': 'Demolition',
    'TMPSER': 'Temporary Structures & Events',
    'ELECTRICAL': 'Electrical, Plumbing & Utility Systems',
    'GEN': 'Electrical, Plumbing & Utility Systems',
    'CANP': 'Signage & Canopy',
    'FENCE': 'Renovation & Interior/Exterior Work',
    'SIDE': 'Renovation & Interior/Exterior Work',
    'HOLVEN': 'Miscellaneous',
    'CONVRT': 'Miscellaneous',
    'SRVCHG': 'Electrical, Plumbing & Utility Systems',
    'LVOLT': 'Electrical, Plumbing & Utility Systems',
    'MAINT': 'Miscellaneous',
    'Service': 'Miscellaneous',
    'DRIVE': 'Construction & New Installations',
    'INDBLR': 'Electrical, Plumbing & Utility Systems',
    'TEMTRL': 'Temporary Structures & Events',
    'FLAM': 'Fire Protection & Safety',
    'COMPAR': 'Miscellaneous',
    'TVTRK': 'Temporary Structures & Events',
    'New': 'Construction & New Installations',
    'GAS': 'Electrical, Plumbing & Utility Systems',
    'INDFUR': 'Electrical, Plumbing & Utility Systems',
    'AWNRNW': 'Signage & Canopy',
    'RNWSIG': 'Signage & Canopy',
    'RESPAR': 'Miscellaneous',
    'AWNRET': 'Signage & Canopy',
    'BFCHMINFIN': 'Miscellaneous',
    'BFCHMTENT': 'Temporary Structures & Events',
    'General': 'Miscellaneous',
    'Dumpsters': 'Miscellaneous',
    'TMPUSOC': 'Occupancy & Use Change',
    'OSEAT': 'Temporary Structures & Events',
    'CANPRN': 'Signage & Canopy',
    'TCOO': 'Temporary Structures & Events'
}

df = df.dropna()

df.loc[:, 'issued_date'] = pd.to_datetime(df['issued_date'], errors='coerce', format='mixed').dt.date
df.loc[:, 'expiration_date'] = pd.to_datetime(df['expiration_date'], errors='coerce', format='mixed').dt.date

d7_zip = ["02119", "02120", "02121", "02122", "02124", "02125", "02115", "02215", "02118"]
df = df[df['zip'].isin(d7_zip)].copy()
df.loc[:, 'zip'] = df['zip'].apply(lambda x: str(x).zfill(5))

d7_city = ["Dorchester", "Fenway", "Roxbury", "South End"]
df = df[df['city'].isin(d7_city)].copy()

columns_to_drop = ['property_id', 'parcel_id', 'gpsy', 'gpsx']
df = df.drop(columns=[col for col in columns_to_drop if col in df.columns])

df['new_worktype'] = df['worktype'].map(worktype_mapping)

df = df.dropna(subset=['new_worktype'])

df

**Method 2**: Run the ZipFile code below to load the cleaned dataset that is uploaded to GitHub as a zip file 

In [None]:
with zipfile.ZipFile("../data/d7-approved-building-permits.csv.zip") as z:
    print(z.namelist())
    with z.open('d7-approved-building-permits.csv') as f:
        df = pd.read_csv(f, low_memory=False)
df

### 4. Data Visualization

In [None]:
df['issued_date'] = pd.to_datetime(df['issued_date'], errors='coerce')
df = df.dropna(subset=['issued_date'])

df['issued_year'] = df['issued_date'].dt.year

worktype_counts_per_year = df.groupby(['issued_year', 'new_worktype']).size().unstack(fill_value=0)

plt.figure(figsize=(20, 10))
worktype_counts_per_year.plot(kind='line', marker='o', linestyle='-', figsize=(20, 10))

plt.title('Total Permits by Work Types (2010-2024)', fontsize=20)
plt.xlabel('Year', fontsize=16)
plt.ylabel('Total Number of Permits', fontsize=16)
plt.xticks(rotation=0, ha='right')
plt.legend(title='Work Type', bbox_to_anchor=(1.05, 1), loc='upper left')

plt.grid(axis='y', linestyle='--', alpha=0.6)
plt.tight_layout()
plt.show()

**Findings**: 
- Electrical, Plumbing & Utility Systems is the most common work type from 2010 to 2024, followed by Renovation and Interior/Exterior Work. 
- The remaining work types stay under 1,000 permits across the years, and are further analyzed in the following graphs. 

In [None]:
df['issued_date'] = pd.to_datetime(df['issued_date'], errors='coerce')
df = df.dropna(subset=['issued_date'])

df['issued_year'] = df['issued_date'].dt.year

worktype_counts_per_year = df.groupby(['issued_year', 'new_worktype']).size().unstack(fill_value=0)

worktype_percentages = worktype_counts_per_year.div(worktype_counts_per_year.sum(axis=1), axis=0) * 100

worktype_percentages = worktype_percentages.loc[2021:2024]

primary_worktypes = ['Electrical, Plumbing & Utility Systems', 'Renovation & Interior/Exterior Work']
secondary_worktypes = ['Construction & New Installations', 'Miscellaneous', 'Fire Protection & Safety']

primary_worktype_data = worktype_percentages[primary_worktypes]
secondary_worktype_data = worktype_percentages[secondary_worktypes]
other_worktype_data = worktype_percentages.drop(columns=primary_worktypes + secondary_worktypes)

plt.figure(figsize=(14, 7))
ax1 = primary_worktype_data.plot(kind='line', marker='o', linestyle='-', figsize=(14, 7))

for line in ax1.get_lines():
    for x, y in zip(line.get_xdata(), line.get_ydata()):
        label = f"{y:.1f}%"
        ax1.annotate(label, (x, y), textcoords="offset points", xytext=(0, 5), ha='center')

plt.title('Percentage of Permits by Primary Work Types (2021-2024)', fontsize=18)
plt.xlabel('Year', fontsize=14)
plt.ylabel('Percentage of Permits (%)', fontsize=14)
plt.xticks(ticks=primary_worktype_data.index, labels=primary_worktype_data.index.astype(int), rotation=0)
plt.legend(title='Primary Work Type', bbox_to_anchor=(1.05, 1), loc='upper left')
plt.grid(axis='y', linestyle='--', alpha=0.6)
plt.tight_layout()
plt.show()

plt.figure(figsize=(14, 7))
ax2 = secondary_worktype_data.plot(kind='line', marker='o', linestyle='-', figsize=(14, 7))

for line in ax2.get_lines():
    for x, y in zip(line.get_xdata(), line.get_ydata()):
        label = f"{y:.1f}%"
        ax2.annotate(label, (x, y), textcoords="offset points", xytext=(0, 5), ha='center')

plt.title('Percentage of Permits by Secondary Work Types (2021-2024)', fontsize=18)
plt.xlabel('Year', fontsize=14)
plt.ylabel('Percentage of Permits (%)', fontsize=14)
plt.xticks(ticks=secondary_worktype_data.index, labels=secondary_worktype_data.index.astype(int), rotation=0)
plt.legend(title='Secondary Work Type', bbox_to_anchor=(1.05, 1), loc='upper left')
plt.grid(axis='y', linestyle='--', alpha=0.6)
plt.tight_layout()
plt.show()

plt.figure(figsize=(14, 7))
ax3 = other_worktype_data.plot(kind='line', marker='o', linestyle='-', figsize=(14, 7))

for line in ax3.get_lines():
    for x, y in zip(line.get_xdata(), line.get_ydata()):
        label = f"{y:.1f}%"
        ax3.annotate(label, (x, y), textcoords="offset points", xytext=(0, 5), ha='center')

plt.title('Percentage of Permits by Other Work Types (2021-2024)', fontsize=18)
plt.xlabel('Year', fontsize=14)
plt.ylabel('Percentage of Permits (%)', fontsize=14)
plt.xticks(ticks=other_worktype_data.index, labels=other_worktype_data.index.astype(int), rotation=0)
plt.legend(title='Other Work Type', bbox_to_anchor=(1.05, 1), loc='upper left')
plt.grid(axis='y', linestyle='--', alpha=0.6)
plt.tight_layout()
plt.show()

**Findings**: 
- The percentage for Electrical, Plumbing, and Utility Systems was generally decreasing, while Renovation and Interior/Exterior Work was slowly increasing. 
- Both work types took up around 78% of total permits. 
- The demands for Construction and New Installations as well as Fire Protection and Safety fluctuated significantly, while Miscellaneous is  generally stable. 
- Demolition, Occupancy and Use Change, Temporary Structures and Events, as well as Signade and Canopy appears to be under control, given their low percentages and relatively even patterns. 

In [None]:
df['issued_date'] = pd.to_datetime(df['issued_date'], errors='coerce')
df = df.dropna(subset=['issued_date'])

df['issued_year'] = df['issued_date'].dt.year

occupancytype_counts_per_year = df.groupby(['issued_year', 'occupancytype']).size().unstack(fill_value=0)

plt.figure(figsize=(20, 10))
occupancytype_counts_per_year.plot(kind='line', marker='o', linestyle='-', figsize=(20, 10))

plt.title('Total Permits by Occupancy Types (2010-2024)', fontsize=20)
plt.xlabel('Year', fontsize=16)
plt.ylabel('Total Number of Permits', fontsize=16)
plt.xticks(rotation=0, ha='right')
plt.legend(title='Occupancy Type', bbox_to_anchor=(1.05, 1), loc='upper left')

plt.grid(axis='y', linestyle='--', alpha=0.6)
plt.tight_layout()
plt.show()

**Findings**: 
- All occupancy types are decreasing or remaining relatively the same since 2023. 
- The most common occupancy type is a family of 1 to 2.
- Family of 1-3, 1-4, and 1 unit are common occupancy types as well.

In [None]:
status_counts_per_year = df.groupby(['issued_year', 'status']).size().unstack(fill_value=0)

plt.figure(figsize=(12, 6))
status_counts_per_year.plot(kind='line', marker='o', linestyle='-')

plt.title('Total Permits from 2010 to 2024', fontsize=16)
plt.xlabel('Year', fontsize=14)
plt.ylabel('Total Number of Permits', fontsize=14)
plt.xticks(rotation=0)
plt.legend(title='Status', bbox_to_anchor=(1.05, 1), loc='upper left')
plt.grid(axis='y', linestyle='--', alpha=0.6)
plt.tight_layout()
plt.show()

**Findings**: 
- Open Permits are higher than Closed Permits from 2010 to 2024. 

In [None]:
df['issued_date'] = pd.to_datetime(df['issued_date'], errors='coerce')
df = df.dropna(subset=['issued_date'])

df['issued_year'] = df['issued_date'].dt.year

df_filtered = df[df['issued_year'].between(2021, 2024)]

status_counts_per_year = df_filtered.groupby(['issued_year', 'status']).size().unstack(fill_value=0)

status_percentages_per_year = status_counts_per_year.div(status_counts_per_year.sum(axis=1), axis=0) * 100

plt.figure(figsize=(12, 6))
status_percentages_per_year.plot(kind='line', marker='o', linestyle='-')

for year in status_percentages_per_year.index:
    for status in status_percentages_per_year.columns:
        plt.text(year, status_percentages_per_year.loc[year, status], 
                 f"{status_percentages_per_year.loc[year, status]:.1f}%", 
                 ha='center', va='bottom', fontsize=8)

plt.title('Percentage of Permits from 2021 to 2024', fontsize=16)
plt.xlabel('Year', fontsize=14)
plt.ylabel('Percentage of Permits (%)', fontsize=14)
plt.xticks(ticks=status_percentages_per_year.index, labels=status_percentages_per_year.index.astype(int), rotation=0)
plt.legend(title='Status', bbox_to_anchor=(1.05, 1), loc='upper left')
plt.grid(axis='y', linestyle='--', alpha=0.6)
plt.tight_layout()
plt.show()


**Findings**: 
- Open Permits were gradually decreasing from 2021 to 2023, but began increasing afterward. 
- In contrast, Closed Permits were slightly increasing from 2021 to 2023, but started decreasing afterward. 

### 5. Conclusion

- Electrical, Plumbing & Utility Systems and Renovation and Interior/Exterior Work are the most common work of building permits.
- Construction and New Installations as well as Fire Protection and Safety should be closely monitored to prevent larger issues occured.
- All occupancy types are decreasing or remaining relatively the same since 2023.
- 75% of the permits remain open, which is significantly higher compared to the years before. 