# District 7 Housing
This project aims to conduct an in-depth data analysis of housing in District 7. It explores property trend, violation statistics, income-restricted housing needs, and other housing-related questions in District 7. At the end, we will perform comparative analysis between District 7 and Boston. If time permits, we will also compare District 7 to Massachusetts.

## 1. Understanding of Datasets

We gathered data to address several housing-related questions posed by the councilor. For instance, we used Boston Census data to determine the number of housing units—both owner-occupied and renter-occupied—providing a clearer view of the housing situation in District 7.

### 1.1 Income-Restricted Housing Inventory

We found data on income-restricted housing, which refers to units designated as affordable for lower-income individuals or families. While this mainly applies to rental units, there are also cases where purchasing restrictions are applied through affordable housing programs.

The councilor's questions focused on how the number of income-restricted units has changed over time, highlighting the growing difficulty people face in obtaining affordable housing and the need for government subsidies. We also addressed questions about Section 8 vouchers, a form of government assistance that helps low-income families, elderly individuals, and people with disabilities afford housing in the private market. Our analysis sought to determine how many units accept these vouchers, offering insight into the availability of subsidized housing.

### 1.2 RentSmart

We utilized the RentSmart dataset for an in-depth analysis of inspectional services, which enforce building, housing, health, and environmental regulations to protect public health and safety. The dataset, ranging from 2016 to 2024, includes information such as violation types, addresses, neighborhoods, zip codes, and property characteristics (e.g., building and remodel years).

From this data, we can analyze how total violations and properties have changed over time, exploring correlations between them. Preliminary analysis indicates enforcement violations are the most common and have fluctuated significantly since 2020. Notably, households with three residents are the most prevalent across all areas in District 7. This allows us to identify which violations are under control and which require more attention, while considering the number of residents per household.

### 1.3 Boston Census

This comprehensive dataset contains detailed housing occupancy information, including the number of units per structure, the number of rooms, ownership status, and utilities. Using this data, we can examine housing density and distribution across District 7, compare it with other districts, and assess housing affordability and availability across neighborhoods. Additionally, we can study occupancy and ownership trends, as well as analyze neighborhood infrastructure and utility needs—valuable insights for future urban planning and development in District 7.

## 2. Data Cleaning and Preprocessing

### 2.1 Income-Restricted Housing Dataset

While cleaning the income-restricted housing dataset, we experimented with various methods before settling on the following approach:

- We dropped missing data points, as they accounted for less than 5% of the dataset.

- The ZIP code column initially contained floating-point numbers. We corrected this by converting the values to integers, then to strings, and finally added leading zeros for proper ZIP code formatting.

- We filtered the data by the ZIP codes associated with District 7's neighborhoods. Some ZIP codes were shared with neighborhoods outside the councilor's jurisdiction, so we further refined the dataset by cleaning and standardizing neighborhood names before filtering for the relevant ones.

### 2.2 RentSmart Dataset

To clean the RentSmart dataset, we applied the following steps:

- The date column included unnecessary time information, so we removed the hours, minutes, and seconds, retaining only the year, month, and day.

- In the description column, we grouped similar violation descriptions. For example, "Work without permit" and "Work w/o Permit" were merged into "Work Without Permit."

- ZIP codes were missing leading zeros, and the year-built and year-remodeled columns contained extraneous ".0" values, so we reformatted them accordingly.

- We filtered the data by ZIP code and neighborhood, following the same approach used for the income-restricted housing dataset to ensure that only District 7 data was included.

### 2.3 Boston Census Dataset

In our initial preprocessing of the Boston Census dataset, we took the following steps:

- We selected the relevant ZIP codes for District 7 and downloaded housing data for the years 2011 to 2022. Although we expected a single dataset, we received 12 datasets (one per year), each containing over 500 columns.

- Each dataset had 4 sub-columns for each primary column: estimates, margins of error, percentages, and percentage margins of error. For preprocessing, we focused on the estimate and percentage columns.

- The column names were complex and difficult to interpret. For example, the column for the estimated total number of housing units was labeled "Estimate!!HOUSING OCCUPANCY!!Total housing units." We are working on cleaning these column names to make them more readable and consistent across datasets.

- We aim to consolidate the data from multiple years into a single dataset for year-by-year comparisons across ZIP codes.

## 3. Challenges

### 3.1 District 7 Jurisdiction

We faced challenges determining which projects from the income-restricted housing dataset are in District 7, as the dataset identifies areas by neighborhood rather than district. For projects in the South End, we suspect that the provided ZIP codes may be incorrect, as multiple ZIP codes are listed when the South End should only have one.

We consulted the research guide librarian, Lucy Flamm, who recommended using the "Find My Councilor" tool to clarify which addresses in the South End belong to District 7. For addresses that cannot be verified through this tool, we will compare them with the Massachusetts Interactive Property Map and Boston's precinct map. This process will only apply to projects in the South End, making it feasible to handle manually.

A CSV file containing only income-restricted housing projects in District 7 is now available on GitHub, but it hasn't been used in the current data cleaning and preprocessing due to the time required for manual verification. We plan to incorporate this dataset into future analyses.

**Update:** The client comments that he is most interested in Zip Code 02199 and 02121 since these are the zip codes that wholely fall within District 7 and makes up the majority of District 7. We are in the process of confirming with the client if this is truely what he wants because this will filter out a lot of data from the income-restricted housing dataset.

## 4. Insufficient Data

Determining whether a South End address is part of District 7 remains challenging, particularly for larger datasets like RentSmart, which cannot be manually processed. We are considering reaching out to Boston’s city data team for assistance, as Lucy Flamm suggested, in case they have privately stored data that could help us.