## Key Insights
> **1. Distribution of Rental Prices**
>- Rental prices show a **right-skewed distribution**, with an average rent of approximately **¥129,149**.
>- The **95% confidence interval** indicates that the **true mean rent for listings in the designated area** is likely within **¥129148.68 &plusmn; 1941.43**.
>- For staff searching for housing, this range provides a useful baseline for estimating expected rental costs.
>
> **2. Relationships Between Rent & Other Features**
>- Floor Area: The Pearson correlation coefficient (~0.881) with a near-zero p-value indicates a **strong positive relationship** between `area` and `rent`.
>- Building Age: The negative correlation coefficient (~-0.400) paired with a very small p-value suggests a **moderate inverse relationship**. In other words, **newer buildings tend to be more expensive**.
>- Floor Plan: **More complex or larger** `floor_plan` categories generally correspond to **higher rental prices**, which aligns with the increase in usable space (`area`).
>- Station Proximity: Contrary to expectations, `distance_to_nearest_station` **does not appear to meaningfully affect** `rent`.
>- Nearest Station: Similarly, **no clear relationship exists** between `nearest_station` and `rent`, suggesting that most stations offer **comparable rates** in this dataset. An objection to this statement naturally arises from the question, **"What are the conditions or characteristics of typical listings near each station?"** For example, some stations may have newer developments in their area relative to other stations. This is worth exploring in future analyses. 
>- Building Size: **Rental prices tend to increase as** `building_size` **increases** in a mostly linear fashion.
>- Floor Level: In general, `rent` **increases with** `floor` in a fairly linear manner.
>- Understanding how rent relates to various housing features provides staff with meaningful context about which factors significantly influence pricing and which have minimal impact. This insight can help incoming field workers evaluate their housing preferences, prioritize features, and form realistic expectations about how specific amenities or characteristics may affect availability and cost. 
>
> **3. Rental Price Prediction**
>- We built a **multivariate linear regression model** to predict `rent` using the **most influential features**: `area`, `building_age`, `building_size`, `floor`, and `floor_plan`.
>- The model achieved an $R^2$ of approximately **89.3% on the training set** and **88.0% on the testing set**, meaning it explains roughly 88% of the variation in rental prices and is therefore a **strong overall performance**.
>- This model provides a practical tool for estimating rental costs based on key housing characteristics, supporting more informed budgeting and housing decisions for current and future staff.

## Limitations & Future Directions 
> **1. Incomplete Coverage of the Full Housing Dataset**
> 
> Because of the structure of *SUUMO.jp*, it was not possible to capture all available listings. While approximately 5,000 listings were successfully scraped, this represents only a portion of the full rental market and therefore provides an incomplete market view. As a result, the analysis should be interpreted as a general overview of market trends rather than a comprehensive assessment of all available properties. Despite this limitation, the dataset is sufficiently large to support regression modeling and to identify typical rental characteristics and pricing patterns. Future iterations of this analysis could explore more robust data collection methods to increase coverage and improve the representativeness of the sample.
> 
> **2. Continuously Changing Market Data**
> 
> Our dataset represents only a single snapshot in time, as *SUUMO.jp* updates listings frequently. We scraped the site once, stored the data in SQLite, and performed all analysis on the static dataset. To better reflect real market behavior, future iterations should incorporate an automated, recurring scraping process (e.g., daily or weekly) to capture new listings and pricing changes over time. This expanded dataset would not only improve model performance but also provide more meaningful insights into housing availability, pricing trends, and market volatility.
> 
> **3. Nonlinear Relationships Among Features**
> 
> Visualizing the relationships between rent and other key features revealed some nonlinear patterns. For simplicity, our current model assumes linearity. While the resulting $R^2$ indicates strong predictive performance, future analyses could benefit from applying polynomial transformations (e.g., using PolynomialFeatures in scikit-learn) to better capture these nonlinear effects. Incorporating polynomial features has the potential to improve model accuracy, but it also increases the risk of overfitting. In such cases, regularization is recommended to ensure the model generalizes well to unseen data.

## Conclusion
> This analysis offers a data-driven overview of Tokyo's rental housing market, centered around the Nakai neighborhood, and identifies the key features most strongly associated with rental prices. Rental costs are right-skewed, with floor area emerging as the primary driver of rent, followed by building age, size, and floor level. In contrast, station proximity and specific station choice showed little influence on pricing within this dataset, suggesting relative price consistency across locations.
>
> The multivariate regression model performed well, explaining approximately 88% of the variation in rental prices and providing a practical framework for estimating housing costs. Despite data limitations related to unit-level coverage and market dynamics, the results offer actionable insights for current and incoming field staff. With more comprehensive scraping and robust predictive models (e.g., polynomial regression), future iterations of this project could deliver more accurate insights into rental prices and market behavior. 