# Question Formulation

## 1. Research Question 1: Unit Price Efficiency Analysis
**The Question**

How does the relationship between property size (`square_feet`) and unit price (`price_per_sqft`) vary across different BHK configurations (1, 2, 3, 4+) and area types (Super Area vs. Carpet Area)?

Specifically: Does a specific **"area threshold"** exist where increasing the square footage no longer results in a competitive unit price (diminishing returns), and does this threshold differ between BHK groups?

**Motivation & Benefits**
* **Why this is worth investigating:** Real estate pricing is rarely linear. This analysis explores the "Core Pricing Logic," verifying whether larger properties offer economies of scale or if there is a point of diminishing returns.
* **Insights provided:** It identifies the "sweet spot" size for each apartment configuration (e.g., the optimal size for a 2BHK before it becomes overpriced relative to the market).
* **Stakeholders:**
    * **Home Buyers:** To identify "value-for-money" properties and avoid paying a premium for inefficiently large spaces.
    * **Investors:** To maximize rental yield potential by purchasing efficiently sized units.
    * **Real Estate Agents:** To justify pricing strategies to sellers based on market data.
* **Real-world decision:** This informs the decision of choosing a property that offers the best utility-to-price ratio.

## 2. Research Question 2: Neighborhood Price Premium
**The Question**

After controlling for structural and listing attributes (e.g., `square_feet`, `bhk`, `areaWithType`, `transaction`, `status`, `furnishing`, and floor-related variables such as `floor`, `num_floor`), which localities in Surat exhibit statistically significant price premiums or discounts in `price_per_sqft`?

Concretely, we will extract locality from property_name (e.g., “... in Dindoli Surat”) and estimate each locality’s adjusted premium/discount relative to the market baseline while holding other factors constant.

**Motivation & Benefits**
* **Why this is worth investigating:** Location is typically the strongest driver of real-estate prices, but naive comparisons (e.g., mean price by locality) can be misleading because different localities may have different mixes of property types (BHK, size, furnishing level, etc.). We need an approach that isolates the “pure” location effect.
* **Insights provided:** This analysis produces an evidence-based ranking of localities by adjusted price premium/discount, including uncertainty (confidence intervals). It reveals where price differences persist even after controlling for property attributes.
* **Stakeholders:**
    * **Home buyers**, **sellers**, **brokers/agents**, **investors**, and **property listing platforms** (pricing guidance, valuation, and anomaly detection).
* **Real-world decision:** This informs the decision of choosing a property that offers the best utility-to-price ratio.

## 3. Research Question 3: Pricing Uncertainty by Segment
**The Question**

Which market segments in Surat have the highest pricing uncertainty in `price_per_sqft`?
We quantify uncertainty using robust dispersion metrics (primarily IQR) computed on
`price_per_sqft_c` (clipped 1%–99%). Specifically:

Rank segments defined by `transaction` (New vs Resale) × `bhk_group` (1, 2, 3, 4+)
by their IQR of `price_per_sqft_c`.

Test whether dispersion differs significantly between New vs Resale within each `bhk_group`.

**Motivation & Benefits**
* **Why this is worth investigating:** Average prices alone do not capture market stability. Two segments may share similar medians but have very different spreads: one is predictable (stable pricing), the other is volatile (hard
to price, larger negotiation risk).
* **Insights provided:** 
  * A clear ranking of “most volatile” vs “most stable” segments using an outlier-robust spread metric.
  * Evidence on whether New and Resale markets differ in price dispersion for the same BHK size.
* **Stakeholders:**
    * Home buyers: understand where listing prices are less reliable and where negotiation may matter.
    * Agents / listing platforms: provide price ranges (confidence bands) rather than single-point prices.
    * Investors: identify segments with greater mispricing opportunities due to higher dispersion.
* **Real-world decision:** “Should I trust listing prices in this segment, or do I need more comparable checks and stronger negotiation because the market is noisy?”

## 4. Research Question 4: Floor Effect Analysis
**Question**

Does a floor effect exist in residential property prices, and does this effect differ by ``transaction`` type (``New Property`` vs. ``Resale``) and ``furnishing``?

The question is worth investigating because the floor effect provides practical and economic insights into property pricing mechanisms

By answering this question, we can:
- Quantify how much price premium is associated with higher floors.
- Determine whether this premium depends on building height (floor_total).
- Identify whether ``New Property`` and ``Resale`` markets behave differently, reflecting differences in buyer expectations, building age, and marketing strategies.

There are several targets that care about this analysis:
- **Home buyers**: can decide whether paying extra for higher floors is justified given their budget and priorities.
- **Investors**: Can evaluate whether floor premiums translate into better resale value or rental yield.

This question is a more generalized problem of many following real-world scenerios:
- *How much more should I pay for an apartment on the 12th floor compared to the 3rd floor?*
- *Is the floor premium only meaningful in high-rise buildings?*
- *Does it make sense to pay extra for a higher floor in a resale apartment?*

## 5. Research Question 5: Property Market Segmentation in Surat

**Question**

Can clustering techniques identify meaningful groups of properties such as affordable low-rise resale units, luxury high-rise new developments, and mid-range family apartments?

The question is worth investigating because the real estate market is heterogeneous. Properties differ not only in price but also in physical structure, vertical position, and intended buyer segment. Simple averages hide these differences

By answering this question, we can:
- Discover natural groupings without predefined labels
- Understand how price, height, and size jointly shape market segments

There are several targets that care about this analysis:
- **Home buyers**: choose properties aligned with budget and preferences
- **Investors**: tailor pricing and marketing

## 6. Research Question 6: Price prediction and feature importance

**Question**

Can we build machine learning models to predict housing prices ``price_lakh`` using structural property features, and explain which factors most strongly influence price?

This is a high–value real-world problem in real estate analytics: automated property valuation based on listing data. The insights we get will help to detect underpriced and overpriced listings