# Davis Housing Market Analysis

This project demonstrates the end-to-end process of preparing housing listing data for a Power BI dashboard.

**Data Collection:**  
- Scraped Davis apartment listings from Craigslist using Python, Selenium, and HTML parsing.  
- Key fields collected: Rent, Number of Bedrooms, Square Footage, and Listing URL.  
- Pretend Data has been used in place of the web scraping due to the live browser nature of Selenium (Chromedriver)


In [32]:
import pandas as pd

## Step 1: Create Sample Housing Data

Simulate scraped data with the following columns:
- Rent Amount
- Square Footage
- Locality / Region
- Number of Bedrooms / Bathrooms
- Type of House
- Amenities


In [41]:
# Sample pretend data
data = {
    "Rent Amount": ["$1800", "$2200", "$1500"],
    "Square Footage": ["900ft2", "1200ft2", "750ft2"],
    "Locality": ["Davis", "Davis", "Davis"],
    "Region": ["CA", "CA", "CA"],
    "Number of Bedrooms": ["2br", "3br", "1br"],
    "Number of Bathrooms": ["1", "2", "1"],
    "Type of House": ["Apartment", "Apartment", "Studio"],
    "Amenities": ["Pool, Gym", "Gym", "Pool"]
}
df = pd.DataFrame(data)
df.head()

## Step 2: Clean Numeric Columns

Convert messy strings to numeric values:
- Rent → float
- Bedrooms → float
- Square Footage → float
- Optional: round to 2 decimals


In [42]:
# Convert columns to numeric and round
df['Rent'] = df['Rent Amount'].str.replace('$', '', regex=False).astype(float).round(2)
df['Bedrooms'] = df['Number of Bedrooms'].str.extract(r'(\d+)')[0].astype(float).round(2)
df['SqFt'] = df['Square Footage'].str.extract(r'(\d+)')[0].astype(float).round(2)
# Drop original columns
df.drop(columns=['Rent Amount', 'Number of Bedrooms', 'Square Footage'], inplace=True)
df.head()

## Step 3: Summary and Quality Check

Quick overview of numeric data to ensure everything is clean and ready.


In [43]:
df.describe()

In [44]:
df.info()

## Step 4: Notes and Next Steps

- Pretend data simulates scraped listings for Davis apartments.
- Cleaning steps included:
  - Converting Rent, Bedrooms, SqFt to numeric to make data usable in Power BI
  - Dropping original messy columns to decrease load
- Data is now ready for:
  - Power BI import
  - Creating star schema
  - DAX measures
  - Interactive dashboards 
