# California Housing Market Analysis (Python-based)

In [5]:
import pandas as pd
import numpy as np

# Load the dataset
housing_data = pd.read_csv('Data/housing_data.csv')

In [6]:
# Fill missing values for all columns with the median of the respective column
# Fill missing values only for numeric columns
numeric_columns = housing_data.select_dtypes(include=[np.number])
housing_data[numeric_columns.columns] = numeric_columns.fillna(numeric_columns.median())


## **Data Analysis**

**Correlation Between Income and House Value**: We sort the `median_income` and `median_house_value` columns to see the relationship between income and house value.

In [7]:
income_vs_value = housing_data[['median_income', 'median_house_value']].sort_values(by='median_income', ascending=False)

**Income Segmentation and House Prices**: We create an `income_category` function to group different income ranges and calculate the average house price for each category.

In [8]:
def income_category(income):
    if income < 2:
        return 'Very Low'
    elif 2 <= income < 3:
        return 'Low'
    elif 3 <= income < 4:
        return 'Median'
    elif 4 <= income < 5:
        return 'High'
    else:
        return 'Very High'

housing_data['income_category'] = housing_data['median_income'].apply(income_category)
income_segmentation = housing_data.groupby('income_category')['median_house_value'].mean().reset_index().sort_values(by='median_house_value', ascending=False)

**Room and Bedroom Counts Impact on House Prices**: Two separate analyses, one based on the total number of bedrooms and one based on the total number of rooms, rounding the values to create groupings, and then calculating average house prices for each group.

In [9]:
# By Bedroom Count
housing_data['bedroom_group'] = housing_data['total_bedrooms'].round(-1)
bedroom_counts = housing_data.groupby('bedroom_group')['median_house_value'].mean().reset_index().sort_values(by='median_house_value', ascending=False)

# By Room Count
housing_data['room_group'] = housing_data['total_rooms'].round(-1)
room_counts = housing_data.groupby('room_group')['median_house_value'].mean().reset_index().sort_values(by='median_house_value', ascending=False)

**Ocean Proximity and House Prices**: We group the data by `ocean_proximity` and calculate the average house prices for each proximity group.

In [10]:
ocean_proximity = housing_data.groupby('ocean_proximity')['median_house_value'].mean().reset_index().sort_values(by='median_house_value', ascending=False)

**Z-Score Distribution of House Prices**: We calculate the Z-scores of house prices to identify outliers and evaluate price distributions.

In [11]:
avg_house_value = housing_data['median_house_value'].mean()
stddev_house_value = housing_data['median_house_value'].std()

housing_data['z_score'] = (housing_data['median_house_value'] - avg_house_value) / stddev_house_value

**Price Distribution by Location (Longitude and Latitude)**: We round off the `longitude` and `latitude` values to group locations, then calculate the average house value for each geographic location.

In [12]:
housing_data['rounded_longitude'] = housing_data['longitude'].round(1)
housing_data['rounded_latitude'] = housing_data['latitude'].round(1)
price_distribution = housing_data.groupby(['rounded_longitude', 'rounded_latitude'])['median_house_value'].mean().reset_index()

**Income to House Price Ratio by Ocean Proximity**: We calculate the ratio of house prices to income for each location and group them by `ocean_proximity`.

In [13]:
housing_data['price_income_ratio'] = housing_data['median_house_value'] / housing_data['median_income']
income_to_price_ratio = housing_data.groupby('ocean_proximity')['price_income_ratio'].mean().reset_index().sort_values(by='price_income_ratio', ascending=False)

In [14]:
# Display results (optional)
print("Income vs House Value:\n", income_vs_value.head())
print("Income Segmentation:\n", income_segmentation)
print("Bedroom Count Impact:\n", bedroom_counts)
print("Room Count Impact:\n", room_counts)
print("Ocean Proximity Impact:\n", ocean_proximity)
print("Price Distribution by Location:\n", price_distribution)
print("Income to Price Ratio:\n", income_to_price_ratio)

Income vs House Value:
        median_income  median_house_value
4352         15.0001            500001.0
10673        15.0001            500001.0
8849         15.0001            500001.0
4606         15.0001            500001.0
5257         15.0001            500001.0
Income Segmentation:
   income_category  median_house_value
3       Very High       328225.139277
0            High       227656.486584
2          Median       190584.716560
1             Low       144222.172148
4        Very Low       112512.674867
Bedroom Count Impact:
      bedroom_group  median_house_value
340         5420.0       500001.000000
286         3020.0       491200.000000
193         1930.0       472000.000000
315         3860.0       451100.000000
207         2080.0       402050.500000
..             ...                 ...
237         2400.0        95266.666667
221         2230.0        81300.000000
291         3100.0        78500.000000
224         2260.0        74400.000000
292         3110.0        28