# San Francisco Housing Cost Analysis

In this assignment, you will perform fundamental analysis for the San Francisco housing market to allow potential real estate investors to choose rental investment properties.

In [6]:
# imports
import plotly.express as px
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from pathlib import Path

import warnings
warnings.filterwarnings('ignore')

## Load Data

In [7]:
# Read the census data into a Pandas DataFrame
file_path = Path("Data/sfo_neighborhoods_census_data.csv")
sfo_data = pd.read_csv(file_path, index_col="year")
sfo_data.head()

Unnamed: 0_level_0,neighborhood,sale_price_sqr_foot,housing_units,gross_rent
year,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2010,Alamo Square,291.182945,372560,1239
2010,Anza Vista,267.932583,372560,1239
2010,Bayview,170.098665,372560,1239
2010,Buena Vista Park,347.394919,372560,1239
2010,Central Richmond,319.027623,372560,1239


- - - 

## Housing Units Per Year

In this section, you will calculate the number of housing units per year and visualize the results as a bar chart using the Pandas plot function.

**Hint:** Use the Pandas `groupby` function.

**Optional challenge:** Use the min, max, and std to scale the y limits of the chart.



In [8]:
# Calculate the mean number of housing units per year (hint: use groupby) 


Unnamed: 0,year,housing_units
0,2010,372560.0
1,2011,374507.0
2,2012,376454.0
3,2013,378401.0
4,2014,380348.0


In [9]:
# Save the dataframe as a csv file


In [None]:
# Use the Pandas plot function to plot the average housing units per year.
# Note: You will need to manually adjust the y limit of the chart using the min and max values from above.


# Optional Challenge: Use the min, max, and std to scale the y limits of the chart


# Set plot size for better readability


# Plotting


# Adjust y limits based on min, max, and std


# Show plot


- - - 

## Average Housing Costs in San Francisco Per Year

In this section, you will calculate the average monthly rent and the average price per square foot for each year. An investor may wish to better understand the sales price of the rental property over time. For example, a customer will want to know if they should expect an increase or decrease in the property value over time so they can determine how long to hold the rental property.  Plot the results as two line charts.

**Optional challenge:** Plot each line chart in a different color.

In [11]:
# Calculate the average sale price per square foot and average gross rent


In [None]:
# Create two line charts, one to plot the average sale price per square foot and another for average montly rent

# Line chart for average sale price per square foot


# Line chart for average montly rent


- - - 

## Average Prices by Neighborhood

In this section, you'll create a function named average_price_by_neighborhood to analyze and visualize the housing market trends in a specific San Francisco neighborhood. First, it filters housing data for the chosen neighborhood. Then, it cleans the data, ensuring sale prices are numeric and removes any missing values. Next, it calculates the yearly average sale price per square foot. Finally, it generates a line plot displaying this trend over time. The function will use Plotly Express for visualization, emphasizing clarity with labeled axes and a descriptive title. Upon calling this function with a neighborhood's name, it will return the trend plot. The same steps will be followed to analyze average gross rent trends.

In [13]:
"""
Write a function that: 
- Calculates the average sale price per square foot for a given neighborhood in San Francisco,
- Filters the data for the specified neighborhood,
- Cleans and processes the data,
- Calculates the average price per square foot for each year,
- Creates a line plot to visualize the trend over the years.
"""

# def average_price_by_neighborhood(neighborhood):
    # df_prices = 
    # Convert 'sale_price_sqr_foot' to a numeric type, ignore errors to avoid conversion issues
    # df_prices['sale_price_sqr_foot'] = pd.to_numeric(df_prices['sale_price_sqr_foot'], errors='coerce')
    # Drop rows with NaN values in 'sale_price_sqr_foot' after conversion
    
    # Group by 'year' and calculate mean, ensuring 'sale_price_sqr_foot' is now numeric
    
    # Create and return the plot


In [None]:
# Test your function by passing a neighborhood name.

# average_price_by_neighborhood("Alamo Square")

In [15]:
# Use plotly to create an interactive line chart of the average monthly rent.
# def average_rent_by_neighborhood(neighborhood):

    # Convert 'gross_rent' to a numeric type, ignore errors to avoid conversion issues

    # Drop rows with NaN values in 'gross_rent' after conversion

    # Group by 'year' and calculate mean, ensuring 'gross_rent' is now numeric

    # Create and return the plot


In [None]:
# average_rent_by_neighborhood("Alamo Square")

## The Top 10 Most Expensive Neighborhoods

In this section, you will Group by year and neighborhood and then create a new dataframe of the mean values to calculate the mean sale price per square foot for each neighborhood and then sort the values to obtain the top 10 most expensive neighborhoods on average. Plot the results as a bar chart.

In [17]:
# Getting the data from the top 10 expensive neighborhoods to own


Unnamed: 0,neighborhood,sale_price_sqr_foot
65,Union Square District,903.993258
36,Merced Heights,788.844818
38,Miraloma Park,779.810842
51,Pacific Heights,689.555817
71,Westwood Park,687.087575
63,Telegraph Hill,676.506578
57,Presidio Heights,675.350212
10,Cow Hollow,665.964042
56,Potrero Hill,662.013613
60,South Beach,650.124479


In [None]:
# Plotting the data from the top 10 expensive neighborhoods


# Show the plot


- - - 

## Comparing cost to purchase versus rental income

In this section, you will define a function that takes a selected neighborhood as input, filters the data for that neighborhood, creates a bar chart using Plotly Express, and returns the chart as a result.

In [19]:
"""
Write a function that: 
- creates a new DataFrame called df_costs containing only the rows from the DataFrame "sfo_data"
- generates a plotly bar chart comparing the sale_price_sqr_foot and gross_rent
columns.
- sets the barmode parameter to 'group'
- returns the plot
"""



# def most_expensive_neighborhoods_rent_sales(selected_neighborhood):
    # df_costs = 

In [None]:
# testing the function


- - - 

## Neighborhood Map

In this section, you will read in neighborhoods location data and build an interactive map with the average house value per neighborhood. Use a `scatter_mapbox` from Plotly express to create the visualization. 

### Load Location Data

In [21]:
# Load neighborhoods coordinates data


### Data Preparation

You will need to join the location data with the mean values per neighborhood.

1. Calculate the mean values for each neighborhood.

2. Join the average values with the neighborhood locations.

In [22]:
# Calculate the mean values for each neighborhood


In [23]:
# Join the average values with the neighborhood locations


### Mapbox Visualization

Plot the average values per neighborhood using a Plotly express `scatter_mapbox` visualization.

In [None]:


# Create a scatter mapbox to analyze neighborhood info


- - -

## Cost Analysis - Optional Challenge

In this section, you will use Plotly express to create visualizations that investors can use to interactively filter and explore various factors related to the house value of the San Francisco's neighborhoods. 

### Create a DataFrame showing the most expensive neighborhoods in San Francisco by year

In [25]:
# Fetch the data from all expensive neighborhoods per year.
df_expensive_neighborhoods = sfo_data.groupby(by="neighborhood").mean()
df_expensive_neighborhoods = df_expensive_neighborhoods.sort_values(
    by="sale_price_sqr_foot", ascending=False
).head(10)
df_expensive_neighborhoods = df_expensive_neighborhoods.reset_index()

### Create a parallel coordinates plot and parallel categories plot of most expensive neighborhoods in San Francisco per year


In [None]:
# Parallel Categories Plot


In [None]:
# Parallel Coordinates Plot


### Create a sunburst chart to conduct a costs analysis of most expensive neighborhoods in San Francisco per year

In [None]:
# Sunburst Plot
