# Housing Rental Analysis for San Francisco

In this challenge, your job is to use your data visualization skills, including aggregation, interactive visualizations, and geospatial analysis, to find properties in the San Francisco market that are viable investment opportunities.

## Instructions

Use the `san_francisco_housing.ipynb` notebook to visualize and analyze the real-estate data.

Note that this assignment requires you to create a visualization by using hvPlot and GeoViews. Additionally, you need to read the `sfo_neighborhoods_census_data.csv` file from the `Resources` folder into the notebook and create the DataFrame that you’ll use in the analysis.

The main task in this Challenge is to visualize and analyze the real-estate data in your Jupyter notebook. Use the `san_francisco_housing.ipynb` notebook to complete the following tasks:

* Calculate and plot the housing units per year.

* Calculate and plot the average prices per square foot.

* Compare the average prices by neighborhood.

* Build an interactive neighborhood map.

* Compose your data story.

### Calculate and Plot the Housing Units per Year

For this part of the assignment, use numerical and visual aggregation to calculate the number of housing units per year, and then visualize the results as a bar chart. To do so, complete the following steps:

1. Use the `groupby` function to group the data by year. Aggregate the results by the `mean` of the groups.

2. Use the `hvplot` function to plot the `housing_units_by_year` DataFrame as a bar chart. Make the x-axis represent the `year` and the y-axis represent the `housing_units`.

3. Style and format the line plot to ensure a professionally styled visualization.

4. Note that your resulting plot should appear similar to the following image:

![A screenshot depicts an example of the resulting bar chart.](Images/zoomed-housing-units-by-year.png)

5. Answer the following question:

    * What’s the overall trend in housing units over the period that you’re analyzing?

### Calculate and Plot the Average Sale Prices per Square Foot

For this part of the assignment, use numerical and visual aggregation to calculate the average prices per square foot, and then visualize the results as a bar chart. To do so, complete the following steps:

1. Group the data by year, and then average the results. What’s the lowest gross rent that’s reported for the years that the DataFrame includes?

2. Create a new DataFrame named `prices_square_foot_by_year` by filtering out the “housing_units” column. The new DataFrame should include the averages per year for only the sale price per square foot and the gross rent.

3. Use hvPlot to plot the `prices_square_foot_by_year` DataFrame as a line plot.

    > **Hint** This single plot will include lines for both `sale_price_sqr_foot` and `gross_rent`.

4. Style and format the line plot to ensure a professionally styled visualization.

5. Note that your resulting plot should appear similar to the following image:

![A screenshot depicts an example of the resulting plot.](Images/avg-sale-px-sq-foot-gross-rent.png)

6. Use both the `prices_square_foot_by_year` DataFrame and interactive plots to answer the following questions:

    * Did any year experience a drop in the average sale price per square foot compared to the previous year?

    * If so, did the gross rent increase or decrease during that year?

### Compare the Average Sale Prices by Neighborhood

For this part of the assignment, use interactive visualizations and widgets to explore the average sale price per square foot by neighborhood. To do so, complete the following steps:

1. Create a new DataFrame that groups the original DataFrame by year and neighborhood. Aggregate the results by the `mean` of the groups.

2. Filter out the “housing_units” column to create a DataFrame that includes only the `sale_price_sqr_foot` and `gross_rent` averages per year.

3. Create an interactive line plot with hvPlot that visualizes both `sale_price_sqr_foot` and `gross_rent`. Set the x-axis parameter to the year (`x="year"`). Use the `groupby` parameter to create an interactive widget for `neighborhood`.

4. Style and format the line plot to ensure a professionally styled visualization.

5. Note that your resulting plot should appear similar to the following image:

![A screenshot depicts an example of the resulting plot.](Images/pricing-info-by-neighborhood.png)

6. Use the interactive visualization to answer the following question:

    * For the Anza Vista neighborhood, is the average sale price per square foot for 2016 more or less than the price that’s listed for 2012? 

### Build an Interactive Neighborhood Map

For this part of the assignment, explore the geospatial relationships in the data by using interactive visualizations with hvPlot and GeoViews. To build your map, use the `sfo_data_df` DataFrame (created during the initial import), which includes the neighborhood location data with the average prices. To do all this, complete the following steps:

1. Read the `neighborhood_coordinates.csv` file from the `Resources` folder into the notebook, and create a DataFrame named `neighborhood_locations_df`. Be sure to set the `index_col` of the DataFrame as “Neighborhood”.

2. Using the original `sfo_data_df` Dataframe, create a DataFrame named `all_neighborhood_info_df` that groups the data by neighborhood. Aggregate the results by the `mean` of the group.

3. Review the two code cells that concatenate the `neighborhood_locations_df` DataFrame with the `all_neighborhood_info_df` DataFrame. Note that the first cell uses the [Pandas concat function](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.concat.html) to create a DataFrame named `all_neighborhoods_df`. The second cell cleans the data and sets the “Neighborhood” column. Be sure to run these cells to create the `all_neighborhoods_df` DataFrame, which you’ll need to create the geospatial visualization.

4. Using hvPlot with GeoViews enabled, create a `points` plot for the `all_neighborhoods_df` DataFrame. Be sure to do the following:

    * Set the `geo` parameter to True.
    * Set the `size` parameter to “sale_price_sqr_foot”.
    * Set the `color` parameter to “gross_rent”.
    * Set the `frame_width` parameter to 700.
    * Set the `frame_height` parameter to 500.
    * Include a descriptive title.

Note that your resulting plot should appear similar to the following image:

![A screenshot depicts an example of a scatter plot created with hvPlot and GeoViews.](Images/6-4-geoviews-plot.png)

5. Use the interactive map to answer the following question:

    * Which neighborhood has the highest gross rent, and which has the highest sale price per square foot?

### Compose Your Data Story

Based on the visualizations that you created, answer the following questions:



* How does the trend in rental income growth compare to the trend in sales prices? Does this same trend hold true for all the neighborhoods across San Francisco?

* What insights can you share with your company about the potential one-click, buy-and-rent strategy that they're pursuing? Do neighborhoods exist that you would suggest for investment, and why?

In [38]:
# Import the required libraries and dependencies
import pandas as pd
import hvplot.pandas
from pathlib import Path

In [39]:
# Removes warnings about future versions
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)

## Import the data 

In [40]:
# Using the read_csv function and Path module, create a DataFrame 

# Sets file path of the sfo_neighborhoods_census_data.csv file from the Resources folder
file_path = Path("Resources/sfo_neighborhoods_census_data.csv")

# Creates the dataframe
sfo_data_df = pd.read_csv(file_path)

# Review the first and last five rows of the DataFrame
sfo_data_df

Unnamed: 0,year,neighborhood,sale_price_sqr_foot,housing_units,gross_rent
0,2010,Alamo Square,291.182945,372560,1239
1,2010,Anza Vista,267.932583,372560,1239
2,2010,Bayview,170.098665,372560,1239
3,2010,Buena Vista Park,347.394919,372560,1239
4,2010,Central Richmond,319.027623,372560,1239
...,...,...,...,...,...
392,2016,Telegraph Hill,903.049771,384242,4390
393,2016,Twin Peaks,970.085470,384242,4390
394,2016,Van Ness/ Civic Center,552.602567,384242,4390
395,2016,Visitacion Valley,328.319007,384242,4390


---

## Calculate and Plot the Housing Units per Year

For this part of the assignment, use numerical and visual aggregation to calculate the number of housing units per year, and then visualize the results as a bar chart. To do so, complete the following steps:

1. Use the `groupby` function to group the data by year. Aggregate the results by the `mean` of the groups.

2. Use the `hvplot` function to plot the `housing_units_by_year` DataFrame as a bar chart. Make the x-axis represent the `year` and the y-axis represent the `housing_units`.

3. Style and format the line plot to ensure a professionally styled visualization.

4. Note that your resulting plot should appear similar to the following image:

![A screenshot depicts an example of the resulting bar chart.](Images/zoomed-housing-units-by-year.png)

5. Answer the following question:

    * What’s the overall trend in housing units over the period that you’re analyzing?



### Step 1 - 3: 
1. Use the `groupby` function to group the data by year. Aggregate the results by the `mean` of the groups.
2. Use the `hvplot` function to plot the `housing_units_by_year` DataFrame as a bar chart. Make the x-axis represent the `year` and the y-axis represent the `housing_units`.
3. Style and format the line plot to ensure a professionally styled visualization.

In [41]:
# Create a numerical aggregation that groups the data by the year and then averages the results.
housing_units_by_year = sfo_data_df[["year", "housing_units"]].groupby("year").mean()

# Displays data in a visual aggregation 
housing_units_by_year.hvplot.bar(
    xlabel='Year',
    ylabel='Housing Units',
    ylim=(365000, 385000),
    title='Housing Units in San Francisco from 2010 to 2016'
).opts(yformatter='%.0f')

### Step 4: Answer the following question:

**Question:** What is the overall trend in housing_units over the period being analyzed?

**Answer:** Over the seven year period being analyzed, we are seeing a steady increase in the housing units year over year. As the bar graph indicates, there is a linear rise. Using some of the built in pandas operations, we can calculate the average rate of increase with the code below.

`housing_units_2010 = housing_units_by_year.loc[2010][0]`

`housing_units_2016 = housing_units_by_year.loc[2016][0]`

`rate_of_increase = (housing_units_2016 - housing_units_2010) / len(housing_units_by_year)`

By outputting the code below, we find that the `rate_of_increase` is 1668.857142857143 units per year.

In [42]:
# Finds the average increase in price over the 7 year period
# And is used in the above markdown
housing_units_2010 = housing_units_by_year.loc[2010][0]
housing_units_2016 = housing_units_by_year.loc[2016][0]

rate_of_increase = (housing_units_2016 - housing_units_2010) / len(housing_units_by_year)

# Display the result
rate_of_increase

1668.857142857143

---

## Calculate and Plot the Average Sale Prices per Square Foot

For this part of the assignment, use numerical and visual aggregation to calculate the average prices per square foot, and then visualize the results as a bar chart. To do so, complete the following steps:

1. Group the data by year, and then average the results. What’s the lowest gross rent that’s reported for the years that the DataFrame includes?

2. Create a new DataFrame named `prices_square_foot_by_year` by filtering out the “housing_units” column. The new DataFrame should include the averages per year for only the sale price per square foot and the gross rent.

3. Use hvPlot to plot the `prices_square_foot_by_year` DataFrame as a line plot.

    > **Hint** This single plot will include lines for both `sale_price_sqr_foot` and `gross_rent`.

4. Style and format the line plot to ensure a professionally styled visualization.

5. Note that your resulting plot should appear similar to the following image:

![A screenshot depicts an example of the resulting plot.](Images/avg-sale-px-sq-foot-gross-rent.png)

6. Use both the `prices_square_foot_by_year` DataFrame and interactive plots to answer the following questions:

    * Did any year experience a drop in the average sale price per square foot compared to the previous year?

    * If so, did the gross rent increase or decrease during that year?



### Step 1: Group the data by year, and then average the results.

In [43]:
# Create a numerical aggregation by grouping the data by year and averaging the results
prices_square_foot_by_year = sfo_data_df.groupby("year")["sale_price_sqr_foot", "gross_rent"].mean()

# Review the resulting DataFrame
prices_square_foot_by_year

Unnamed: 0_level_0,sale_price_sqr_foot,gross_rent
year,Unnamed: 1_level_1,Unnamed: 2_level_1
2010,369.344353,1239.0
2011,341.903429,1530.0
2012,399.389968,2324.0
2013,483.600304,2971.0
2014,556.277273,3528.0
2015,632.540352,3739.0
2016,697.643709,4390.0


**Question:** What is the lowest gross rent reported for the years included in the DataFrame?

**Answer:** The lowest rent reported is 2010, the earliest year that we have data for. As the above table indicates, both the price per square foot and the rent increase nearly every year (with the one exception being the decrease in price per square foot in 2011) over the seven years tracked. 


### Step 2: Create a new DataFrame named `prices_square_foot_by_year` by filtering out the “housing_units” column. The new DataFrame should include the averages per year for only the sale price per square foot and the gross rent.

In [None]:
# Step two (filter out the 'housing_units' column) was performed above

### Step 3: Use hvPlot to plot the `prices_square_foot_by_year` DataFrame as a line plot.

> **Hint** This single plot will include lines for both `sale_price_sqr_foot` and `gross_rent`

### Step 4: Style and format the line plot to ensure a professionally styled visualization.


In [90]:
# Plot prices_square_foot_by_year. 
# Inclued labels for the x- and y-axes, and a title.
prices_square_foot_by_year.hvplot.line(
    x='year',
    y=['sale_price_sqr_foot', 'gross_rent'],
    xlabel='Year',
    ylabel="Gross Rent / Sale Price Per Square Foot",
    title="Sale Price Per Square Foot and Average Gross Rent - 2010-2016 - San Francisco",
)

# Note:
I attempted to separate the y-axis for the two plots on the left and right side of the graph. I followed the code from this resources.
https://discourse.holoviz.org/t/how-to-implement-secondary-y-axis/4317/2

`from bokeh.models import GlyphRenderer, LinearAxis, LinearScale, Range1d

plot1 = prices_square_foot_by_year.hvplot.line(
    x='year',
    y='gross_rent',
    ylabel="Price in $USD",
    title="Sale Price Per Square Foot - 2010-2016 - San Francisco",
)


plot2 = prices_square_foot_by_year.hvplot.line(
    x='year',
    y='sale_price_sqr_foot',
    ylabel="Price in $USD",
    title="Average Gross Rent - 2010-2016 - San Francisco",
)

def overlay_hook(plot, element):
    # Adds right y-axis
    p = plot.handles["plot"]
    p.extra_y_scales = {"right": LinearScale()}
    p.extra_y_ranges = {"right": Range1d(0, 50)}
    p.add_layout(LinearAxis(y_range_name="right"), "right")
   
    # Finds the lsat line and sets it to right
    lines = [p for p in p.renderers if isinstance(p, GlyphRenderer)]
    lines[-1].y_range_name = "right"

plot1.opts(ylim=(0, 5000)) * plot2.opts(ylim=(0, 1000)).opts(hooks=[overlay_hook])
`

However the result was unsuccessful. Still, I am keeping it here as a note for possible implimentation on a future version.

In [93]:
plot1 = prices_square_foot_by_year.hvplot.line(
    x='year',
    y='gross_rent',
    ylabel="Price in $USD",
    title="Sale Price Per Square Foot - 2010-2016 - San Francisco",
)

plot2 = prices_square_foot_by_year.hvplot.line(
    x='year',
    y='sale_price_sqr_foot',
    ylabel="Price in $USD",
    title="Average Gross Rent - 2010-2016 - San Francisco",
)

def overlay_hook(plot, element):
    # Adds right y-axis
    p = plot.handles["plot"]
    p.extra_y_scales = {"right": LinearScale()}
    p.extra_y_ranges = {"right": Range1d(0, 50)}
    p.add_layout(LinearAxis(y_range_name="right"), "right")

    # Finds the lsat line and sets it to right
    lines = [p for p in p.renderers if isinstance(p, GlyphRenderer)]
    lines[-1].y_range_name = "right"

plot1.opts(ylim=(0, 5000)) * plot2.opts(ylim=(0, 1000)).opts(hooks=[overlay_hook])

### Step 5: Use both the `prices_square_foot_by_year` DataFrame and interactive plots to answer the following questions:

**Question:** Did any year experience a drop in the average sale price per square foot compared to the previous year?

**Answer:** 2011 was the only year that experienced a drop in the average price per square foot compared to the previous year.

**Question:** If so, did the gross rent increase or decrease during that year?

**Answer:** Despite the decrease in the average price per square foot in 2011, the average gross rent went up that year.

---

## Compare the Average Sale Prices by Neighborhood

For this part of the assignment, use interactive visualizations and widgets to explore the average sale price per square foot by neighborhood. To do so, complete the following steps:

1. Create a new DataFrame that groups the original DataFrame by year and neighborhood. Aggregate the results by the `mean` of the groups.

2. Filter out the “housing_units” column to create a DataFrame that includes only the `sale_price_sqr_foot` and `gross_rent` averages per year.

3. Create an interactive line plot with hvPlot that visualizes both `sale_price_sqr_foot` and `gross_rent`. Set the x-axis parameter to the year (`x="year"`). Use the `groupby` parameter to create an interactive widget for `neighborhood`.

4. Style and format the line plot to ensure a professionally styled visualization.

5. Note that your resulting plot should appear similar to the following image:

![A screenshot depicts an example of the resulting plot.](Images/pricing-info-by-neighborhood.png)

6. Use the interactive visualization to answer the following question:

    * For the Anza Vista neighborhood, is the average sale price per square foot for 2016 more or less than the price that’s listed for 2012? 


### Step 1: Create a new DataFrame that groups the original DataFrame by year and neighborhood. Aggregate the results by the `mean` of the groups.

In [45]:
# Group by year and neighborhood and then create a new dataframe of the mean values
prices_by_year_by_neighborhood = sfo_data_df.groupby(["neighborhood", "year"])["sale_price_sqr_foot", "gross_rent"].mean()

# Review the DataFrame
prices_by_year_by_neighborhood

Unnamed: 0_level_0,Unnamed: 1_level_0,sale_price_sqr_foot,gross_rent
neighborhood,year,Unnamed: 2_level_1,Unnamed: 3_level_1
Alamo Square,2010,291.182945,1239.0
Alamo Square,2011,272.527310,1530.0
Alamo Square,2012,183.099317,2324.0
Alamo Square,2013,387.794144,2971.0
Alamo Square,2014,484.443552,3528.0
...,...,...,...
Yerba Buena,2011,438.860545,1530.0
Yerba Buena,2012,491.814003,2324.0
Yerba Buena,2013,753.011413,2971.0
Yerba Buena,2014,479.923749,3528.0


### Step 2: Filter out the “housing_units” column to create a DataFrame that includes only the `sale_price_sqr_foot` and `gross_rent` averages per year.

In [None]:
# Step 2 performed in the cell above

### Step 3: Create an interactive line plot with hvPlot that visualizes both `sale_price_sqr_foot` and `gross_rent`. Set the x-axis parameter to the year (`x="year"`). Use the `groupby` parameter to create an interactive widget for `neighborhood`.

### Step 4: Style and format the line plot to ensure a professionally styled visualization.

In [46]:
# Use hvplot to create an interactive line plot of the average price per square foot
# The plot should have a dropdown selector for the neighborhood
prices_by_year_by_neighborhood.hvplot.line(
    x='year',
    y=['sale_price_sqr_foot', 'gross_rent'],
    xlabel="Year",
    ylabel="Gross Rent / Sale Price Per Square Foot",
    groupby='neighborhood',
    title="Sale Price Per Square Foot and Average Gross Rent - 2010-2016 - By Neighborhood"  
)

### Step 6: Use the interactive visualization to answer the following question:

**Question:** For the Anza Vista neighborhood, is the average sale price per square foot for 2016 more or less than the price that’s listed for 2012? 

**Answer:** For the Anza Vista neighborhood, the average sale price per square foot in 2016 was less than it was in 2012.

---

## Build an Interactive Neighborhood Map

For this part of the assignment, explore the geospatial relationships in the data by using interactive visualizations with hvPlot and GeoViews. To build your map, use the `sfo_data_df` DataFrame (created during the initial import), which includes the neighborhood location data with the average prices. To do all this, complete the following steps:

1. Read the `neighborhood_coordinates.csv` file from the `Resources` folder into the notebook, and create a DataFrame named `neighborhood_locations_df`. Be sure to set the `index_col` of the DataFrame as “Neighborhood”.

2. Using the original `sfo_data_df` Dataframe, create a DataFrame named `all_neighborhood_info_df` that groups the data by neighborhood. Aggregate the results by the `mean` of the group.

3. Review the two code cells that concatenate the `neighborhood_locations_df` DataFrame with the `all_neighborhood_info_df` DataFrame. Note that the first cell uses the [Pandas concat function](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.concat.html) to create a DataFrame named `all_neighborhoods_df`. The second cell cleans the data and sets the “Neighborhood” column. Be sure to run these cells to create the `all_neighborhoods_df` DataFrame, which you’ll need to create the geospatial visualization.

4. Using hvPlot with GeoViews enabled, create a `points` plot for the `all_neighborhoods_df` DataFrame. Be sure to do the following:

    * Set the `size` parameter to “sale_price_sqr_foot”.

    * Set the `color` parameter to “gross_rent”.

    * Set the `size_max` parameter to “25”.

    * Set the `zoom` parameter to “11”.

Note that your resulting plot should appear similar to the following image:

![A screenshot depicts an example of a scatter plot created with hvPlot and GeoViews.](Images/6-4-geoviews-plot.png)

5. Use the interactive map to answer the following question:

    * Which neighborhood has the highest gross rent, and which has the highest sale price per square foot?


### Step 1: Read the `neighborhood_coordinates.csv` file from the `Resources` folder into the notebook, and create a DataFrame named `neighborhood_locations_df`. Be sure to set the `index_col` of the DataFrame as “Neighborhood”.

In [47]:
# Load neighborhoods coordinates data
neighborhood_file_path = Path("Resources/neighborhoods_coordinates.csv")
neighborhood_locations_df = pd.read_csv(neighborhood_file_path, index_col="Neighborhood")

# Review the DataFrame
neighborhood_locations_df

Unnamed: 0_level_0,Lat,Lon
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1
Alamo Square,37.791012,-122.402100
Anza Vista,37.779598,-122.443451
Bayview,37.734670,-122.401060
Bayview Heights,37.728740,-122.410980
Bernal Heights,37.728630,-122.443050
...,...,...
West Portal,37.740260,-122.463880
Western Addition,37.792980,-122.435790
Westwood Highlands,37.734700,-122.456854
Westwood Park,37.734150,-122.457000


### Step 2: Using the original `sfo_data_df` Dataframe, create a DataFrame named `all_neighborhood_info_df` that groups the data by neighborhood. Aggregate the results by the `mean` of the group.

In [48]:
# Calculates the mean values for each neighborhood
all_neighborhood_info_df = sfo_data_df.groupby("neighborhood").mean()

# Drops the 'year' column because we don't need that figured averaged
all_neighborhood_info_df.drop('year', axis=1, inplace=True)

# Review the resulting DataFrame
all_neighborhood_info_df

Unnamed: 0_level_0,sale_price_sqr_foot,housing_units,gross_rent
neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Alamo Square,366.020712,378401.00,2817.285714
Anza Vista,373.382198,379050.00,3031.833333
Bayview,204.588623,376454.00,2318.400000
Bayview Heights,590.792839,382295.00,3739.000000
Bernal Heights,576.746488,379374.50,3080.333333
...,...,...,...
West Portal,498.488485,376940.75,2515.500000
Western Addition,307.562201,377427.50,2555.166667
Westwood Highlands,533.703935,376454.00,2250.500000
Westwood Park,687.087575,382295.00,3959.000000


### Step 3: Review the two code cells that concatenate the `neighborhood_locations_df` DataFrame with the `all_neighborhood_info_df` DataFrame. 

Note that the first cell uses the [Pandas concat function](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.concat.html) to create a DataFrame named `all_neighborhoods_df`. 

The second cell cleans the data and sets the “Neighborhood” column. 

Be sure to run these cells to create the `all_neighborhoods_df` DataFrame, which you’ll need to create the geospatial visualization.

In [61]:
# Using the Pandas `concat` function, join the 
# neighborhood_locations_df and the all_neighborhood_info_df DataFrame
# The axis of the concatenation is "columns".
# The concat function will automatially combine columns with
# identical information, while keeping the additional columns.
all_neighborhoods_df = pd.concat(
    [neighborhood_locations_df, all_neighborhood_info_df], 
    axis="columns",
    sort=False
).dropna()

# Review the resulting DataFrame
display(all_neighborhoods_df.head())
display(all_neighborhoods_df.tail())

Unnamed: 0,Lat,Lon,sale_price_sqr_foot,housing_units,gross_rent
Alamo Square,37.791012,-122.4021,366.020712,378401.0,2817.285714
Anza Vista,37.779598,-122.443451,373.382198,379050.0,3031.833333
Bayview,37.73467,-122.40106,204.588623,376454.0,2318.4
Bayview Heights,37.72874,-122.41098,590.792839,382295.0,3739.0
Buena Vista Park,37.76816,-122.43933,452.680591,378076.5,2698.833333


Unnamed: 0,Lat,Lon,sale_price_sqr_foot,housing_units,gross_rent
West Portal,37.74026,-122.46388,498.488485,376940.75,2515.5
Western Addition,37.79298,-122.43579,307.562201,377427.5,2555.166667
Westwood Highlands,37.7347,-122.456854,533.703935,376454.0,2250.5
Westwood Park,37.73415,-122.457,687.087575,382295.0,3959.0
Yerba Buena,37.79298,-122.39636,576.709848,377427.5,2555.166667


In [62]:
# dropna() called when concatinating in the cell above

# Rename the "index" column as "Neighborhood" for use in the Visualization
all_neighborhoods_df = all_neighborhoods_df.rename(columns={"index": "Neighborhood"})

# Review the resulting DataFrame
display(all_neighborhoods_df.head())
display(all_neighborhoods_df.tail())

Unnamed: 0,Lat,Lon,sale_price_sqr_foot,housing_units,gross_rent
Alamo Square,37.791012,-122.4021,366.020712,378401.0,2817.285714
Anza Vista,37.779598,-122.443451,373.382198,379050.0,3031.833333
Bayview,37.73467,-122.40106,204.588623,376454.0,2318.4
Bayview Heights,37.72874,-122.41098,590.792839,382295.0,3739.0
Buena Vista Park,37.76816,-122.43933,452.680591,378076.5,2698.833333


Unnamed: 0,Lat,Lon,sale_price_sqr_foot,housing_units,gross_rent
West Portal,37.74026,-122.46388,498.488485,376940.75,2515.5
Western Addition,37.79298,-122.43579,307.562201,377427.5,2555.166667
Westwood Highlands,37.7347,-122.456854,533.703935,376454.0,2250.5
Westwood Park,37.73415,-122.457,687.087575,382295.0,3959.0
Yerba Buena,37.79298,-122.39636,576.709848,377427.5,2555.166667


### Step 4: Using hvPlot with GeoViews enabled, create a `points` plot for the `all_neighborhoods_df` DataFrame. Be sure to do the following:

* Set the `geo` parameter to True.
* Set the `size` parameter to “sale_price_sqr_foot”.
* Set the `color` parameter to “gross_rent”.
* Set the `frame_width` parameter to 700.
* Set the `frame_height` parameter to 500.
* Include a descriptive title.

In [64]:
# Create a plot to analyze neighborhood info
all_neighborhoods_df.hvplot.points(
    'Lon',
    'Lat',
    hover_cols=['index'],
    geo=True,
    tiles='OSM',
    size='sale_price_sqr_foot',
    color='gross_rent',
    frame_width=700,
    frame_height=500,
    title='All Neighborhood Info',
).opts(yformatter='%.0f')

### Step 5: Use the interactive map to answer the following question:

**Question:** Which neighborhood has the highest gross rent, and which has the highest sale price per square foot?

**Answer:** By hovering over the darkest circle, we can see that Westwood Park is the neighborhood with the highest rent. The code below also validates this. As for the area with the highest price per square foot, this is Union Square Disctrict. While the above graph does not create an easy visualization for the price per square foot via color coding, this information can be gleaned by overing over the plots. And this fact is also validated in the code below.

In [77]:
# Identifies the value of the highest rent
highest_rent = all_neighborhoods_df['gross_rent'].max()

# Pulls and displays the information contained on the neighborhood with the highest rent
neibhborhood_with_highest_rent = all_neighborhoods_df[all_neighborhoods_df['gross_rent'] == highest_rent]
print(neibhborhood_with_highest_rent)

# Identifies the value of the highest sales price per square foot
highest_sale_price_per_sq_foot = all_neighborhoods_df['sale_price_sqr_foot'].max()

# Pulls and displays the information contained on the neighborhood with the highest sale price per square foot
neibhborhood_with_highest_sale_price = all_neighborhoods_df[all_neighborhoods_df['sale_price_sqr_foot'] == highest_sale_price_per_sq_foot]
print(neibhborhood_with_highest_sale_price)

                    Lat      Lon  sale_price_sqr_foot  housing_units  \
Westwood Park  37.73415 -122.457           687.087575       382295.0   

               gross_rent  
Westwood Park      3959.0  
                            Lat       Lon  sale_price_sqr_foot  housing_units  \
Union Square District  37.79101 -122.4021           903.993258       377427.5   

                        gross_rent  
Union Square District  2555.166667  


## Compose Your Data Story

Based on the visualizations that you have created, compose a data story that synthesizes your analysis by answering the following questions:

**Question:**  How does the trend in rental income growth compare to the trend in sales prices? Does this same trend hold true for all the neighborhoods across San Francisco?

**Answer:** By analyzing the graph 'Sale Price Per Square Foot and Average Gross Rent - 2010-2016 - San Francisco', we can see that on the whole both gross rent and sale price per square foot have increased fairily steadily over the 7 years. But not all neighborhoods in San Francisco are the same. The graph 'Sale Price Per Square Foot and Average Gross Rent - 2010-2016 - By Neigborhood' alows us to see the same information on a neighborhood by neighborhood basis. And in doing so, we can see that while rents in almost all the neigbhorhoods at a steady rate, the sale price per square foot did not necessarily incearse every year in each neighborhood. 

**Question:** What insights can you share with your company about the potential one-click, buy-and-rent strategy that they're pursuing? Do neighborhoods exist that you would suggest for investment, and why?

**Answer:** For our Proptech company, San Francisco would be a great makret to tes the instant, one-click service for people to buy properties and then rent them. This is because rents are increasing fairily steadily across the city. This steady increase in gross rent includeds neighborhoods where there are fluctionations in the price per square foot. Which means that our customers (i.e. the investors who are going to use our one-click buy and rent service) would be able to take advantage of relative increase in the capitalization rate that may result from 1) increase in gross rent and/or 2) relative decrease in the price per square foot in a given neighborhood. The bottom line is the based on the data analyzed, San Francisco was a hot real estate market and the players / investors in this market would likely be eager to take advantage of new proptech services.