# Housing Rental Analysis for San Francisco

In this challenge, your job is to use your data visualization skills, including aggregation, interactive visualizations, and geospatial analysis, to find properties in the San Francisco market that are viable investment opportunities.

## Instructions

Use the `san_francisco_housing.ipynb` notebook to visualize and analyze the real-estate data.

Note that this assignment requires you to create a visualization by using hvPlot and GeoViews. Additionally, you need to read the `sfo_neighborhoods_census_data.csv` file from the `Resources` folder into the notebook and create the DataFrame that you’ll use in the analysis.

The main task in this Challenge is to visualize and analyze the real-estate data in your Jupyter notebook. Use the `san_francisco_housing.ipynb` notebook to complete the following tasks:

* Calculate and plot the housing units per year.

* Calculate and plot the average prices per square foot.

* Compare the average prices by neighborhood.

* Build an interactive neighborhood map.

* Compose your data story.

### Calculate and Plot the Housing Units per Year

For this part of the assignment, use numerical and visual aggregation to calculate the number of housing units per year, and then visualize the results as a bar chart. To do so, complete the following steps:

1. Use the `groupby` function to group the data by year. Aggregate the results by the `mean` of the groups.

2. Use the `hvplot` function to plot the `housing_units_by_year` DataFrame as a bar chart. Make the x-axis represent the `year` and the y-axis represent the `housing_units`.

3. Style and format the line plot to ensure a professionally styled visualization.

4. Note that your resulting plot should appear similar to the following image:

![A screenshot depicts an example of the resulting bar chart.](Images/zoomed-housing-units-by-year.png)

5. Answer the following question:

    * What’s the overall trend in housing units over the period that you’re analyzing?

### Calculate and Plot the Average Sale Prices per Square Foot

For this part of the assignment, use numerical and visual aggregation to calculate the average prices per square foot, and then visualize the results as a bar chart. To do so, complete the following steps:

1. Group the data by year, and then average the results. What’s the lowest gross rent that’s reported for the years that the DataFrame includes?

2. Create a new DataFrame named `prices_square_foot_by_year` by filtering out the “housing_units” column. The new DataFrame should include the averages per year for only the sale price per square foot and the gross rent.

3. Use hvPlot to plot the `prices_square_foot_by_year` DataFrame as a line plot.

    > **Hint** This single plot will include lines for both `sale_price_sqr_foot` and `gross_rent`.

4. Style and format the line plot to ensure a professionally styled visualization.

5. Note that your resulting plot should appear similar to the following image:

![A screenshot depicts an example of the resulting plot.](Images/avg-sale-px-sq-foot-gross-rent.png)

6. Use both the `prices_square_foot_by_year` DataFrame and interactive plots to answer the following questions:

    * Did any year experience a drop in the average sale price per square foot compared to the previous year?

    * If so, did the gross rent increase or decrease during that year?

### Compare the Average Sale Prices by Neighborhood

For this part of the assignment, use interactive visualizations and widgets to explore the average sale price per square foot by neighborhood. To do so, complete the following steps:

1. Create a new DataFrame that groups the original DataFrame by year and neighborhood. Aggregate the results by the `mean` of the groups.

2. Filter out the “housing_units” column to create a DataFrame that includes only the `sale_price_sqr_foot` and `gross_rent` averages per year.

3. Create an interactive line plot with hvPlot that visualizes both `sale_price_sqr_foot` and `gross_rent`. Set the x-axis parameter to the year (`x="year"`). Use the `groupby` parameter to create an interactive widget for `neighborhood`.

4. Style and format the line plot to ensure a professionally styled visualization.

5. Note that your resulting plot should appear similar to the following image:

![A screenshot depicts an example of the resulting plot.](Images/pricing-info-by-neighborhood.png)

6. Use the interactive visualization to answer the following question:

    * For the Anza Vista neighborhood, is the average sale price per square foot for 2016 more or less than the price that’s listed for 2012? 

### Build an Interactive Neighborhood Map

For this part of the assignment, explore the geospatial relationships in the data by using interactive visualizations with hvPlot and GeoViews. To build your map, use the `sfo_data_df` DataFrame (created during the initial import), which includes the neighborhood location data with the average prices. To do all this, complete the following steps:

1. Read the `neighborhood_coordinates.csv` file from the `Resources` folder into the notebook, and create a DataFrame named `neighborhood_locations_df`. Be sure to set the `index_col` of the DataFrame as “Neighborhood”.

2. Using the original `sfo_data_df` Dataframe, create a DataFrame named `all_neighborhood_info_df` that groups the data by neighborhood. Aggregate the results by the `mean` of the group.

3. Review the two code cells that concatenate the `neighborhood_locations_df` DataFrame with the `all_neighborhood_info_df` DataFrame. Note that the first cell uses the [Pandas concat function](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.concat.html) to create a DataFrame named `all_neighborhoods_df`. The second cell cleans the data and sets the “Neighborhood” column. Be sure to run these cells to create the `all_neighborhoods_df` DataFrame, which you’ll need to create the geospatial visualization.

4. Using hvPlot with GeoViews enabled, create a `points` plot for the `all_neighborhoods_df` DataFrame. Be sure to do the following:

    * Set the `geo` parameter to True.
    * Set the `size` parameter to “sale_price_sqr_foot”.
    * Set the `color` parameter to “gross_rent”.
    * Set the `frame_width` parameter to 700.
    * Set the `frame_height` parameter to 500.
    * Include a descriptive title.

Note that your resulting plot should appear similar to the following image:

![A screenshot depicts an example of a scatter plot created with hvPlot and GeoViews.](Images/6-4-geoviews-plot.png)

5. Use the interactive map to answer the following question:

    * Which neighborhood has the highest gross rent, and which has the highest sale price per square foot?

### Compose Your Data Story

Based on the visualizations that you created, answer the following questions:

* How does the trend in rental income growth compare to the trend in sales prices? Does this same trend hold true for all the neighborhoods across San Francisco?

* What insights can you share with your company about the potential one-click, buy-and-rent strategy that they're pursuing? Do neighborhoods exist that you would suggest for investment, and why?

In [1]:
# Import the required libraries and dependencies
import pandas as pd
import hvplot.pandas
from pathlib import Path
import os

## Import the data 

In [2]:
# Using the read_csv function and Path module, create a DataFrame 
# by importing the sfo_neighborhoods_census_data.csv file from the Resources folder
sfo_data_df = pd.read_csv(
    Path('/Users/montygash/Desktop/FinTechWork/python-homework/Module 6 Challenge/Resources/sfo_neighborhoods_census_data.csv')
)

# Review the first five rows of the DataFrame
sfo_data_df.head()

Unnamed: 0,year,neighborhood,sale_price_sqr_foot,housing_units,gross_rent
0,2010,Alamo Square,291.182945,372560,1239
1,2010,Anza Vista,267.932583,372560,1239
2,2010,Bayview,170.098665,372560,1239
3,2010,Buena Vista Park,347.394919,372560,1239
4,2010,Central Richmond,319.027623,372560,1239


---

## Calculate and Plot the Housing Units per Year

For this part of the assignment, use numerical and visual aggregation to calculate the number of housing units per year, and then visualize the results as a bar chart. To do so, complete the following steps:

1. Use the `groupby` function to group the data by year. Aggregate the results by the `mean` of the groups.

2. Use the `hvplot` function to plot the `housing_units_by_year` DataFrame as a bar chart. Make the x-axis represent the `year` and the y-axis represent the `housing_units`.

3. Style and format the line plot to ensure a professionally styled visualization.

4. Note that your resulting plot should appear similar to the following image:

![A screenshot depicts an example of the resulting bar chart.](Images/zoomed-housing-units-by-year.png)

5. Answer the following question:

    * What’s the overall trend in housing units over the period that you’re analyzing?



### Step 1: Use the `groupby` function to group the data by year. Aggregate the results by the `mean` of the groups.

In [3]:
# Create a numerical aggregation that groups the data by the year and then averages the results.
housing_units_by_year = sfo_data_df.groupby('year').housing_units.mean()

# Review the DataFrame
display(housing_units_by_year)

#capturing the min and max values of the housing units to use in the graph below
min_value1 = housing_units_by_year.min()
max_value1 = housing_units_by_year.max()

year
2010    372560.0
2011    374507.0
2012    376454.0
2013    378401.0
2014    380348.0
2015    382295.0
2016    384242.0
Name: housing_units, dtype: float64

### Step 2: Use the `hvplot` function to plot the `housing_units_by_year` DataFrame as a bar chart. Make the x-axis represent the `year` and the y-axis represent the `housing_units`.

### Step 3: Style and format the line plot to ensure a professionally styled visualization.

In [19]:
# Create a visual aggregation explore the housing units by year
housing_units_by_year.hvplot.bar(
    x="year", 
    y="housing_units", 
    xlabel="Year", 
    ylabel="Housing Units",
    title="Number of Housing Units by Year in San Francisco" 
).opts(
    yformatter='%.0f',
    ylim=(min_value1-1000,max_value1+1000) #restricting y-axis to set initial view of graph
)


### Step 5: Answer the following question:

**Question:** What is the overall trend in housing_units over the period being analyzed?

**Answer:** Given the period from 2010 to 2016, as year increases, the amount of housing units increases. From a first glance it looks like a positive linear relationship between the number of housing units and the year in which each data point was taken.

---

## Calculate and Plot the Average Sale Prices per Square Foot

For this part of the assignment, use numerical and visual aggregation to calculate the average prices per square foot, and then visualize the results as a bar chart. To do so, complete the following steps:

1. Group the data by year, and then average the results. What’s the lowest gross rent that’s reported for the years that the DataFrame includes?

2. Create a new DataFrame named `prices_square_foot_by_year` by filtering out the “housing_units” column. The new DataFrame should include the averages per year for only the sale price per square foot and the gross rent.

3. Use hvPlot to plot the `prices_square_foot_by_year` DataFrame as a line plot.

    > **Hint** This single plot will include lines for both `sale_price_sqr_foot` and `gross_rent`.

4. Style and format the line plot to ensure a professionally styled visualization.

5. Note that your resulting plot should appear similar to the following image:

![A screenshot depicts an example of the resulting plot.](Images/avg-sale-px-sq-foot-gross-rent.png)

6. Use both the `prices_square_foot_by_year` DataFrame and interactive plots to answer the following questions:

    * Did any year experience a drop in the average sale price per square foot compared to the previous year?

    * If so, did the gross rent increase or decrease during that year?



### Step 1: Group the data by year, and then average the results.

In [5]:
# Create a numerical aggregation by grouping the data by year and averaging the results
prices_square_foot_by_year = sfo_data_df.groupby('year').sale_price_sqr_foot.mean()

# Review the resulting DataFrame
display(prices_square_foot_by_year)


# saving the minimum value of the prices/sqft since that will be the min value overall on the graph
sqft_min = prices_square_foot_by_year.min()

# printing the lowest gross rent for the question below
print(f"The minimum gross rent for the years in the DataFrame is ${sfo_data_df['gross_rent'].min()}")

year
2010    369.344353
2011    341.903429
2012    399.389968
2013    483.600304
2014    556.277273
2015    632.540352
2016    697.643709
Name: sale_price_sqr_foot, dtype: float64

The minimum gross rent for the years in the DataFrame is $1239


**Question:** What is the lowest gross rent reported for the years included in the DataFrame?

**Answer:** # $1,239

### Step 2: Create a new DataFrame named `prices_square_foot_by_year` by filtering out the “housing_units” column. The new DataFrame should include the averages per year for only the sale price per square foot and the gross rent.

In [6]:
# Filter out the housing_units column, creating a new DataFrame 
# Keep all rows
# Keep only sale_price_sqr_foot and gross_rent averages per year columns
prices_square_foot_by_year_df = sfo_data_df.loc[:, ['year', 'gross_rent', 'sale_price_sqr_foot']]

# Review the DataFrame
prices_square_foot_by_year_df.head()

Unnamed: 0,year,gross_rent,sale_price_sqr_foot
0,2010,1239,291.182945
1,2010,1239,267.932583
2,2010,1239,170.098665
3,2010,1239,347.394919
4,2010,1239,319.027623


### Step 3: Use hvPlot to plot the `prices_square_foot_by_year` DataFrame as a line plot.

> **Hint** This single plot will include lines for both `sale_price_sqr_foot` and `gross_rent`

### Step 4: Style and format the line plot to ensure a professionally styled visualization.


In [7]:
# Create a numerical aggregation by grouping the data by year and averaging the results
prices_square_foot_by_year = sfo_data_df.groupby('year').sale_price_sqr_foot.mean()

# Do the same as above for gross_rent
gross_rent_by_year = sfo_data_df.groupby('year').gross_rent.mean()

# saving the maximum value of the rent since that will be the max value overall on the graph
gross_rent_max = gross_rent_by_year.max()


# Plot prices_square_foot_by_year. 
# Inclued labels for the x- and y-axes, and a title.
prices_sqft_by_year_plot = prices_square_foot_by_year.hvplot(
    x='year',
    y='sale_price_sqr_foot',
    title = 'Price / sqft. by year in San Fransisco',
    xlabel='Year',
    ylabel='Price per Square Foot'
)

# Plotting gross_rent_by_year, and saving it as a variable
gross_rent_by_year_plot = gross_rent_by_year.hvplot(
    x='year',
    y='gross_rent',
    title='Gross Rent by Year in San Francisco',
    xlabel='Year',
    ylabel='Gross Rent',
)

# putting both groups of data onto the same plot
combined_plot = (prices_sqft_by_year_plot * gross_rent_by_year_plot).opts(
    title="Price / sqft. and Gross Rent by Year in San Francisco",
    xlabel="Year",
    ylabel="Price per Sqft / Gross Rent",
    ylim=(sqft_min-50, gross_rent_max+100),    # rescrict y-values to clean up the graph
    legend_position='top_left',
    width=800,  
    height=500,
)

#display(prices_sqft_by_year_plot)
#display(gross_rent_by_year_plot)
display(combined_plot)

### Step 6: Use both the `prices_square_foot_by_year` DataFrame and interactive plots to answer the following questions:

**Question:** Did any year experience a drop in the average sale price per square foot compared to the previous year?

**Answer:** From 2010 to 2011, the average sale price per square foot decreased. Other than that, the annual average sale price per square foot has increased from 2011 to 2016.

**Question:** If so, did the gross rent increase or decrease during that year?

**Answer:** Given our data, gross rent in San Francisco increased from 2010 to 2011.

---

## Compare the Average Sale Prices by Neighborhood

For this part of the assignment, use interactive visualizations and widgets to explore the average sale price per square foot by neighborhood. To do so, complete the following steps:

1. Create a new DataFrame that groups the original DataFrame by year and neighborhood. Aggregate the results by the `mean` of the groups.

2. Filter out the “housing_units” column to create a DataFrame that includes only the `sale_price_sqr_foot` and `gross_rent` averages per year.

3. Create an interactive line plot with hvPlot that visualizes both `sale_price_sqr_foot` and `gross_rent`. Set the x-axis parameter to the year (`x="year"`). Use the `groupby` parameter to create an interactive widget for `neighborhood`.

4. Style and format the line plot to ensure a professionally styled visualization.

5. Note that your resulting plot should appear similar to the following image:

![A screenshot depicts an example of the resulting plot.](Images/pricing-info-by-neighborhood.png)

6. Use the interactive visualization to answer the following question:

    * For the Anza Vista neighborhood, is the average sale price per square foot for 2016 more or less than the price that’s listed for 2012? 


### Step 1: Create a new DataFrame that groups the original DataFrame by year and neighborhood. Aggregate the results by the `mean` of the groups.

In [8]:
# Group by year and neighborhood and then create a new dataframe of the mean values
prices_by_year_by_neighborhood_df = sfo_data_df.groupby(['year', 'neighborhood']).mean()

# Review the resulting DataFrame
display(prices_by_year_by_neighborhood_df)

Unnamed: 0_level_0,Unnamed: 1_level_0,sale_price_sqr_foot,housing_units,gross_rent
year,neighborhood,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2010,Alamo Square,291.182945,372560.0,1239.0
2010,Anza Vista,267.932583,372560.0,1239.0
2010,Bayview,170.098665,372560.0,1239.0
2010,Buena Vista Park,347.394919,372560.0,1239.0
2010,Central Richmond,319.027623,372560.0,1239.0
...,...,...,...,...
2016,Telegraph Hill,903.049771,384242.0,4390.0
2016,Twin Peaks,970.085470,384242.0,4390.0
2016,Van Ness/ Civic Center,552.602567,384242.0,4390.0
2016,Visitacion Valley,328.319007,384242.0,4390.0


### Step 2: Filter out the “housing_units” column to create a DataFrame that includes only the `sale_price_sqr_foot` and `gross_rent` averages per year.

In [9]:
# Filter out the housing_units
df_1 = prices_by_year_by_neighborhood_df[["sale_price_sqr_foot", "gross_rent"]]

# Review the first and last five rows of the DataFrame
display(df_1.head())
display(df_1.tail())


Unnamed: 0_level_0,Unnamed: 1_level_0,sale_price_sqr_foot,gross_rent
year,neighborhood,Unnamed: 2_level_1,Unnamed: 3_level_1
2010,Alamo Square,291.182945,1239.0
2010,Anza Vista,267.932583,1239.0
2010,Bayview,170.098665,1239.0
2010,Buena Vista Park,347.394919,1239.0
2010,Central Richmond,319.027623,1239.0


Unnamed: 0_level_0,Unnamed: 1_level_0,sale_price_sqr_foot,gross_rent
year,neighborhood,Unnamed: 2_level_1,Unnamed: 3_level_1
2016,Telegraph Hill,903.049771,4390.0
2016,Twin Peaks,970.08547,4390.0
2016,Van Ness/ Civic Center,552.602567,4390.0
2016,Visitacion Valley,328.319007,4390.0
2016,Westwood Park,631.195426,4390.0


### Step 3: Create an interactive line plot with hvPlot that visualizes both `sale_price_sqr_foot` and `gross_rent`. Set the x-axis parameter to the year (`x="year"`). Use the `groupby` parameter to create an interactive widget for `neighborhood`.

### Step 4: Style and format the line plot to ensure a professionally styled visualization.

In [21]:
# Use hvplot to create an interactive line plot of the average price per square foot
# The plot should have a dropdown selector for the neighborhood
# Plot prices_square_foot_by_year. 
# Inclued labels for the x- and y-axes, and a title.


# Create Sale Price per sqft. by year and neighborhood plot
sale_price_sqft_plot = df_1.hvplot.line(
    x='year', 
    y='sale_price_sqr_foot', 
    groupby='neighborhood', 
    dynamic=False,
    ).opts(
    title="Prices per sqft. in Different Neighborhoods in San Francisco",
    xlabel="Year",
    ylabel="Price per Sqft",
 )


# Create Gross Rent by year and neighborhood plot
gross_rent_plot = df_1.hvplot.line(
    x='year', 
    y='gross_rent',
    groupby='neighborhood', 
    dynamic=False
    ).opts(
    title="Gross Rent in Different Neighborhoods in San Francisco",
    xlabel="Year",
    ylabel="Gross Rent",
 )


combined_plot = (sale_price_sqft_plot * gross_rent_plot).opts(
    title="Gross Rent, Price/sqft. Given Neighborhood in San Francisco",
    xlabel="Year",
    ylabel="Price per Sqft / Gross Rent",
    width=1000,  
    height=500,
    ylim=(0, gross_rent_max+234), # restricting y-values to clean up the graph
    legend_position='top_left',
    #groupby=['gross_rent', 'sale_price_sqr_ft']
)

#NOT SURE WHY MY LEGEND IS NOT WORKING HERE, I BELIEVE IT IS THE SAME CODE AS MY GRAPH ABOVE WITH A WORKING LEGEND.

#display(sale_price_sqft_plot)
#display(gross_rent_plot)
display(combined_plot)



### Step 6: Use the interactive visualization to answer the following question:

**Question:** For the Anza Vista neighborhood, is the average sale price per square foot for 2016 more or less than the price that’s listed for 2012? 

**Answer:** Less.

---

## Build an Interactive Neighborhood Map

For this part of the assignment, explore the geospatial relationships in the data by using interactive visualizations with hvPlot and GeoViews. To build your map, use the `sfo_data_df` DataFrame (created during the initial import), which includes the neighborhood location data with the average prices. To do all this, complete the following steps:

1. Read the `neighborhood_coordinates.csv` file from the `Resources` folder into the notebook, and create a DataFrame named `neighborhood_locations_df`. Be sure to set the `index_col` of the DataFrame as “Neighborhood”.

2. Using the original `sfo_data_df` Dataframe, create a DataFrame named `all_neighborhood_info_df` that groups the data by neighborhood. Aggregate the results by the `mean` of the group.

3. Review the two code cells that concatenate the `neighborhood_locations_df` DataFrame with the `all_neighborhood_info_df` DataFrame. Note that the first cell uses the [Pandas concat function](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.concat.html) to create a DataFrame named `all_neighborhoods_df`. The second cell cleans the data and sets the “Neighborhood” column. Be sure to run these cells to create the `all_neighborhoods_df` DataFrame, which you’ll need to create the geospatial visualization.

4. Using hvPlot with GeoViews enabled, create a `points` plot for the `all_neighborhoods_df` DataFrame. Be sure to do the following:

    * Set the `size` parameter to “sale_price_sqr_foot”.

    * Set the `color` parameter to “gross_rent”.

    * Set the `size_max` parameter to “25”.

    * Set the `zoom` parameter to “11”.

Note that your resulting plot should appear similar to the following image:

![A screenshot depicts an example of a scatter plot created with hvPlot and GeoViews.](Images/6-4-geoviews-plot.png)

5. Use the interactive map to answer the following question:

    * Which neighborhood has the highest gross rent, and which has the highest sale price per square foot?


### Step 1: Read the `neighborhood_coordinates.csv` file from the `Resources` folder into the notebook, and create a DataFrame named `neighborhood_locations_df`. Be sure to set the `index_col` of the DataFrame as “Neighborhood”.

In [11]:
# Load neighborhoods coordinates data
neighborhood_locations_df = pd.read_csv(
    Path('Resources/neighborhoods_coordinates.csv')
)

# Review the DataFrame
neighborhood_locations_df.head()

# Note the index is int.
# Note that 'Nieghborhood' is capitalized for this neighborhood_locations_df

Unnamed: 0,Neighborhood,Lat,Lon
0,Alamo Square,37.791012,-122.4021
1,Anza Vista,37.779598,-122.443451
2,Bayview,37.73467,-122.40106
3,Bayview Heights,37.72874,-122.41098
4,Bernal Heights,37.72863,-122.44305


### Step 2: Using the original `sfo_data_df` Dataframe, create a DataFrame named `all_neighborhood_info_df` that groups the data by neighborhood. Aggregate the results by the `mean` of the group.

In [12]:
# Calculate the mean values for each neighborhood
all_neighborhood_info_df = sfo_data_df.groupby('neighborhood').mean()

# Review the resulting DataFrame
display(all_neighborhood_info_df.head())
display(all_neighborhood_info_df.tail())

# Note the index, 'neighborhood', is uncapitalized for this all_neighborhood_info_df


Unnamed: 0_level_0,year,sale_price_sqr_foot,housing_units,gross_rent
neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Alamo Square,2013.0,366.020712,378401.0,2817.285714
Anza Vista,2013.333333,373.382198,379050.0,3031.833333
Bayview,2012.0,204.588623,376454.0,2318.4
Bayview Heights,2015.0,590.792839,382295.0,3739.0
Bernal Heights,2013.5,576.746488,379374.5,3080.333333


Unnamed: 0_level_0,year,sale_price_sqr_foot,housing_units,gross_rent
neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
West Portal,2012.25,498.488485,376940.75,2515.5
Western Addition,2012.5,307.562201,377427.5,2555.166667
Westwood Highlands,2012.0,533.703935,376454.0,2250.5
Westwood Park,2015.0,687.087575,382295.0,3959.0
Yerba Buena,2012.5,576.709848,377427.5,2555.166667


In [13]:
# Set the index of neighborhood_locations_df to 'Nieghborhood' so that we can concatenate more easily.
neighborhood_locations_df = neighborhood_locations_df.set_index(['Neighborhood'])

# Make the index name, 'Neighborhood' the exact same in both dataframes.
# I will choose to change neighborhood_locations_df to match the index of all_neighborhood_info; 'neighborhood'
neighborhood_locations_df = neighborhood_locations_df.rename_axis('neighborhood')

# Ensure that the index was set correctly
display(neighborhood_locations_df.head())


Unnamed: 0_level_0,Lat,Lon
neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1
Alamo Square,37.791012,-122.4021
Anza Vista,37.779598,-122.443451
Bayview,37.73467,-122.40106
Bayview Heights,37.72874,-122.41098
Bernal Heights,37.72863,-122.44305


### Step 3: Review the two code cells that concatenate the `neighborhood_locations_df` DataFrame with the `all_neighborhood_info_df` DataFrame. 

Note that the first cell uses the [Pandas concat function](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.concat.html) to create a DataFrame named `all_neighborhoods_df`. 

The second cell cleans the data and sets the “Neighborhood” column. 

Be sure to run these cells to create the `all_neighborhoods_df` DataFrame, which you’ll need to create the geospatial visualization.

In [14]:
# Using the Pandas `concat` function, join the 
# neighborhood_locations_df and the all_neighborhood_info_df DataFrame
# The axis of the concatenation is "columns".
# The concat function will automatially combine columns with
# identical information, while keeping the additional columns.
all_neighborhoods_df = pd.concat(
    [neighborhood_locations_df, all_neighborhood_info_df], 
    axis="columns",
    sort=False
)

# Review the resulting DataFrame
display(all_neighborhoods_df.head())
display(all_neighborhoods_df.tail())


# Notice that our single index column is 'neighborhood'


Unnamed: 0_level_0,Lat,Lon,year,sale_price_sqr_foot,housing_units,gross_rent
neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Alamo Square,37.791012,-122.4021,2013.0,366.020712,378401.0,2817.285714
Anza Vista,37.779598,-122.443451,2013.333333,373.382198,379050.0,3031.833333
Bayview,37.73467,-122.40106,2012.0,204.588623,376454.0,2318.4
Bayview Heights,37.72874,-122.41098,2015.0,590.792839,382295.0,3739.0
Bernal Heights,37.72863,-122.44305,,,,


Unnamed: 0_level_0,Lat,Lon,year,sale_price_sqr_foot,housing_units,gross_rent
neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Yerba Buena,37.79298,-122.39636,2012.5,576.709848,377427.5,2555.166667
Bernal Heights,,,2013.5,576.746488,379374.5,3080.333333
Downtown,,,2013.0,391.434378,378401.0,2817.285714
Ingleside,,,2012.5,367.895144,377427.5,2509.0
Outer Richmond,,,2013.0,473.900773,378401.0,2817.285714


In [15]:
# Call the dropna function to remove any neighborhoods that do not have data.
# No need to reset the index because we set it correctly above.
all_neighborhoods_df = all_neighborhoods_df.dropna() #reset_index().dropna() 

# Rename the latitude and longitude columns to coincide with the format of our hvPlot/GeoViews libraries
all_neighborhoods_df = all_neighborhoods_df.rename(columns={'Lat': 'Latitude', 'Lon': 'Longitude'})


# Review the resulting DataFrame
display(all_neighborhoods_df.head())
display(all_neighborhoods_df.tail())

Unnamed: 0_level_0,Latitude,Longitude,year,sale_price_sqr_foot,housing_units,gross_rent
neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Alamo Square,37.791012,-122.4021,2013.0,366.020712,378401.0,2817.285714
Anza Vista,37.779598,-122.443451,2013.333333,373.382198,379050.0,3031.833333
Bayview,37.73467,-122.40106,2012.0,204.588623,376454.0,2318.4
Bayview Heights,37.72874,-122.41098,2015.0,590.792839,382295.0,3739.0
Buena Vista Park,37.76816,-122.43933,2012.833333,452.680591,378076.5,2698.833333


Unnamed: 0_level_0,Latitude,Longitude,year,sale_price_sqr_foot,housing_units,gross_rent
neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
West Portal,37.74026,-122.46388,2012.25,498.488485,376940.75,2515.5
Western Addition,37.79298,-122.43579,2012.5,307.562201,377427.5,2555.166667
Westwood Highlands,37.7347,-122.456854,2012.0,533.703935,376454.0,2250.5
Westwood Park,37.73415,-122.457,2015.0,687.087575,382295.0,3959.0
Yerba Buena,37.79298,-122.39636,2012.5,576.709848,377427.5,2555.166667


### Step 4: Using hvPlot with GeoViews enabled, create a `points` plot for the `all_neighborhoods_df` DataFrame. Be sure to do the following:

* Set the `geo` parameter to True.
* Set the `size` parameter to “sale_price_sqr_foot”.
* Set the `color` parameter to “gross_rent”.
* Set the `frame_width` parameter to 700.
* Set the `frame_height` parameter to 500.
* Include a descriptive title.

In [16]:
# Create a plot to analyze neighborhood info

neighborhood_points_plot = all_neighborhoods_df.hvplot.points(
    'Longitude', 
    'Latitude',
    geo=True, 
    title = 'Gross Rent by Neighborhood in San Francisco',
    size="sale_price_sqr_foot", 
    size_max=25,
    color='gross_rent',
    zoom=11,
    alpha=0.4, #transparency level of the bubbles
    tiles='OSM',        #OSM - Open Street Map
    frame_width = 700,
    frame_height = 500,
    xlabel='Longitude',
    ylabel='Latitude',
    )


# Show the plot
neighborhood_points_plot




### Step 5: Use the interactive map to answer the following question:

**Question:** Which neighborhood has the highest gross rent, and which has the highest sale price per square foot?

**Answer:** Given that the size of the bubbles represents the sale price per square footage of a house in San Francisco, it appears that the area of 37.7342, -122.4570 and 37.7199, -122.4660, have the highest sale price per square foot.

Given that the color of the bubles represents the gross rent of the homes, with darker blue representing a higher gross rent, the homes with coorinates of 37.7287, -122.4100 have the highest relative gross rent, followed by the homes in the area of 37.7342, -122.4570 then 37.7286, -122.4431. 

If we wanted to double check our above answers, we can find the minimum and maximum values of both gross rent and sale price per square foot of the homes with the above coordinates in our data set, and then determine if those values coincide with our analysis given our graph of that data.

## Compose Your Data Story

Based on the visualizations that you have created, compose a data story that synthesizes your analysis by answering the following questions:

**Question:**  How does the trend in rental income growth compare to the trend in sales prices? Does this same trend hold true for all the neighborhoods across San Francisco?

**Answer:** 

Sale price per square foot increases across all neighborhoods in San Francisco year-to-year given every year in our dataset.
Given which neighborhood, the change is more or less subtantial, however the sale price per square foot increases over time nonetheless. 

Rental income growth translates to gross rent. 
Gross rent varies much more than sale price, and does not steadily increase each year.
Flipping through the neighborhoods in our dataset, we can see that gross rent is highly variable. In some neighborhoods the gross rent price decreases from 2010 to 2011 or 2012, then increases through to the end of the dataset. In other neighborhoods there is a steady increase in gross rent, while in others there appears to be no substantial change or an overall decrease in gross rent.

**Question:** What insights can you share with your company about the potential one-click, buy-and-rent strategy that they're pursuing? Do neighborhoods exist that you would suggest for investment, and why?

**Answer:** 
Based on the data, gross rent is highly variable, while sale price per square foot increases each year. Therefore, I would suggest to look for neighborhoods which gross rents have been relatively stable over time, and focus on generating profit based on the increase of sale prices per square foot over time.

The company can potentially find bargins in terms of square price per square foot by comparing houses on the market versus those sold and other factors. Then the company can ensure that they have a low likelihood of losing money due to the volatility of gross rent in the overall San Francisco housing market by fosucing on those neighborhoods with the most stable gross rents.