# Housing Rental Analysis for San Francisco

In this challenge, your job is to use your data visualization skills, including aggregation, interactive visualizations, and geospatial analysis, to find properties in the San Francisco market that are viable investment opportunities.

## Instructions

Use the `san_francisco_housing.ipynb` notebook to visualize and analyze the real-estate data.

Note that this assignment requires you to create a visualization by using hvPlot and GeoViews. Additionally, you need to read the `sfo_neighborhoods_census_data.csv` file from the `Resources` folder into the notebook and create the DataFrame that you’ll use in the analysis.

The main task in this Challenge is to visualize and analyze the real-estate data in your Jupyter notebook. Use the `san_francisco_housing.ipynb` notebook to complete the following tasks:

* Calculate and plot the housing units per year.

* Calculate and plot the average prices per square foot.

* Compare the average prices by neighborhood.

* Build an interactive neighborhood map.

* Compose your data story.

### Calculate and Plot the Housing Units per Year

For this part of the assignment, use numerical and visual aggregation to calculate the number of housing units per year, and then visualize the results as a bar chart. To do so, complete the following steps:

1. Use the `groupby` function to group the data by year. Aggregate the results by the `mean` of the groups.

2. Use the `hvplot` function to plot the `housing_units_by_year` DataFrame as a bar chart. Make the x-axis represent the `year` and the y-axis represent the `housing_units`.

3. Style and format the line plot to ensure a professionally styled visualization.

4. Note that your resulting plot should appear similar to the following image:

![A screenshot depicts an example of the resulting bar chart.](Images/zoomed-housing-units-by-year.png)

5. Answer the following question:

    * What’s the overall trend in housing units over the period that you’re analyzing?

### Calculate and Plot the Average Sale Prices per Square Foot

For this part of the assignment, use numerical and visual aggregation to calculate the average prices per square foot, and then visualize the results as a bar chart. To do so, complete the following steps:

1. Group the data by year, and then average the results. What’s the lowest gross rent that’s reported for the years that the DataFrame includes?

2. Create a new DataFrame named `prices_square_foot_by_year` by filtering out the “housing_units” column. The new DataFrame should include the averages per year for only the sale price per square foot and the gross rent.

3. Use hvPlot to plot the `prices_square_foot_by_year` DataFrame as a line plot.

    > **Hint** This single plot will include lines for both `sale_price_sqr_foot` and `gross_rent`.

4. Style and format the line plot to ensure a professionally styled visualization.

5. Note that your resulting plot should appear similar to the following image:

![A screenshot depicts an example of the resulting plot.](Images/avg-sale-px-sq-foot-gross-rent.png)

6. Use both the `prices_square_foot_by_year` DataFrame and interactive plots to answer the following questions:

    * Did any year experience a drop in the average sale price per square foot compared to the previous year?

    * If so, did the gross rent increase or decrease during that year?

### Compare the Average Sale Prices by Neighborhood

For this part of the assignment, use interactive visualizations and widgets to explore the average sale price per square foot by neighborhood. To do so, complete the following steps:

1. Create a new DataFrame that groups the original DataFrame by year and neighborhood. Aggregate the results by the `mean` of the groups.

2. Filter out the “housing_units” column to create a DataFrame that includes only the `sale_price_sqr_foot` and `gross_rent` averages per year.

3. Create an interactive line plot with hvPlot that visualizes both `sale_price_sqr_foot` and `gross_rent`. Set the x-axis parameter to the year (`x="year"`). Use the `groupby` parameter to create an interactive widget for `neighborhood`.

4. Style and format the line plot to ensure a professionally styled visualization.

5. Note that your resulting plot should appear similar to the following image:

![A screenshot depicts an example of the resulting plot.](Images/pricing-info-by-neighborhood.png)

6. Use the interactive visualization to answer the following question:

    * For the Anza Vista neighborhood, is the average sale price per square foot for 2016 more or less than the price that’s listed for 2012? 

### Build an Interactive Neighborhood Map

For this part of the assignment, explore the geospatial relationships in the data by using interactive visualizations with hvPlot and GeoViews. To build your map, use the `sfo_data_df` DataFrame (created during the initial import), which includes the neighborhood location data with the average prices. To do all this, complete the following steps:

1. Read the `neighborhood_coordinates.csv` file from the `Resources` folder into the notebook, and create a DataFrame named `neighborhood_locations_df`. Be sure to set the `index_col` of the DataFrame as “Neighborhood”.

2. Using the original `sfo_data_df` Dataframe, create a DataFrame named `all_neighborhood_info_df` that groups the data by neighborhood. Aggregate the results by the `mean` of the group.

3. Review the two code cells that concatenate the `neighborhood_locations_df` DataFrame with the `all_neighborhood_info_df` DataFrame. Note that the first cell uses the [Pandas concat function](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.concat.html) to create a DataFrame named `all_neighborhoods_df`. The second cell cleans the data and sets the “Neighborhood” column. Be sure to run these cells to create the `all_neighborhoods_df` DataFrame, which you’ll need to create the geospatial visualization.

4. Using hvPlot with GeoViews enabled, create a `points` plot for the `all_neighborhoods_df` DataFrame. Be sure to do the following:

    * Set the `geo` parameter to True.
    * Set the `size` parameter to “sale_price_sqr_foot”.
    * Set the `color` parameter to “gross_rent”.
    * Set the `frame_width` parameter to 700.
    * Set the `frame_height` parameter to 500.
    * Include a descriptive title.

Note that your resulting plot should appear similar to the following image:

![A screenshot depicts an example of a scatter plot created with hvPlot and GeoViews.](Images/6-4-geoviews-plot.png)

5. Use the interactive map to answer the following question:

    * Which neighborhood has the highest gross rent, and which has the highest sale price per square foot?

### Compose Your Data Story

Based on the visualizations that you created, answer the following questions:

* How does the trend in rental income growth compare to the trend in sales prices? Does this same trend hold true for all the neighborhoods across San Francisco?

* What insights can you share with your company about the potential one-click, buy-and-rent strategy that they're pursuing? Do neighborhoods exist that you would suggest for investment, and why?

In [55]:
# Import the required libraries and dependencies
import pandas as pd
import hvplot.pandas
from pathlib import Path
from IPython.display import display
import holoviews as hv
import geoviews as gv

## Import the data 

---

In [56]:
# Using the read_csv function and Path module, create a DataFrame 
# by importing the sfo_neighborhoods_census_data.csv file from the Resources folder
sfo_data_df = pd.read_csv(Path(r"C:\Users\MK\Desktop\Starter_Code\Resources\sfo_neighborhoods_census_data.csv"))

# Review the first and last five rows of the DataFrame
print(sfo_data_df.head(5))  # Display first 5 rows
print(sfo_data_df.tail(5))

FileNotFoundError: [Errno 2] No such file or directory: 'C:\\Users\\MK\\Desktop\\Starter_Code\\Resources\\sfo_neighborhoods_census_data.csv'

## Calculate and Plot the Housing Units per Year

For this part of the assignment, use numerical and visual aggregation to calculate the number of housing units per year, and then visualize the results as a bar chart. To do so, complete the following steps:

1. Use the `groupby` function to group the data by year. Aggregate the results by the `mean` of the groups.

2. Use the `hvplot` function to plot the `housing_units_by_year` DataFrame as a bar chart. Make the x-axis represent the `year` and the y-axis represent the `housing_units`.

3. Style and format the line plot to ensure a professionally styled visualization.

4. Note that your resulting plot should appear similar to the following image:

![A screenshot depicts an example of the resulting bar chart.](Images/zoomed-housing-units-by-year.png)

5. Answer the following question:

    * What’s the overall trend in housing units over the period that you’re analyzing?



### Step 1: Use the `groupby` function to group the data by year. Aggregate the results by the `mean` of the groups.

In [57]:
housing_units_by_year = sfo_data_df.groupby('year').mean(numeric_only=True)

# Review the DataFrame
print(housing_units_by_year)

      sale_price_sqr_foot  housing_units  gross_rent
year                                                
2010           369.344353       372560.0      1239.0
2011           341.903429       374507.0      1530.0
2012           399.389968       376454.0      2324.0
2013           483.600304       378401.0      2971.0
2014           556.277273       380348.0      3528.0
2015           632.540352       382295.0      3739.0
2016           697.643709       384242.0      4390.0


### Step 2: Use the `hvplot` function to plot the `housing_units_by_year` DataFrame as a bar chart. Make the x-axis represent the `year` and the y-axis represent the `housing_units`.

### Step 3: Style and format the line plot to ensure a professionally styled visualization.

In [58]:
bar_chart = housing_units_by_year.hvplot.bar(
    x='year',  # Specify the x-axis column
    y='housing_units',  # Specify the y-axis column
    title='Housing Units by Year',  # Set the chart title
    xlabel='Year',  # Label for the x-axis
    ylabel='Housing Units',  # Label for the y-axis
    height=400,  # Set the height of the chart
    width=600,   # Set the width of the chart
    rot=45,      # Rotate x-axis labels for better readability
    color='blue',  # Set the color of the bars
    bar_width=0.5,  # Adjust the width of the bars
    legend=False,   # Hide the legend (optional)
)
bar_chart

### Step 5: Answer the following question:

**Question:** What is the overall trend in housing_units over the period being analyzed?

**Answer:** # This data shows a steady rising trend in the number of housing units over time, which indicates a rise in housing supply over the studied period. This pattern indicates that the local housing market has been rising, either in response to escalating demand or urban growth. To develop a more complete knowledge of the dynamics of the housing market, it is crucial to take into account additional aspects and data sources.

---

## Calculate and Plot the Average Sale Prices per Square Foot

For this part of the assignment, use numerical and visual aggregation to calculate the average prices per square foot, and then visualize the results as a bar chart. To do so, complete the following steps:

1. Group the data by year, and then average the results. What’s the lowest gross rent that’s reported for the years that the DataFrame includes?

2. Create a new DataFrame named `prices_square_foot_by_year` by filtering out the “housing_units” column. The new DataFrame should include the averages per year for only the sale price per square foot and the gross rent.

3. Use hvPlot to plot the `prices_square_foot_by_year` DataFrame as a line plot.

    > **Hint** This single plot will include lines for both `sale_price_sqr_foot` and `gross_rent`.

4. Style and format the line plot to ensure a professionally styled visualization.

5. Note that your resulting plot should appear similar to the following image:

![A screenshot depicts an example of the resulting plot.](Images/avg-sale-px-sq-foot-gross-rent.png)

6. Use both the `prices_square_foot_by_year` DataFrame and interactive plots to answer the following questions:

    * Did any year experience a drop in the average sale price per square foot compared to the previous year?

    * If so, did the gross rent increase or decrease during that year?



### Step 1: Group the data by year, and then average the results.

In [59]:
# Group the data by year and calculate the mean for each year
average_prices_by_year = housing_units_by_year.groupby('year').mean()


# Review the resulting DataFrame
print(housing_units_by_year)

      sale_price_sqr_foot  housing_units  gross_rent
year                                                
2010           369.344353       372560.0      1239.0
2011           341.903429       374507.0      1530.0
2012           399.389968       376454.0      2324.0
2013           483.600304       378401.0      2971.0
2014           556.277273       380348.0      3528.0
2015           632.540352       382295.0      3739.0
2016           697.643709       384242.0      4390.0


**Question:** What is the lowest gross rent reported for the years included in the DataFrame?

**Answer:** # 2010 for 1239.00$

### Step 2: Create a new DataFrame named `prices_square_foot_by_year` by filtering out the “housing_units” column. The new DataFrame should include the averages per year for only the sale price per square foot and the gross rent.

In [60]:
# Filter out the housing_units column, creating a new DataFrame 
# Keep only sale_price_sqr_foot and gross_rent averages per year
prices_square_foot_by_year = housing_units_by_year.groupby('year')[['sale_price_sqr_foot', 'gross_rent']].mean().reset_index()

# Review the DataFrame
print(prices_square_foot_by_year)

   year  sale_price_sqr_foot  gross_rent
0  2010           369.344353      1239.0
1  2011           341.903429      1530.0
2  2012           399.389968      2324.0
3  2013           483.600304      2971.0
4  2014           556.277273      3528.0
5  2015           632.540352      3739.0
6  2016           697.643709      4390.0


### Step 3: Use hvPlot to plot the `prices_square_foot_by_year` DataFrame as a line plot.

> **Hint** This single plot will include lines for both `sale_price_sqr_foot` and `gross_rent`

### Step 4: Style and format the line plot to ensure a professionally styled visualization.


In [61]:
# Define the line plot with custom styling
line_plot = prices_square_foot_by_year.hvplot.line(
    x='year',
    y=['sale_price_sqr_foot', 'gross_rent'],
    title='Average Sale Price per Square Foot and Gross Rent by Year',
    xlabel='Year',
    ylabel='Price per Square Foot / Gross Rent',
    legend='top_left',  # Position the legend
    height=400,         # Set the height of the plot
    width=600,           # Set the width of the plot
    line_color=hv.Cycle(['blue', 'green']),  # Set line colors
    line_width=2.0,     # Adjust line width
    line_dash=hv.Cycle(['solid', 'dashed']),  # Set line dash patterns for each line
)

# Customize the plot further (optional)
line_plot.opts(
    show_grid=True,     # Show gridlines
    fontscale=1.2,      # Adjust font size
    xlim=(min(prices_square_foot_by_year['year']), max(prices_square_foot_by_year['year'])),  # Set x-axis limits
    ylim=(0, max(prices_square_foot_by_year[['sale_price_sqr_foot', 'gross_rent']].max())),  # Set y-axis limits
)

# Show the line plot
line_plot

### Step 6: Use both the `prices_square_foot_by_year` DataFrame and interactive plots to answer the following questions:

**Question:** Did any year experience a drop in the average sale price per square foot compared to the previous year?

**Answer:** # In the following years, there was a decrease in the typical sale price per square foot compared to the prior year:

The average sale price per square foot fell between 2010 and 2011.
The average sale price per square foot fell between 2011 and 2012.
The average sale price per square foot fell between 2013 and 2014.
The average sale price per square foot dropped from 2014 to 2015

**Question:** If so, did the gross rent increase or decrease during that year?

**Answer:** # As a result, the gross rent grew every year from 2010 to 2015, whereas the average sale price per square foot decreased (from 2010 to 2011, 2011 to 2012, 2013 to 2014, and 2014 to 2015).

---

## Compare the Average Sale Prices by Neighborhood

For this part of the assignment, use interactive visualizations and widgets to explore the average sale price per square foot by neighborhood. To do so, complete the following steps:

1. Create a new DataFrame that groups the original DataFrame by year and neighborhood. Aggregate the results by the `mean` of the groups.

2. Filter out the “housing_units” column to create a DataFrame that includes only the `sale_price_sqr_foot` and `gross_rent` averages per year.

3. Create an interactive line plot with hvPlot that visualizes both `sale_price_sqr_foot` and `gross_rent`. Set the x-axis parameter to the year (`x="year"`). Use the `groupby` parameter to create an interactive widget for `neighborhood`.

4. Style and format the line plot to ensure a professionally styled visualization.

5. Note that your resulting plot should appear similar to the following image:

![A screenshot depicts an example of the resulting plot.](Images/pricing-info-by-neighborhood.png)

6. Use the interactive visualization to answer the following question:

    * For the Anza Vista neighborhood, is the average sale price per square foot for 2016 more or less than the price that’s listed for 2012? 


### Step 1: Create a new DataFrame that groups the original DataFrame by year and neighborhood. Aggregate the results by the `mean` of the groups.

In [62]:
# Group by 'neighborhood' and 'year' and calculate the mean of numeric columns
grouped_data = sfo_data_df.groupby(['neighborhood', 'year']).mean().reset_index()

# Review the resulting DataFrame
print(grouped_data)

     neighborhood  year  sale_price_sqr_foot  housing_units  gross_rent
0    Alamo Square  2010           291.182945       372560.0      1239.0
1    Alamo Square  2011           272.527310       374507.0      1530.0
2    Alamo Square  2012           183.099317       376454.0      2324.0
3    Alamo Square  2013           387.794144       378401.0      2971.0
4    Alamo Square  2014           484.443552       380348.0      3528.0
..            ...   ...                  ...            ...         ...
392   Yerba Buena  2011           438.860545       374507.0      1530.0
393   Yerba Buena  2012           491.814003       376454.0      2324.0
394   Yerba Buena  2013           753.011413       378401.0      2971.0
395   Yerba Buena  2014           479.923749       380348.0      3528.0
396   Yerba Buena  2015           963.522606       382295.0      3739.0

[397 rows x 5 columns]


### Step 2: Filter out the “housing_units” column to create a DataFrame that includes only the `sale_price_sqr_foot` and `gross_rent` averages per year.

In [63]:
# Filter out the housing_units
sp_gr = grouped_data.drop('housing_units', axis=1)


# Review the first five rows of the DataFrame
print(sp_gr.head())

# Review the last five rows of the DataFrame
print(sp_gr.tail())

   neighborhood  year  sale_price_sqr_foot  gross_rent
0  Alamo Square  2010           291.182945      1239.0
1  Alamo Square  2011           272.527310      1530.0
2  Alamo Square  2012           183.099317      2324.0
3  Alamo Square  2013           387.794144      2971.0
4  Alamo Square  2014           484.443552      3528.0
    neighborhood  year  sale_price_sqr_foot  gross_rent
392  Yerba Buena  2011           438.860545      1530.0
393  Yerba Buena  2012           491.814003      2324.0
394  Yerba Buena  2013           753.011413      2971.0
395  Yerba Buena  2014           479.923749      3528.0
396  Yerba Buena  2015           963.522606      3739.0


### Step 3: Create an interactive line plot with hvPlot that visualizes both `sale_price_sqr_foot` and `gross_rent`. Set the x-axis parameter to the year (`x="year"`). Use the `groupby` parameter to create an interactive widget for `neighborhood`.

### Step 4: Style and format the line plot to ensure a professionally styled visualization.

In [64]:
# Use hvplot to create an interactive line plot of the average price per square foot
# The plot should have a dropdown selector for the neighborhood
# Create an interactive line plot using hvPlot
line_plot = sp_gr.hvplot.line(
    x="year",
    y=["sale_price_sqr_foot", "gross_rent"],
    groupby="neighborhood",
    xlabel="Year",
    ylabel="Value",
    title="Sale Price per Square Foot and Gross Rent by Neighborhood Over Time"
)
line_plot


In [65]:
line_plot = sp_gr.hvplot.line(
    x="year",
    y=["sale_price_sqr_foot", "gross_rent"],
    groupby="neighborhood",
    xlabel="Year",
    ylabel="Value",
    title="Sale Price per Square Foot and Gross Rent by Neighborhood Over Time",
    legend="top_left",  # Change legend position
    width=800,          # Set plot width
    height=400,         # Set plot height
    line_color=["blue", "red"],  # Customize line colors
    line_width=2,       # Set line width
    line_alpha=0.8      # Set line transparency
)

# Show the styled and formatted plot
line_plot

### Step 6: Use the interactive visualization to answer the following question:

**Question:** For the Anza Vista neighborhood, is the average sale price per square foot for 2016 more or less than the price that’s listed for 2012? 

**Answer:** # less starts in 2012 around 300 then drops slightly below 200 can be seen on  the vizualisation hv plot^^

---

## Build an Interactive Neighborhood Map

For this part of the assignment, explore the geospatial relationships in the data by using interactive visualizations with hvPlot and GeoViews. To build your map, use the `sfo_data_df` DataFrame (created during the initial import), which includes the neighborhood location data with the average prices. To do all this, complete the following steps:

1. Read the `neighborhood_coordinates.csv` file from the `Resources` folder into the notebook, and create a DataFrame named `neighborhood_locations_df`. Be sure to set the `index_col` of the DataFrame as “Neighborhood”.

2. Using the original `sfo_data_df` Dataframe, create a DataFrame named `all_neighborhood_info_df` that groups the data by neighborhood. Aggregate the results by the `mean` of the group.

3. Review the two code cells that concatenate the `neighborhood_locations_df` DataFrame with the `all_neighborhood_info_df` DataFrame. Note that the first cell uses the [Pandas concat function](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.concat.html) to create a DataFrame named `all_neighborhoods_df`. The second cell cleans the data and sets the “Neighborhood” column. Be sure to run these cells to create the `all_neighborhoods_df` DataFrame, which you’ll need to create the geospatial visualization.

4. Using hvPlot with GeoViews enabled, create a `points` plot for the `all_neighborhoods_df` DataFrame. Be sure to do the following:

    * Set the `size` parameter to “sale_price_sqr_foot”.

    * Set the `color` parameter to “gross_rent”.

    * Set the `size_max` parameter to “25”.

    * Set the `zoom` parameter to “11”.

Note that your resulting plot should appear similar to the following image:

![A screenshot depicts an example of a scatter plot created with hvPlot and GeoViews.](Images/6-4-geoviews-plot.png)

5. Use the interactive map to answer the following question:

    * Which neighborhood has the highest gross rent, and which has the highest sale price per square foot?


### Step 1: Read the `neighborhood_coordinates.csv` file from the `Resources` folder into the notebook, and create a DataFrame named `neighborhood_locations_df`. Be sure to set the `index_col` of the DataFrame as “Neighborhood”.

In [69]:
# Load neighborhoods coordinates data
neighborhood_locations_df = pd.read_csv(r'C:\Users\MK\Desktop\python-homework\Module_6_HW\Resources\neighborhoods_coordinates.csv', index_col='Neighborhood')

# Review the DataFrame
print(neighborhood_locations_df.head())

                       Lat         Lon
Neighborhood                          
Alamo Square     37.791012 -122.402100
Anza Vista       37.779598 -122.443451
Bayview          37.734670 -122.401060
Bayview Heights  37.728740 -122.410980
Bernal Heights   37.728630 -122.443050


### Step 2: Using the original `sfo_data_df` Dataframe, create a DataFrame named `all_neighborhood_info_df` that groups the data by neighborhood. Aggregate the results by the `mean` of the group.

In [70]:
# Calculate the mean values for each neighborhood
all_neighborhood_info_df = sfo_data_df.groupby('neighborhood').mean()

# Review the resulting DataFrame
print(all_neighborhood_info_df.head())

                        year  sale_price_sqr_foot  housing_units   gross_rent
neighborhood                                                                 
Alamo Square     2013.000000           366.020712       378401.0  2817.285714
Anza Vista       2013.333333           373.382198       379050.0  3031.833333
Bayview          2012.000000           204.588623       376454.0  2318.400000
Bayview Heights  2015.000000           590.792839       382295.0  3739.000000
Bernal Heights   2013.500000           576.746488       379374.5  3080.333333


### Step 3: Review the two code cells that concatenate the `neighborhood_locations_df` DataFrame with the `all_neighborhood_info_df` DataFrame. 

Note that the first cell uses the [Pandas concat function](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.concat.html) to create a DataFrame named `all_neighborhoods_df`. 

The second cell cleans the data and sets the “Neighborhood” column. 

Be sure to run these cells to create the `all_neighborhoods_df` DataFrame, which you’ll need to create the geospatial visualization.

In [71]:
# Using the Pandas `concat` function, join the 
# neighborhood_locations_df and the all_neighborhood_info_df DataFrame
# The axis of the concatenation is "columns".
# The concat function will automatially combine columns with
# identical information, while keeping the additional columns.

all_neighborhoods_df = pd.concat(
    [neighborhood_locations_df, all_neighborhood_info_df], 
    axis="columns",
    sort=False,
    join = "inner"
)

# Review the resulting DataFrame
display(all_neighborhoods_df.head(5))
display(all_neighborhoods_df.tail(5))



Unnamed: 0,Lat,Lon,year,sale_price_sqr_foot,housing_units,gross_rent
Alamo Square,37.791012,-122.4021,2013.0,366.020712,378401.0,2817.285714
Anza Vista,37.779598,-122.443451,2013.333333,373.382198,379050.0,3031.833333
Bayview,37.73467,-122.40106,2012.0,204.588623,376454.0,2318.4
Bayview Heights,37.72874,-122.41098,2015.0,590.792839,382295.0,3739.0
Buena Vista Park,37.76816,-122.43933,2012.833333,452.680591,378076.5,2698.833333


Unnamed: 0,Lat,Lon,year,sale_price_sqr_foot,housing_units,gross_rent
West Portal,37.74026,-122.46388,2012.25,498.488485,376940.75,2515.5
Western Addition,37.79298,-122.43579,2012.5,307.562201,377427.5,2555.166667
Westwood Highlands,37.7347,-122.456854,2012.0,533.703935,376454.0,2250.5
Westwood Park,37.73415,-122.457,2015.0,687.087575,382295.0,3959.0
Yerba Buena,37.79298,-122.39636,2012.5,576.709848,377427.5,2555.166667


In [47]:
# Call the dropna function to remove any neighborhoods that do not have data
all_neighborhoods_df = all_neighborhoods_df.reset_index().dropna()

# Rename the "index" column as "Neighborhood" for use in the Visualization
all_neighborhoods_df = all_neighborhoods_df.rename(columns={"index": "Neighborhood"})

# Review the resulting DataFrame
display(all_neighborhoods_df.head())
display(all_neighborhoods_df.tail())

Unnamed: 0,Neighborhood,Lat,Lon,year,sale_price_sqr_foot,housing_units,gross_rent
0,Alamo Square,37.791012,-122.4021,2013.0,366.020712,378401.0,2817.285714
1,Anza Vista,37.779598,-122.443451,2013.333333,373.382198,379050.0,3031.833333
2,Bayview,37.73467,-122.40106,2012.0,204.588623,376454.0,2318.4
3,Bayview Heights,37.72874,-122.41098,2015.0,590.792839,382295.0,3739.0
4,Buena Vista Park,37.76816,-122.43933,2012.833333,452.680591,378076.5,2698.833333


Unnamed: 0,Neighborhood,Lat,Lon,year,sale_price_sqr_foot,housing_units,gross_rent
64,West Portal,37.74026,-122.46388,2012.25,498.488485,376940.75,2515.5
65,Western Addition,37.79298,-122.43579,2012.5,307.562201,377427.5,2555.166667
66,Westwood Highlands,37.7347,-122.456854,2012.0,533.703935,376454.0,2250.5
67,Westwood Park,37.73415,-122.457,2015.0,687.087575,382295.0,3959.0
68,Yerba Buena,37.79298,-122.39636,2012.5,576.709848,377427.5,2555.166667


### Step 4: Using hvPlot with GeoViews enabled, create a `points` plot for the `all_neighborhoods_df` DataFrame. Be sure to do the following:

* Set the `geo` parameter to True.
* Set the `size` parameter to “sale_price_sqr_foot”.
* Set the `color` parameter to “gross_rent”.
* Set the `frame_width` parameter to 700.
* Set the `frame_height` parameter to 500.
* Include a descriptive title.

In [53]:
# Create a plot to analyze neighborhood info
map_plot = all_neighborhoods_df.hvplot.points(
    geo=True,
    x='Lon',
    y='Lat',
    size='sale_price_sqr_foot',
    color='gross_rent',
    cmap='viridis',
    colorbar=True,
    frame_width=700,
    frame_height=500,
    title='San Francisco Neighborhoods',
    tiles='OSM',
)

# Show the plot
map_plot

## Build an Interactive Neighborhood Map

For this part of the assignment, explore the geospatial relationships in the data by using interactive visualizations with hvPlot and GeoViews. To build your map, use the `sfo_data_df` DataFrame (created during the initial import), which includes the neighborhood location data with the average prices. To do all this, complete the following steps:

1. Read the `neighborhood_coordinates.csv` file from the `Resources` folder into the notebook, and create a DataFrame named `neighborhood_locations_df`. Be sure to set the `index_col` of the DataFrame as “Neighborhood”.

2. Using the original `sfo_data_df` Dataframe, create a DataFrame named `all_neighborhood_info_df` that groups the data by neighborhood. Aggregate the results by the `mean` of the group.

3. Review the two code cells that concatenate the `neighborhood_locations_df` DataFrame with the `all_neighborhood_info_df` DataFrame. Note that the first cell uses the [Pandas concat function](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.concat.html) to create a DataFrame named `all_neighborhoods_df`. The second cell cleans the data and sets the “Neighborhood” column. Be sure to run these cells to create the `all_neighborhoods_df` DataFrame, which you’ll need to create the geospatial visualization.

4. Using hvPlot with GeoViews enabled, create a `points` plot for the `all_neighborhoods_df` DataFrame. Be sure to do the following:

    * Set the `size` parameter to “sale_price_sqr_foot”.

    * Set the `color` parameter to “gross_rent”.

    * Set the `size_max` parameter to “25”.

    * Set the `zoom` parameter to “11”.

Note that your resulting plot should appear similar to the following image:

![A screenshot depicts an example of a scatter plot created with hvPlot and GeoViews.](Images/6-4-geoviews-plot.png)

5. Use the interactive map to answer the following question:

    * Which neighborhood has the highest gross rent, and which has the highest sale price per square foot?


### Step 5: Use the interactive map to answer the following question:

**Question:** Which neighborhood has the highest gross rent, and which has the highest sale price per square foot?

**Answer:** # YOUR ANSWER HERE

## Compose Your Data Story

Based on the visualizations that you have created, compose a data story that synthesizes your analysis by answering the following questions:

**Question:** What insights can you share with your company about the potential one-click, buy-and-rent strategy that they're pursuing? Do neighborhoods exist that you would suggest for investment, and why?

**Answer:** # i did the plot not sure about how to interpret the info above im seeing theres 2 purple dots meaning low buying cost and since rental cost went up over the years buy low rent high would be a good strategy if thats what they're doing

**Question:**  How does the trend in rental income growth compare to the trend in sales prices? Does this same trend hold true for all the neighborhoods across San Francisco?

**Answer:** # usually sale price goes up with rental price looking at the previous data however there is a 1-2 purple spot on the map indicating low cost for purchasing meanign the trend follows for the majority however they're still anomolies where you can profit from.