## Problem 3: How many people live near shopping centers? (8 points)

In the last step of this analysis, use a *spatial join* to relate data from a population grid data set to the buffer layer created in *problem 2* to find out how many people live in all population grid cells that are **within** 1.5 km distance from each shopping centre. 

Use the same population grid data set as during [lesson 3](https://autogis-site.readthedocs.io/en/latest/lessons/lesson-3/spatial-join.html) (load it directly from WFS, don’t forget to assign a CRS).


*Feel free to divide your solution into more codeblocks than prepared! Remember to add comments to your code :)*

### a) Load the population grid data set and the buffer geometries (2 points)

Use the same population grid data set as during [lesson 3](https://autogis-site.readthedocs.io/en/latest/lessons/lesson-3/spatial-join.html) (load it directly from WFS, don’t forget to assign a CRS). Load the data into a `GeoDataFrame` called `population_grid`.

(optional) If you want, discard unneeded columns and translate the remaining column names from Finnish to English.

In [1]:
%pip install geopandas
%pip install pathlib

import geopandas
import pathlib 

Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


In [15]:
NOTEBOOK_PATH = pathlib.Path().resolve()
DATA_DIRECTORY = NOTEBOOK_PATH / "data"

In [16]:
#read file
#should do via WFS but seems to have SSL certificate issues
#downloaded from https://www.hsy.fi/en/environmental-information/open-data/avoin-data---sivut/population-grid-of-helsinki-metropolitan-area/
#upload from own comp

population_grid = geopandas.read_file(
    DATA_DIRECTORY / "Vaestotietoruudukko_2022.shp"
)

population_grid.crs = "EPSG:3879"  # for WFS data, the CRS needs to be specified manually

In [17]:
# NON-EDITABLE CODE CELL FOR TESTING YOUR SOLUTION
import geopandas
import pyproj

assert isinstance(population_grid, geopandas.GeoDataFrame)
assert population_grid.crs == pyproj.CRS("EPSG:3879")

In [29]:
#reduce population grid columns
#translate population grid column names
population_grid = population_grid[["ASUKKAITA", "geometry"]]
population_grid = population_grid.rename(columns={"ASUKKAITA": "population"})



Load the buffers computed in *problem 2* into a `GeoDataFrame` called `shopping_centre_buffers`. Add an `assert` statement to check whether the two data frames are in the same CRS.

In [21]:
# ADD YOUR OWN CODE HERE
#read point and buffer files
shopping_centre_buffers = geopandas.read_file(
    DATA_DIRECTORY / "shopping_centres.gpkg",
    layer = "buffers"
)

In [24]:
shopping_centre_buffers

Unnamed: 0,address,id,name,addr,geometry
0,"Tilexman, 1-7, Itäkatu, Itäkeskus, Vartiokylä,...",1,Itis,"Itäkatu 1-7, 00930 Helsinki, Finland","POLYGON ((25506082.138 6677756.530, 25506074.9..."
1,"Jungle Juice Bar, 14-20, Mannerheimintie, Kesk...",2,Forum,"Mannerheimintie 14–20, 00100 Helsinki, Finland","POLYGON ((25498047.460 6672895.008, 25498040.2..."
2,"Bangkok9 Iso Omena, 11, Piispansilta, Matinkyl...",3,Iso-omena,"Piispansilta 11, 02230 Espoo, Finland","POLYGON ((25486936.273 6671983.930, 25486929.0..."
3,"KappAhl, 3-9, Leppävaarankatu, Säteri, Etelä-L...",4,Sello,"Leppävaarankatu 3-9, 02600 Espoo, Finland","POLYGON ((25491059.611 6678430.043, 25491052.3..."
4,"Stockmann, 3, Vantaanportinkatu, Vantaanportti...",5,Jumbo,"Vantaanportinkatu 3, 01510 Vantaa, Finland","POLYGON ((25499443.932 6686656.982, 25499436.7..."
5,"Dressmann, 5, Hermannin rantatie, Verkkosaari,...",6,REDI,"Hermannin rantatie 5, 00580 Helsinki, Finland","POLYGON ((25500308.309 6674958.748, 25500301.0..."
6,,7,Tripla,"Mall of Tripla, Fredikanterassi 1, 00520 Helsi...",


In [23]:
# NON-EDITABLE CODE CELL FOR TESTING YOUR SOLUTION
assert isinstance(shopping_centre_buffers, geopandas.GeoDataFrame)
assert shopping_centre_buffers.geometry.geom_type.unique() == ["Polygon"]
assert shopping_centre_buffers.crs == pyproj.CRS("EPSG:3879")

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()


---

### b) Carry out a *spatial join* between the `population_grid` and the `shopping_centre_buffers`  (2 points)

Join the shopping centre’s `id` column (and others, if you want) to the population grid data frame, for all population grid cells that are **within** the buffer area of each shopping centre. [Use a *join-type* that retains only rows from both input data frames for which the geometric predicate is true](https://geopandas.org/en/stable/gallery/spatial_joins.html#Types-of-spatial-joins). 


In [40]:
# ADD YOUR OWN CODE HERE
population_grid_with_shopping_centre_buffers = population_grid.sjoin(
    shopping_centre_buffers,
    how="inner",
    predicate="within"
)

In [42]:
population_grid_with_shopping_centre_buffers = population_grid_with_shopping_centre_buffers[[
    "population", "geometry", "id", "name"]]

In [50]:
population_grid_with_shopping_centre_buffers.head()

Unnamed: 0,population,geometry,id,name
1134,134,"POLYGON ((25484250.000 6672499.005, 25484250.0...",3,Iso-omena
1135,75,"POLYGON ((25484250.000 6672249.006, 25484250.0...",3,Iso-omena
1136,20,"POLYGON ((25484250.000 6671748.997, 25484250.0...",3,Iso-omena
1199,106,"POLYGON ((25484499.998 6672749.004, 25484499.9...",3,Iso-omena
1200,130,"POLYGON ((25484499.998 6672499.005, 25484499.9...",3,Iso-omena


In [39]:
population_grid.head()

Unnamed: 0,population,geometry
0,5,"POLYGON ((25472499.995 6689749.005, 25472499.9..."
1,5,"POLYGON ((25472499.995 6685998.998, 25472499.9..."
2,8,"POLYGON ((25472499.995 6684249.004, 25472499.9..."
3,7,"POLYGON ((25472499.995 6683999.005, 25472499.9..."
4,10,"POLYGON ((25472499.995 6682998.998, 25472499.9..."


In [49]:
shopping_centre_buffers.head()

Unnamed: 0,address,id,name,addr,geometry
0,"Tilexman, 1-7, Itäkatu, Itäkeskus, Vartiokylä,...",1,Itis,"Itäkatu 1-7, 00930 Helsinki, Finland","POLYGON ((25506082.138 6677756.530, 25506074.9..."
1,"Jungle Juice Bar, 14-20, Mannerheimintie, Kesk...",2,Forum,"Mannerheimintie 14–20, 00100 Helsinki, Finland","POLYGON ((25498047.460 6672895.008, 25498040.2..."
2,"Bangkok9 Iso Omena, 11, Piispansilta, Matinkyl...",3,Iso-omena,"Piispansilta 11, 02230 Espoo, Finland","POLYGON ((25486936.273 6671983.930, 25486929.0..."
3,"KappAhl, 3-9, Leppävaarankatu, Säteri, Etelä-L...",4,Sello,"Leppävaarankatu 3-9, 02600 Espoo, Finland","POLYGON ((25491059.611 6678430.043, 25491052.3..."
4,"Stockmann, 3, Vantaanportinkatu, Vantaanportti...",5,Jumbo,"Vantaanportinkatu 3, 01510 Vantaa, Finland","POLYGON ((25499443.932 6686656.982, 25499436.7..."



---

### c) Compute the population sum around shopping centres (2 points)

Group the resulting (joint) data frame by shopping centre (`id` or `name`), and calculate the `sum()` of the population living inside the 1.5 km radius around them.

Print the results, for instance, in the form "12345 people live within 1.5 km from REDI".

In [64]:
# ADD YOUR OWN CODE HERE

#group by name
grouped = population_grid_with_shopping_centre_buffers.groupby("name")

#grouped.describe()

#calculate sum of each group
for key, group in grouped:
    #print sum of population in group
    print(f" {sum(group['population'])} people live within 1.5km from {key}")

#note no value for TRIPLA because it couldnt be geocoded

 55464 people live within 1.5km from Forum
 27489 people live within 1.5km from Iso-omena
 21422 people live within 1.5km from Itis
 11718 people live within 1.5km from Jumbo
 31198 people live within 1.5km from REDI
 25424 people live within 1.5km from Sello



---

### d) Reflection

Good job! You are almost done with this week’s exercise. Please quickly answer the following short questions:
    
- How challenging did you find problems 1-3 (on scale to 1-5), and why?
- What was easy?
- What was difficult?

Add your answers in a new *Markdown* cell below:

was rather tedious...