### OBTAINING PARK LAYER
The goal of this notebook is to produce a final urban green areas layer to performe accessibility analysis. <br>
I will use as a baseline the R03土地利用現況 file, which authored by Tokyo Metropolitan Government (opendata.metro.tokyo).The issue with these data is that they include also cemeteries and sports facilities as parks. My solution is the following:
- Extract from the land use data the parks (LU_1: 300)
- Query OSM for cemeteries, graveyeards, pitches and sport centers.
- First merge (dissolve) the extracted land park polygons
- Exclude the UGS polygons which overlap to the OSM layer


##### Import libraries and datasets

In [9]:
import geopandas as gpd
import pandas as pd

In [10]:
ugs_path = "C:\\Users\\lucap\\Documents\\thesis_work\\data\\ugs\\R03\\R03土地利用現況.shp"
unwanted_path = "C:\\Users\\lucap\\Documents\\thesis_work\\data\\ugs\\unwanted_features.geojson" # obtained through OSM
landuse = gpd.read_file(ugs_path) # layer of Tokyo landuse
unwanted = gpd.read_file(unwanted_path) # layer of cemeteries, graveyeards and sport facilities

##### Preprocess parks data
- Fix CRS
- Merge adjecent parks
- Create index and recompute areas

In [11]:
print(f"original CRS: {landuse.crs}") # originally the data is in EPSG:6677
landuse = landuse.to_crs(epsg=32654) # I do this because I need a CRS that keeps information about distance to compute the buffers
unwanted = unwanted.to_crs(epsg=32654)
print(f"geometry attribute name: {landuse.geometry.name}") # this gives the name of the attibute corresponding to the geometry column (a GeoSeries)


original CRS: EPSG:6677
geometry attribute name: geometry


In [12]:
# filter only the parks from the land use dataframe
print(f"Total polygons in the land usage dataset: {landuse.shape[0]}")
parks = landuse[landuse["LU_1"] == 300] # 300 identifies parks
print(f"Number of parks' polygons: {parks.shape[0]}")

# create a new index and update the areas
parks['park_id'] = range(1,len(parks)+1)
parks.set_index(parks.park_id)
parks['AREA'] = parks.geometry.area
parks = parks.rename(columns={'AREA':'area'})
initial_parks = parks.shape[0]

Total polygons in the land usage dataset: 815736
Number of parks' polygons: 14030


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  super().__setitem__(key, value)


##### Remove unwanted features

In [13]:
# remove features that are mostly unwanted
overlap = gpd.overlay(parks, unwanted, how='intersection')
overlap['overlap_area'] = overlap.geometry.area
tot_overlap = overlap.groupby('park_id')['overlap_area'].sum().reset_index()
parks = parks.merge(tot_overlap, on='park_id', how='left')
parks['overlap_area'] = parks['overlap_area'].fillna(0)
parks['ov_percentage'] = (parks['overlap_area']/parks['area'])* 100
filtered_parks = parks[parks['ov_percentage'] <= 50]
parks_after_removal = filtered_parks.shape[0]
n_parks_removed1 = initial_parks - parks_after_removal
print(f"{n_parks_removed1} parks removed")

# export filtered data
# filtered_parks.to_file("C:\\Users\\lucap\\Documents\\thesis_work\\data\\ugs\\filtered_parks.geojson")

1546 parks removed


The following cell block used to be above (before checking for unwanted areas)
I will now try to keep it here to see whether it is better. The rationale is the following: <br>
The following process dissolves and explodes the UGS. Therefore by desing the average area of the parks will increase after the process.
It is therefore possible that by moving this process _after_ removing unwanted polygons, more unwanted polygons will be removed.

In [14]:
# merge the adjcent parks into a single entity
filtered_parks = filtered_parks.dissolve()
filtered_parks = filtered_parks.explode()
print(f"Number of parks' after merging: {filtered_parks.shape[0]}")
print(f"Step 1 eliminated {n_parks_removed1}")
print(f"Step 2 eliminated {parks_after_removal - filtered_parks.shape[0]}")
filtered_parks['area'] = filtered_parks.geometry.area # the value changed after merging

filtered_parks.to_file("C:\\Users\\lucap\\Documents\\thesis_work\\data\\ugs\\filtered_2.geojson")

Number of parks' after merging: 10517
Step 1 eliminated 1546
Step 2 eliminated 1967


##### Attempt to include bodies of water
THIS IS COMPELTELY WRONG NOW

Main idea: a body of water should be included only if most of its perimeter touches the parks. <br>
This would allow for ponds within parks to be included into the park itself, however huge rivers would be omitted. <br>
Some complications:
- Parks can be divided into multiple polygons, so it is important to consider that the total perimiter touching all polygons that compose the park is what matters to evaluate a bluespace's inclusion in the dataset.
- My solution is dissolving all parks in one single polygon. However bluespace that touches multiple parks may wrongly be added to the dataset.

HOWEVER: many smaller bodies of water are already included into the parks' dataset.

In [15]:
# Dissolve the parks into a single geometry (unified park boundary)
unipark = filtered_parks.dissolve()

# Create a single boundary geometry for the parks
park_boundary = unipark.geometry.boundary.union_all()  # This creates a single LineString or MultiLineString

# Apply buffering to the park boundary (adjust buffer size as needed, e.g., 10 meters)
buffered_park_boundary = park_boundary.buffer(2)

# Filter the bodies of water (LU_1 = 700)
bluespace = landuse[landuse['LU_1'] == 700]

# Calculate the boundary of each body of water
bluespace['boundary'] = bluespace.geometry.boundary

# Intersect the buffered park boundary with the water feature boundaries
bluespace['shared_boundary'] = bluespace['boundary'].apply(lambda b: b.intersection(buffered_park_boundary))

# Calculate the length of the shared boundary for each water feature
bluespace['shared_length'] = bluespace['shared_boundary'].apply(lambda b: b.length)

# Calculate the percentage of the water feature's boundary that is shared with the buffered park boundary
bluespace['perimeter'] = bluespace['boundary'].apply(lambda b: b.length)
bluespace['shared_percentage'] = (bluespace['shared_length'] / bluespace['perimeter']) * 100

# Filter water features where at least 80% of the boundary is adjacent to the park boundary or its buffer
adj_bluespace = bluespace[bluespace['shared_percentage'] >= 80]

# Optional: Add these selected water features back to the parks dataset
parks_with_water = gpd.GeoDataFrame(pd.concat([filtered_parks, adj_bluespace], ignore_index=True))


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  super().__setitem__(key, value)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  super().__setitem__(key, value)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  super().__setitem__(key, value)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = 

The following is the first iteration of adding the bodies of water inside parks. <br>
It worked, but without buffering "only" 200+ polygons were added. Some important absents were the Ueno park's ponds.

In [16]:
# Dissolve the parks into a single geometry (unified park boundary)
#unipark = filtered_parks.dissolve()

# Create a single boundary geometry for the parks
#park_boundary = unipark.geometry.boundary.union_all()  # This creates a single LineString or MultiLineString

# Filter the bodies of water (LU_1 = 700)
#bluespace = landuse[landuse['LU_1'] == 700]

# Calculate the boundary of each body of water
#bluespace['boundary'] = bluespace.geometry.boundary

# Intersect the water feature boundaries with the park boundary
#bluespace['shared_boundary'] = bluespace['boundary'].apply(lambda b: b.intersection(park_boundary))

# Calculate the length of the shared boundary for each water feature
#bluespace['shared_length'] = bluespace['shared_boundary'].apply(lambda b: b.length)

# Calculate the percentage of the water feature's boundary that is shared with the park boundary
#bluespace['perimeter'] = bluespace['boundary'].apply(lambda b: b.length)
#bluespace['shared_percentage'] = (bluespace['shared_length'] / bluespace['perimeter']) * 100

# Filter water features where at least 80% of the boundary is adjacent to parks
#adj_bluespace = bluespace[bluespace['shared_percentage'] >= 80]

# Optional: Add these selected water features back to the parks dataset
#parks_with_water = gpd.GeoDataFrame(pd.concat([filtered_parks, adj_bluespace], ignore_index=True))





In [18]:
# Inspect the result
print(parks_with_water.shape)
print(filtered_parks.shape)

parks_with_water = parks_with_water.drop(columns=['boundary', 'shared_boundary'])
#parks_with_water.set_geometry('geometry', inplace=True)
parks_with_water.to_file('C:\\Users\\lucap\\Documents\\thesis_work\\data\\ugs\\parks_w_water2.geojson')

(10944, 21)
(10517, 15)


In [None]:
# TODO dissolve and explode again
# TODO add meji jingu and chiyoda