## 🔧 Data Transformation Plan

1. **Load and Inspect**:
   - Import data (CSV, SHP) into Pandas or GeoPandas.
   - Standardize encoding (UTF-8) and column headers.

2. **Clean & Normalize**:
   - Rename columns using snake_case.
   - Fill or drop nulls where applicable.
   - Standardize naming of park/playground types.

3. **Spatial Validation**:
   - Ensure all rows have valid latitude and longitude.
   - Run spatial joins with neighborhood shapefiles.

4. **Structure & Store**:
   - Tag each row with `zone_type`: `"park"` or `"playground"`.

5. **Export or Integrate**:
   - Save cleaned data as `.geojson` or upload to PostGIS/spatial DB.
   - Integrate into broader listing or neighborhood enrichment pipelines.

## 🧪 Step 1: Research & Data Modelling

### Public Parks (Grünenanlage) Berlin

In [3]:
import pandas as pd
from geopy.geocoders import Nominatim
from time import sleep

In [4]:
# Load your CSV
df = pd.read_csv("/Users/dianaterraza/Desktop/webeet.io/layered-populate-data-pool-da/recreational_zones/sources/public_parks.csv", sep=';')

In [5]:
df

Unnamed: 0,Technischer Schlüssel,Schlüssel,Objektnummer,Bezirk,Ortsteil,Art der Grünanlage,Name der Grünanlage,Namenszusatz der Grünanlage,Baujahr,letztes Sanierungsjahr,Größe in m² (Kataster),Widmung,Nummer des Planungsraumes,Name des Planungsraumes
0,00008100_001042bb,00008100:001042bb,00037,Reinickendorf,Frohnau,Grünanlage,"Im Fischgrund, ""Rosenanger""",Rosenanger,-,-,1699150,gewidmet,12400721,Frohnau Ost
1,00008100_00104621,00008100:00104621,01179,Reinickendorf,Lübars,Grünanlage,Klötzbecken bis Zabel-Krüger-Damm,einschl. Klötzbecken,-,-,5222460,gewidmet,12500929,Lübars
2,00008100_001044bd,00008100:001044bd,01074,Reinickendorf,Hermsdorf,Grünanlage,"Heidenheimer Str. (ab Friedrichsthaler Weg), W...",-,-,-,301200,gewidmet,12400722,Hermsdorf West
3,00008100_00104620,00008100:00104620,01180,Reinickendorf,Lübars,Grünanlage,"Wittenauer Str., südl. AEG-Siedlung",-,-,-,337420,gewidmet,12500929,Lübars
4,00008100_00104438,00008100:00104438,00476,Reinickendorf,Reinickendorf,Grünanlage,Kuhnpromenade u. Lindauer Allee 59/61,-,-,-,312200,gewidmet,12100206,Humboldtstraße
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2551,00008100_0014bc05,00008100:0014bc05,214310,Mitte,Moabit,Grünanlage,Essener Park,Essener Str. 7,-,-,367700,gewidmet,01200522,Elberfelder Straße
2552,00008100_0014bf15,00008100:0014bf15,340910,Mitte,Wedding,Grünanlage,Nauener Platz,Reinickendorfer Str. 55 / 56 Teil 1; 30475,-,-,244900,gewidmet,01401049,Schulstraße
2553,00008100_002df21c,00008100:002df21c,324060,Mitte,Wedding,Grünanlage,Leopoldplatz an der Alten Nazarethkirche,-,-,-,82200,gewidmet,01401048,Leopoldplatz
2554,00008100_002ed956,00008100:002ed956,105820,Mitte,Mitte,Grünanlage,Mollstr. 15-18,östlich des Spielplatzes,-,-,37900,gewidmet,01100311,Karl-Marx-Allee


In [6]:
df.head()

Unnamed: 0,Technischer Schlüssel,Schlüssel,Objektnummer,Bezirk,Ortsteil,Art der Grünanlage,Name der Grünanlage,Namenszusatz der Grünanlage,Baujahr,letztes Sanierungsjahr,Größe in m² (Kataster),Widmung,Nummer des Planungsraumes,Name des Planungsraumes
0,00008100_001042bb,00008100:001042bb,37,Reinickendorf,Frohnau,Grünanlage,"Im Fischgrund, ""Rosenanger""",Rosenanger,-,-,1699150,gewidmet,12400721,Frohnau Ost
1,00008100_00104621,00008100:00104621,1179,Reinickendorf,Lübars,Grünanlage,Klötzbecken bis Zabel-Krüger-Damm,einschl. Klötzbecken,-,-,5222460,gewidmet,12500929,Lübars
2,00008100_001044bd,00008100:001044bd,1074,Reinickendorf,Hermsdorf,Grünanlage,"Heidenheimer Str. (ab Friedrichsthaler Weg), W...",-,-,-,301200,gewidmet,12400722,Hermsdorf West
3,00008100_00104620,00008100:00104620,1180,Reinickendorf,Lübars,Grünanlage,"Wittenauer Str., südl. AEG-Siedlung",-,-,-,337420,gewidmet,12500929,Lübars
4,00008100_00104438,00008100:00104438,476,Reinickendorf,Reinickendorf,Grünanlage,Kuhnpromenade u. Lindauer Allee 59/61,-,-,-,312200,gewidmet,12100206,Humboldtstraße


In [7]:
df.columns

Index(['Technischer Schlüssel', 'Schlüssel', 'Objektnummer', 'Bezirk',
       'Ortsteil', 'Art der Grünanlage', 'Name der Grünanlage',
       'Namenszusatz der Grünanlage', 'Baujahr', 'letztes Sanierungsjahr',
       'Größe in m² (Kataster)', 'Widmung', 'Nummer des Planungsraumes',
       'Name des Planungsraumes'],
      dtype='object')

### Rename the Columns 

In [8]:
df.rename(columns={
    'Technischer Schlüssel': 'Technical ID',
    'Schlüssel': 'Key',
    'Objektnummer': 'Object Number',
    'Bezirk': 'neighborhood',
    'Ortsteil': 'Locality',
    'Art der Grünanlage': 'Type of Green Space',
    'Name der Grünanlage': 'Green Space Name',
    'Namenszusatz der Grünanlage': 'Name Extension',
    'Baujahr': 'Year Built',
    'letztes Sanierungsjahr': 'Last Renovation Year',
    'Größe in m² (Kataster)': 'Size sqm',
    'Widmung': 'Dedication',
    'Nummer des Planungsraumes': 'Planning Area Number',
    'Name des Planungsraumes': 'Planning Area Name'
}, inplace=True)


In [9]:
df.columns


Index(['Technical ID', 'Key', 'Object Number', 'neighborhood', 'Locality',
       'Type of Green Space', 'Green Space Name', 'Name Extension',
       'Year Built', 'Last Renovation Year', 'Size sqm', 'Dedication',
       'Planning Area Number', 'Planning Area Name'],
      dtype='object')

### Create Address1 Column with Green Space Name

In [23]:
df['Address1'] = (
    df['Green Space Name'].astype(str) + ", Berlin, Germany"
)

df.head()

Unnamed: 0,Technical ID,Key,Object Number,neighborhood,Locality,Type of Green Space,Green Space Name,Name Extension,Year Built,Last Renovation Year,Size sqm,Dedication,Planning Area Number,Planning Area Name,Full Address,Address1
0,00008100_001042bb,00008100:001042bb,37,Reinickendorf,Frohnau,Grünanlage,"Im Fischgrund, ""Rosenanger""",Rosenanger,-,-,1699150,gewidmet,12400721,Frohnau Ost,"Im Fischgrund, ""Rosenanger"", Frohnau Ost, Berl...","Im Fischgrund, ""Rosenanger"", Berlin, Germany"
1,00008100_00104621,00008100:00104621,1179,Reinickendorf,Lübars,Grünanlage,Klötzbecken bis Zabel-Krüger-Damm,einschl. Klötzbecken,-,-,5222460,gewidmet,12500929,Lübars,"Klötzbecken bis Zabel-Krüger-Damm, Lübars, Ber...","Klötzbecken bis Zabel-Krüger-Damm, Berlin, Ger..."
2,00008100_001044bd,00008100:001044bd,1074,Reinickendorf,Hermsdorf,Grünanlage,"Heidenheimer Str. (ab Friedrichsthaler Weg), W...",-,-,-,301200,gewidmet,12400722,Hermsdorf West,"Heidenheimer Str. (ab Friedrichsthaler Weg), W...","Heidenheimer Str. (ab Friedrichsthaler Weg), W..."
3,00008100_00104620,00008100:00104620,1180,Reinickendorf,Lübars,Grünanlage,"Wittenauer Str., südl. AEG-Siedlung",-,-,-,337420,gewidmet,12500929,Lübars,"Wittenauer Str., südl. AEG-Siedlung, Lübars, B...","Wittenauer Str., südl. AEG-Siedlung, Berlin, G..."
4,00008100_00104438,00008100:00104438,476,Reinickendorf,Reinickendorf,Grünanlage,Kuhnpromenade u. Lindauer Allee 59/61,-,-,-,312200,gewidmet,12100206,Humboldtstraße,"Kuhnpromenade u. Lindauer Allee 59/61, Humbold...","Kuhnpromenade u. Lindauer Allee 59/61, Berlin,..."


### Lets look for duplicates 

In [24]:
# Count duplicate Address values
duplicates = df['Address1'].duplicated().sum()
print(f"Found {duplicates} duplicate addresses.")

Found 19 duplicate addresses.


### Create a unique address DataFrame

In [None]:
unique_addresses = df[['Address1']].drop_duplicates().copy()

### Geocode only the sample_df (10 rows) of unique addresses using OpenStreetMap’s Nominatim API 

In [25]:
sample_df = unique_addresses.loc[0:10]
sample_df

Unnamed: 0,Address1,Latitude,Longitude
0,"Im Fischgrund, ""Rosenanger"", Berlin, Germany",,
1,"Klötzbecken bis Zabel-Krüger-Damm, Berlin, Ger...",,
2,"Heidenheimer Str. (ab Friedrichsthaler Weg), W...",,
3,"Wittenauer Str., südl. AEG-Siedlung, Berlin, G...",,
4,"Kuhnpromenade u. Lindauer Allee 59/61, Berlin,...",,
5,"Avenue Charles de Gaulle 32-33, Berlin, Germany",52.601249,13.319022
6,"Platz der US-Berlin-Brigaden WG, Berlin, Germany",,
7,"Schünemannweg N, Berlin, Germany",52.444492,13.352584
8,"Grabens. Hlgs., Lindengraben, Berlin, Germany",,
9,"BAB, Überbauung Tunnel Tegel, Berlin, Germany",,


In [26]:
geolocator = Nominatim(user_agent="berlin-geocoder")

def geocode_address(address):
    try:
        location = geolocator.geocode(address)
        sleep(1)
        if location:
            return pd.Series([location.latitude, location.longitude])
        else: 
            return pd.Series([None, None])
    except:
        return pd.Series([None, None])

In [18]:
# Geocode sample_df
sample_df[['Latitude', 'Longitude']] = sample_df['Address1'].apply(geocode_address)


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  sample_df[['Latitude', 'Longitude']] = sample_df['Address1'].apply(geocode_address)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  sample_df[['Latitude', 'Longitude']] = sample_df['Address1'].apply(geocode_address)


In [19]:
sample_df

Unnamed: 0,Address1,Latitude,Longitude
0,"Im Fischgrund, ""Rosenanger"", Berlin, Germany",,
1,"Klötzbecken bis Zabel-Krüger-Damm, Berlin, Ger...",,
2,"Heidenheimer Str. (ab Friedrichsthaler Weg), W...",,
3,"Wittenauer Str., südl. AEG-Siedlung, Berlin, G...",,
4,"Kuhnpromenade u. Lindauer Allee 59/61, Berlin,...",,
5,"Avenue Charles de Gaulle 32-33, Berlin, Germany",52.601249,13.319022
6,"Platz der US-Berlin-Brigaden WG, Berlin, Germany",,
7,"Schünemannweg N, Berlin, Germany",52.444492,13.352584
8,"Grabens. Hlgs., Lindengraben, Berlin, Germany",,
9,"BAB, Überbauung Tunnel Tegel, Berlin, Germany",,


### Geocode the unique addresses of the entire dataset using OpenStreetMap’s Nominatim API 

In [20]:
geolocator = Nominatim(user_agent="berlin-geocoder")

def geocode_address(address):
    try:
        location = geolocator.geocode(address)
        sleep(1)
        if location:
            return pd.Series([location.latitude, location.longitude])
        else: 
            return pd.Series([None, None])
    except:
        return pd.Series([None, None])

In [21]:
# Geocode unique addresses
unique_addresses[['Latitude', 'Longitude']] = unique_addresses['Address1'].apply(geocode_address)

In [22]:
unique_addresses

Unnamed: 0,Address1,Latitude,Longitude
0,"Im Fischgrund, ""Rosenanger"", Berlin, Germany",,
1,"Klötzbecken bis Zabel-Krüger-Damm, Berlin, Ger...",,
2,"Heidenheimer Str. (ab Friedrichsthaler Weg), W...",,
3,"Wittenauer Str., südl. AEG-Siedlung, Berlin, G...",,
4,"Kuhnpromenade u. Lindauer Allee 59/61, Berlin,...",,
...,...,...,...
2550,"Weddingplatz, Berlin, Germany",52.540801,13.369760
2551,"Essener Park, Berlin, Germany",52.524730,13.340990
2553,"Leopoldplatz an der Alten Nazarethkirche, Berl...",,
2554,"Mollstr. 15-18, Berlin, Germany",52.526345,13.416519
