# 🌇 Welcome to the `Step-By-Step` `End-To-End` example!

This notebook guides you through a complete `UrbanMapper` workflow, step-by-step, using the `PLUTO` dataset in `Downtown Brooklyn`.

We’ll load data, create a street intersections layer, impute missing coordinates, filter data, map it to intersections, enrich with average floors, and visualise the results interactively. This essentially walks through `Basics/[1-6]` examples in a single notebook.

**Data source used**:
- PLUTO data from NYC Open Data. https://www.nyc.gov/content/planning/pages/resources/datasets/mappluto-pluto-change

**What you’ll learn**:

- Load PLUTO data from a CSV file.
- Create a street intersections layer for Downtown Brooklyn.
- Impute missing longitude and latitude values.
- Filter data to the layer’s bounding box.
- Map data to the nearest intersections.
- Enrich the layer with average floors per intersection.
- Visualise the enriched layer on an interactive map.

Each step includes explanations of its purpose and the input-output transformations.

In [None]:
import urban_mapper as um

# Initialise UrbanMapper
mapper = um.UrbanMapper()

## Step 1: Load Data

**Goal**: Load the PLUTO dataset to begin our analysis.

**Input**: A CSV file path (`./pluto.csv`) containing PLUTO data with columns like `longitude`, `latitude`, and `numfloors`. Replace with your own csv filepath here.

**Output**: A GeoDataFrame (`gdf`) with the loaded data, tagged with longitude and latitude columns for geospatial analysis.

Here, we use the `loader` module to read the CSV and specify the coordinate columns, making the data ready for geospatial operations.

In [None]:
data = (
    mapper.loader.from_file("./pluto.csv").with_columns(longitude_column="longitude", latitude_column="latitude").load()
)
data.head(10)  # Preview the first ten rows

## Step 2: Create Urban Layer

**Goal**: Build a foundational layer of street intersections in `Downtown Brooklyn` to map our data onto.

**Input**: A place name (`Downtown Brooklyn, New York City, USA`) and mapping configuration (`longitude`, `latitude`, `output column`, and `threshold distance`).

**Output**: An `UrbanLayer` object representing street intersections, ready to associate data points with specific intersections.

We use the `urban_layer` module with type `streets_intersections`, fetch the network via OSMnx (using `drive` network type), and configure mapping to assign data points to the nearest intersection within 50 meters.

In [None]:
layer = (
    mapper.urban_layer.with_type("streets_intersections")
    .from_place("Downtown Brooklyn, New York City, USA", network_type="drive")
    .with_mapping(
        longitude_column="longitude",
        latitude_column="latitude",
        output_column="nearest_intersection",
        threshold_distance=50,
    )
    .build()
)
layer.static_render()  # Visualise the plain intersections statically (Optional)

## Step 3: Impute Missing Data

**Goal**: Fill in missing `longitude` and `latitude` values to ensure all data points can be mapped and played with.

**Input**: The GeoDataFrame from Step 1 (with potential missing coordinates) and the urban layer from Step 2.

**Output**: A GeoDataFrame with imputed coordinates, reducing missing values.

The `SimpleGeoImputer` from the `imputer` module removes records that simply are having missing coordinates (naive way) –– Further look in the documentation for more. We check missing values before and after to see the effect.

In [None]:
print(f"Missing before: {data[['longitude', 'latitude']].isna().sum()}")
imputed_data = (
    mapper.imputer.with_type("SimpleGeoImputer")
    .on_columns(longitude_column="longitude", latitude_column="latitude")
    .transform(data, layer)
)
print(f"Missing after: {imputed_data[['longitude', 'latitude']].isna().sum()}")

## Step 4: Filter Data

**Goal**: Narrow down the data to only points within Downtown Brooklyn’s bounds.

**Input**: The imputed GeoDataFrame from Step 3 and the urban layer from Step 2.

**Output**: A filtered GeoDataFrame containing only data within the layer’s bounding box.

Using the `BoundingBoxFilter` from the `filter` module, we trim the dataset to match the spatial extent of our intersections layer, reducing irrelevant data.

In [None]:
print(f"Rows before: {len(imputed_data)}")
filtered_data = mapper.filter.with_type("BoundingBoxFilter").transform(
    imputed_data, layer
)
print(f"Rows after: {len(filtered_data)}")

## Step 5: Map to Nearest Layer

**Goal**: Link each data point to its nearest street intersection so later on we could enrich the intersections with some basic aggregations or geo-statistics.

**Input**: The filtered GeoDataFrame from Step 4.

**Output**: An updated `UrbanLayer` and a GeoDataFrame with a new `nearest_intersection` column indicating the closest intersection for each point.

The `map_nearest_layer` method uses the mapping configuration from Step 2 to associate data points with intersections, enabling spatial aggregation in the next step.

In [None]:
_, mapped_data = layer.map_nearest_layer(filtered_data) # Outputs both the layer (unnecessary here) and the mapped data
mapped_data.head()  # Check the new 'nearest_intersection' column

## Step 6: Enrich the Layer

**Goal**: Add meaningful insights by calculating the average number of floors per intersection.

**Input**: The mapped GeoDataFrame from Step 5 and the urban layer from Step 2.

**Output**: An enriched `UrbanLayer` with an `avg_floors` column in its GeoDataFrame.

The `enricher` module aggregates the `numfloors` column by `nearest_intersection` using the mean, adding this statistic to the layer for visualisation or further analysis like Machine Learning-based.

In [None]:
enricher = (
    mapper.enricher.with_data(group_by="nearest_intersection", values_from="numfloors")
    .aggregate_by(method="mean", output_column="avg_floors")
    .build()
)
enriched_layer = enricher.enrich(mapped_data, layer)
enriched_layer.get_layer().head()  # Preview the enriched layer's GeoDataFrame content

## Step 7: Visualise Results

**Goal**: Display the enriched data on an interactive map for exploration.

**Input**: The enriched GeoDataFrame from Step 6.

**Output**: An interactive Folium map showing average floors per intersection with a dark theme.

The `visual` module creates an interactive map with the `Interactive` type and a dark `CartoDB dark_matter` style, highlighting the `avg_floors` column.

In [None]:
fig = (
    mapper.visual.with_type("Interactive")
    .with_style({"tiles": "CartoDB dark_matter"})
    .show(columns=["avg_floors"])  # Show the avg_floors column
    .render(enriched_layer.get_layer())
)
fig  # Display the map

## Conclusion

Congratulations! You’ve completed a full UrbanMapper workflow, step-by-step. You’ve transformed raw PLUTO data into a visually rich map of average building floors per intersection in Downtown Brooklyn. For a more streamlined approach, check out the Pipeline End-To-End notebook!