# 🌇 Welcome to the `Pipeline` `End-To-End` example!

This notebook demonstrates a streamlined UrbanMapper workflow using the `UrbanPipeline` class, replicating the step-by-step example with PLUTO data in `Downtown Brooklyn`. We’ll define all steps upfront, execute them in one go, and visualise the results.

Essentially, this notebook covers the `Basics/[7]urban_pipeline.ipynb` example.

**Data source used**:
- PLUTO data from NYC Open Data. https://www.nyc.gov/content/planning/pages/resources/datasets/mappluto-pluto-change

**What you'll learn**:
- Assemble a pipeline to load, process, enrich, and visualise PLUTO data.
- Execute the pipeline efficiently.
- Display an interactive map of average floors per intersection.

The `UrbanPipeline` simplifies complex workflows by chaining steps together, enhancing reusability and clarity.

In [None]:
from urban_mapper import UrbanMapper
from urban_mapper.pipeline import UrbanPipeline

# Initialise UrbanMapper
um = UrbanMapper()

## Step 1: Define the Pipeline

**Goal**: Set up all components of the workflow in a single pipeline.

**Input**: Configurations for each UrbanMapper module.

**Output**: An `UrbanPipeline` object ready to process data.

We define each step—urban layer, loader, imputer, filter, enricher, and visualiser—with their specific roles:
- **Urban Layer**: Street intersections in Downtown Brooklyn.
- **Loader**: PLUTO data from CSV.
- **Imputer**: Fills missing coordinates.
- **Filter**: Trims data to the bounding box.
- **Enricher**: Adds average floors per intersection.
- **Visualiser**: Prepares an interactive map.

In [None]:
urban_layer = (
    um.urban_layer.with_type("streets_intersections")
    .from_place("Downtown Brooklyn, New York City, USA", network_type="drive")
    .with_mapping(
        longitude_column="longitude",
        latitude_column="latitude",
        output_column="nearest_intersection",
        threshold_distance=50,
    )  # Recall that with mapping is to tell `map_nearest_layer` how it should map the urban data with the urban layer.
    .build()
)

loader = (
    um.loader.from_file("./pluto.csv").with_columns("longitude", "latitude").build()
)

imputer = (
    um.imputer.with_type("SimpleGeoImputer").on_columns("longitude", "latitude").build()
)

filter_step = um.filter.with_type("BoundingBoxFilter").build()

enricher = (
    um.enricher.with_data(group_by="nearest_intersection", values_from="numfloors")
    .aggregate_by(method="mean", output_column="avg_floors")
    .build()
)

visualiser = (
    um.visual.with_type("Interactive")
    .with_style({"tiles": "CartoDB dark_matter", "colorbar_text_color": "white"})
    .build()
)

# Assemble the pipeline
pipeline = UrbanPipeline(
    [
        ("urban_layer", urban_layer),
        ("loader", loader),
        ("imputer", imputer),
        ("filter", filter_step),
        ("enricher", enricher),
        ("visualiser", visualiser),
    ]
)

# Let's preview the urban pipeline we just created
pipeline.preview()

## Step 2: Execute the Pipeline

**Goal**: Process the data through all defined steps in one operation.

**Input**: The `UrbanPipeline` object from Step 1.

**Output**: A mapped GeoDataFrame and an enriched `UrbanLayer` with processed data.

The `compose_transform` method runs the entire workflow—loading data, imputing, filtering, mapping, and enriching—in a single call, ensuring seamless data flow.

In [None]:
mapped_data, enriched_layer = pipeline.compose_transform()

## Step 3: Visualise Results

**Goal**: Present the enriched data on an interactive map.

**Input**: The enriched layer from Step 2 and columns to display (`avg_floors`).

**Output**: An interactive Folium map showing average floors per intersection.

The pipeline’s `visualise` method leverages the pre-configured visualiser to generate the map directly from the enriched layer.

In [None]:
fig = pipeline.visualise(["avg_floors"])
fig  # Display the interactive map

## Step 4: Save and Load Pipeline

**Goal**: Preserve the pipeline for future use or sharing.

**Input**: A file path (`./my_pipeline.dill`) for saving.

**Output**: A saved pipeline file and a reloaded `UrbanPipeline` object.

Saving with `save` and loading with `load` allows you to reuse or distribute your workflow effortlessly.

In [None]:
# Save the pipeline
pipeline.save("./my_pipeline.dill")

# Load it back
loaded_pipeline = UrbanPipeline.load("./my_pipeline.dill")

# Preview the loaded pipeline
loaded_pipeline.preview()

# Visualise with the loaded pipeline
fig = loaded_pipeline.visualise(["avg_floors"])

## Conclusion

Well done! Using `UrbanPipeline`, you’ve efficiently processed and visualised PLUTO data with less code than the step-by-step approach. This method shines for its simplicity and reusability. Compare it with the Step-by-Step notebook for a detailed breakdown of each stage!