# Urban Pipeline Basics

Welcome to this Basics notebook on the `UrbanPipeline` 👋 class—a cracking little tool in UrbanMapper that lets you bundle up all your workflow steps into one neat package.

In this notebook, we’ll:

- Get to grips with what the `UrbanPipeline` does.
- Build a simple pipeline with a few key steps.
- Run it and show off the results.

Let’s get started! 🌟

In [None]:
import urban_mapper as um
from urban_mapper.pipeline import UrbanPipeline

# Fire up UrbanMapper
mapper = um.UrbanMapper()

## What’s the `UrbanPipeline` All About?

The `UrbanPipeline` class is like the conductor of an orchestra –– for the ML enthusiasts, it is trying to mimick what does Scikit-Learn does with the Scikit Pipeline –– —it brings together all the UrbanMapper steps (loading data, creating layers, imputing missing bits, filtering, enriching, and visualising) and makes them play in harmony. You define your steps, pop them into the pipeline, and it handles the rest. It’s brilliant for keeping your workflow tidy and repeatable!

## Setting Up a Simple Pipeline

Let’s build a pipeline that does the following:

- Loads PLUTO data from a CSV file.
- Creates a street intersections layer for Downtown Brooklyn.
- Imputes missing coordinates.
- Filters data to the layer’s bounding box.
- Enriches the layer with average building floors.
- Sets up an interactive map to visualise it all.

We’ll define each step and slot them into our pipeline.

In [None]:
# Define the steps
urban_layer = (
    mapper.urban_layer.with_type("streets_intersections")
    .from_place("Downtown Brooklyn, New York City, USA", network_type="drive")
    .with_mapping(
        longitude_column="longitude",
        latitude_column="latitude",
        output_column="nearest_intersection",
        threshold_distance=50,  # Optional: sets a 50-meter threshold for nearest mapping.
        # Distance unit is based on the coordinate reference system, see further in the documentation.
    )
    .build()
)

loader = (
    mapper.loader.from_file("./pluto.csv").with_columns("longitude", "latitude").build()
)

imputer = (
    mapper.imputer.with_type("SimpleGeoImputer")
    .on_columns("longitude", "latitude")
    .build()
)

filter_step = mapper.filter.with_type("BoundingBoxFilter").build()

enricher = (
    mapper.enricher.with_data(group_by="nearest_intersection", values_from="numfloors")
    .aggregate_by(method="mean", output_column="avg_floors")
    .build()
)

visualiser = (
    mapper.visual.with_type("Interactive")
    .with_style({"tiles": "CartoDB dark_matter"})
    .build()
)

# Assemble the pipeline
# Note that a step is a tuple with a name and the step itself.
pipeline = UrbanPipeline(
    [
        ("urban_layer", urban_layer),
        ("loader", loader),
        ("imputer", imputer),
        ("filter", filter_step),
        ("enricher", enricher),
        ("visualiser", visualiser),
    ]
)

# Note that we can do this in a more concise way, but we are showing the steps for clarity.
# The concise way would be looking alike this for only with urban layer:

# pipeline = UrbanPipeline([
#     ("urban_layer", (
#         mapper.urban_layer
#         .with_type("streets_intersections")
#         .from_place("Downtown Brooklyn, New York City, USA", network_type="drive")
#         .with_mapping(
#             longitude_column="longitude",
#             latitude_column="latitude",
#             output_column="nearest_intersection",
#             threshold_distance=50
#         )
#         .build()
#     )),
#     # Add the other steps here
# ])

# Let's preview our urban pipeline workflow
pipeline.preview()

## Running the Pipeline

Time to put it to work! We’ll use `compose_transform` to run the entire pipeline in one go—loading, imputing, filtering, mapping, enriching, all sorted. Then, we’ll visualise the results with a snazzy interactive map.

Note however that we could do this in two steps, first calling `compose()` and then `transform()`, but we are showing the two steps in one for simplicity.

In [None]:
# Execute the pipeline
mapped_data, enriched_layer = pipeline.compose_transform()

# Show the results
fig = pipeline.visualise(result_columns=["avg_floors"])
# result_columns is basically the columns that will be displayed in the map.
# If you want to display only one column, you can pass a string as well.

fig  # Displays an interactive map in your notebook

## Saving and Loading Your Pipeline

Want to keep your pipeline for later? You can save it to a file and load it back up whenever you fancy. It’s like stashing your favourite recipe for a rainy day!

In [None]:
# Save the pipeline
pipeline.save("./my_pipeline.joblib")

# Load it back
loaded_pipeline = mapper.urban_pipeline.load(  # From the UrbanPipeline module
    "./my_pipeline.joblib"
)

# Get all step names
step_names = loaded_pipeline.get_step_names()
print("Pipeline steps:", step_names)

# Access a specific step with get_step
urban_layer_step = loaded_pipeline.get_step("urban_layer")
print("Urban Layer step:", urban_layer_step)

# Access a step with square brackets
loader_step = loaded_pipeline["loader"]
print("Loader step:", loader_step)

# Preview the entire pipeline
loaded_pipeline.preview()

## Wrapping It Up

Smashing job! 🌟 You’ve built and run your first `UrbanPipeline`, bringing together all those UrbanMapper steps into one smooth workflow. You can now reuse it, share it, or tweak it as you like.