# Welcome to OSMNxMapping ☀️!

_Chapter 6_ will be the first notebook showcasing a basic, concise, and reproducible workflow using the `UrbanPipeline` class in OSMNxMapping. For the connaisseur, we try to mimick a `Scikit-Learn` based pipeline but for OSMNxMapping. That is, if you followed chapter 1, the steps would be all the steps done in a couple of cells now in one.

We'll process PLUTO (Primary Land Use Tax Lot Output) building data from New York City, enrich a Manhattan road network with the average number of floors per street segment, and demonstrate how to save and load the pipeline for later use.

Of course, you can re-use it as for a machine learning Sklearn's model.

**Goal**: Learn to:
- Import the OSMNxMapping library and necessary modules.
- Initialise an OSMNxMapping instance.
- Build a pipeline with steps for loading, preprocessing, networking, enriching, and visualising.
- Execute the pipeline in one go with `compose_transform`.
- Visualise the enriched network.
- Save the pipeline to a file.
- Load the pipeline and reuse it.

Unlike previous notebooks, we won’t use Auctus here—data must be available locally in CSV, Shapefile, or Parquet format. We'll use a sample CSV file (`pluto.csv`). For foundational steps or alternative approaches, refer to the chapter 1's notebook.

Let’s get started! 🚀

## Step 1: Import the Library and Modules

We begin by importing the `osmnx_mapping` library with the alias `oxm`, along with the necessary pipeline and module classes to build our workflow.

In [None]:
import osmnx_mapping as oxm
from osmnx_mapping.modules.network import OSMNxNetwork
from osmnx_mapping.modules.loader import CSVLoader
from osmnx_mapping.modules.preprocessing import CreatePreprocessor
from osmnx_mapping.modules.enricher import CreateEnricher
from osmnx_mapping.modules.visualiser import InteractiveVisualiser, StaticVisualiser
from osmnx_mapping.pipeline import UrbanPipeline
from osmnx_mapping import CreateNetwork

## Step 2: Initialise an OSMNxMapping Instance

We create an instance of `OSMNxMapping` named `pluto_buildings`. This instance will serve as the foundation for managing our pipeline and urban data analysis.

In [None]:
pluto_buildings = oxm.OSMNxMapping()

## Step 3: Build the Urban Pipeline

We construct an `UrbanPipeline` with a series of steps to process our data:

- **Network**: Query a Manhattan road network using `OSMNxNetwork`. Inform which mapping is of interest, i.e. `map_nearest_nodes` `map_nearest_edges` so that your dataset's records can be enriched with the nearest <node/edge> in the querried network.
- **Load**: Load PLUTO building data from a CSV file (`pluto.csv`) with `CSVLoader`.
- **Impute**: Drop rows with missing latitude/longitude using `SimpleGeoImputer`.
- **Filter**: Keep data within the network’s bounding box using `BoundingBoxFilter`.
- **Enrich**: Calculate the average number of floors per street segment with `CreateEnricher`.
- **Visualise**: Set up a `StaticVisualiser` for static output (we’ll switch to interactive later).

> **Note**: Ensure the file path (`"./pluto.csv"`) matches your local CSV file’s location. Adjust column names (`"latitude"`, `"longitude"`, `"numfloors"`) if they differ in your dataset.

In [None]:
pipeline = UrbanPipeline([
    ("load", CSVLoader(file_path="./../data/PLUTO/csv/pluto.csv")),
    ("network", CreateNetwork()
        .with_place("Manhattan, NYC", network_type="drive")
        .with_mapping(
            mapping_type="node",
            longitude_column_name="longitude",
            latitude_column_name="latitude",
            output_column="nearest_node"
        ).build()
    ),    
    ("impute", CreatePreprocessor()
         .with_imputer(
            imputer_type="SimpleGeoImputer",
        ).build()
    ),
    ("filter", CreatePreprocessor()
         .with_filter(
            filter_type="BoundingBoxFilter"
        ).build()
    ),
    ("enrich", CreateEnricher()
        .with_data(group_by="nearest_node", values_from="numfloors")
        .aggregate_with(method="mean", edge_method="average", output_column="avg_numfloors")
        .build()
    ),
    ("viz", StaticVisualiser())
])

## Step 4: Execute the Pipeline

We run the pipeline in one concise step using `compose_transform`, which configures and executes all steps (loading, preprocessing, networking, enriching). We specify the latitude and longitude column names required for the workflow.

Note that you could have done the `.compose(latitude_column_name, longitude_column_name)` and `.transform()` steps separately as well. Hence here you can assume that compose will feed all steps with the latitude column name and longitude column name as a requirement to their initialisation / usage –– unless you are setting them in the pipeline initialisation.

This returns the processed data, enriched graph, nodes, and edges.

In [None]:
data, graph, nodes, edges = pipeline.compose_transform(
    latitude_column_name="latitude",
    longitude_column_name="longitude"
)

## Step 5: Visualise the Enriched Network

We visualise the enriched network using the pipeline’s `visualise` method with the `StaticVisualiser` defined in the pipeline. This creates a static Matplotlib plot showing the average number of floors per street segment.

In [None]:
pipeline.visualise(result_columns="avg_numfloors")