# Welcome to OSMNxMapping ☀️!

_Chapter 1_ will be about reproducing the "Getting Started" section from the [OSMNxMapping README](https://github.com/VIDA-NYU/OSMNXMapping#getting-started). 

We demonstrate how to use the OSMNxMapping library to map urban data to street networks, **step by step**. We'll use the example of PLUTO (Primary Land Use Tax Lot Output) buildings in New York City to:

- Import the library.
- Initialise an OSMNxMapping instance.
- Search for datasets using Auctus.
- Load the selected dataset.
- Convert the dataset for OSMNxMapping.
- Query a road network.
- Map data to street nodes.
- Preprocess the data (imputing and filtering).
- Enrich the network with aggregated data.
- Visualise the results.

Let's dive in! 🚀

## Step 1: Import the Library

We start by importing the `osmnx_mapping` library with the alias `oxm` for convenience.

In [None]:
import osmnx_mapping as oxm

## Step 2: Initialise an OSMNxMapping Instance

Next, we create an instance of `OSMNxMapping`. This instance, named `pluto_buildings`, will manage our urban data and road network analysis. At this point, no data is loaded or queried—everything is set up for the steps ahead.

In [None]:
pluto_buildings = oxm.OSMNxMapping()

## Step 3: Search for Datasets

We use the `AuctusSearchMixin` to search for datasets related to "PLUTO" via the Auctus API. Setting `display_initial_results=True` shows an interactive grid of dataset cards in the notebook.

> **Note**: After running this cell, browse the displayed datasets and click "Select This Dataset" on the one you want to use. You can also load data manually from a file—see the `examples/` folder in the repository for different examples.

In [None]:
collection = pluto_buildings.auctus.search_datasets(search_query="PLUTO", display_initial_results=True)

## Step 3-BIS: Profile the Selected Dataset

After selecting a dataset in Step 3, we can profile thanks to https://github.com/soniacq/DataProfileVis integrated within https://github.com/VIDA-NYU/auctus_search the dataset. Yet we cannot yet edit a profiling in the current version, further discussion needs to be made.

In [None]:
pluto_buildings.auctus.profile_selected_dataset()

## Step 4: Load the Selected Dataset

After selecting a dataset in Step 3, we load it into memory using `load_dataset_from_auctus()`. This returns a `pandas.DataFrame` or `geopandas.GeoDataFrame` and displays an interactive table preview by default.

In [None]:
dataset = pluto_buildings.auctus.load_dataset_from_auctus()

## Step 5: Load Your Auctus Dataset into OSMNxMapping

We convert the loaded dataset into a format compatible with OSMNxMapping using `load_from_dataframe`. This method transposes the data into a `geopandas.GeoDataFrame`, specifying the latitude and longitude columns. We then display it interactively with `interactive_display` (which uses `Skrub` for interactive data exploration).

> **Note**: Adjust `"latitude"` and `"longitude"` to match your dataset's actual column names if they differ.

In [None]:
loaded_data = pluto_buildings.loader.load_from_dataframe(
    input_dataframe=dataset,
    latitude_column="latitude",  # Replace with your dataset's latitude column name
    longitude_column="longitude"  # Replace with your dataset's longitude column name
)
pluto_buildings.table_vis.interactive_display(loaded_data)

## Step 6: Query a Road Network

We query the road network for Manhattan using `network_from_place`. Setting `render=True` displays a plot of the network.

In [None]:
graph, nodes, edges = pluto_buildings.road_networks.network_from_place("Manhattan, New York City, USA", render=True)

## Step 7: Geo Preprocessing Your Dataset

We perform two preprocessing steps:

1. **Impute Missing Values**: Drop rows with missing latitude or longitude values using the default `SimpleGeoImputer`. To see if you have some use the interactive viz. of Skrub, click on a column say latitude or longitude and you'll see a distribution and the number of missing values.
2. **Filter Data**: Keep only points within the road network's bounding box using `BoundingBoxFilter`.

> **Note**: Each `PreprocessingMixin` instance can only perform one action (impute or filter). We reuse the mixin for each step here. For advanced preprocessing, see the `PreprocessingMixin` API.

### Substep 1: Impute Missing Values

In [None]:
loaded_data = (
    pluto_buildings.preprocessing
    .with_default_imputer(latitude_column_name="latitude", longitude_column_name="longitude")
    .transform(input_data=loaded_data)
)

### Substep 2: Filter Data Within Bounding Box

Note that this is most of the time very useful especially say your data is for NYC entirely but you are focusing only on Manhattan, you better be filtering this out; otherwise, you may have outliers that could skew your urban pipeline analysis here.

Here below we use CreatePreprocessor. Which is the same pluto_buildings.preprocessing... However, it creates a new instance. That is because we do not allow to have two instances of a preprocessing with different actions (imputer vs. filter). Though for consistences, we could have used two times the CreatePreprocessor but we wanted to show what's possible. When there is only one preprocessing action, I would recommend using the above's. When more than one simply use CreatePreprocessor it'll be much easier.

In [None]:
from osmnx_mapping.modules.preprocessing import CreatePreprocessor

loaded_data = (
    CreatePreprocessor().with_filter(
        filter_type="BoundingBoxFilter",
        nodes=nodes
    ).build().transform(loaded_data)
)

## Step 8: Map the Loaded Data to the Nearest Street Nodes

We map each data point (e.g., a building) to the nearest node in the road network using `map_nearest_street`. This adds a column (default: `"nearest_node"`) to `loaded_data` with the ID of the closest node, which is crucial for the enrichment step.

In [None]:
loaded_data = pluto_buildings.road_networks.map_nearest_street(
    data=loaded_data,
    longitude_column="longitude",
    latitude_column="latitude"
)

## Step 9: Enrich the Network with the Loaded Data

We enrich the network by calculating the average number of floors (`"numfloors"`) per street segment. We use the `with_default` method for simplicity, configuring the enricher to aggregate data by `"nearest_node"` using the `mean` method.

Then, we apply the enricher to the network with `enrich_network`.

> **Tip**: For more advanced configurations see the `EnricherMixin` API for details.

In [None]:
# Configure a default enricher
pluto_buildings.enricher.with_default(
    group_by_column="nearest_node",
    values_from_column="numfloors",
    output_column="avg_numfloors",
    method="mean",
    edge_method="average"
)

# Apply the enricher to the network
enriched_data, graph, nodes, edges = pluto_buildings.enricher.enrich_network(
    input_data=loaded_data,
    input_graph=graph,
    input_nodes=nodes,
    input_edges=edges
)

## Step 10: Visualise Your Enriched Network

We visualise the enriched network in two ways:

1. **Static Visualisation**: Using the default `StaticVisualiser` to create a Matplotlib plot.
2. **Interactive Visualisation**: Using `InteractiveVisualiser` for a Folium map.

> **Note**: Ensure you have the necessary Jupyter extensions installed for interactive visualisations (see the README's Installation section).

### Substep 1: Static Visualisation (Matplotlib)

In [None]:
viz = pluto_buildings.visual.visualise(graph=graph, nodes=nodes, edges=edges, result_columns="avg_numfloors")
viz

### Substep 2: Interactive Visualisation (Folium)

In [None]:
from osmnx_mapping.modules.visualiser.visualisers import InteractiveVisualiser

pluto_buildings = oxm.OSMNxMapping() # Refreshing the instance to avoid the visualiser to be confused between static and interactive visualiser.

viz = pluto_buildings.visual(visualiser=InteractiveVisualiser()).visualise(graph=graph, nodes=nodes, edges=edges, result_columns="avg_numfloors")
viz

## Conclusion

Voila! 🥐 You've successfully mapped the PLUTO buildings dataset to Manhattan's street network, enriched it with the average number of floors per street segment, and visualised the results both statically and interactively. 🎉

This is just the beginning—explore the [OSMNxMapping API](https://github.com/VIDA-NYU/OSMNXMapping#api) and the `examples/` directory for more advanced features and use cases. Happy urban mapping! 🌆